Six Sigma and the PC World Investigational approach benefits from the power of statistical tools Mark Anawis
 click to enlarge |
Fig 1: Cause and effect diagram |
Six Sigma is, at heart, a business philosophy that seeks to deliver products and services with a minimum of defects. The cost savings and increase in business efficiencies have resulted in an alignment between the business needs and the scientific community, which is bridged by the statistical world. Advances in computing power and the ready availability of computer applications such as JMP and Minitab have put the needed computing tools in the hands of the average scientist and engineer.
click to enlarge |
| Fig 2: Distribution of Y, boxplot, and test of normality |
Define
The Define phase is a succinct description of the problem and potential causes. It is the first step in data gathering for further analysis and experimentation. With a member of each appropriate functional area present, the team conducts a brainstorming session. Although sophisticated tools are not a requirement at this stage, use of an application-specific tool — such as a diagram (also called a fishbone or Ishikawa chart) with the head describing the problem, and the major branches the groupings of potential causes — is beneficial, since it can be easily modified and disseminated (Figure 1).
The major groupings can take the form of Man, Machine, Material, Method, Measurement and Environment or an alternative but suitable set. Organization of dependencies of causes can be defined through Parent-Child relationships. These relationships establish the hierarchy such that the defect is the original Parent, and the major categories are Children. To create subcategories, these Children become additionally defined as Parents with their own Children. A process map provides understanding of where the problem occurs in the overall flow.
click to enlarge |
Fig 3: Pareto Plot |
Measurement
The Measurement phase includes a data-gathering plan and data collection to establish a baseline. A measurement systems analysis is important at this phase to determine whether the problem can be adequately measured. Once the critical independent (Xs) and dependent (Ys) variables have been determined, a distribution of the Ys with an outlier box plot can provide insight into groupings and outliers. A fit of different distribution types (Normal, LogNormal, Weibull, Exponential, Gamma, Beta, etcetera) and Goodness of Fit tests can be applied (Figure 2).
If the process is normal, or can be transformed to be normal, a process capability analysis can be performed to determine the expected percentage of occurrences outside of specifications. If not, quantiles can be used to estimate the percentage of occurrences outside of specifications. A Normal Quantile plot also can provide a visualization of normality and areas of the distribution, which deviate from normality. A Pareto chart is often helpful at this phase to rank causes (Xs) (Figure 3). Time series and control charts can provide insight into the beginning dates of the defect.
click to enlarge |
Fig 4: Variability chart |
Analyze
The Analyze phase establishes a causal relationship between Xs and Ys in order to determine a root cause(s). Variability charts can show data side-by-side for more than one grouping to provide a visual context of the problem (Figure 4).
Hypothesis tests can define the process and deviations from the process. For normally distributed data, this includes tests for the mean, such as Z and T tests, as well as tests for the variance, such as F and chi-square tests. For non-normally distributed data, this includes tests for the median, such as Wilcoxon, as well as tests for the variance, such as Levene’s test. Confidence intervals also are used to define the process.
Model fitting using either a single X versus a single Y or more complex models can be evaluated. A single X versus a single Y plot includes simple linear regression, ANOVA, logistic regression or contingency plots. More complex models involve multiple Xs, Ys, interaction terms and higher-order terms. Although more difficult to evaluate, the complex models have diagnostic tools to evaluate the model (i.e. Rsquared, ANOVA, lack-of-fit tests and residual plots) and probability (p) values to aid in the selection of factors which contribute to the defects. Low p values (< 0.05) are definite candidates for evaluation. Data mining algorithms such as partitioning (decision trees) can be effective at exploring relationships between Xs and a Y without needing to develop a model first (Figure 5).
click to enlarge |
Fig 5: Partition |
Improve
The Improve phase eliminates or reduces the effect of the root cause(s). Once the critical factors contributing to the problem have been identified, design of experiments (DOE) can be used to optimize the process. This methodology allows the evaluation of numerous Xs with an economy of runs. If the process has not been well characterized at this point and the Analyze phase has yielded numerous Xs, it may be necessary to conduct a screening DOE first. Although screening designs allow the evaluation of numerous factors, they do not allow the calculation of interaction effects or curvature.
Once a small number of Xs has been identified, a response surface methodology (RSM) design can be used to characterize the system. Once the RSM is performed, models can be fit to the data to select the values at which the critical factors can be set to optimize the system. The goal of the optimization can be either to maximize, minimize and meet a target value for the Ys, or to minimize its variability, or both.
Description and Interpretation of Special Causes Tests
|
Test 1
One point beyond Zone A
|
Test 2
Nine points in a row in a single (upper or lower) side of Zone C or beyond
|
Test 3
Six points in a row steadily increasing or decreasing
|
Test 4
Fourteen points in a row alternating up and down
|
Test 5
Two out of three points in a row in Zone A or beyond
|
Test 6
Four out of five points in Zone B or beyond
|
Test 7
Fifteen points in a row in Zone C, above and below the center line
|
Test 8
Eight points in a row on both sides of the center line with none in Zones C
|
Fig 6: Runs rules |
Control
The Control phase monitors the process to ensure that the improvements have been sustained. Statistical process control (SPC) is the tool used to determine whether the process continues to perform consistently through the use of control charts. The objective is to determine whether an observed variation is due to random chance or an assignable cause. This is aided by the application of runs rules, such as the Western Electric set, to identify changes to the process that need further investigation to remove assignable cause variation (Figure 6).
Variable data is plotted on charts, such as the X/R or I/mR pairs, which consist of a mean and variability chart. X/R are used where the subgrouping is important in the determination of the process variability, whereas the I/mR pairs are used where, either there is only a single value to define the subgroup, or the subgroup-to-subgroup variability is of more interest. Attribute data (counts of defects rather than continuous data) is plotted on charts, such as p, np, c, or u, which depend on whether the data follows a binomial or Poisson distribution and whether the sample size is fixed or varies. Capability analysis also can be performed in conjunction with control charts to provide an overall summary of the process performance.
Conclusion
Future trends in Six Sigma will be influenced by the increase in data captured in databases, the automation of data analysis, and developments in data mining algorithms. Although query tools exist separate from analytical applications, query functionality is available within several of the analytical packages. This combination of data importation and analysis can be automated through use of scripting languages. Furthermore, the integration of several data mining algorithms in these applications will continue to aid the user in root cause identification and predictive modeling.
Note: All figures created in this article used JMP 8.0.1.
Acronyms
ANOVA Analysis of Variance | DMAIC Define, Measure, Analyze, Improve and Control | DOE Design of Experiments | RSM Response Surface Methodology | SPC Statistical Process Control
Mark Anawis is a Principal Scientist and ASQ Six Sigma Black Belt at Abbott. He may be reached at editor@ScientificComputing.com.