# Articles

Hypothesis Testing: A Look at Two Common Approaches
Wed, 08/08/2012 - 11:40am

Hypothesis Testing: A Look at Two Common Approaches
Depending upon the situation, either Classical or Bayesian testing may be the better choice

 Figure 1: Classical Hypothesis Statements One-Tailed Testing Examples Two-Tailed Testing Example Ho: ?????o Ho: ?????o Ha: ?????o Ha: ?????o or Ho: ?????o Ha: ?????o
Thomas Huxley once said “The great tragedy of science is the slaying of a beautiful hypothesis by an ugly fact.” There are two main approaches to hypothesis testing: classical and Bayesian. The related field of decision theory, which is concerned with principles and algorithms for allowing the selection of the best outcome among a set of alternatives is complex and will not be discussed.

In the classical approach, one of the main objectives of the analysis of data using statistics is to make inferences about a population by examination of a representative sample from that population, since we can not often examine the entire population. First, a null hypothesis is constructed which equates to “no difference.” This typically means that the sample observation arose purely due to chance. From the null hypothesis, a contrasting alternative hypothesis is constructed which equates to “difference.” This typically means that the observation arose due to a cause.

The alternative hypothesis is often what we are hoping to prove by gathering enough evidence to overturn the null hypothesis. Statisticians take the stance that you either “reject the null hypothesis” or “fail to reject the null hypothesis.” You never “accept the null hypothesis,” since this implies that you think it is true, whereas you are typically trying to prove the alternative hypothesis. The “fail to reject the null hypothesis” carries with it the implication that the data is insufficient to overturn the null hypothesis. It is important that the hypotheses are defined prior to testing.

 Figure 2: Two Types of Errors in Classical Hypothesis Testing Ho is true Ho is not true Ho is rejected Type I error (innocent in jail) Correct decision Ho is not rejected Correct decision Type II error (guilty goes free)
The form of the null and alternative hypothesis governs whether the testing is one-tailed or two-tailed. One-tailed testing is used when the null hypothesis is less than or equal to a value, or greater than or equal to a value. Two-tailed testing is used when the null hypothesis is equal to a value, since we are interested in both values greater than or less than a value. Figure 1 describes the forms.

The evaluation of the hypotheses depends on the probabilities of accepting an error we have made. There are two types of errors. Type I error is when we have rejected the null hypothesis when, in fact, it is true. This is more easily understood as the court situation of falsely convicting an innocent person. Type II error is when we have failed to reject the null hypothesis when, in fact, it is false. This is more easily understood as the court situation of letting a guilty person go free. Figure 2 summarizes the outcomes.

 Figure 3: Common Classical Statistical Tests – Assumptions and Use Statistical Test Assumptions and Use Z Normal distribution, known population standard deviation, means comparison to target or population t Normal distribution, sample standard deviation, 2 sample means comparison Chi-Squared (variance) Normal distribution, variance comparison to specified variance Chi-Squared (independence) Normal distribution, 2 variable test of association Chi-Squared (goodness of fit) Test of adequacy of distribution type F (variance) Normal distribution, 2 sample variance test F (ANOVA) Normal distribution, significance of model variables

The assumptions for the type of statistical test to be done need to be outlined. These differ by test, but examples are:
• independent sample selection
• the expected type of distribution
• whether the mean or standard deviation is known

In situations where conclusions about the type of distribution cannot be made, non-parametric tests can be used.

The decision rules define the strength and type of evidence used to reject or fail to reject the null hypothesis. Probability (p) values often are used. This is the probability of observing a statistic as large as a specific value assuming that the null hypothesis is true. Probability values less than this level cause us to reject the null hypothesis. Probability values greater than this level cause us to fail to reject the null hypothesis.

 Figure 5: Bayesian Hypothesis Testing State A is true State B is true Accept State A 0 cost (A|B) Accept State B cost (B|A) 0

The probability of committing a Type I error which we are willing to accept is called alpha (?). A typical value used is 0.05. The probability of committing a Type II error, which we are willing to accept is called beta (?). A typical value used is 0.1. Often, the probability of not committing a Type II error is referred to as power (1 – ?). The sample size, the size of the difference needed to be detected, and the standard deviation needs to be considered. It is the size of the difference needed to be detected that causes the most difficulty. A helpful question is: how large a difference is meaningful?

Figure 6: Bayesian Decision Rule

O’’ R > 1, accept state A
O’’ R < 1, accept state B

The tests commonly used are summarized in Figure 3.

The statistic for the observations is then calculated. Software packages often include the probability value associated with the statistic to allow for the final decision to reject the null hypothesis or fail to reject the null hypothesis.

Several criticisms of the classical approach are that the null hypothesis under consideration may not be plausible, future evaluation may not be done once the hypothesis is rejected or fails to be rejected, and that prior information (probability) is not considered.

The Bayesian approach starts with stating the hypotheses (or states) to be compared and the prior probabilities of each. Two states are summarized in Figure 4.

The cost of accepting state A when state B is true (cost (A|B)) is compared to the cost of accepting state B when state A is true (cost (B|A)). The comparisons are summarized in Figure 5.

The loss ratio (R) is cost (B|A) / cost (A|B). The posterior odds ratio (O’’) is posterior probability of state A (P’’(A))/ posterior probability of state B (P’’(B)). The product of O’’R can be used as a decision rule summarized in Figure 6.

Classical hypothesis testing can be used in situations where plausible inferences about a population can be made. However, Bayesian hypothesis testing should be considered where there are several competing states of different probabilities and costs.

Mark Anawis is a Principal Scientist and ASQ Six Sigma Black Belt at Abbott. He may be reached at editor@ScientificComputing.com.