Teach yourself statistics
Hypothesis Test for a Mean
This lesson explains how to conduct a hypothesis test of a mean, when the following conditions are met:
 The sampling method is simple random sampling .
 The sampling distribution is normal or nearly normal.
Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.
 The population distribution is normal.
 The population distribution is symmetric , unimodal , without outliers , and the sample size is 15 or less.
 The population distribution is moderately skewed , unimodal, without outliers, and the sample size is between 16 and 40.
 The sample size is greater than 40, without outliers.
This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.
State the Hypotheses
Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.
The table below shows three sets of hypotheses. Each makes a statement about how the population mean μ is related to a specified value M . (In the table, the symbol ≠ means " not equal to ".)
Set  Null hypothesis  Alternative hypothesis  Number of tails 

1  μ = M  μ ≠ M  2 
2  μ M  μ < M  1 
3  μ M  μ > M  1 
The first set of hypotheses (Set 1) is an example of a twotailed test , since an extreme value on either side of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are onetailed tests , since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.
Formulate an Analysis Plan
The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.
 Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
 Test method. Use the onesample ttest to determine whether the hypothesized mean differs significantly from the observed sample mean.
Analyze Sample Data
Using sample data, conduct a onesample ttest. This involves finding the standard error, degrees of freedom, test statistic, and the Pvalue associated with the test statistic.
SE = s * sqrt{ ( 1/n ) * [ ( N  n ) / ( N  1 ) ] }
SE = s / sqrt( n )
 Degrees of freedom. The degrees of freedom (DF) is equal to the sample size (n) minus one. Thus, DF = n  1.
t = ( x  μ) / SE
 Pvalue. The Pvalue is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the t statistic, given the degrees of freedom computed above. (See sample problems at the end of this lesson for examples of how this is done.)
Sample Size Calculator
As you probably noticed, the process of hypothesis testing can be complex. When you need to test a hypothesis about a mean score, consider using the Sample Size Calculator. The calculator is fairly easy to use, and it is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat Tools tab. Or you can tap the button below.
Interpret Results
If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the Pvalue to the significance level , and rejecting the null hypothesis when the Pvalue is less than the significance level.
Test Your Understanding
In this section, two sample problems illustrate how to conduct a hypothesis test of a mean score. The first problem involves a twotailed test; the second problem, a onetailed test.
Problem 1: TwoTailed Test
An inventor has developed a new, energyefficient lawn mower engine. He claims that the engine will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. From his stock of 2000 engines, the inventor selects a simple random sample of 50 engines for testing. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance. (Assume that run times for the population of engines are normally distributed.)
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:
Null hypothesis: μ = 300
Alternative hypothesis: μ ≠ 300
 Formulate an analysis plan . For this analysis, the significance level is 0.05. The test method is a onesample ttest .
SE = s / sqrt(n) = 20 / sqrt(50) = 20/7.07 = 2.83
DF = n  1 = 50  1 = 49
t = ( x  μ) / SE = (295  300)/2.83 = 1.77
where s is the standard deviation of the sample, x is the sample mean, μ is the hypothesized population mean, and n is the sample size.
Since we have a twotailed test , the Pvalue is the probability that the t statistic having 49 degrees of freedom is less than 1.77 or greater than 1.77. We use the t Distribution Calculator to find P(t < 1.77) is about 0.04.
 If you enter 1.77 as the sample mean in the t Distribution Calculator, you will find the that the P(t < 1.77) is about 0.04. Therefore, P(t > 1.77) is 1 minus 0.96 or 0.04. Thus, the Pvalue = 0.04 + 0.04 = 0.08.
 Interpret results . Since the Pvalue (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.
Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the population was normally distributed, and the sample size was small relative to the population size (less than 5%).
Problem 2: OneTailed Test
Bon Air Elementary School has 1000 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students. Among the sampled students, the average IQ is 108 with a standard deviation of 10. Based on these results, should the principal accept or reject her original hypothesis? Assume a significance level of 0.01. (Assume that test scores in the population of engines are normally distributed.)
Null hypothesis: μ >= 110
Alternative hypothesis: μ < 110
 Formulate an analysis plan . For this analysis, the significance level is 0.01. The test method is a onesample ttest .
SE = s / sqrt(n) = 10 / sqrt(20) = 10/4.472 = 2.236
DF = n  1 = 20  1 = 19
t = ( x  μ) / SE = (108  110)/2.236 = 0.894
Here is the logic of the analysis: Given the alternative hypothesis (μ < 110), we want to know whether the observed sample mean is small enough to cause us to reject the null hypothesis.
The observed sample mean produced a t statistic test statistic of 0.894. We use the t Distribution Calculator to find P(t < 0.894) is about 0.19.
 This means we would expect to find a sample mean of 108 or smaller in 19 percent of our samples, if the true population IQ were 110. Thus the Pvalue in this analysis is 0.19.
 Interpret results . Since the Pvalue (0.19) is greater than the significance level (0.01), we cannot reject the null hypothesis.
 FOR INSTRUCTOR
 FOR INSTRUCTORS
8.4.3 Hypothesis Testing for the Mean
$\quad$ $H_0$: $\mu=\mu_0$, $\quad$ $H_1$: $\mu \neq \mu_0$.
$\quad$ $H_0$: $\mu \leq \mu_0$, $\quad$ $H_1$: $\mu > \mu_0$.
$\quad$ $H_0$: $\mu \geq \mu_0$, $\quad$ $H_1$: $\mu \lt \mu_0$.
Twosided Tests for the Mean:
Therefore, we can suggest the following test. Choose a threshold, and call it $c$. If $W \leq c$, accept $H_0$, and if $W>c$, accept $H_1$. How do we choose $c$? If $\alpha$ is the required significance level, we must have
 As discussed above, we let \begin{align}%\label{} W(X_1,X_2, \cdots,X_n)=\frac{\overline{X}\mu_0}{\sigma / \sqrt{n}}. \end{align} Note that, assuming $H_0$, $W \sim N(0,1)$. We will choose a threshold, $c$. If $W \leq c$, we accept $H_0$, and if $W>c$, accept $H_1$. To choose $c$, we let \begin{align} P(W > c \;  \; H_0) =\alpha. \end{align} Since the standard normal PDF is symmetric around $0$, we have \begin{align} P(W > c \;  \; H_0) = 2 P(W>c  \; H_0). \end{align} Thus, we conclude $P(W>c  \; H_0)=\frac{\alpha}{2}$. Therefore, \begin{align} c=z_{\frac{\alpha}{2}}. \end{align} Therefore, we accept $H_0$ if \begin{align} \left\frac{\overline{X}\mu_0}{\sigma / \sqrt{n}} \right \leq z_{\frac{\alpha}{2}}, \end{align} and reject it otherwise.
 We have \begin{align} \beta (\mu) &=P(\textrm{type II error}) = P(\textrm{accept }H_0 \;  \; \mu) \\ &= P\left(\left\frac{\overline{X}\mu_0}{\sigma / \sqrt{n}} \right \lt z_{\frac{\alpha}{2}}\;  \; \mu \right). \end{align} If $X_i \sim N(\mu,\sigma^2)$, then $\overline{X} \sim N(\mu, \frac{\sigma^2}{n})$. Thus, \begin{align} \beta (\mu)&=P\left(\left\frac{\overline{X}\mu_0}{\sigma / \sqrt{n}} \right \lt z_{\frac{\alpha}{2}}\;  \; \mu \right)\\ &=P\left(\mu_0 z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} \leq \overline{X} \leq \mu_0+ z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\right)\\ &=\Phi\left(z_{\frac{\alpha}{2}}+\frac{\mu_0\mu}{\sigma / \sqrt{n}}\right)\Phi\left(z_{\frac{\alpha}{2}}+\frac{\mu_0\mu}{\sigma / \sqrt{n}}\right). \end{align}
 Let $S^2$ be the sample variance for this random sample. Then, the random variable $W$ defined as \begin{equation} W(X_1,X_2, \cdots, X_n)=\frac{\overline{X}\mu_0}{S / \sqrt{n}} \end{equation} has a $t$distribution with $n1$ degrees of freedom, i.e., $W \sim T(n1)$. Thus, we can repeat the analysis of Example 8.24 here. The only difference is that we need to replace $\sigma$ by $S$ and $z_{\frac{\alpha}{2}}$ by $t_{\frac{\alpha}{2},n1}$. Therefore, we accept $H_0$ if \begin{align} W \leq t_{\frac{\alpha}{2},n1}, \end{align} and reject it otherwise. Let us look at a numerical example of this case.
$\quad$ $H_0$: $\mu=170$, $\quad$ $H_1$: $\mu \neq 170$.
 Let's first compute the sample mean and the sample standard deviation. The sample mean is \begin{align}%\label{} \overline{X}&=\frac{X_1+X_2+X_3+X_4+X_5+X_6+X_7+X_8+X_9}{9}\\ &=165.8 \end{align} The sample variance is given by \begin{align}%\label{} {S}^2=\frac{1}{91} \sum_{k=1}^9 (X_k\overline{X})^2&=68.01 \end{align} The sample standard deviation is given by \begin{align}%\label{} S&= \sqrt{S^2}=8.25 \end{align} The following MATLAB code can be used to obtain these values: x=[176.2,157.9,160.1,180.9,165.1,167.2,162.9,155.7,166.2]; m=mean(x); v=var(x); s=std(x); Now, our test statistic is \begin{align} W(X_1,X_2, \cdots, X_9)&=\frac{\overline{X}\mu_0}{S / \sqrt{n}}\\ &=\frac{165.8170}{8.25 / 3}=1.52 \end{align} Thus, $W=1.52$. Also, we have \begin{align} t_{\frac{\alpha}{2},n1} = t_{0.025,8} \approx 2.31 \end{align} The above value can be obtained in MATLAB using the command $\mathtt{tinv(0.975,8)}$. Thus, we conclude \begin{align} W \leq t_{\frac{\alpha}{2},n1}. \end{align} Therefore, we accept $H_0$. In other words, we do not have enough evidence to conclude that the average height in the city is different from the average height in the country.
Let us summarize what we have obtained for the twosided test for the mean.
Case  Test Statistic  Acceptance Region 

$X_i \sim N(\mu, \sigma^2)$, $\sigma$ known  $W=\frac{\overline{X}\mu_0}{\sigma / \sqrt{n}}$  $W \leq z_{\frac{\alpha}{2}}$ 
$n$ large, $X_i$ nonnormal  $W=\frac{\overline{X}\mu_0}{S / \sqrt{n}}$  $W \leq z_{\frac{\alpha}{2}}$ 
$X_i \sim N(\mu, \sigma^2)$, $\sigma$ unknown  $W=\frac{\overline{X}\mu_0}{S / \sqrt{n}}$  $W \leq t_{\frac{\alpha}{2},n1}$ 
Onesided Tests for the Mean:
 As before, we define the test statistic as \begin{align}%\label{} W(X_1,X_2, \cdots,X_n)=\frac{\overline{X}\mu_0}{\sigma / \sqrt{n}}. \end{align} If $H_0$ is true (i.e., $\mu \leq \mu_0$), we expect $\overline{X}$ (and thus $W$) to be relatively small, while if $H_1$ is true, we expect $\overline{X}$ (and thus $W$) to be larger. This suggests the following test: Choose a threshold, and call it $c$. If $W \leq c$, accept $H_0$, and if $W>c$, accept $H_1$. How do we choose $c$? If $\alpha$ is the required significance level, we must have \begin{align} P(\textrm{type I error}) &= P(\textrm{Reject }H_0 \;  \; H_0) \\ &= P(W > c \;  \; \mu \leq \mu_0) \leq \alpha. \end{align} Here, the probability of type I error depends on $\mu$. More specifically, for any $\mu \leq \mu_0$, we can write \begin{align} P(\textrm{type I error} \;  \; \mu) &= P(\textrm{Reject }H_0 \;  \; \mu) \\ &= P(W > c \;  \; \mu)\\ &=P \left(\frac{\overline{X}\mu_0}{\sigma / \sqrt{n}}> c \;  \; \mu\right)\\ &=P \left(\frac{\overline{X}\mu}{\sigma / \sqrt{n}}+\frac{\mu\mu_0}{\sigma / \sqrt{n}}> c \;  \; \mu\right)\\ &=P \left(\frac{\overline{X}\mu}{\sigma / \sqrt{n}}> c+\frac{\mu_0\mu}{\sigma / \sqrt{n}} \;  \; \mu\right)\\ &\leq P \left(\frac{\overline{X}\mu}{\sigma / \sqrt{n}}> c \;  \; \mu\right) \quad (\textrm{ since }\mu \leq \mu_0)\\ &=1\Phi(c) \quad \big(\textrm{ since given }\mu, \frac{\overline{X}\mu}{\sigma / \sqrt{n}} \sim N(0,1) \big). \end{align} Thus, we can choose $\alpha=1\Phi(c)$, which results in \begin{align} c=z_{\alpha}. \end{align} Therefore, we accept $H_0$ if \begin{align} \frac{\overline{X}\mu_0}{\sigma / \sqrt{n}} \leq z_{\alpha}, \end{align} and reject it otherwise.
Case  Test Statistic  Acceptance Region 

$X_i \sim N(\mu, \sigma^2)$, $\sigma$ known  $W=\frac{\overline{X}\mu_0}{\sigma / \sqrt{n}}$  $W \leq z_{\alpha}$ 
$n$ large, $X_i$ nonnormal  $W=\frac{\overline{X}\mu_0}{S / \sqrt{n}}$  $W \leq z_{\alpha}$ 
$X_i \sim N(\mu, \sigma^2)$, $\sigma$ unknown  $W=\frac{\overline{X}\mu_0}{S / \sqrt{n}}$  $W \leq t_{\alpha,n1}$ 
$\quad$ $H_0$: $\mu \geq \mu_0$, $\quad$ $H_1$: $\mu \lt \mu_0$,
Case  Test Statistic  Acceptance Region 

$X_i \sim N(\mu, \sigma^2)$, $\sigma$ known  $W=\frac{\overline{X}\mu_0}{\sigma / \sqrt{n}}$  $W \geq z_{\alpha}$ 
$n$ large, $X_i$ nonnormal  $W=\frac{\overline{X}\mu_0}{S / \sqrt{n}}$  $W \geq z_{\alpha}$ 
$X_i \sim N(\mu, \sigma^2)$, $\sigma$ unknown  $W=\frac{\overline{X}\mu_0}{S / \sqrt{n}}$  $W \geq t_{\alpha,n1}$ 
The print version of the book is available on . 

Hypothesis Testing for Means & Proportions
Lisa Sullivan, PhD
Professor of Biostatistics
Boston University School of Public Health
Introduction
This is the first of three modules that will addresses the second area of statistical inference, which is hypothesis testing, in which a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The process of hypothesis testing involves setting up two competing hypotheses, the null hypothesis and the alternate hypothesis. One selects a random sample (or multiple samples when there are more comparison groups), computes summary statistics and then assesses the likelihood that the sample data support the research or alternative hypothesis. Similar to estimation, the process of hypothesis testing is based on probability theory and the Central Limit Theorem.
This module will focus on hypothesis testing for means and proportions. The next two modules in this series will address analysis of variance and chisquared tests.
Learning Objectives
After completing this module, the student will be able to:
 Define null and research hypothesis, test statistic, level of significance and decision rule
 Distinguish between Type I and Type II errors and discuss the implications of each
 Explain the difference between one and two sided tests of hypothesis
 Estimate and interpret pvalues
 Explain the relationship between confidence interval estimates and pvalues in drawing inferences
 Differentiate hypothesis testing procedures based on type of outcome variable and number of sample
Introduction to Hypothesis Testing
Techniques for hypothesis testing .
The techniques for hypothesis testing depend on
 the type of outcome variable being analyzed (continuous, dichotomous, discrete)
 the number of comparison groups in the investigation
 whether the comparison groups are independent (i.e., physically separate such as men versus women) or dependent (i.e., matched or paired such as pre and postassessments on the same participants).
In estimation we focused explicitly on techniques for one and two samples and discussed estimation for a specific parameter (e.g., the mean or proportion of a population), for differences (e.g., difference in means, the risk difference) and ratios (e.g., the relative risk and odds ratio). Here we will focus on procedures for one and two samples when the outcome is either continuous (and we focus on means) or dichotomous (and we focus on proportions).
General Approach: A Simple Example
The Centers for Disease Control (CDC) reported on trends in weight, height and body mass index from the 1960's through 2002. 1 The general trend was that Americans were much heavier and slightly taller in 2002 as compared to 1960; both men and women gained approximately 24 pounds, on average, between 1960 and 2002. In 2002, the mean weight for men was reported at 191 pounds. Suppose that an investigator hypothesizes that weights are even higher in 2006 (i.e., that the trend continued over the subsequent 4 years). The research hypothesis is that the mean weight in men in 2006 is more than 191 pounds. The null hypothesis is that there is no change in weight, and therefore the mean weight is still 191 pounds in 2006.
Null Hypothesis  H : μ= 191 (no change) 
Research Hypothesis  H : μ> 191 (investigator's belief) 
In order to test the hypotheses, we select a random sample of American males in 2006 and measure their weights. Suppose we have resources available to recruit n=100 men into our sample. We weigh each participant and compute summary statistics on the sample data. Suppose in the sample we determine the following:
Do the sample data support the null or research hypothesis? The sample mean of 197.1 is numerically higher than 191. However, is this difference more than would be expected by chance? In hypothesis testing, we assume that the null hypothesis holds until proven otherwise. We therefore need to determine the likelihood of observing a sample mean of 197.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true or under the null hypothesis). We can compute this probability using the Central Limit Theorem. Specifically,
(Notice that we use the sample standard deviation in computing the Z score. This is generally an appropriate substitution as long as the sample size is large, n > 30. Thus, there is less than a 1% probability of observing a sample mean as large as 197.1 when the true population mean is 191. Do you think that the null hypothesis is likely true? Based on how unlikely it is to observe a sample mean of 197.1 under the null hypothesis (i.e., <1% probability), we might infer, from our data, that the null hypothesis is probably not true.
Suppose that the sample data had turned out differently. Suppose that we instead observed the following in 2006:
How likely it is to observe a sample mean of 192.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true)? We can again compute this probability using the Central Limit Theorem. Specifically,
There is a 33.4% probability of observing a sample mean as large as 192.1 when the true population mean is 191. Do you think that the null hypothesis is likely true?
Neither of the sample means that we obtained allows us to know with certainty whether the null hypothesis is true or not. However, our computations suggest that, if the null hypothesis were true, the probability of observing a sample mean >197.1 is less than 1%. In contrast, if the null hypothesis were true, the probability of observing a sample mean >192.1 is about 33%. We can't know whether the null hypothesis is true, but the sample that provided a mean value of 197.1 provides much stronger evidence in favor of rejecting the null hypothesis, than the sample that provided a mean value of 192.1. Note that this does not mean that a sample mean of 192.1 indicates that the null hypothesis is true; it just doesn't provide compelling evidence to reject it.
In essence, hypothesis testing is a procedure to compute a probability that reflects the strength of the evidence (based on a given sample) for rejecting the null hypothesis. In hypothesis testing, we determine a threshold or cutoff point (called the critical value) to decide when to believe the null hypothesis and when to believe the research hypothesis. It is important to note that it is possible to observe any sample mean when the true population mean is true (in this example equal to 191), but some sample means are very unlikely. Based on the two samples above it would seem reasonable to believe the research hypothesis when x̄ = 197.1, but to believe the null hypothesis when x̄ =192.1. What we need is a threshold value such that if x̄ is above that threshold then we believe that H 1 is true and if x̄ is below that threshold then we believe that H 0 is true. The difficulty in determining a threshold for x̄ is that it depends on the scale of measurement. In this example, the threshold, sometimes called the critical value, might be 195 (i.e., if the sample mean is 195 or more then we believe that H 1 is true and if the sample mean is less than 195 then we believe that H 0 is true). Suppose we are interested in assessing an increase in blood pressure over time, the critical value will be different because blood pressures are measured in millimeters of mercury (mmHg) as opposed to in pounds. In the following we will explain how the critical value is determined and how we handle the issue of scale.
First, to address the issue of scale in determining the critical value, we convert our sample data (in particular the sample mean) into a Z score. We know from the module on probability that the center of the Z distribution is zero and extreme values are those that exceed 2 or fall below 2. Z scores above 2 and below 2 represent approximately 5% of all Z values. If the observed sample mean is close to the mean specified in H 0 (here m =191), then Z will be close to zero. If the observed sample mean is much larger than the mean specified in H 0 , then Z will be large.
In hypothesis testing, we select a critical value from the Z distribution. This is done by first determining what is called the level of significance, denoted α ("alpha"). What we are doing here is drawing a line at extreme values. The level of significance is the probability that we reject the null hypothesis (in favor of the alternative) when it is actually true and is also called the Type I error rate.
α = Level of significance = P(Type I error) = P(Reject H 0  H 0 is true).
Because α is a probability, it ranges between 0 and 1. The most commonly used value in the medical literature for α is 0.05, or 5%. Thus, if an investigator selects α=0.05, then they are allowing a 5% probability of incorrectly rejecting the null hypothesis in favor of the alternative when the null is in fact true. Depending on the circumstances, one might choose to use a level of significance of 1% or 10%. For example, if an investigator wanted to reject the null only if there were even stronger evidence than that ensured with α=0.05, they could choose a =0.01as their level of significance. The typical values for α are 0.01, 0.05 and 0.10, with α=0.05 the most commonly used value.
Suppose in our weight study we select α=0.05. We need to determine the value of Z that holds 5% of the values above it (see below).
The critical value of Z for α =0.05 is Z = 1.645 (i.e., 5% of the distribution is above Z=1.645). With this value we can set up what is called our decision rule for the test. The rule is to reject H 0 if the Z score is 1.645 or more.
With the first sample we have
Because 2.38 > 1.645, we reject the null hypothesis. (The same conclusion can be drawn by comparing the 0.0087 probability of observing a sample mean as extreme as 197.1 to the level of significance of 0.05. If the observed probability is smaller than the level of significance we reject H 0 ). Because the Z score exceeds the critical value, we conclude that the mean weight for men in 2006 is more than 191 pounds, the value reported in 2002. If we observed the second sample (i.e., sample mean =192.1), we would not be able to reject the null hypothesis because the Z score is 0.43 which is not in the rejection region (i.e., the region in the tail end of the curve above 1.645). With the second sample we do not have sufficient evidence (because we set our level of significance at 5%) to conclude that weights have increased. Again, the same conclusion can be reached by comparing probabilities. The probability of observing a sample mean as extreme as 192.1 is 33.4% which is not below our 5% level of significance.
Hypothesis Testing: Upper, Lower, and Two Tailed Tests
The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.
 Step 1. Set up hypotheses and select the level of significance α.
H 0 : Null hypothesis (no change, no difference);
H 1 : Research hypothesis (investigator's belief); α =0.05
Uppertailed, Lowertailed, Twotailed Tests The research or alternative hypothesis can take one of three forms. An investigator might believe that the parameter has increased, decreased or changed. For example, an investigator might hypothesize: : μ > μ , where μ is the comparator or null value (e.g., μ =191 in our example about weight in men in 2006) and an increase is hypothesized  this type of test is called an ; : μ < μ , where a decrease is hypothesized and this is called a ; or : μ ≠ μ where a difference is hypothesized and this is called a .The exact form of the research hypothesis depends on the investigator's belief about the parameter of interest and whether it has possibly increased, decreased or is different from the null value. The research hypothesis is set up by the investigator before any data are collected.

 Step 2. Select the appropriate test statistic.
The test statistic is a single number that summarizes the sample information. An example of a test statistic is the Z statistic computed as follows:
When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.
 Step 3. Set up decision rule.
The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.
 The decision rule depends on whether an uppertailed, lowertailed, or twotailed test is proposed. In an uppertailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lowertailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value. In a twotailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
 The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.
 The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value. For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.
The following figures illustrate the rejection regions defined by the decision rule for upper, lower and twotailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.
Rejection Region for UpperTailed Z Test (H : μ > μ ) with α=0.05 The decision rule is: Reject H if Z 1.645. 
Rejection Region for LowerTailed Z Test (H 1 : μ < μ 0 ) with α =0.05 The decision rule is: Reject H 0 if Z < 1.645.
Rejection Region for TwoTailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05 The decision rule is: Reject H 0 if Z < 1.960 or if Z > 1.960.
The complete table of critical values of Z for upper, lower and twotailed tests can be found in the table of Z values to the right in "Other Resources." Critical values of t for upper, lower and twotailed tests can be found in the table of t values in "Other Resources."
Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.
The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely). If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the pvalue and it will be less than the chosen level of significance if we reject H 0 . Statistical computing packages provide exact pvalues as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a pvalue. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .
H 0 : μ = 191 H 1 : μ > 191 α =0.05 The research hypothesis is that weights have increased, and therefore an upper tailed test is used.
Because the sample size is large (n > 30) the appropriate test statistic is
In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05. Reject H 0 if Z > 1.645. We now substitute the sample data into the formula for the test statistic identified in Step 2. We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the pvalue which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the pvalue is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the pvalue. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the pvalue. A statistical computing package would produce a more precise pvalue which would be in between 0.005 and 0.010. Here we are approximating the pvalue and would report p < 0.010. Type I and Type II ErrorsIn all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality). Table  Conclusions in Test of Hypothesis
In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ). When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0  H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true. The most common reason for a Type II error is a small sample size. Tests with One Sample, Continuous OutcomeHypothesis testing applications with a continuous outcome variable in a single population are performed according to the fivestep procedure outlined above. A key component is setting up the null and research hypotheses. The objective is to compare the mean in a single population to known mean (μ 0 ). The known value is generally derived from another study or report, for example a study in a similar, but not identical, population or a study performed some years ago. The latter is called a historical control. It is important in setting up the hypotheses in a one sample test that the mean specified in the null hypothesis is a fair and reasonable comparator. This will be discussed in the examples that follow. Test Statistics for Testing H 0 : μ= μ 0
Note that statistical computing packages will use the t statistic exclusively and make the necessary adjustments for comparing the test statistic to appropriate values from probability tables to produce a pvalue. The National Center for Health Statistics (NCHS) published a report in 2005 entitled Health, United States, containing extensive information on major trends in the health of Americans. Data are provided for the US population as a whole and for specific ages, sexes and races. The NCHS report indicated that in 2002 Americans paid an average of $3,302 per year on health care and prescription drugs. An investigator hypothesizes that in 2005 expenditures have decreased primarily due to the availability of generic drugs. To test the hypothesis, a sample of 100 Americans are selected and their expenditures on health care and prescription drugs in 2005 are measured. The sample data are summarized as follows: n=100, x̄ =$3,190 and s=$890. Is there statistical evidence of a reduction in expenditures on health care and prescription drugs in 2005? Is the sample mean of $3,190 evidence of a true reduction in the mean or is it within chance fluctuation? We will run the test using the fivestep approach.
H 0 : μ = 3,302 H 1 : μ < 3,302 α =0.05 The research hypothesis is that expenditures have decreased, and therefore a lowertailed test is used. This is a lower tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < 1.645.
We do not reject H 0 because 1.26 > 1.645. We do not have statistically significant evidence at α=0.05 to show that the mean expenditures on health care and prescription drugs are lower in 2005 than the mean of $3,302 reported in 2002. Recall that when we fail to reject H 0 in a test of hypothesis that either the null hypothesis is true (here the mean expenditures in 2005 are the same as those in 2002 and equal to $3,302) or we committed a Type II error (i.e., we failed to reject H 0 when in fact it is false). In summarizing this test, we conclude that we do not have sufficient evidence to reject H 0 . We do not conclude that H 0 is true, because there may be a moderate to high probability that we committed a Type II error. It is possible that the sample size is not large enough to detect a difference in mean expenditures. The NCHS reported that the mean total cholesterol level in 2002 for all adults was 203. Total cholesterol levels in participants who attended the seventh examination of the Offspring in the Framingham Heart Study are summarized as follows: n=3,310, x̄ =200.3, and s=36.8. Is there statistical evidence of a difference in mean cholesterol levels in the Framingham Offspring? Here we want to assess whether the sample mean of 200.3 in the Framingham sample is statistically significantly different from 203 (i.e., beyond what we would expect by chance). We will run the test using the fivestep approach. H 0 : μ= 203 H 1 : μ≠ 203 α=0.05 The research hypothesis is that cholesterol levels are different in the Framingham Offspring, and therefore a twotailed test is used.
This is a twotailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < 1.960 or is Z > 1.960. We reject H 0 because 4.22 ≤ 1. .960. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level in the Framingham Offspring is different from the national average of 203 reported in 2002. Because we reject H 0 , we also approximate a pvalue. Using the twosided significance levels, p < 0.0001. Statistical Significance versus Clinical (Practical) SignificanceThis example raises an important concept of statistical versus clinical or practical significance. From a statistical standpoint, the total cholesterol levels in the Framingham sample are highly statistically significantly different from the national average with p < 0.0001 (i.e., there is less than a 0.01% chance that we are incorrectly rejecting the null hypothesis). However, the sample mean in the Framingham Offspring study is 200.3, less than 3 units different from the national mean of 203. The reason that the data are so highly statistically significant is due to the very large sample size. It is always important to assess both statistical and clinical significance of data. This is particularly relevant when the sample size is large. Is a 3 unit difference in total cholesterol a meaningful difference? Consider again the NCHSreported mean total cholesterol level in 2002 for all adults of 203. Suppose a new drug is proposed to lower total cholesterol. A study is designed to evaluate the efficacy of the drug in lowering cholesterol. Fifteen patients are enrolled in the study and asked to take the new drug for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows: n=15, x̄ =195.9 and s=28.7. Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new drug for 6 weeks? We will run the test using the fivestep approach. H 0 : μ= 203 H 1 : μ< 203 α=0.05
Because the sample size is small (n<30) the appropriate test statistic is This is a lower tailed test, using a t statistic and a 5% level of significance. In order to determine the critical value of t, we need degrees of freedom, df, defined as df=n1. In this example df=151=14. The critical value for a lower tailed test with df=14 and a =0.05 is 2.145 and the decision rule is as follows: Reject H 0 if t < 2.145. We do not reject H 0 because 0.96 > 2.145. We do not have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower than the national mean in patients taking the new drug for 6 weeks. Again, because we failed to reject the null hypothesis we make a weaker concluding statement allowing for the possibility that we may have committed a Type II error (i.e., failed to reject H 0 when in fact the drug is efficacious). This example raises an important issue in terms of study design. In this example we assume in the null hypothesis that the mean cholesterol level is 203. This is taken to be the mean cholesterol level in patients without treatment. Is this an appropriate comparator? Alternative and potentially more efficient study designs to evaluate the effect of the new drug could involve two treatment groups, where one group receives the new drug and the other does not, or we could measure each patient's baseline or pretreatment cholesterol level and then assess changes from baseline to 6 weeks posttreatment. These designs are also discussed here. Video  Comparing a Sample Mean to Known Population Mean (8:20) Link to transcript of the video Tests with One Sample, Dichotomous OutcomeHypothesis testing applications with a dichotomous outcome variable in a single population are also performed according to the fivestep procedure. Similar to tests for means, a key component is setting up the null and research hypotheses. The objective is to compare the proportion of successes in a single population to a known proportion (p 0 ). That known proportion is generally derived from another study or report and is sometimes called a historical control. It is important in setting up the hypotheses in a one sample test that the proportion specified in the null hypothesis is a fair and reasonable comparator. In one sample tests for a dichotomous outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the sample proportion which is computed by taking the ratio of the number of successes to the sample size, We then determine the appropriate test statistic (Step 2) for the hypothesis test. The formula for the test statistic is given below. Test Statistic for Testing H 0 : p = p 0 if min(np 0 , n(1p 0 )) > 5 The formula above is appropriate for large samples, defined when the smaller of np 0 and n(1p 0 ) is at least 5. This is similar, but not identical, to the condition required for appropriate use of the confidence interval formula for a population proportion, i.e., Here we use the proportion specified in the null hypothesis as the true proportion of successes rather than the sample proportion. If we fail to satisfy the condition, then alternative procedures, called exact methods must be used to test the hypothesis about the population proportion. Example: The NCHS report indicated that in 2002 the prevalence of cigarette smoking among American adults was 21.1%. Data on prevalent smoking in n=3,536 participants who attended the seventh examination of the Offspring in the Framingham Heart Study indicated that 482/3,536 = 13.6% of the respondents were currently smoking at the time of the exam. Suppose we want to assess whether the prevalence of smoking is lower in the Framingham Offspring sample given the focus on cardiovascular health in that community. Is there evidence of a statistically lower prevalence of smoking in the Framingham Offspring study as compared to the prevalence among all Americans? H 0 : p = 0.211 H 1 : p < 0.211 α=0.05 We must first check that the sample size is adequate. Specifically, we need to check min(np 0 , n(1p 0 )) = min( 3,536(0.211), 3,536(10.211))=min(746, 2790)=746. The sample size is more than adequate so the following formula can be used: This is a lower tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < 1.645. We reject H 0 because 10.93 < 1.645. We have statistically significant evidence at α=0.05 to show that the prevalence of smoking in the Framingham Offspring is lower than the prevalence nationally (21.1%). Here, p < 0.0001. The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data? Calculate this on your own before checking the answer. Video  Hypothesis Test for One Sample and a Dichotomous Outcome (3:55) Tests with Two Independent Samples, Continuous OutcomeThere are many applications where it is of interest to compare two independent groups with respect to their mean scores on a continuous outcome. Here we compare means between groups, but rather than generating an estimate of the difference, we will test whether the observed difference (increase, decrease or difference) is statistically significant or not. Remember, that hypothesis testing gives an assessment of statistical significance, whereas estimation gives an estimate of effect and both are important. Here we discuss the comparison of means when the two comparison groups are independent or physically separate. The two groups might be determined by a particular attribute (e.g., sex, diagnosis of cardiovascular disease) or might be set up by the investigator (e.g., participants assigned to receive an experimental treatment or placebo). The first step in the analysis involves computing descriptive statistics on each of the two samples. Specifically, we compute the sample size, mean and standard deviation in each sample and we denote these summary statistics as follows: for sample 1: for sample 2: The designation of sample 1 and sample 2 is arbitrary. In a clinical trial setting the convention is to call the treatment group 1 and the control group 2. However, when comparing men and women, for example, either group can be 1 or 2. In the two independent samples application with a continuous outcome, the parameter of interest in the test of hypothesis is the difference in population means, μ 1 μ 2 . The null hypothesis is always that there is no difference between groups with respect to means, i.e., The null hypothesis can also be written as follows: H 0 : μ 1 = μ 2 . In the research hypothesis, an investigator can hypothesize that the first mean is larger than the second (H 1 : μ 1 > μ 2 ), that the first mean is smaller than the second (H 1 : μ 1 < μ 2 ), or that the means are different (H 1 : μ 1 ≠ μ 2 ). The three different alternatives represent upper, lower, and twotailed tests, respectively. The following test statistics are used to test these hypotheses. Test Statistics for Testing H 0 : μ 1 = μ 2
NOTE: The formulas above assume equal variability in the two populations (i.e., the population variances are equal, or s 1 2 = s 2 2 ). This means that the outcome is equally variable in each of the comparison populations. For analysis, we have samples from each of the comparison populations. If the sample variances are similar, then the assumption about variability in the populations is probably reasonable. As a guideline, if the ratio of the sample variances, s 1 2 /s 2 2 is between 0.5 and 2 (i.e., if one variance is no more than double the other), then the formulas above are appropriate. If the ratio of the sample variances is greater than 2 or less than 0.5 then alternative formulas must be used to account for the heterogeneity in variances. The test statistics include Sp, which is the pooled estimate of the common standard deviation (again assuming that the variances in the populations are similar) computed as the weighted average of the standard deviations in the samples as follows: Because we are assuming equal variances between groups, we pool the information on variability (sample variances) to generate an estimate of the variability in the population. Note: Because Sp is a weighted average of the standard deviations in the sample, Sp will always be in between s 1 and s 2 .) Data measured on n=3,539 participants who attended the seventh examination of the Offspring in the Framingham Heart Study are shown below.
Suppose we now wish to assess whether there is a statistically significant difference in mean systolic blood pressures between men and women using a 5% level of significance. H 0 : μ 1 = μ 2 H 1 : μ 1 ≠ μ 2 α=0.05 Because both samples are large ( > 30), we can use the Z test statistic as opposed to t. Note that statistical computing packages use t throughout. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The guideline suggests investigating the ratio of the sample variances, s 1 2 /s 2 2 . Suppose we call the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted when interpreting the results. The ratio of the sample variances is 17.5 2 /20.1 2 = 0.76, which falls between 0.5 and 2 suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is We now substitute the sample data into the formula for the test statistic identified in Step 2. Before substituting, we will first compute Sp, the pooled estimate of the common standard deviation. Notice that the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer in value to the standard deviation in the women (20.1) as there were slightly more women in the sample. Recall, Sp is a weight average of the standard deviations in the comparison groups, weighted by the respective sample sizes. Now the test statistic: We reject H 0 because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to show that there is a difference in mean systolic blood pressures between men and women. The pvalue is p < 0.010. Here again we find that there is a statistically significant difference in mean systolic blood pressures between men and women at p < 0.010. Notice that there is a very small difference in the sample means (128.2126.5 = 1.7 units), but this difference is beyond what would be expected by chance. Is this a clinically meaningful difference? The large sample size in this example is driving the statistical significance. A 95% confidence interval for the difference in mean systolic blood pressures is: 1.7 + 1.26 or (0.44, 2.96). The confidence interval provides an assessment of the magnitude of the difference between means whereas the test of hypothesis and pvalue provide an assessment of the statistical significance of the difference. Above we performed a study to evaluate a new drug designed to lower total cholesterol. The study involved one sample of patients, each patient took the new drug for 6 weeks and had their cholesterol measured. As a means of evaluating the efficacy of the new drug, the mean total cholesterol following 6 weeks of treatment was compared to the NCHSreported mean total cholesterol level in 2002 for all adults of 203. At the end of the example, we discussed the appropriateness of the fixed comparator as well as an alternative study design to evaluate the effect of the new drug involving two treatment groups, where one group receives the new drug and the other does not. Here, we revisit the example with a concurrent or parallel control group, which is very typical in randomized controlled trials or clinical trials (refer to the EP713 module on Clinical Trials). A new drug is proposed to lower total cholesterol. A randomized controlled trial is designed to evaluate the efficacy of the medication in lowering cholesterol. Thirty participants are enrolled in the trial and are randomly assigned to receive either the new drug or a placebo. The participants do not know which treatment they are assigned. Each participant is asked to take the assigned treatment for 6 weeks. At the end of 6 weeks, each patient's total cholesterol level is measured and the sample statistics are as follows.
Is there statistical evidence of a reduction in mean total cholesterol in patients taking the new drug for 6 weeks as compared to participants taking placebo? We will run the test using the fivestep approach. H 0 : μ 1 = μ 2 H 1 : μ 1 < μ 2 α=0.05 Because both samples are small (< 30), we use the t test statistic. Before implementing the formula, we first check whether the assumption of equality of population variances is reasonable. The ratio of the sample variances, s 1 2 /s 2 2 =28.7 2 /30.3 2 = 0.90, which falls between 0.5 and 2, suggesting that the assumption of equality of population variances is reasonable. The appropriate test statistic is: This is a lowertailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table (in More Resources to the right). In order to determine the critical value of t we need degrees of freedom, df, defined as df=n 1 +n 2 2 = 15+152=28. The critical value for a lower tailed test with df=28 and α=0.05 is 1.701 and the decision rule is: Reject H 0 if t < 1.701. Now the test statistic, We reject H 0 because 2.92 < 1.701. We have statistically significant evidence at α=0.05 to show that the mean total cholesterol level is lower in patients taking the new drug for 6 weeks as compared to patients taking placebo, p < 0.005. The clinical trial in this example finds a statistically significant reduction in total cholesterol, whereas in the previous example where we had a historical control (as opposed to a parallel control group) we did not demonstrate efficacy of the new drug. Notice that the mean total cholesterol level in patients taking placebo is 217.4 which is very different from the mean cholesterol reported among all Americans in 2002 of 203 and used as the comparator in the prior example. The historical control value may not have been the most appropriate comparator as cholesterol levels have been increasing over time. In the next section, we present another design that can be used to assess the efficacy of the new drug. Video  Comparison of Two Independent Samples With a Continuous Outcome (8:02) Tests with Matched Samples, Continuous OutcomeIn the previous section we compared two groups with respect to their mean scores on a continuous outcome. An alternative study design is to compare matched or paired samples. The two comparison groups are said to be dependent, and the data can arise from a single sample of participants where each participant is measured twice (possibly before and after an intervention) or from two samples that are matched on specific characteristics (e.g., siblings). When the samples are dependent, we focus on difference scores in each participant or between members of a pair and the test of hypothesis is based on the mean difference, μ d . The null hypothesis again reflects "no difference" and is stated as H 0 : μ d =0 . Note that there are some instances where it is of interest to test whether there is a difference of a particular magnitude (e.g., μ d =5) but in most instances the null hypothesis reflects no difference (i.e., μ d =0). The appropriate formula for the test of hypothesis depends on the sample size. The formulas are shown below and are identical to those we presented for estimating the mean of a single sample presented (e.g., when comparing against an external or historical control), except here we focus on difference scores. Test Statistics for Testing H 0 : μ d =0 A new drug is proposed to lower total cholesterol and a study is designed to evaluate the efficacy of the drug in lowering cholesterol. Fifteen patients agree to participate in the study and each is asked to take the new drug for 6 weeks. However, before starting the treatment, each patient's total cholesterol level is measured. The initial measurement is a pretreatment or baseline value. After taking the drug for 6 weeks, each patient's total cholesterol level is measured again and the data are shown below. The rightmost column contains difference scores for each patient, computed by subtracting the 6 week cholesterol level from the baseline level. The differences represent the reduction in total cholesterol over 4 weeks. (The differences could have been computed by subtracting the baseline total cholesterol level from the level measured at 6 weeks. The way in which the differences are computed does not affect the outcome of the analysis only the interpretation.)
Because the differences are computed by subtracting the cholesterols measured at 6 weeks from the baseline values, positive differences indicate reductions and negative differences indicate increases (e.g., participant 12 increases by 2 units over 6 weeks). The goal here is to test whether there is a statistically significant reduction in cholesterol. Because of the way in which we computed the differences, we want to look for an increase in the mean difference (i.e., a positive reduction). In order to conduct the test, we need to summarize the differences. In this sample, we have The calculations are shown below.
Is there statistical evidence of a reduction in mean total cholesterol in patients after using the new medication for 6 weeks? We will run the test using the fivestep approach. H 0 : μ d = 0 H 1 : μ d > 0 α=0.05 NOTE: If we had computed differences by subtracting the baseline level from the level measured at 6 weeks then negative differences would have reflected reductions and the research hypothesis would have been H 1 : μ d < 0.
This is an uppertailed test, using a t statistic and a 5% level of significance. The appropriate critical value can be found in the t Table at the right, with df=151=14. The critical value for an uppertailed test with df=14 and α=0.05 is 2.145 and the decision rule is Reject H 0 if t > 2.145. We now substitute the sample data into the formula for the test statistic identified in Step 2. We reject H 0 because 4.61 > 2.145. We have statistically significant evidence at α=0.05 to show that there is a reduction in cholesterol levels over 6 weeks. Here we illustrate the use of a matched design to test the efficacy of a new drug to lower total cholesterol. We also considered a parallel design (randomized clinical trial) and a study using a historical comparator. It is extremely important to design studies that are best suited to detect a meaningful difference when one exists. There are often several alternatives and investigators work with biostatisticians to determine the best design for each application. It is worth noting that the matched design used here can be problematic in that observed differences may only reflect a "placebo" effect. All participants took the assigned medication, but is the observed reduction attributable to the medication or a result of these participation in a study. Video  Hypothesis Testing With a Matched Sample and a Continuous Outcome (3:11) Tests with Two Independent Samples, Dichotomous OutcomeThere are several approaches that can be used to test hypotheses concerning two independent proportions. Here we present one approach  the chisquare test of independence is an alternative, equivalent, and perhaps more popular approach to the same analysis. Hypothesis testing with the chisquare test is addressed in the third module in this series: BS704_HypothesisTestingChiSquare. In tests of hypothesis comparing proportions between two independent groups, one test is performed and results can be interpreted to apply to a risk difference, relative risk or odds ratio. As a reminder, the risk difference is computed by taking the difference in proportions between comparison groups, the risk ratio is computed by taking the ratio of proportions, and the odds ratio is computed by taking the ratio of the odds of success in the comparison groups. Because the null values for the risk difference, the risk ratio and the odds ratio are different, the hypotheses in tests of hypothesis look slightly different depending on which measure is used. When performing tests of hypothesis for the risk difference, relative risk or odds ratio, the convention is to label the exposed or treated group 1 and the unexposed or control group 2. For example, suppose a study is designed to assess whether there is a significant difference in proportions in two independent comparison groups. The test of interest is as follows: H 0 : p 1 = p 2 versus H 1 : p 1 ≠ p 2 . The following are the hypothesis for testing for a difference in proportions using the risk difference, the risk ratio and the odds ratio. First, the hypotheses above are equivalent to the following:
Suppose a test is performed to test H 0 : RD = 0 versus H 1 : RD ≠ 0 and the test rejects H 0 at α=0.05. Based on this test we can conclude that there is significant evidence, α=0.05, of a difference in proportions, significant evidence that the risk difference is not zero, significant evidence that the risk ratio and odds ratio are not one. The risk difference is analogous to the difference in means when the outcome is continuous. Here the parameter of interest is the difference in proportions in the population, RD = p 1 p 2 and the null value for the risk difference is zero. In a test of hypothesis for the risk difference, the null hypothesis is always H 0 : RD = 0. This is equivalent to H 0 : RR = 1 and H 0 : OR = 1. In the research hypothesis, an investigator can hypothesize that the first proportion is larger than the second (H 1 : p 1 > p 2 , which is equivalent to H 1 : RD > 0, H 1 : RR > 1 and H 1 : OR > 1), that the first proportion is smaller than the second (H 1 : p 1 < p 2 , which is equivalent to H 1 : RD < 0, H 1 : RR < 1 and H 1 : OR < 1), or that the proportions are different (H 1 : p 1 ≠ p 2 , which is equivalent to H 1 : RD ≠ 0, H 1 : RR ≠ 1 and H 1 : OR ≠ 1). The three different alternatives represent upper, lower and twotailed tests, respectively. The formula for the test of hypothesis for the difference in proportions is given below. Test Statistics for Testing H 0 : p 1 = p
The formula above is appropriate for large samples, defined as at least 5 successes (np > 5) and at least 5 failures (n(1p > 5)) in each of the two samples. If there are fewer than 5 successes or failures in either comparison group, then alternative procedures, called exact methods must be used to estimate the difference in population proportions. The following table summarizes data from n=3,799 participants who attended the fifth examination of the Offspring in the Framingham Heart Study. The outcome of interest is prevalent CVD and we want to test whether the prevalence of CVD is significantly higher in smokers as compared to nonsmokers.
The prevalence of CVD (or proportion of participants with prevalent CVD) among nonsmokers is 298/3,055 = 0.0975 and the prevalence of CVD among current smokers is 81/744 = 0.1089. Here smoking status defines the comparison groups and we will call the current smokers group 1 (exposed) and the nonsmokers (unexposed) group 2. The test of hypothesis is conducted below using the five step approach. H 0 : p 1 = p 2 H 1 : p 1 ≠ p 2 α=0.05
We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group. In this example, we have more than enough successes (cases of prevalent CVD) and failures (persons free of CVD) in each comparison group. The sample size is more than adequate so the following formula can be used: Reject H 0 if Z < 1.960 or if Z > 1.960. We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes: We now substitute to compute the test statistic.
We do not reject H 0 because 1.960 < 0.927 < 1.960. We do not have statistically significant evidence at α=0.05 to show that there is a difference in prevalent CVD between smokers and nonsmokers. A 95% confidence interval for the difference in prevalent CVD (or risk difference) between smokers and nonsmokers as 0.0114 + 0.0247, or between 0.0133 and 0.0361. Because the 95% confidence interval for the risk difference includes zero we again conclude that there is no statistically significant difference in prevalent CVD between smokers and nonsmokers. Smoking has been shown over and over to be a risk factor for cardiovascular disease. What might explain the fact that we did not observe a statistically significant difference using data from the Framingham Heart Study? HINT: Here we consider prevalent CVD, would the results have been different if we considered incident CVD? A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 010 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.
We now test whether there is a statistically significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using the five step approach. H 0 : p 1 = p 2 H 1 : p 1 ≠ p 2 α=0.05 Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2. We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group, i.e., In this example, we have min(50(0.46), 50(10.46), 50(0.22), 50(10.22)) = min(23, 27, 11, 39) = 11. The sample size is adequate so the following formula can be used We reject H 0 because 2.526 > 1960. We have statistically significant evidence at a =0.05 to show that there is a difference in the proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever. A 95% confidence interval for the difference in proportions of patients on the new pain reliever reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) as compared to patients on the standard pain reliever is 0.24 + 0.18 or between 0.06 and 0.42. Because the 95% confidence interval does not include zero we concluded that there was a statistically significant difference in proportions which is consistent with the test of hypothesis result. Again, the procedures discussed here apply to applications where there are two independent comparison groups and a dichotomous outcome. There are other applications in which it is of interest to compare a dichotomous outcome in matched or paired samples. For example, in a clinical trial we might wish to test the effectiveness of a new antibiotic eye drop for the treatment of bacterial conjunctivitis. Participants use the new antibiotic eye drop in one eye and a comparator (placebo or active control treatment) in the other. The success of the treatment (yes/no) is recorded for each participant for each eye. Because the two assessments (success or failure) are paired, we cannot use the procedures discussed here. The appropriate test is called McNemar's test (sometimes called McNemar's test for dependent proportions). Vide0  Hypothesis Testing With Two Independent Samples and a Dichotomous Outcome (2:55) Here we presented hypothesis testing techniques for means and proportions in one and two sample situations. Tests of hypothesis involve several steps, including specifying the null and alternative or research hypothesis, selecting and computing an appropriate test statistic, setting up a decision rule and drawing a conclusion. There are many details to consider in hypothesis testing. The first is to determine the appropriate test. We discussed Z and t tests here for different applications. The appropriate test depends on the distribution of the outcome variable (continuous or dichotomous), the number of comparison groups (one, two) and whether the comparison groups are independent or dependent. The following table summarizes the different tests of hypothesis discussed here.
Once the type of test is determined, the details of the test must be specified. Specifically, the null and alternative hypotheses must be clearly stated. The null hypothesis always reflects the "no change" or "no difference" situation. The alternative or research hypothesis reflects the investigator's belief. The investigator might hypothesize that a parameter (e.g., a mean, proportion, difference in means or proportions) will increase, will decrease or will be different under specific conditions (sometimes the conditions are different experimental conditions and other times the conditions are simply different groups of participants). Once the hypotheses are specified, data are collected and summarized. The appropriate test is then conducted according to the five step approach. If the test leads to rejection of the null hypothesis, an approximate pvalue is computed to summarize the significance of the findings. When tests of hypothesis are conducted using statistical computing packages, exact pvalues are computed. Because the statistical tables in this textbook are limited, we can only approximate pvalues. If the test fails to reject the null hypothesis, then a weaker concluding statement is made for the following reason. In hypothesis testing, there are two types of errors that can be committed. A Type I error occurs when a test incorrectly rejects the null hypothesis. This is referred to as a false positive result, and the probability that this occurs is equal to the level of significance, α. The investigator chooses the level of significance in Step 1, and purposely chooses a small value such as α=0.05 to control the probability of committing a Type I error. A Type II error occurs when a test fails to reject the null hypothesis when in fact it is false. The probability that this occurs is equal to β. Unfortunately, the investigator cannot specify β at the outset because it depends on several factors including the sample size (smaller samples have higher b), the level of significance (β decreases as a increases), and the difference in the parameter under the null and alternative hypothesis. We noted in several examples in this chapter, the relationship between confidence intervals and tests of hypothesis. The approaches are different, yet related. It is possible to draw a conclusion about statistical significance by examining a confidence interval. For example, if a 95% confidence interval does not contain the null value (e.g., zero when analyzing a mean difference or risk difference, one when analyzing relative risks or odds ratios), then one can conclude that a twosided test of hypothesis would reject the null at α=0.05. It is important to note that the correspondence between a confidence interval and test of hypothesis relates to a twosided test and that the confidence level corresponds to a specific level of significance (e.g., 95% to α=0.05, 90% to α=0.10 and so on). The exact significance of the test, the pvalue, can only be determined using the hypothesis testing approach and the pvalue provides an assessment of the strength of the evidence and not an estimate of the effect. Answers to Selected ProblemsDental services problem  bottom of page 5.
α=0.05
First, determine whether the sample size is adequate. Therefore the sample size is adequate, and we can use the following formula:
Reject H0 if Z is less than or equal to 1.96 or if Z is greater than or equal to 1.96.
We reject the null hypothesis because 6.15<1.96. Therefore there is a statistically significant difference in the proportion of children in Boston using dental services compated to the national proportion. User PreferencesContent preview. Arcu felis bibendum ut tristique et egestas quis:
Keyboard Shortcuts5.3  hypothesis testing for onesample mean. In the previous section, we learned how to perform a hypothesis test for one proportion. The concepts of hypothesis testing remain constant for any hypothesis test. In these next few sections, we will present the hypothesis test for one mean. We start with our knowledge of the sampling distribution of the sample mean. Hypothesis Test for OneSample Mean SectionRecall that under certain conditions, the sampling distribution of the sample mean, \(\bar{x} \), is approximately normal with mean, \(\mu \), standard error \(\dfrac{\sigma}{\sqrt{n}} \), and estimated standard error \(\dfrac{s}{\sqrt{n}} \). \(H_0\colon \mu=\mu_0\) Conditions:
Test Statistic: If at least one of conditions are satisfied, then... \( t=\dfrac{\bar{x}\mu_0}{\frac{s}{\sqrt{n}}} \) will follow a tdistribution with \(n1 \) degrees of freedom. Notice when working with continuous data we are going to use a t statistic as opposed to the z statistic. This is due to the fact that the sample size impacts the sampling distribution and needs to be taken into account. We do this by recognizing “degrees of freedom”. We will not go into too much detail about degrees of freedom in this course. Let’s look at an example. Example 51 SectionThis depends on the standard deviation of \(\bar{x} \) . \begin{align} t^*&=\dfrac{\bar{x}\mu}{\frac{s}{\sqrt{n}}}\\&=\dfrac{8.38.5}{\frac{1.2}{\sqrt{61}}}\\&=1.3 \end{align} Thus, we are asking if \(1.3\) is very far away from zero, since that corresponds to the case when \(\bar{x}\) is equal to \(\mu_0 \). If it is far away, then it is unlikely that the null hypothesis is true and one rejects it. Otherwise, one cannot reject the null hypothesis. Hypothesis tests about the meanby Marco Taboga , PhD This lecture explains how to conduct hypothesis tests about the mean of a normal distribution. We tackle two different cases: when we know the variance of the distribution, then we use a zstatistic to conduct the test; when the variance is unknown, then we use the tstatistic. In each case we derive the power and the size of the test. We conclude with two solved exercises on size and power. Table of contents Known variance: the ztestThe null hypothesis, the test statistic, the critical region, the decision, the power function, the size of the test, how to choose the critical value, unknown variance: the ttest, how to choose the critical values, solved exercises. The assumptions are the same we made in the lecture on confidence intervals for the mean . A test of hypothesis based on it is called ztest . Otherwise, it is not rejected. We explain how to do this in the page on critical values . This case is similar to the previous one. The only difference is that we now relax the assumption that the variance of the distribution is known. The test of hypothesis based on it is called ttest . Otherwise, we do not reject it. The page on critical values explains how this equation is solved. Below you can find some exercises with explained solutions. Suppose that a statistician observes 100 independent realizations of a normal random variable. The mean and the variance of the random variable, which the statistician does not know, are equal to 1 and 4 respectively. Find the probability that the statistician will reject the null hypothesis that the mean is equal to zero if: she runs a ttest based on the 100 observed realizations; A statistician observes 100 independent realizations of a normal random variable. She performs a ttest of the null hypothesis that the mean of the variable is equal to zero. How to citePlease cite as: Taboga, Marco (2021). "Hypothesis tests about the mean", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentalsofstatistics/hypothesistestingmean. Most of the learning materials found on this website are now available in a traditional textbook format.
Statistics TutorialDescriptive statistics, inferential statistics, stat reference, statistics  hypothesis testing a mean. A population mean is an average of value a population. Hypothesis tests are used to check a claim about the size of that population mean. Hypothesis Testing a MeanThe following steps are used for a hypothesis test:
For example:
And we want to check the claim: "The average age of Nobel Prize winners when they received the prize is more than 55" By taking a sample of 30 randomly selected Nobel Prize winners we could find that: The mean age in the sample (\(\bar{x}\)) is 62.1 The standard deviation of age in the sample (\(s\)) is 13.46 From this sample data we check the claim with the steps below. 1. Checking the ConditionsThe conditions for calculating a confidence interval for a proportion are:
A moderately large sample size, like 30, is typically large enough. In the example, the sample size was 30 and it was randomly selected, so the conditions are fulfilled. Note: Checking if the data is normally distributed can be done with specialized statistical tests. 2. Defining the ClaimsWe need to define a null hypothesis (\(H_{0}\)) and an alternative hypothesis (\(H_{1}\)) based on the claim we are checking. The claim was: In this case, the parameter is the mean age of Nobel Prize winners when they received the prize (\(\mu\)). The null and alternative hypothesis are then: Null hypothesis : The average age was 55. Alternative hypothesis : The average age was more than 55. Which can be expressed with symbols as: \(H_{0}\): \(\mu = 55 \) \(H_{1}\): \(\mu > 55 \) This is a ' right tailed' test, because the alternative hypothesis claims that the proportion is more than in the null hypothesis. If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis. Advertisement 3. Deciding the Significance LevelThe significance level (\(\alpha\)) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test. The significance level is a percentage probability of accidentally making the wrong conclusion. Typical significance levels are:
A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis. There is no "correct" significance level  it only states the uncertainty of the conclusion. Note: A 5% significance level means that when we reject a null hypothesis: We expect to reject a true null hypothesis 5 out of 100 times. 4. Calculating the Test StatisticThe test statistic is used to decide the outcome of the hypothesis test. The test statistic is a standardized value calculated from the sample. The formula for the test statistic (TS) of a population mean is: \(\displaystyle \frac{\bar{x}  \mu}{s} \cdot \sqrt{n} \) \(\bar{x}\mu\) is the difference between the sample mean (\(\bar{x}\)) and the claimed population mean (\(\mu\)). \(s\) is the sample standard deviation . \(n\) is the sample size. In our example: The claimed (\(H_{0}\)) population mean (\(\mu\)) was \( 55 \) The sample mean (\(\bar{x}\)) was \(62.1\) The sample standard deviation (\(s\)) was \(13.46\) The sample size (\(n\)) was \(30\) So the test statistic (TS) is then: \(\displaystyle \frac{62.155}{13.46} \cdot \sqrt{30} = \frac{7.1}{13.46} \cdot \sqrt{30} \approx 0.528 \cdot 5.477 = \underline{2.889}\) You can also calculate the test statistic using programming language functions: With Python use the scipy and math libraries to calculate the test statistic. With R use builtin math and statistics functions to calculate the test statistic. 5. ConcludingThere are two main approaches for making the conclusion of a hypothesis test:
Note: The two approaches are only different in how they present the conclusion. The Critical Value ApproachFor the critical value approach we need to find the critical value (CV) of the significance level (\(\alpha\)). For a population mean test, the critical value (CV) is a Tvalue from a student's tdistribution . This critical Tvalue (CV) defines the rejection region for the test. The rejection region is an area of probability in the tails of the standard normal distribution. Because the claim is that the population mean is more than 55, the rejection region is in the right tail: The student's tdistribution is adjusted for the uncertainty from smaller samples. This adjustment is called degrees of freedom (df), which is the sample size \((n)  1\) In this case the degrees of freedom (df) is: \(30  1 = \underline{29} \) Choosing a significance level (\(\alpha\)) of 0.01, or 1%, we can find the critical Tvalue from a Ttable , or with a programming language function: With Python use the Scipy Stats library t.ppf() function find the TValue for an \(\alpha\) = 0.01 at 29 degrees of freedom (df). With R use the builtin qt() function to find the tvalue for an \(\alpha\) = 0.01 at 29 degrees of freedom (df). Using either method we can find that the critical TValue is \(\approx \underline{2.462}\) For a right tailed test we need to check if the test statistic (TS) is bigger than the critical value (CV). If the test statistic is bigger than the critical value, the test statistic is in the rejection region . When the test statistic is in the rejection region, we reject the null hypothesis (\(H_{0}\)). Here, the test statistic (TS) was \(\approx \underline{2.889}\) and the critical value was \(\approx \underline{2.462}\) Here is an illustration of this test in a graph: Since the test statistic was bigger than the critical value we reject the null hypothesis. This means that the sample data supports the alternative hypothesis. And we can summarize the conclusion stating: The sample data supports the claim that "The average age of Nobel Prize winners when they received the prize is more than 55" at a 1% significance level . The PValue ApproachFor the Pvalue approach we need to find the Pvalue of the test statistic (TS). If the Pvalue is smaller than the significance level (\(\alpha\)), we reject the null hypothesis (\(H_{0}\)). The test statistic was found to be \( \approx \underline{2.889} \) For a population proportion test, the test statistic is a TValue from a student's tdistribution . Because this is a right tailed test, we need to find the Pvalue of a tvalue bigger than 2.889. The student's tdistribution is adjusted according to degrees of freedom (df), which is the sample size \((30)  1 = \underline{29}\) We can find the Pvalue using a Ttable , or with a programming language function: With Python use the Scipy Stats library t.cdf() function find the Pvalue of a Tvalue bigger than 2.889 at 29 degrees of freedom (df): With R use the builtin pt() function find the Pvalue of a TValue bigger than 2.889 at 29 degrees of freedom (df): Using either method we can find that the Pvalue is \(\approx \underline{0.0036}\) This tells us that the significance level (\(\alpha\)) would need to be bigger than 0.0036, or 0.36%, to reject the null hypothesis. This Pvalue is smaller than any of the common significance levels (10%, 5%, 1%). So the null hypothesis is rejected at all of these significance levels. The sample data supports the claim that "The average age of Nobel Prize winners when they received the prize is more than 55" at a 10%, 5%, or 1% significance level . Note: An outcome of an hypothesis test that rejects the null hypothesis with a pvalue of 0.36% means: For this pvalue, we only expect to reject a true null hypothesis 36 out of 10000 times. Calculating a PValue for a Hypothesis Test with ProgrammingMany programming languages can calculate the Pvalue to decide outcome of a hypothesis test. Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult. The Pvalue calculated here will tell us the lowest possible significance level where the nullhypothesis can be rejected. With Python use the scipy and math libraries to calculate the Pvalue for a right tailed hypothesis test for a mean. Here, the sample size is 30, the sample mean is 62.1, the sample standard deviation is 13.46, and the test is for a mean bigger than 55. With R use builtin math and statistics functions find the Pvalue for a right tailed hypothesis test for a mean. LeftTailed and TwoTailed TestsThis was an example of a right tailed test, where the alternative hypothesis claimed that parameter is bigger than the null hypothesis claim. You can check out an equivalent stepbystep guide for other types here:
COLOR PICKERContact SalesIf you want to use W3Schools services as an educational institution, team or enterprise, send us an email: [email protected] Report ErrorIf you want to report an error, or if you want to make a suggestion, send us an email: [email protected] Top TutorialsTop references, top examples, get certified. Have a language expert improve your writingRun a free plagiarism check in 10 minutes, generate accurate citations for free.
Choosing the Right Statistical Test  Types & ExamplesPublished on January 28, 2020 by Rebecca Bevans . Revised on June 22, 2023. Statistical tests are used in hypothesis testing . They can be used to:
Statistical tests assume a null hypothesis of no relationship or no difference between groups. Then they determine whether the observed data fall outside of the range of values predicted by the null hypothesis. If you already know what types of variables you’re dealing with, you can use the flowchart to choose the right statistical test for your data. Statistical tests flowchart Table of contentsWhat does a statistical test do, when to perform a statistical test, choosing a parametric test: regression, comparison, or correlation, choosing a nonparametric test, flowchart: choosing a statistical test, other interesting articles, frequently asked questions about statistical tests. Statistical tests work by calculating a test statistic – a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship. It then calculates a p value (probability value). The p value estimates how likely it is that you would see the difference described by the test statistic if the null hypothesis of no relationship were true. If the value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then you can infer a statistically significant relationship between the predictor and outcome variables. If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then you can infer no statistically significant relationship between the predictor and outcome variables. Receive feedback on language, structure, and formattingProfessional editors proofread and edit your paper by focusing on:
See an example You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment , or through observations made using probability sampling methods . For a statistical test to be valid , your sample size needs to be large enough to approximate the true distribution of the population being studied. To determine which statistical test to use, you need to know:
Statistical assumptionsStatistical tests make some common assumptions about the data they are testing:
If your data do not meet the assumptions of normality or homogeneity of variance, you may be able to perform a nonparametric statistical test , which allows you to make comparisons without any assumptions about the data distribution. If your data do not meet the assumption of independence of observations, you may be able to use a test that accounts for structure in your data (repeatedmeasures tests or tests that include blocking variables). Types of variablesThe types of variables you have usually determine what type of statistical test you can use. Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types of quantitative variables include:
Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include:
Choose the test that fits the types of predictor and outcome variables you have collected (if you are doing an experiment , these are the independent and dependent variables ). Consult the tables below to see which test best matches your variables. Parametric tests usually have stricter requirements than nonparametric tests, and are able to make stronger inferences from the data. They can only be conducted with data that adheres to the common assumptions of statistical tests. The most common types of parametric test include regression tests, comparison tests, and correlation tests. Regression testsRegression tests look for causeandeffect relationships . They can be used to estimate the effect of one or more continuous variables on another variable.
Comparison testsComparison tests look for differences among group means . They can be used to test the effect of a categorical variable on the mean value of some other characteristic. Ttests are used when comparing the means of precisely two groups (e.g., the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults).
Correlation testsCorrelation tests check whether variables are related without hypothesizing a causeandeffect relationship. These can be used to test whether two variables you want to use in (for example) a multiple regression test are autocorrelated.
Nonparametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.
Here's why students love Scribbr's proofreading servicesDiscover proofreading & editing This flowchart helps you choose among parametric tests. For nonparametric alternatives, check the table above. If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.
Methodology
Research bias
Statistical tests commonly assume that:
If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences. A test statistic is a number calculated by a statistical test . It describes how far your observed data is from the null hypothesis of no relationship between variables or no difference among sample groups. The test statistic tells you how different two or more groups are from the overall population mean , or how different a linear slope is from the slope predicted by a null hypothesis . Different test statistics are used in different statistical tests. Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p value , or probability value. Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis . When the p value falls below the chosen alpha value, then we say the result of the test is statistically significant. Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age). Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips). You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results . Discrete and continuous variables are two types of quantitative variables :
Cite this Scribbr articleIf you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator. Bevans, R. (2023, June 22). Choosing the Right Statistical Test  Types & Examples. Scribbr. Retrieved July 5, 2024, from https://www.scribbr.com/statistics/statisticaltests/ Is this article helpful?Rebecca BevansOther students also liked, hypothesis testing  a stepbystep guide with easy examples, test statistics  definition, interpretation, and examples, normal distribution  examples, formulas, & uses, what is your plagiarism score. Hypothesis TestingHypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid. A null hypothesis and an alternative hypothesis are set up before performing the hypothesis testing. This helps to arrive at a conclusion regarding the sample obtained from the population. In this article, we will learn more about hypothesis testing, its types, steps to perform the testing, and associated examples.
What is Hypothesis Testing in Statistics?Hypothesis testing uses sample data from the population to draw useful conclusions regarding the population probability distribution . It tests an assumption made about the data using different types of hypothesis testing methodologies. The hypothesis testing results in either rejecting or not rejecting the null hypothesis. Hypothesis Testing DefinitionHypothesis testing can be defined as a statistical tool that is used to identify if the results of an experiment are meaningful or not. It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses will always be mutually exclusive. This means that if the null hypothesis is true then the alternative hypothesis is false and vice versa. An example of hypothesis testing is setting up a test to check if a new medicine works on a disease in a more efficient manner. Null HypothesisThe null hypothesis is a concise mathematical statement that is used to indicate that there is no difference between two possibilities. In other words, there is no difference between certain characteristics of data. This hypothesis assumes that the outcomes of an experiment are based on chance alone. It is denoted as \(H_{0}\). Hypothesis testing is used to conclude if the null hypothesis can be rejected or not. Suppose an experiment is conducted to check if girls are shorter than boys at the age of 5. The null hypothesis will say that they are the same height. Alternative HypothesisThe alternative hypothesis is an alternative to the null hypothesis. It is used to show that the observations of an experiment are due to some real effect. It indicates that there is a statistical significance between two possible outcomes and can be denoted as \(H_{1}\) or \(H_{a}\). For the abovementioned example, the alternative hypothesis would be that girls are shorter than boys at the age of 5. Hypothesis Testing P ValueIn hypothesis testing, the p value is used to indicate whether the results obtained after conducting a test are statistically significant or not. It also indicates the probability of making an error in rejecting or not rejecting the null hypothesis.This value is always a number between 0 and 1. The p value is compared to an alpha level, \(\alpha\) or significance level. The alpha level can be defined as the acceptable risk of incorrectly rejecting the null hypothesis. The alpha level is usually chosen between 1% to 5%. Hypothesis Testing Critical regionAll sets of values that lead to rejecting the null hypothesis lie in the critical region. Furthermore, the value that separates the critical region from the noncritical region is known as the critical value. Hypothesis Testing FormulaDepending upon the type of data available and the size, different types of hypothesis testing are used to determine whether the null hypothesis can be rejected or not. The hypothesis testing formula for some important test statistics are given below:
We will learn more about these test statistics in the upcoming section. Types of Hypothesis TestingSelecting the correct test for performing hypothesis testing can be confusing. These tests are used to determine a test statistic on the basis of which the null hypothesis can either be rejected or not rejected. Some of the important tests used for hypothesis testing are given below. Hypothesis Testing Z TestA z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known. It can also be used to compare the mean of two samples. It is used to compute the z test statistic. The formulas are given as follows:
Hypothesis Testing t TestThe t test is another method of hypothesis testing that is used for a small sample size (n < 30). It is also used to compare the sample mean and population mean. However, the population standard deviation is not known. Instead, the sample standard deviation is known. The mean of two samples can also be compared using the t test.
Hypothesis Testing Chi SquareThe Chi square test is a hypothesis testing method that is used to check whether the variables in a population are independent or not. It is used when the test statistic is chisquared distributed. One Tailed Hypothesis TestingOne tailed hypothesis testing is done when the rejection region is only in one direction. It can also be known as directional hypothesis testing because the effects can be tested in one direction only. This type of testing is further classified into the right tailed test and left tailed test. Right Tailed Hypothesis Testing The right tail test is also known as the upper tail test. This test is used to check whether the population parameter is greater than some value. The null and alternative hypotheses for this test are given as follows: \(H_{0}\): The population parameter is ≤ some value \(H_{1}\): The population parameter is > some value. If the test statistic has a greater value than the critical value then the null hypothesis is rejected Left Tailed Hypothesis Testing The left tail test is also known as the lower tail test. It is used to check whether the population parameter is less than some value. The hypotheses for this hypothesis testing can be written as follows: \(H_{0}\): The population parameter is ≥ some value \(H_{1}\): The population parameter is < some value. The null hypothesis is rejected if the test statistic has a value lesser than the critical value. Two Tailed Hypothesis TestingIn this hypothesis testing method, the critical region lies on both sides of the sampling distribution. It is also known as a non  directional hypothesis testing method. The twotailed test is used when it needs to be determined if the population parameter is assumed to be different than some value. The hypotheses can be set up as follows: \(H_{0}\): the population parameter = some value \(H_{1}\): the population parameter ≠ some value The null hypothesis is rejected if the test statistic has a value that is not equal to the critical value. Hypothesis Testing StepsHypothesis testing can be easily performed in five simple steps. The most important step is to correctly set up the hypotheses and identify the right method for hypothesis testing. The basic steps to perform hypothesis testing are as follows:
Hypothesis Testing ExampleThe best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs. Using hypothesis testing, check if there is enough evidence to support the researcher's claim. The confidence interval is given as 95%. Step 1: This is an example of a righttailed test. Set up the null hypothesis as \(H_{0}\): \(\mu\) = 100. Step 2: The alternative hypothesis is given by \(H_{1}\): \(\mu\) > 100. Step 3: As this is a onetailed test, \(\alpha\) = 100%  95% = 5%. This can be used to determine the critical value. 1  \(\alpha\) = 1  0.05 = 0.95 0.95 gives the required area under the curve. Now using a normal distribution table, the area 0.95 is at z = 1.645. A similar process can be followed for a ttest. The only additional requirement is to calculate the degrees of freedom given by n  1. Step 4: Calculate the z test statistic. This is because the sample size is 30. Furthermore, the sample and population means are known along with the standard deviation. z = \(\frac{\overline{x}\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\mu\) = 100, \(\overline{x}\) = 112.5, n = 30, \(\sigma\) = 15 z = \(\frac{112.5100}{\frac{15}{\sqrt{30}}}\) = 4.56 Step 5: Conclusion. As 4.56 > 1.645 thus, the null hypothesis can be rejected. Hypothesis Testing and Confidence IntervalsConfidence intervals form an important part of hypothesis testing. This is because the alpha level can be determined from a given confidence interval. Suppose a confidence interval is given as 95%. Subtract the confidence interval from 100%. This gives 100  95 = 5% or 0.05. This is the alpha value of a onetailed hypothesis testing. To obtain the alpha value for a twotailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025. Related Articles:
Important Notes on Hypothesis Testing
Examples on Hypothesis Testing
go to slide go to slide go to slide Book a Free Trial Class FAQs on Hypothesis TestingWhat is hypothesis testing. Hypothesis testing in statistics is a tool that is used to make inferences about the population data. It is also used to check if the results of an experiment are valid. What is the z Test in Hypothesis Testing?The z test in hypothesis testing is used to find the z test statistic for normally distributed data . The z test is used when the standard deviation of the population is known and the sample size is greater than or equal to 30. What is the t Test in Hypothesis Testing?The t test in hypothesis testing is used when the data follows a student t distribution . It is used when the sample size is less than 30 and standard deviation of the population is not known. What is the formula for z test in Hypothesis Testing?The formula for a one sample z test in hypothesis testing is z = \(\frac{\overline{x}\mu}{\frac{\sigma}{\sqrt{n}}}\) and for two samples is z = \(\frac{(\overline{x_{1}}\overline{x_{2}})(\mu_{1}\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\). What is the p Value in Hypothesis Testing?The p value helps to determine if the test results are statistically significant or not. In hypothesis testing, the null hypothesis can either be rejected or not rejected based on the comparison between the p value and the alpha level. What is One Tail Hypothesis Testing?When the rejection region is only on one side of the distribution curve then it is known as one tail hypothesis testing. The right tail test and the left tail test are two types of directional hypothesis testing. What is the Alpha Level in Two Tail Hypothesis Testing?To get the alpha level in a two tail hypothesis testing divide \(\alpha\) by 2. This is done as there are two rejection regions in the curve. 8.5 Hypothesis Tests for One Population Mean μRecall that there are two different procedures used to construct confidence intervals for one population mean [latex]\mu[/latex]: the onesample Z interval (used when the population standard deviation [latex]\sigma[/latex] is known) and the onesample tinterval (used when [latex]\sigma[/latex] is unknown). In a similar vein, there are two different procedures for hypothesis tests for one population mean: the onesample Z test is used when [latex]\sigma[/latex] is known and the onesample t test is used when [latex]\sigma[/latex] is unknown. 8.5.1 OneSample Z Test When σ is KnownAssumptions :
Example: OneSample Z Test OneSample Z Test A machine fills beer into bottles whose volume is supposed to be 341 ml, but the exact amount varies from bottle to bottle. We randomly picked 100 bottles and obtained the sample mean volume of 339 ml. Assume the population standard deviation [latex]\sigma = 5[/latex] ml. Test at the 5% significance level whether the machine is NOT working properly. Check the assumptions :
[latex]z_o = \frac{\bar{x}  \mu_0}{\sigma / \sqrt{n}} = \frac{339  341}{5 / \sqrt{100}} = \frac{2}{0.5} = 4.[/latex]
If using the critical value approach, steps 13 are the same, steps 46 become:
Pvalue approach is preferred for the following reasons:
8.5.2 OneSample tTest When σ is Unknown
Example: OneSample t Test A computer company claims that the average lifetime of its laptop is about 4 years. A simple random sample of 36 laptops yields an average lifetime of 3.5 years with a sample standard deviation of 4.2 years. Test at the 1% significance level whether the mean lifetime of this brand of laptops is less than 4 years.
If we use the critical value approach, steps 13 are the same, and steps 46 become:
Exercise: Pvalue for One sample tTest Use the same setting of the previous example (onesample ttest with df = 35) to find the Pvalues of the following hypothesis tests.
Exercise: Onesample tTest The number of cell phone users has increased dramatically since 1997. Suppose the mean local monthly bill was $50 for cell phone users in the United States in 2006. A simple random sample of 50 cell phone users was obtained in 2019, and the sample mean local monthly bill was [latex]\bar{x} = 55[/latex] with a sample standard deviation [latex]s = $25[/latex].
Introduction to Applied Statistics Copyright © 2024 by Wanhua Su is licensed under a Creative Commons AttributionNonCommercialShareAlike 4.0 International License , except where otherwise noted. If you're seeing this message, it means we're having trouble loading external resources on our website. If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked. To log in and use all the features of Khan Academy, please enable JavaScript in your browser. Statistics and probabilityCourse: statistics and probability > unit 13.
Tutorial PlaylistStatistics tutorial, everything you need to know about the probability density function in statistics, the best guide to understand central limit theorem, an indepth guide to measures of central tendency : mean, median and mode, the ultimate guide to understand conditional probability. A Comprehensive Look at Percentile in Statistics The Best Guide to Understand Bayes TheoremEverything you need to know about the normal distribution, an indepth explanation of cumulative distribution function, a complete guide to chisquare test, what is hypothesis testing in statistics types and examples, understanding the fundamentals of arithmetic and geometric progression, the definitive guide to understand spearman’s rank correlation, mean squared error: overview, examples, concepts and more, all you need to know about the empirical rule in statistics, the complete guide to skewness and kurtosis, a holistic look at bernoulli distribution. All You Need to Know About Bias in Statistics A Complete Guide to Get a Grasp of Time Series AnalysisThe Key Differences Between ZTest Vs. TTest The Complete Guide to Understand Pearson's CorrelationA complete guide on the types of statistical studies, everything you need to know about poisson distribution, your best guide to understand correlation vs. regression, the most comprehensive guide for beginners on what is correlation, what is hypothesis testing in statistics types and examples. Lesson 10 of 24 By Avijeet Biswal Table of ContentsIn today’s datadriven world , decisions are based on data all the time. Hypothesis plays a crucial role in that process, whether it may be making business decisions, in the health sector, academia, or in quality improvement. Without hypothesis & hypothesis tests, you risk drawing the wrong conclusions and making bad decisions. In this tutorial, you will look at Hypothesis Testing in Statistics. The Ultimate Ticket to Top Data Science Job RolesWhat Is Hypothesis Testing in Statistics?Hypothesis Testing is a type of statistical analysis in which you put your assumptions about a population parameter to the test. It is used to estimate the relationship between 2 statistical variables. Let's discuss few examples of statistical hypothesis from reallife 
Now that you know about hypothesis testing, look at the two types of hypothesis testing in statistics. Hypothesis Testing FormulaZ = ( x̅ – μ0 ) / (σ /√n)
How Hypothesis Testing Works?An analyst performs hypothesis testing on a statistical sample to present evidence of the plausibility of the null hypothesis. Measurements and analyses are conducted on a random sample of the population to test a theory. Analysts use a random population sample to test two hypotheses: the null and alternative hypotheses. The null hypothesis is typically an equality hypothesis between population parameters; for example, a null hypothesis may claim that the population means return equals zero. The alternate hypothesis is essentially the inverse of the null hypothesis (e.g., the population means the return is not equal to zero). As a result, they are mutually exclusive, and only one can be correct. One of the two possibilities, however, will always be correct. Your Dream Career is Just Around The Corner!Null Hypothesis and Alternate HypothesisThe Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study's outcome unless it is rejected. H0 is the symbol for it, and it is pronounced Hnaught. The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it. Let's understand this with an example. A sanitizer manufacturer claims that its product kills 95 percent of germs on average. To put this company's claim to the test, create a null and alternate hypothesis. H0 (Null Hypothesis): Average = 95%. Alternative Hypothesis (H1): The average is less than 95%. Another straightforward example to understand this concept is determining whether or not a coin is fair and balanced. The null hypothesis states that the probability of a show of heads is equal to the likelihood of a show of tails. In contrast, the alternate theory states that the probability of a show of heads and tails would be very different. Become a Data Scientist with Handson Training!Hypothesis Testing Calculation With ExamplesLet's consider a hypothesis test for the average height of women in the United States. Suppose our null hypothesis is that the average height is 5'4". We gather a sample of 100 women and determine that their average height is 5'5". The standard deviation of population is 2. To calculate the zscore, we would use the following formula: z = ( x̅ – μ0 ) / (σ /√n) z = (5'5"  5'4") / (2" / √100) z = 0.5 / (0.045) We will reject the null hypothesis as the zscore of 11.11 is very large and conclude that there is evidence to suggest that the average height of women in the US is greater than 5'4". Steps of Hypothesis TestingHypothesis testing is a statistical method to determine if there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. Here’s a breakdown of the typical steps involved in hypothesis testing: Formulate Hypotheses
Choose the Significance Level (α)The significance level, often denoted by alpha (α), is the probability of rejecting the null hypothesis when it is true. Common choices for α are 0.05 (5%), 0.01 (1%), and 0.10 (10%). Select the Appropriate TestChoose a statistical test based on the type of data and the hypothesis. Common tests include ttests, chisquare tests, ANOVA, and regression analysis . The selection depends on data type, distribution, sample size, and whether the hypothesis is onetailed or twotailed. Collect DataGather the data that will be analyzed in the test. This data should be representative of the population to infer conclusions accurately. Calculate the Test StatisticBased on the collected data and the chosen test, calculate a test statistic that reflects how much the observed data deviates from the null hypothesis. Determine the pvalueThe pvalue is the probability of observing test results at least as extreme as the results observed, assuming the null hypothesis is correct. It helps determine the strength of the evidence against the null hypothesis. Make a DecisionCompare the pvalue to the chosen significance level:
Report the ResultsPresent the findings from the hypothesis test, including the test statistic, pvalue, and the conclusion about the hypotheses. Perform Posthoc Analysis (if necessary)Depending on the results and the study design, further analysis may be needed to explore the data more deeply or to address multiple comparisons if several hypotheses were tested simultaneously. Types of Hypothesis TestingTo determine whether a discovery or relationship is statistically significant, hypothesis testing uses a ztest. It usually checks to see if two means are the same (the null hypothesis). Only when the population standard deviation is known and the sample size is 30 data points or more, can a ztest be applied. A statistical test called a ttest is employed to compare the means of two groups. To determine whether two groups differ or if a procedure or treatment affects the population of interest, it is frequently used in hypothesis testing. ChiSquareYou utilize a Chisquare test for hypothesis testing concerning whether your data is as predicted. To determine if the expected and observed results are wellfitted, the Chisquare test analyzes the differences between categorical variables from a random sample. The test's fundamental premise is that the observed values in your data should be compared to the predicted values that would be present if the null hypothesis were true. Hypothesis Testing and Confidence IntervalsBoth confidence intervals and hypothesis tests are inferential techniques that depend on approximating the sample distribution. Data from a sample is used to estimate a population parameter using confidence intervals. Data from a sample is used in hypothesis testing to examine a given hypothesis. We must have a postulated parameter to conduct hypothesis testing. Bootstrap distributions and randomization distributions are created using comparable simulation techniques. The observed sample statistic is the focal point of a bootstrap distribution, whereas the null hypothesis value is the focal point of a randomization distribution. A variety of feasible population parameter estimates are included in confidence ranges. In this lesson, we created just twotailed confidence intervals. There is a direct connection between these twotail confidence intervals and these twotail hypothesis tests. The results of a twotailed hypothesis test and twotailed confidence intervals typically provide the same results. In other words, a hypothesis test at the 0.05 level will virtually always fail to reject the null hypothesis if the 95% confidence interval contains the predicted value. A hypothesis test at the 0.05 level will nearly certainly reject the null hypothesis if the 95% confidence interval does not include the hypothesized parameter. Become a Data Scientist through handson learning with hackathons, masterclasses, webinars, and AskMeAnything! Start learning now! Simple and Composite Hypothesis TestingDepending on the population distribution, you can classify the statistical hypothesis into two types. Simple Hypothesis: A simple hypothesis specifies an exact value for the parameter. Composite Hypothesis: A composite hypothesis specifies a range of values. A company is claiming that their average sales for this quarter are 1000 units. This is an example of a simple hypothesis. Suppose the company claims that the sales are in the range of 900 to 1000 units. Then this is a case of a composite hypothesis. OneTailed and TwoTailed Hypothesis TestingThe OneTailed test, also called a directional test, considers a critical region of data that would result in the null hypothesis being rejected if the test sample falls into it, inevitably meaning the acceptance of the alternate hypothesis. In a onetailed test, the critical distribution area is onesided, meaning the test sample is either greater or lesser than a specific value. In two tails, the test sample is checked to be greater or less than a range of values in a TwoTailed test, implying that the critical distribution area is twosided. If the sample falls within this range, the alternate hypothesis will be accepted, and the null hypothesis will be rejected. Become a Data Scientist With RealWorld ExperienceRight Tailed Hypothesis TestingIf the larger than (>) sign appears in your hypothesis statement, you are using a righttailed test, also known as an upper test. Or, to put it another way, the disparity is to the right. For instance, you can contrast the battery life before and after a change in production. Your hypothesis statements can be the following if you want to know if the battery life is longer than the original (let's say 90 hours):
The crucial point in this situation is that the alternate hypothesis (H1), not the null hypothesis, decides whether you get a righttailed test. Left Tailed Hypothesis TestingAlternative hypotheses that assert the true value of a parameter is lower than the null hypothesis are tested with a lefttailed test; they are indicated by the asterisk "<". Suppose H0: mean = 50 and H1: mean not equal to 50 According to the H1, the mean can be greater than or less than 50. This is an example of a Twotailed test. In a similar manner, if H0: mean >=50, then H1: mean <50 Here the mean is less than 50. It is called a Onetailed test. Type 1 and Type 2 ErrorA hypothesis test can result in two types of errors. Type 1 Error: A TypeI error occurs when sample results reject the null hypothesis despite being true. Type 2 Error: A TypeII error occurs when the null hypothesis is not rejected when it is false, unlike a TypeI error. Suppose a teacher evaluates the examination paper to decide whether a student passes or fails. H0: Student has passed H1: Student has failed Type I error will be the teacher failing the student [rejects H0] although the student scored the passing marks [H0 was true]. Type II error will be the case where the teacher passes the student [do not reject H0] although the student did not score the passing marks [H1 is true]. Level of SignificanceThe alpha value is a criterion for determining whether a test statistic is statistically significant. In a statistical test, Alpha represents an acceptable probability of a Type I error. Because alpha is a probability, it can be anywhere between 0 and 1. In practice, the most commonly used alpha values are 0.01, 0.05, and 0.1, which represent a 1%, 5%, and 10% chance of a Type I error, respectively (i.e. rejecting the null hypothesis when it is in fact correct). A pvalue is a metric that expresses the likelihood that an observed difference could have occurred by chance. As the pvalue decreases the statistical significance of the observed difference increases. If the pvalue is too low, you reject the null hypothesis. Here you have taken an example in which you are trying to test whether the new advertising campaign has increased the product's sales. The pvalue is the likelihood that the null hypothesis, which states that there is no change in the sales due to the new advertising campaign, is true. If the pvalue is .30, then there is a 30% chance that there is no increase or decrease in the product's sales. If the pvalue is 0.03, then there is a 3% probability that there is no increase or decrease in the sales value due to the new advertising campaign. As you can see, the lower the pvalue, the chances of the alternate hypothesis being true increases, which means that the new advertising campaign causes an increase or decrease in sales. Our Data Scientist Master's Program covers core topics such as R, Python, Machine Learning, Tableau, Hadoop, and Spark. Get started on your journey today! Why Is Hypothesis Testing Important in Research Methodology?Hypothesis testing is crucial in research methodology for several reasons:
When Did Hypothesis Testing Begin?Hypothesis testing as a formalized process began in the early 20th century, primarily through the work of statisticians such as Ronald A. Fisher, Jerzy Neyman, and Egon Pearson. The development of hypothesis testing is closely tied to the evolution of statistical methods during this period.
The dialogue between Fisher's and NeymanPearson's approaches shaped the methods and philosophy of statistical hypothesis testing used today. Fisher emphasized the evidential interpretation of the pvalue. At the same time, Neyman and Pearson advocated for a decisiontheoretical approach in which hypotheses are either accepted or rejected based on predetermined significance levels and power considerations. The application and methodology of hypothesis testing have since become a cornerstone of statistical analysis across various scientific disciplines, marking a significant statistical development. Limitations of Hypothesis TestingHypothesis testing has some limitations that researchers should be aware of:
Learn All The Tricks Of The BI TradeAfter reading this tutorial, you would have a much better understanding of hypothesis testing, one of the most important concepts in the field of Data Science . The majority of hypotheses are based on speculation about observed behavior, natural phenomena, or established theories. If you are interested in statistics of data science and skills needed for such a career, you ought to explore the Post Graduate Program in Data Science. If you have any questions regarding this ‘Hypothesis Testing In Statistics’ tutorial, do share them in the comment section. Our subject matter expert will respond to your queries. Happy learning! 1. What is hypothesis testing in statistics with example?Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then collecting data to assess the evidence. An example: testing if a new drug improves patient recovery (Ha) compared to the standard treatment (H0) based on collected patient data. 2. What is H0 and H1 in statistics?In statistics, H0 and H1 represent the null and alternative hypotheses. The null hypothesis, H0, is the default assumption that no effect or difference exists between groups or conditions. The alternative hypothesis, H1, is the competing claim suggesting an effect or a difference. Statistical tests determine whether to reject the null hypothesis in favor of the alternative hypothesis based on the data. 3. What is a simple hypothesis with an example?A simple hypothesis is a specific statement predicting a single relationship between two variables. It posits a direct and uncomplicated outcome. For example, a simple hypothesis might state, "Increased sunlight exposure increases the growth rate of sunflowers." Here, the hypothesis suggests a direct relationship between the amount of sunlight (independent variable) and the growth rate of sunflowers (dependent variable), with no additional variables considered. 4. What are the 2 types of hypothesis testing?
The choice between onetailed and twotailed tests depends on the specific research question and the directionality of the expected effect. 5. What are the 3 major types of hypothesis?The three major types of hypotheses are:
Find our PL300 Microsoft Power BI Certification Training Online Classroom training classes in top cities:
About the AuthorAvijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football. Recommended ResourcesFree eBook: Top Programming Languages For A Data Scientist Normality Test in Minitab: Minitab with Statistics Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer
In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation. Snapsolve any problem by taking a picture. Try it in the Numerade app? Mean test for highdimensional data based on covariance matrix with linear structures
Cite this article
15 Accesses Explore all metrics In this work, the mean test is considered under the condition that the number of dimensions p is much larger than the sample size n when the covariance matrix is represented as a linear structure as possible. At first, the estimator of coefficients in the linear structures of the covariance matrix is constructed, and then an efficient covariance matrix estimator is naturally given. Next, a new test statistic similar to the classical Hotelling’s \(T^2\) test is proposed by replacing the sample covariance matrix with the given estimator of covariance matrix. Then the asymptotic normality of the estimator of coefficients and that of a new statistic for the mean test are separately obtained under some mild conditions. Simulation results show that the performance of the proposed test statistic is almost the same as the Hotelling’s \(T^2\) test statistic for which the covariance matrix is known. Our new test statistic can not only control reasonably the nominal level; it also gains greater empirical powers than competing tests. It is found that the power of mean test has great improvement when considering the structure information of the covariance matrix, especially for highdimensional cases. Moreover, an example with real data is provided to show the application of our approach. This is a preview of subscription content, log in via an institution to check access. Access this articleSubscribe and save.
Price includes VAT (Russian Federation) Instant access to the full article PDF. Rent this article via DeepDyve Institutional subscriptions Similar content being viewed by othersTesting highdimensional mean vector with applicationsLinear hypothesis testing in highdimensional oneway manova: a new normal reference approach, highdimensional tests for mean vector: approaches without estimating the mean vector directly. Anderson TW (2003) An introduction of multivariate statistical analysis, 3rd ed. Wiley series in probability and statistics Bai ZD, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329 MathSciNet Google Scholar Bickel P, Levina E (2008) Regularized estimation of large covariance matrices. Ann Stat 36:199–227 Article MathSciNet Google Scholar Cai TT, Liu W (2011) Adaptive thresholding for sparse covariance matrix estimation. J Am Stat Assoc 106:627–684 Chen LS, Paul D, Prentice RL, Wang P (2011) A regularized Hotelling’s \(T^2\) test for pathway analysis in proteomic studies. J Am Stat Assoc 106:1345–1360 Article Google Scholar Chen SX, Qin YL (2010) A twosample test for highdimensional data with applications to geneset testing. Ann Stat 38:808–835 Chen SX, Zhang LX, Zhong PS (2010) Tests for high dimensional covariance matrices. J Am Stat Assoc 109:810–819 Dong K, Pang H, Tong TJ et al (2016) Shrinkagebased diagonal Hotelling’s tests for highdimensional small sample size data. J Multivar Anal 143:127–142 Feng L, Zou CL, Wang ZJ (2016) Multivariatesignbased highdimensional tests for the twosample location problem. J Am Stat Assoc 111:721–735 Feng L, Zou CL, Wang ZJ, Zhu LX (2017) Composite \(T^2\) test for highdimensional data. Stat Sin 27:1419–1436 Google Scholar Guo WW, Cui HJ (2019) Projection tests for highdimensional spiked covariance matrices. J Multivar Anal 169:21–32 Hotelling T (1931) The generalization of student’s ratio. Ann Math Stat 2:360–378 Huang Y, Li CC, Li RZ, Yang SS (2022) An overview of tests on highdimensional means. J Multivar Anal 188:104813 Hu J, Bai ZD (2016) A review of 20 years of Naive tests of significance for highdimensional mean vectors and covariance matrices. Sci China Math 59:2281–2300 Liu W, Li YQ (2020) Signbased test for mean vector in highdimensional and sparse settings. Acta Math Sin 36:96–111 Lkhagvadorj S et al (2009) Microaaray gene expression profiles of fasting induced changes in liver and adipose tissues of pigs expressing the melanocortin4 receptor D298N variant. Physiol Genom 38:98–111 Muirhead RJ (1982) Aspects of multivariate statistical theory. Wiley, New York Book Google Scholar Park J, Nag Ayyala D (2013) A test for the mean vector in large dimension and small samples. J Stat Plan Inference 143:929–943 Pan GM, Zhou W (2011) Central limit theorem for Hotelling’s \(T^2\) statistic under large dimension. Ann Appl Probab 21:1860–1910 Srivastava MS, Du M (2008) A test for the mean vector with fewer observations than the dimension. J Multivar Anal 99:386–402 Srivastava MS, Katayama S, Kano Y (2013) A two sample test in high dimensional data. J Multivar Anal 119:349–358 Wang L, Peng B, Li RZ (2015) A highdimensional nonparametric multivariate test for mean vector. J Am Stat Assoc 110:1658–1669 Wang G, Cui H (2023) Cross projection test for highdimensional mean vectors. Satistica Sinica (online) Zheng SR, Chen Z, Cui HJ, Li RZ (2019) Hypothesis testing on linear structures of high dimensional covariance matrix. Ann Stat 47:3300–3334 Download references AcknowledgementsThis work is supported by the National Natural Science Foundation of China (12031016, 11971324, 11471223), the startup fund from Weifang University, and the Beijing Science and Technology Innovation Platform Construction Project Funding. Dr Simon WANG at the Language Centre, Hong Kong Baptist University has helped edit the manuscript. Author informationAuthors and affiliations. School of Mathematics and Statistics, Weifang University, No.5147, Dongfeng East Street, Weifang, China Guanpeng Wang Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong Yuyuan Wang School of Mathematical Sciences, Capital Normal University, No. 105, North West Third Ring Road, Beijing, China Guanpeng Wang & Hengjian Cui You can also search for this author in PubMed Google Scholar Corresponding authorCorrespondence to Hengjian Cui . Ethics declarationsConflict of interest. On behalf of all authors, the corresponding author states that there is no Conflict of interest. Additional informationPublisher's note. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Appendix A: Proof of the main resultsIn this section, we will proof our theoretical properties for Lemma and Theorems in above aforementioned. Proof of Theorem 1According to the minimized expression of formula ( 3 ), and \(\widehat{\varvec{\theta }}_n\) is a solution that satisfies the following equation Solve the Eq. ( A.1 ), we have \(\widehat{\varvec{\theta }}_n =\textbf{B}^{1}\textbf{s},\) where \(\textbf{B}=(\text {tr}(\textbf{A}_i\textbf{A}_j))_{K\times K}\) and \(\textbf{s}=(\text {tr}(\textbf{S}_n\textbf{A}_i))_{K\times 1}\) . Because the sample covariance matrix \(\textbf{S}_n\) is an unbiased estimator for population covariance matrix \(\varvec{\Sigma }\) , and it’s easy to have In summary, this completes proof the content (a) of Theorem 1 . In order to derive the content (b) of Theorem 1 . Firstly, we compute the expectation of \(\textbf{D}_{kl} = \textrm{E}[\text {tr}(\textbf{S}_n \textbf{A}_k)\cdot \text {tr}(\textbf{S}_n\textbf{A}_l)]\) . Since the sample covariance matrix \(\textbf{S}_n\) is invariant with respect to location parameter \(\varvec{\mu }\) , without loss of generality, let \(\varvec{\mu }=0\) , our calculation yields the following result Next, we only give the expression of \(\textrm{E}(X_1^T\textbf{A}_kX_1X_1^T\textbf{A}_kX_1)\) . Under the condition (C1), through the simple calculation of quadratic expectation, we can get It can be derived that Then we have the covariance matrix of vector \(\widehat{\varvec{\theta }}_n\) Note that for any \(\textbf{a}\in {R}^K\) , under the condition (C1), so we have From the Eq. ( A.2 ), then Since the parameter \(\varvec{\theta }\) in a compact set parameter space \({\varvec{\Theta }}\) , we can assume that \(\max \limits _{1\le k\le K}\theta _k\le c\) where c is a positive constant. Under the requirement of \(\max \limits _{1\le k\le K}\lambda _1(\textbf{A}_k^2)=1\) , it is noticed that for any \(\textbf{a}\in {R}^K, \textbf{a}\ne \textbf{0}\) , \((\text {tr}({\varvec{\Sigma }} \textbf{A}_i{\varvec{\Sigma }} \textbf{A}_j))_{K\times K}\) and \(\textbf{B}\) satisfy the following inequality For any \(\textbf{b}\in R^K, \textbf{b}^T\textbf{b}=1\) , by setting \(\textbf{a}=\textbf{B}^{1/2}\textbf{b}\) , then we have and combining ( A.3 ) In summary, this completes proof the content (b) of Theorem 1 . Finally, we will proof the content (c) of Theorem 1 . Similarly, we consider a simple condition that \(\textbf{B}=(b_{ij})_{K\times K}\) is diagonal matrix where \(b_{ij}=\text {tr}(\textbf{A}_i\textbf{A}_j)=0,~i\ne j\) . In this case, considering the asymptotic distribution for each elements \(\widehat{\theta }_{k}\) in \(\widehat{\varvec{\theta }}_n\) . Recalling the expression of \(\widehat{\theta }_{k}\) and we have Since the term \((X_i\varvec{\mu })^T\textbf{A}_k(X_i\varvec{\mu }),\) \(i=1,\ldots ,n\) ’s are independent and then we have by CLT that It’s easy to show that \(\textrm{E}(I_2)=0\) and \(\text {Var}(I_2)= 4\text {tr}^2({\varvec{\Sigma }}\textbf{A}_k)/[n(n1)\text {tr}^2(\textbf{A}_k^2)]\) , then we have \(\sqrt{n}I_2{\mathop {\rightarrow }\limits ^\textrm{P}}0.\) According to ( A.5 ), which implies that For the estimation vector \(\widehat{\varvec{\theta }}_n\) of the linear coefficients, we have the following asymptotic normality where \(V= \left( (\kappa 3)\text {tr}(\Gamma ^T\textbf{A}_i\Gamma \circ \Gamma ^T\textbf{A}_j\Gamma )+2\text {tr}({\varvec{\Sigma }} \textbf{A}_i {\varvec{\Sigma }} \textbf{A}_j) \right) _{K\times K}.\) For the general case \(\textbf{B}=(\text {tr}(\textbf{A}_i\textbf{A}_j))\) , it’s easy to prove that the asymptotic distribution of the vector \(\widehat{\varvec{\theta }}_n\) is normality distribution in ( A.6 ). Therefore, this finishes the proof of Theorem 1 . Proof of Lemma 1Rewrite \(\textbf{u}^T{\varvec{\Sigma }}^{1}\textbf{u}\) for any nonzero p dimensional vector \(\textbf{u}\in \mathbb {R}^p\) as \(\textbf{u}^T{\varvec{\Sigma }}^{1}\textbf{u}=\textbf{u}^T({\varvec{\Sigma }}^{1}\widehat{\varvec{\Sigma }}^{1})\textbf{u}+\textbf{u}^T\widehat{\varvec{\Sigma }}^{1}\textbf{u},\) then we have where \(\mathbf{A^*}=\widehat{\varvec{\Sigma }}^{1/2}({\varvec{\Sigma }}^{1}\widehat{\varvec{\Sigma }}^{1})\widehat{\varvec{\Sigma }}^{1/2}\) , and the Frobenius norms \(\Vert \textbf{A}^*\Vert _F^2=\sum _{i,j}{A^*_{ij}}^2.\) Next, it will prove that \(\text {tr}(\mathbf{A^*}^2)\) tends to 0 with probability. Note that Under condition (C2) holds, it is easy to get \(\text {tr}\left( \textbf{B}^{1/2}(\text {tr}({\varvec{\Sigma }} \textbf{A}_i{\varvec{\Sigma }} \textbf{A}_j))_{K\times K}\textbf{B}^{1/2}\right) \le cK,\) where c is a positive constant. Then we can get by ( A.7 ) that According to Markov inequality, we can say \(\text {tr}(\mathbf{A^*}^2)=O_p(1/n)\) . It can be derived that The proof of lemma 1 is completed. Proof of Theorem 2According to multivariate CLT, under the null hypothesis \(H_0:\varvec{\mu }=\textbf{0}\) , we have that as n , p grow to infinity. Under the alternative hypothesis \(H_1:\varvec{\mu }\ne \textbf{0}\) , we also have where \(\delta = n\varvec{\mu }^T{\varvec{\Sigma }}^{1}\varvec{\mu }\) . First, we rewrite \(\frac{T_L^2p}{\sqrt{2p}}\) as two terms Through the results of the formula ( A.8 ) in the proof of the Lemma 1 , we have that and under condition (C2), when \(p^{1v}=o(n)\) holds, we have According to the condition (C3), then Therefore we can obtain that and the term \(I_{4n}\) can be simplified to Finally, when the dimension of the variable and the sample size satisfy the relation of \(p^{1v}=o(n)\) , as \(n\rightarrow \infty \) , we easily get The two parts decomposed by equation ( A.9 ) are easy to find so we finally obtain the asymptotic of \(T_L^2\) Under the alternative hypothesis \(H_1:\varvec{\mu }\ne \textbf{0}\) , and we also write the test statistics \(\frac{T_L^2p\delta }{\sqrt{2p+4\delta }}\) in the following two parts Using a method similar to proving the asymptotic distribution of the null hypothesis test statistics, it is straightforward that Hence, as \(n, p\rightarrow \infty \) , according to which implies that In summary, combining the conclusion of ( A.10 )–( A.11 ), the content of the Theorem 2 is proved. Rights and permissionsSpringer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author selfarchiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Reprints and permissions About this articleWang, G., Wang, Y. & Cui, H. Mean test for highdimensional data based on covariance matrix with linear structures. Metrika (2024). https://doi.org/10.1007/s00184024009713 Download citation Received : 08 February 2023 Accepted : 31 May 2024 Published : 02 July 2024 DOI : https://doi.org/10.1007/s00184024009713 Share this articleAnyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt contentsharing initiative
What Does the End of Chevron Deference Mean for Federal Health Care Programs?On June 28, 2024, the Supreme Court rejected the doctrine of Chevron deference in the closely watched case of Loper Bright Enterprises v. Raimondo . [1] In a 63 decision, the Court held that Chevron ’s rule that courts must defer to federal agencies’ interpretation of ambiguous statutes gave the executive branch interpretive authority that properly belonged with the courts. Moreover, the Court concluded that Chevron deference was inconsistent with the Administrative Procedure Act (APA), holding that the APA requires courts to exercise independent judgment when deciding legal issues in the review of agency action. Loper will have significant and immediate implications for the U.S. Department of Health and Human Services (HHS), the federal agency charged with the administration of the federal health care programs, including Medicare and Medicaid. As detailed below, the Court’s decision sets a more exacting standard for courts to apply when reviewing HHS’s regulations and legal positions. What Was Chevron Deference?The doctrine of Chevron deference was established in 1984 by the Supreme Court in Chevron U.S.A., Inc. v. Natural Resources Defense Council, Inc. [2] In that case, the Court held when a “statute is silent or ambiguous with respect to the specific issue” raised regarding a statute that the agency administers, “the question for the court is whether the agency’s answer is based on a permissible construction of the statute.” [3] Although scholars have debated Chevron ’s rationale at length, it generally was read to require deference based upon agencies’ presumed subject matter expertise and an assumption that Congress delegated authority to agencies—rather than courts—to fill in gaps in statutory schemes. Notably, the Supreme Court had not itself invoked Chevron deference since 2016, although lower courts have continued to rely on it regularly. [4] What Did Loper Decide?Loper involved two New England fishing companies appealing the D.C. Circuit’s ruling that applied Chevron deference to uphold the National Marine Fisheries Service’s interpretation of the Federal MagnusonStevens Act (the “Act”) as requiring fishermen to pay for the use of compliance monitors on certain fishing boats, even though the federal law is silent on who must pay. Petitioners used the case as a vehicle to present a broader challenge to Chevron , arguing that the doctrine has led to excessive deference to federal agencies, resulting in overregulation, the abdication of judicial responsibility to interpret statutes, and the unwarranted imposition of regulatory enforcement costs. The Loper majority firmly rejected Chevron and held that the APA requires courts to exercise their independent judgment in deciding legal questions that arise in reviewing agency action. As the majority held, “courts need not and under the APA may not defer to an agency interpretation of the law simply because a statute is ambiguous.” [5] Importantly, however, Loper noted that deference may still be afforded agencies in certain instances. First, the Court observed that the APA expressly mandates a deferential standard of review for agency policymaking and factfinding. [6] Second, Loper explained that some statutes are best read to “delegate[] discretionary authority to an agency,” in which case a court’s role is to merely ensure the agency “engaged in ‘reasoned decisionmaking’” within that authority. [7] Lastly, Loper reaffirmed that an agency’s “expertise” remains “one of the factors” that may make an agency’s interpretation persuasive. [8] How Will Loper Impact Federal Health Care Programs?Loper ’s directive that courts should construe statutes independently and not defer to agencies’ positions has enormous implications for providers and suppliers that participate in federal health care programs. Much of today’s health care landscape is governed by HHS’ regulations, impacting many Americans and much of the federal budget. For example, Medicare currently covers more than 67 million beneficiaries, and Medicare spending comprised 12% of the federal budget in 2022 and 21% of national health care spending in 2021 . [9] Federal health care programs like Medicare and Medicaid are established by statutes that set forth myriad requirements regarding the coverage of items and services, and how, when, and by whom those items and services may be furnished. [10] HHS’s various components—most notably the Centers for Medicare and Medicaid Services (CMS)—have issued numerous, detailed regulations to implement these statutes. HHS’s components also include FDA, CDC, HRSA, AHRQ, OCR, NIH, and many others that intersect with health care providers and suppliers regularly. Going forward under Loper , future challenges to agency regulations will take place upon a much different playing field. This has several important implications:
Going forward, courts will be more amenable than ever to siding with challenges to HHS regulations. This creates both challenges and opportunities for providers and suppliers who should carefully assess the legal basis for all new regulations. The authors acknowledge the contributions of Callie Ericksen, a student at the University of California Davis Law School and 2024 summer associate at Foley & Lardner LLP. [1] Loper Bright Enterprises v. Raimondo , No. 22451 (June 28, 2024), together with Relentless, Inc. v. Department of Commerce , No. 221219, available here . [2] 467 U.S. 837 (1984). [3] Id. at 843 (emphasis added). [4] See Am. Hosp. Ass’n (“AHA”) v. Becerra , 142 S. Ct. 1896, 1904 (2022) (determining that HHS’s preclusion of judicial review “lacks any textual basis,” remaining silent with respect to Chevron ); Becerra v. Empire Health Found. , 142 S. Ct. 2354, 2362 (2022) (illustrating that HHS’s reading aligns with the statute’s “text, context, and structure” in calculating the Medicare fraction for purposes of Medicare Part A benefits, without any mention of Chevron ); Vanda Pharms., Inc. v. Ctrs. for Medicare & Medicaid Servs. ,98 F.4th 483 (2024) (holding that CMS’s definitions of “lineextension” and “new formulation” did not conflict with the Medicaid statute). [5] Loper Bright Enterprises v. Raimondo , No. 22451, slip op. 35 (June 28, 2024). [6] Id. at slip. op. 14 (citing 5 U.S.C. §§ 706(2)(A), (E)). [7] Id. at slip op. 18. [8] Id. at slip op. 25 (citing Skidmore v. Swift & Co. , 323 U.S. 134 (1944). [9] See KFF, Medicare 101 (published May 28, 2024), available here . [10] See 42 U.S.C. §§ 1395–1395lll. [11] Loper Bright Enterprises v. Raimondo , No. 22451, slip op. 34 (June 28, 2024). [12] See Am. Health Care Ass’n v. Becerra , No. 24cv114 (N.D. Tex) (challenging the rule issued at 89 Fed. Reg. 40876 (May 10, 2024). [13] Conditions of Participation, 42 C.F.R. §§ 485.500485.546 (Subpart E), and Payments, §§ 419.90419.95 (Subpart J), 87 Fed. Reg. 71748, 7229293 (Nov. 23, 2022), [14] 21 C.F.R. § 809, 89 Fed. Reg. 37286 (May 6, 2024). [15] See, e.g. , Baptist Mem’l Hosp. – Golden Triangle, Inc. v. Azar , 956 F.3d 689 (5th Cir. 2020) (deferring to CMS’s rule addressing “costs incurred” for calculating Medicaid Disproportionate Share Hospital payments). Matthew D. KruegerJudith A. WaltzJarrod S. BrodskyIlana MeyerLaw Graduate Related InsightsHipaa: amendments to protect reproductive health care information can now be implemented with ocr’s final rule, episode 32: let’s talk compliance: what the ftc’s ban on noncompetes means for the health care industry, episode 31: health care philanthropy: navigating the hurdles, rules, and restrictions. 
IMAGES
VIDEO
COMMENTS
Full Hypothesis Test Examples. Example 8.2.4. Jeffrey, as an eightyear old, established a mean time of 16.43 seconds for swimming the 25yard freestyle, with a standard deviation of 0.8 seconds. His dad, Frank, thought that Jeffrey could swim the 25yard freestyle faster using goggles.
Here is the logic of the analysis: Given the alternative hypothesis (μ < 110), we want to know whether the observed sample mean is small enough to cause us to reject the null hypothesis. The observed sample mean produced a t statistic test statistic of 0.894.
In this case, the null hypothesis is a simple hypothesis and the alternative hypothesis is a twosided hypothesis (i.e., it includes both $\mu \lt \mu_0$ and $\mu>\mu_0$). We call this hypothesis test a twosided test.
Z scores above 2 and below 2 represent approximately 5% of all Z values. If the observed sample mean is close to the mean specified in H 0 (here m =191), then Z will be close to zero. If the observed sample mean is much larger than the mean specified in H 0, then Z will be large. In hypothesis testing, we select a critical value from the Z ...
It depends on the level of significance \(\alpha \) (step 2 of conducting a hypothesis test), and the probability the sample data would produce the observed result. In the next section, we set up the six steps for a hypothesis test for one mean.
The mean pregnancy length is 266 days. We test the following hypotheses. H 0: μ = 266. H a: μ < 266. Suppose a random sample of 40 women who smoke during their pregnancy have a mean pregnancy length of 260 days with a standard deviation of 21 days. The Pvalue is 0.04.
5.3  Hypothesis Testing for OneSample Mean. In the previous section, we learned how to perform a hypothesis test for one proportion. The concepts of hypothesis testing remain constant for any hypothesis test. In these next few sections, we will present the hypothesis test for one mean. We start with our knowledge of the sampling distribution ...
Hypothesis testing example. You want to test whether there is a relationship between gender and height. Based on your knowledge of human physiology, you formulate a hypothesis that men are, on average, taller than women. ... Stating results in a statistics assignment In our comparison of mean height between men and women we found an average ...
The null hypothesis. We test the null hypothesis that the mean is equal to a specific value : The test statistic. We construct the test statistic by using the sample mean and the adjusted sample variance. The test statistics, called tstatistic, is. The test of hypothesis based on it is called ttest.
Full Hypothesis Test Examples. Example 8.3.6 8.3. 6. Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65 65 70 67 66 63 63 68 72 71.
Significance tests give us a formal process for using sample data to evaluate the likelihood of some claim about a population value. Learn how to conduct significance tests and calculate pvalues to see how likely a sample result is to occur by random chance. You'll also see how we use pvalues to make conclusions about hypotheses.
Null hypothesis: Mean IQ scores for children whose mothers smoke 10 or more cigarettes a day during pregnancy are same as mean for those whose mothers do not smoke, in populations similar to one from which this sample was drawn. Alternative hypothesis: Mean IQ scores for children whose mothers smoke 10 or more cigarettes a day during pregnancy
The test statistic is used to decide the outcome of the hypothesis test. The test statistic is a standardized value calculated from the sample. The formula for the test statistic (TS) of a population mean is: x ¯ − μ s ⋅ n. x ¯ − μ is the difference between the sample mean ( x ¯) and the claimed population mean ( μ ).
ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults). Predictor variable. Outcome variable. Research question example. Paired ttest. Categorical. 1 predictor. Quantitative. groups come from the same population.
In this video there was no critical value set for this experiment. In the last seconds of the video, Sal briefly mentions a pvalue of 5% (0.05), which would have a critical of value of z = (+/) 1.96. Since the experiment produced a zscore of 3, which is more extreme than 1.96, we reject the null hypothesis.
A z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known.
Answer. Set up the hypothesis test: A 5% level of significance means that α = 0.05 α = 0.05. This is a test of a single population mean. H0: μ = 65 Ha: μ > 65 H 0: μ = 65 H a: μ > 65. Since the instructor thinks the average score is higher, use a " > > ". The " > > " means the test is righttailed.
OneSample Z Test. A machine fills beer into bottles whose volume is supposed to be 341 ml, but the exact amount varies from bottle to bottle. We randomly picked 100 bottles and obtained the sample mean volume of 339 ml. Assume the population standard deviation [latex]\sigma = 5[/latex] ml. Test at the 5% significance level whether the machine is NOT working properly.
On 6.43 sal used the same "The standard deviation of the difference of the sample mean" as the video he used before. But on the last video on this chapter (named "Hypothesis test comparing population proportions), he calculated at new Standard Deviation for the Null Hypothesis. ... In this video, I actually want to do a hypothesis test, really ...
8.3: Hypothesis Testing of Single Mean is shared under a license and was authored, remixed, and/or curated by LibreTexts. Previous hypotheses testing for population means was described in the case of large samples. The statistical validity of the tests was insured by the Central Limit Theorem, with essentially no ….
For a mean, the process of hypothesis testing can be conducted to look at data more closely. Dive into hypothesis testing, setting up the problem, and analyzing data, including some examples to ...
Example: Suppose H0: mean = 50 and H1: mean not equal to 50. According to the H1, the mean can be greater than or less than 50. This is an example of a Twotailed test. ... Results are samplespecific: Hypothesis testing is based on analyzing a sample from a population, and the conclusions drawn are specific to that particular sample.
To test whether the new pain reliever works more quickly than the standard one, \(50\) patients with minor surgeries were given the new pain reliever and their times to relief were recorded. The experiment yielded sample mean \(\bar{x}=3.1\) minutes and sample standard deviation \(s=1.5\) minutes.
Hypothesis testing involves designing a study and analyzing the data in order to see if the mean of the study significantly differs from the population mean. Designing a study can be done in three ...
The test statistic for a sample mean is a zscore (z), which is calculated using the formula: z = (X̄  μ) / (σ / √n) where X̄ is the sample mean, μ is the population mean, σ is the population standard deviation, and n is the sample size. ... Hypothesis Testing with One Sample: A Comprehensive Guide. Hypothesis Testing with Two Samples ...
In this work, the mean test is considered under the condition that the number of dimensions p is much larger than the sample size n when the covariance matrix is represented as a linear structure as possible. At first, the estimator of coefficients in the linear structures of the covariance matrix is constructed, and then an efficient covariance matrix estimator is naturally given. Next, a new ...
Even though this situation is not likely (knowing the population standard deviations is not likely), the following example illustrates hypothesis testing for independent means, ... .36 for sigma2, 3 for the first sample mean, 20 for n1, 2.9 for the second sample mean, and 20 for n2. Arrow down to \(\mu1\): and arrow to \(> \mu_{2}\).
On June 28, 2024, the Supreme Court rejected the doctrine of Chevron deference in the closely watched case of Loper Bright Enterprises v.Raimondo. In a 63 decision, the Court held that Chevron's rule that courts must defer to federal agencies' interpretation of ambiguous statutes gave the executive branch interpretive authority that properly belonged with the courts.
A random survey of 75 death row inmates revealed that the mean length of time on death row is 17.4 years with a standard deviation of 6.3 years. Conduct a hypothesis test to determine if the population mean time on death row could likely be 15 years. Is this a test of one mean or proportion? State the null and alternative hypotheses.
Assumptions. When you perform a hypothesis test of a single population mean \(\mu\) using a Student's \(t\)distribution (often called a \(t\)test), there are fundamental assumptions that need to be met in order for the test to work properly.Your data should be a simple random sample that comes from a population that is approximately normally distributed.