two tailed hypothesis test p value

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

S.3.2 hypothesis testing (p-value approach).

The P -value approach involves determining "likely" or "unlikely" by determining the probability — assuming the null hypothesis was true — of observing a more extreme test statistic in the direction of the alternative hypothesis than the one observed. If the P -value is small, say less than (or equal to) \(\alpha\), then it is "unlikely." And, if the P -value is large, say more than \(\alpha\), then it is "likely."

If the P -value is less than (or equal to) \(\alpha\), then the null hypothesis is rejected in favor of the alternative hypothesis. And, if the P -value is greater than \(\alpha\), then the null hypothesis is not rejected.

Specifically, the four steps involved in using the P -value approach to conducting any hypothesis test are:

  • Specify the null and alternative hypotheses.
  • Using the sample data and assuming the null hypothesis is true, calculate the value of the test statistic. Again, to conduct the hypothesis test for the population mean μ , we use the t -statistic \(t^*=\frac{\bar{x}-\mu}{s/\sqrt{n}}\) which follows a t -distribution with n - 1 degrees of freedom.
  • Using the known distribution of the test statistic, calculate the P -value : "If the null hypothesis is true, what is the probability that we'd observe a more extreme test statistic in the direction of the alternative hypothesis than we did?" (Note how this question is equivalent to the question answered in criminal trials: "If the defendant is innocent, what is the chance that we'd observe such extreme criminal evidence?")
  • Set the significance level, \(\alpha\), the probability of making a Type I error to be small — 0.01, 0.05, or 0.10. Compare the P -value to \(\alpha\). If the P -value is less than (or equal to) \(\alpha\), reject the null hypothesis in favor of the alternative hypothesis. If the P -value is greater than \(\alpha\), do not reject the null hypothesis.

Example S.3.2.1

Mean gpa section  .

In our example concerning the mean grade point average, suppose that our random sample of n = 15 students majoring in mathematics yields a test statistic t * equaling 2.5. Since n = 15, our test statistic t * has n - 1 = 14 degrees of freedom. Also, suppose we set our significance level α at 0.05 so that we have only a 5% chance of making a Type I error.

Right Tailed

The P -value for conducting the right-tailed test H 0 : μ = 3 versus H A : μ > 3 is the probability that we would observe a test statistic greater than t * = 2.5 if the population mean \(\mu\) really were 3. Recall that probability equals the area under the probability curve. The P -value is therefore the area under a t n - 1 = t 14 curve and to the right of the test statistic t * = 2.5. It can be shown using statistical software that the P -value is 0.0127. The graph depicts this visually.

t-distrbution graph showing the right tail beyond a t value of 2.5

The P -value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0127, is less than \(\alpha\) = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ > 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ > 3 if we lowered our willingness to make a Type I error to \(\alpha\) = 0.01 instead, as the P -value, 0.0127, is then greater than \(\alpha\) = 0.01.

Left Tailed

In our example concerning the mean grade point average, suppose that our random sample of n = 15 students majoring in mathematics yields a test statistic t * instead of equaling -2.5. The P -value for conducting the left-tailed test H 0 : μ = 3 versus H A : μ < 3 is the probability that we would observe a test statistic less than t * = -2.5 if the population mean μ really were 3. The P -value is therefore the area under a t n - 1 = t 14 curve and to the left of the test statistic t* = -2.5. It can be shown using statistical software that the P -value is 0.0127. The graph depicts this visually.

t distribution graph showing left tail below t value of -2.5

The P -value, 0.0127, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0127, is less than α = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ < 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ < 3 if we lowered our willingness to make a Type I error to α = 0.01 instead, as the P -value, 0.0127, is then greater than \(\alpha\) = 0.01.

In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t * instead of equaling -2.5. The P -value for conducting the two-tailed test H 0 : μ = 3 versus H A : μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean μ really was 3. That is, the two-tailed test requires taking into account the possibility that the test statistic could fall into either tail (hence the name "two-tailed" test). The P -value is, therefore, the area under a t n - 1 = t 14 curve to the left of -2.5 and to the right of 2.5. It can be shown using statistical software that the P -value is 0.0127 + 0.0127, or 0.0254. The graph depicts this visually.

t-distribution graph of two tailed probability for t values of -2.5 and 2.5

Note that the P -value for a two-tailed test is always two times the P -value for either of the one-tailed tests. The P -value, 0.0254, tells us it is "unlikely" that we would observe such an extreme test statistic t * in the direction of H A if the null hypothesis were true. Therefore, our initial assumption that the null hypothesis is true must be incorrect. That is, since the P -value, 0.0254, is less than α = 0.05, we reject the null hypothesis H 0 : μ = 3 in favor of the alternative hypothesis H A : μ ≠ 3.

Note that we would not reject H 0 : μ = 3 in favor of H A : μ ≠ 3 if we lowered our willingness to make a Type I error to α = 0.01 instead, as the P -value, 0.0254, is then greater than \(\alpha\) = 0.01.

Now that we have reviewed the critical value and P -value approach procedures for each of the three possible hypotheses, let's look at three new examples — one of a right-tailed test, one of a left-tailed test, and one of a two-tailed test.

The good news is that, whenever possible, we will take advantage of the test statistics and P -values reported in statistical software, such as Minitab, to conduct our hypothesis tests in this course.

Please enable JavaScript to view this site.

  • Statistics Guide
  • Curve Fitting Guide
  • Prism Guide
  • Zoom Window Out
  • Larger Text  |  Smaller Text
  • Hide Page Header
  • Show Expanding Text
  • Printable Version
  • Save Permalink URL

When comparing two groups, you must distinguish between one- and two-tail P values. Some books refer to one-sided and two-sided P values, which mean the same thing.

What does one-tail mean?

It is easiest to understand the distinction in context. So let’s imagine that you are comparing the mean of two groups (with an unpaired t test). Both one- and two-tail P values are based on the same null hypothesis, that two populations really are the same and that an observed discrepancy between sample means is due to chance.

A two-tailed P value answers this question:

Assuming the null hypothesis is true, what is the chance that randomly selected samples would have means as far apart as (or further than) you observed in this experiment with either group having the larger mean?

To interpret a one-tail P value, you must predict which group will have the larger mean before collecting any data. The one-tail P value answers this question:

Assuming the null hypothesis is true, what is the chance that randomly selected samples would have means as far apart as (or further than) observed in this experiment with the specified group having the larger mean?

If the observed difference went in the direction predicted by the experimental hypothesis, the one-tailed P value is half the two-tailed P value (with most, but not quite all, statistical tests).

When is it appropriate to use a one-tail P value?

A one-tailed test is appropriate when previous data, physical limitations, or common sense tells you that the difference, if any, can only go in one direction. You should only choose a one-tail P value when both of the following are true.

• You predicted which group will have the larger mean (or proportion) before you collected any data. If you only made the "prediction" after seeing the data, don't even think about using a one-tail P value.

• If the other group had ended up with the larger mean – even if it is quite a bit larger – you would have attributed that difference to chance and called the difference 'not statistically significant'.

Here is an example in which you might appropriately choose a one-tailed P value: You are testing whether a new antibiotic impairs renal function, as measured by serum creatinine. Many antibiotics poison kidney cells, resulting in reduced glomerular filtration and increased serum creatinine. As far as I know, no antibiotic is known to decrease serum creatinine, and it is hard to imagine a mechanism by which an antibiotic would increase the glomerular filtration rate. Before collecting any data, you can state that there are two possibilities: Either the drug will not change the mean serum creatinine of the population, or it will increase the mean serum creatinine in the population. You consider it impossible that the drug will truly decrease mean serum creatinine of the population and plan to attribute any observed decrease to random sampling. Accordingly, it makes sense to calculate a one-tailed P value. In this example, a two-tailed P value tests the null hypothesis that the drug does not alter the creatinine level; a one-tailed P value tests the null hypothesis that the drug does not increase the creatinine level.

The issue in choosing between one- and two-tailed P values is not whether or not you expect a difference to exist. If you already knew whether or not there was a difference, there is no reason to collect the data. Rather, the issue is whether the direction of a difference (if there is one) can only go one way. You should only use a one-tailed P value when you can state with certainty (and before collecting any data) that in the overall populations there either is no difference or there is a difference in a specified direction. If your data end up showing a difference in the “wrong” direction, you should be willing to attribute that difference to random sampling without even considering the notion that the measured difference might reflect a true difference in the overall populations. If a difference in the “wrong” direction would intrigue you (even a little), you should calculate a two-tailed P value.

How Prism reports one-tail P values

When you ask Prism to report a one-tail P value, it assumes the actual difference or effect went in the direction you predicted, so the one-sided P value reported by Prism is always smaller (almost always, exactly half of) the two-tail P value.

If, in fact, the observed difference or effect goes in the opposite direction to what you predicted, the one-sided P value reported by Prism is wrong. The actual one-tail P value will equal 1.0 minus the reported one. For example, if the reported one-tail P value is 0.04 and the actual difference is in the opposite direction to what you predicted, then the actual one-sided P value is 0.96.

What if you didn't predict the direction of the difference or effect before collecting data?

If you didn't predict the direction of the effect before collecting data, you should not be reporting one-sided P values. It is cheating to say "well, I would have predicted...". If you didn't record the prediction, then you should not use a one-sided P value.

What if there are not two directions to the test?

The concept of one- and two-tail P values only makes sense for hypotheses where there are two directions to the effect, an increase or a decrease. If you are comparing three or more groups (ANOVA), then the concept of one- and two-tail P value makes no sense, and Prism doesn't ask you to make this choice.

How to convert between one- and two-tail P values

If the actual effect went in the direction you predicted:

• The one-tail P value is half the two-tail P value.

• The two-tail P value is twice the one-tail P value (assuming you correctly predicted the direction of the difference).

This rule works perfectly for almost all statistical tests. Some tests (Fisher's test) are not symmetrical, so these rules are only approximate for these tests.

If the actual effect went in the opposite direction to what you predicted:

• The one-tail P value equals 1.0 minus half the two-tail P value.

© 1995- 2019 GraphPad Software, LLC. All rights reserved.

resize nav pane

two tailed hypothesis test p value

Hypothesis Testing for Means & Proportions

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  
  • |   10  

On This Page sidebar

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

Type i and type ii errors.

Learn More sidebar

All Modules

More Resources sidebar

Z score Table

t score Table

The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.  

  • Step 1. Set up hypotheses and select the level of significance α.

H 0 : Null hypothesis (no change, no difference);  

H 1 : Research hypothesis (investigator's belief); α =0.05

  • Step 2. Select the appropriate test statistic.  

The test statistic is a single number that summarizes the sample information.   An example of a test statistic is the Z statistic computed as follows:

When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.

  • Step 3.  Set up decision rule.  

The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.

  • The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value.  In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
  • The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.  
  • The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value.   For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.  

The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.

Standard normal distribution with lower tail at -1.645 and alpha=0.05

Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < 1.645.

Standard normal distribution with two tails

Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.

The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."

Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."

  • Step 4. Compute the test statistic.  

Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.

  • Step 5. Conclusion.  

The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).  

If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .

Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .  

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 191 H 1 : μ > 191                 α =0.05

The research hypothesis is that weights have increased, and therefore an upper tailed test is used.

  • Step 2. Select the appropriate test statistic.

Because the sample size is large (n > 30) the appropriate test statistic is

  • Step 3. Set up decision rule.  

In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05.   Reject H 0 if Z > 1.645.

We now substitute the sample data into the formula for the test statistic identified in Step 2.  

We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.                  

In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).

Table - Conclusions in Test of Hypothesis

In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).

When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.

Lightbulb icon signifying an important idea

 The most common reason for a Type II error is a small sample size.

return to top | previous page | next page

Content ©2017. All Rights Reserved. Date last modified: November 6, 2017. Wayne W. LaMorte, MD, PhD, MPH

P-Value And Statistical Significance: What It Is & Why It Matters

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01.

P-Value Explained in Normal Distribution

Hypothesis testing

When you perform a statistical test, a p-value helps you determine the significance of your results in relation to the null hypothesis.

The null hypothesis (H0) states no relationship exists between the two variables being studied (one variable does not affect the other). It states the results are due to chance and are not significant in supporting the idea being investigated. Thus, the null hypothesis assumes that whatever you try to prove did not happen.

The alternative hypothesis (Ha or H1) is the one you would believe if the null hypothesis is concluded to be untrue.

The alternative hypothesis states that the independent variable affected the dependent variable, and the results are significant in supporting the theory being investigated (i.e., the results are not due to random chance).

What a p-value tells you

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance (i.e., that the null hypothesis is true).

The level of statistical significance is often expressed as a p-value between 0 and 1.

The smaller the p -value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

Remember, a p-value doesn’t tell you if the null hypothesis is true or false. It just tells you how likely you’d see the data you observed (or more extreme data) if the null hypothesis was true. It’s a piece of evidence, not a definitive proof.

Example: Test Statistic and p-Value

Suppose you’re conducting a study to determine whether a new drug has an effect on pain relief compared to a placebo. If the new drug has no impact, your test statistic will be close to the one predicted by the null hypothesis (no difference between the drug and placebo groups), and the resulting p-value will be close to 1. It may not be precisely 1 because real-world variations may exist. Conversely, if the new drug indeed reduces pain significantly, your test statistic will diverge further from what’s expected under the null hypothesis, and the p-value will decrease. The p-value will never reach zero because there’s always a slim possibility, though highly improbable, that the observed results occurred by random chance.

P-value interpretation

The significance level (alpha) is a set probability threshold (often 0.05), while the p-value is the probability you calculate based on your study or analysis.

A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

A p-value less than or equal to a predetermined significance level (often 0.05 or 0.01) indicates a statistically significant result, meaning the observed data provide strong evidence against the null hypothesis.

This suggests the effect under study likely represents a real relationship rather than just random chance.

For instance, if you set α = 0.05, you would reject the null hypothesis if your p -value ≤ 0.05. 

It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).

Therefore, we reject the null hypothesis and accept the alternative hypothesis.

Example: Statistical Significance

Upon analyzing the pain relief effects of the new drug compared to the placebo, the computed p-value is less than 0.01, which falls well below the predetermined alpha value of 0.05. Consequently, you conclude that there is a statistically significant difference in pain relief between the new drug and the placebo.

What does a p-value of 0.001 mean?

A p-value of 0.001 is highly statistically significant beyond the commonly used 0.05 threshold. It indicates strong evidence of a real effect or difference, rather than just random variation.

Specifically, a p-value of 0.001 means there is only a 0.1% chance of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is correct.

Such a small p-value provides strong evidence against the null hypothesis, leading to rejecting the null in favor of the alternative hypothesis.

A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis.

This means we retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot accept the null hypothesis; we can only reject it or fail to reject it.

Note : when the p-value is above your threshold of significance,  it does not mean that there is a 95% probability that the alternative hypothesis is true.

One-Tailed Test

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Two-Tailed Test

statistical significance two tailed

How do you calculate the p-value ?

Most statistical software packages like R, SPSS, and others automatically calculate your p-value. This is the easiest and most common way.

Online resources and tables are available to estimate the p-value based on your test statistic and degrees of freedom.

These tables help you understand how often you would expect to see your test statistic under the null hypothesis.

Understanding the Statistical Test:

Different statistical tests are designed to answer specific research questions or hypotheses. Each test has its own underlying assumptions and characteristics.

For example, you might use a t-test to compare means, a chi-squared test for categorical data, or a correlation test to measure the strength of a relationship between variables.

Be aware that the number of independent variables you include in your analysis can influence the magnitude of the test statistic needed to produce the same p-value.

This factor is particularly important to consider when comparing results across different analyses.

Example: Choosing a Statistical Test

If you’re comparing the effectiveness of just two different drugs in pain relief, a two-sample t-test is a suitable choice for comparing these two groups. However, when you’re examining the impact of three or more drugs, it’s more appropriate to employ an Analysis of Variance ( ANOVA) . Utilizing multiple pairwise comparisons in such cases can lead to artificially low p-values and an overestimation of the significance of differences between the drug groups.

How to report

A statistically significant result cannot prove that a research hypothesis is correct (which implies 100% certainty).

Instead, we may state our results “provide support for” or “give evidence for” our research hypothesis (as there is still a slight probability that the results occurred by chance and the null hypothesis was correct – e.g., less than 5%).

Example: Reporting the results

In our comparison of the pain relief effects of the new drug and the placebo, we observed that participants in the drug group experienced a significant reduction in pain ( M = 3.5; SD = 0.8) compared to those in the placebo group ( M = 5.2; SD  = 0.7), resulting in an average difference of 1.7 points on the pain scale (t(98) = -9.36; p < 0.001).

The 6th edition of the APA style manual (American Psychological Association, 2010) states the following on the topic of reporting p-values:

“When reporting p values, report exact p values (e.g., p = .031) to two or three decimal places. However, report p values less than .001 as p < .001.

The tradition of reporting p values in the form p < .10, p < .05, p < .01, and so forth, was appropriate in a time when only limited tables of critical values were available.” (p. 114)

  • Do not use 0 before the decimal point for the statistical value p as it cannot equal 1. In other words, write p = .001 instead of p = 0.001.
  • Please pay attention to issues of italics ( p is always italicized) and spacing (either side of the = sign).
  • p = .000 (as outputted by some statistical packages such as SPSS) is impossible and should be written as p < .001.
  • The opposite of significant is “nonsignificant,” not “insignificant.”

Why is the p -value not enough?

A lower p-value  is sometimes interpreted as meaning there is a stronger relationship between two variables.

However, statistical significance means that it is unlikely that the null hypothesis is true (less than 5%).

To understand the strength of the difference between the two groups (control vs. experimental) a researcher needs to calculate the effect size .

When do you reject the null hypothesis?

In statistical hypothesis testing, you reject the null hypothesis when the p-value is less than or equal to the significance level (α) you set before conducting your test. The significance level is the probability of rejecting the null hypothesis when it is true. Commonly used significance levels are 0.01, 0.05, and 0.10.

Remember, rejecting the null hypothesis doesn’t prove the alternative hypothesis; it just suggests that the alternative hypothesis may be plausible given the observed data.

The p -value is conditional upon the null hypothesis being true but is unrelated to the truth or falsity of the alternative hypothesis.

What does p-value of 0.05 mean?

If your p-value is less than or equal to 0.05 (the significance level), you would conclude that your result is statistically significant. This means the evidence is strong enough to reject the null hypothesis in favor of the alternative hypothesis.

Are all p-values below 0.05 considered statistically significant?

No, not all p-values below 0.05 are considered statistically significant. The threshold of 0.05 is commonly used, but it’s just a convention. Statistical significance depends on factors like the study design, sample size, and the magnitude of the observed effect.

A p-value below 0.05 means there is evidence against the null hypothesis, suggesting a real effect. However, it’s essential to consider the context and other factors when interpreting results.

Researchers also look at effect size and confidence intervals to determine the practical significance and reliability of findings.

How does sample size affect the interpretation of p-values?

Sample size can impact the interpretation of p-values. A larger sample size provides more reliable and precise estimates of the population, leading to narrower confidence intervals.

With a larger sample, even small differences between groups or effects can become statistically significant, yielding lower p-values. In contrast, smaller sample sizes may not have enough statistical power to detect smaller effects, resulting in higher p-values.

Therefore, a larger sample size increases the chances of finding statistically significant results when there is a genuine effect, making the findings more trustworthy and robust.

Can a non-significant p-value indicate that there is no effect or difference in the data?

No, a non-significant p-value does not necessarily indicate that there is no effect or difference in the data. It means that the observed data do not provide strong enough evidence to reject the null hypothesis.

There could still be a real effect or difference, but it might be smaller or more variable than the study was able to detect.

Other factors like sample size, study design, and measurement precision can influence the p-value. It’s important to consider the entire body of evidence and not rely solely on p-values when interpreting research findings.

Can P values be exactly zero?

While a p-value can be extremely small, it cannot technically be absolute zero. When a p-value is reported as p = 0.000, the actual p-value is too small for the software to display. This is often interpreted as strong evidence against the null hypothesis. For p values less than 0.001, report as p < .001

Further Information

  • P-values and significance tests (Kahn Academy)
  • Hypothesis testing and p-values (Kahn Academy)
  • Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond “ p “< 0.05”.
  • Criticism of using the “ p “< 0.05”.
  • Publication manual of the American Psychological Association
  • Statistics for Psychology Book Download

Bland, J. M., & Altman, D. G. (1994). One and two sided tests of significance: Authors’ reply.  BMJ: British Medical Journal ,  309 (6958), 874.

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research.  American Journal of Public Health ,  78 (12), 1568-1574.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions . In  Seminars in hematology  (Vol. 45, No. 3, pp. 135-140). WB Saunders.

Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value.  Epidemiology (Cambridge, Mass.) ,  9 (1), 7-8.

Print Friendly, PDF & Email

Related Articles

Exploratory Data Analysis

Exploratory Data Analysis

What Is Face Validity In Research? Importance & How To Measure

Research Methodology , Statistics

What Is Face Validity In Research? Importance & How To Measure

Criterion Validity: Definition & Examples

Criterion Validity: Definition & Examples

Convergent Validity: Definition and Examples

Convergent Validity: Definition and Examples

Content Validity in Research: Definition & Examples

Content Validity in Research: Definition & Examples

Construct Validity In Psychology Research

Construct Validity In Psychology Research

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Statistics and probability

Course: statistics and probability   >   unit 12, hypothesis testing and p-values.

  • One-tailed and two-tailed tests
  • Z-statistics vs. T-statistics
  • Small sample hypothesis test
  • Large sample proportion hypothesis testing

two tailed hypothesis test p value

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Good Answer

Video transcript

t-test Calculator

Table of contents

Welcome to our t-test calculator! Here you can not only easily perform one-sample t-tests , but also two-sample t-tests , as well as paired t-tests .

Do you prefer to find the p-value from t-test, or would you rather find the t-test critical values? Well, this t-test calculator can do both! 😊

What does a t-test tell you? Take a look at the text below, where we explain what actually gets tested when various types of t-tests are performed. Also, we explain when to use t-tests (in particular, whether to use the z-test vs. t-test) and what assumptions your data should satisfy for the results of a t-test to be valid. If you've ever wanted to know how to do a t-test by hand, we provide the necessary t-test formula, as well as tell you how to determine the number of degrees of freedom in a t-test.

When to use a t-test?

A t-test is one of the most popular statistical tests for location , i.e., it deals with the population(s) mean value(s).

There are different types of t-tests that you can perform:

  • A one-sample t-test;
  • A two-sample t-test; and
  • A paired t-test.

In the next section , we explain when to use which. Remember that a t-test can only be used for one or two groups . If you need to compare three (or more) means, use the analysis of variance ( ANOVA ) method.

The t-test is a parametric test, meaning that your data has to fulfill some assumptions :

  • The data points are independent; AND
  • The data, at least approximately, follow a normal distribution .

If your sample doesn't fit these assumptions, you can resort to nonparametric alternatives. Visit our Mann–Whitney U test calculator or the Wilcoxon rank-sum test calculator to learn more. Other possibilities include the Wilcoxon signed-rank test or the sign test.

Which t-test?

Your choice of t-test depends on whether you are studying one group or two groups:

One sample t-test

Choose the one-sample t-test to check if the mean of a population is equal to some pre-set hypothesized value .

The average volume of a drink sold in 0.33 l cans — is it really equal to 330 ml?

The average weight of people from a specific city — is it different from the national average?

Two-sample t-test

Choose the two-sample t-test to check if the difference between the means of two populations is equal to some pre-determined value when the two samples have been chosen independently of each other.

In particular, you can use this test to check whether the two groups are different from one another .

The average difference in weight gain in two groups of people: one group was on a high-carb diet and the other on a high-fat diet.

The average difference in the results of a math test from students at two different universities.

This test is sometimes referred to as an independent samples t-test , or an unpaired samples t-test .

Paired t-test

A paired t-test is used to investigate the change in the mean of a population before and after some experimental intervention , based on a paired sample, i.e., when each subject has been measured twice: before and after treatment.

In particular, you can use this test to check whether, on average, the treatment has had any effect on the population .

The change in student test performance before and after taking a course.

The change in blood pressure in patients before and after administering some drug.

How to do a t-test?

So, you've decided which t-test to perform. These next steps will tell you how to calculate the p-value from t-test or its critical values, and then which decision to make about the null hypothesis.

Decide on the alternative hypothesis :

Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations' means) agrees or disagrees with the pre-set value.

Use a one-tailed t-test if you want to test whether this mean (or difference in means) is greater/less than the pre-set value.

Compute your T-score value :

Formulas for the test statistic in t-tests include the sample size , as well as its mean and standard deviation . The exact formula depends on the t-test type — check the sections dedicated to each particular test for more details.

Determine the degrees of freedom for the t-test:

The degrees of freedom are the number of observations in a sample that are free to vary as we estimate statistical parameters. In the simplest case, the number of degrees of freedom equals your sample size minus the number of parameters you need to estimate . Again, the exact formula depends on the t-test you want to perform — check the sections below for details.

The degrees of freedom are essential, as they determine the distribution followed by your T-score (under the null hypothesis). If there are d degrees of freedom, then the distribution of the test statistics is the t-Student distribution with d degrees of freedom . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from N(0,1).

💡 The t-Student distribution owes its name to William Sealy Gosset, who, in 1908, published his paper on the t-test under the pseudonym "Student". Gosset worked at the famous Guinness Brewery in Dublin, Ireland, and devised the t-test as an economical way to monitor the quality of beer. Cheers! 🍺🍺🍺

p-value from t-test

Recall that the p-value is the probability (calculated under the assumption that the null hypothesis is true) that the test statistic will produce values at least as extreme as the T-score produced for your sample . As probabilities correspond to areas under the density function, p-value from t-test can be nicely illustrated with the help of the following pictures:

p-value from t-test

The following formulae say how to calculate p-value from t-test. By cdf t,d we denote the cumulative distribution function of the t-Student distribution with d degrees of freedom:

p-value from left-tailed t-test:

p-value = cdf t,d (t score )

p-value from right-tailed t-test:

p-value = 1 − cdf t,d (t score )

p-value from two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

or, equivalently: p-value = 2 − 2 × cdf t,d (|t score |)

However, the cdf of the t-distribution is given by a somewhat complicated formula. To find the p-value by hand, you would need to resort to statistical tables, where approximate cdf values are collected, or to specialized statistical software. Fortunately, our t-test calculator determines the p-value from t-test for you in the blink of an eye!

t-test critical values

Recall, that in the critical values approach to hypothesis testing, you need to set a significance level, α, before computing the critical values , which in turn give rise to critical regions (a.k.a. rejection regions).

Formulas for critical values employ the quantile function of t-distribution, i.e., the inverse of the cdf :

Critical value for left-tailed t-test: cdf t,d -1 (α)

critical region:

(-∞, cdf t,d -1 (α)]

Critical value for right-tailed t-test: cdf t,d -1 (1-α)

[cdf t,d -1 (1-α), ∞)

Critical values for two-tailed t-test: ±cdf t,d -1 (1-α/2)

(-∞, -cdf t,d -1 (1-α/2)] ∪ [cdf t,d -1 (1-α/2), ∞)

To decide the fate of the null hypothesis, just check if your T-score lies within the critical region:

If your T-score belongs to the critical region , reject the null hypothesis and accept the alternative hypothesis.

If your T-score is outside the critical region , then you don't have enough evidence to reject the null hypothesis.

How to use our t-test calculator

Choose the type of t-test you wish to perform:

A one-sample t-test (to test the mean of a single group against a hypothesized mean);

A two-sample t-test (to compare the means for two groups); or

A paired t-test (to check how the mean from the same group changes after some intervention).

Two-tailed;

Left-tailed; or

Right-tailed.

This t-test calculator allows you to use either the p-value approach or the critical regions approach to hypothesis testing!

Enter your T-score and the number of degrees of freedom . If you don't know them, provide some data about your sample(s): sample size, mean, and standard deviation, and our t-test calculator will compute the T-score and degrees of freedom for you .

Once all the parameters are present, the p-value, or critical region, will immediately appear underneath the t-test calculator, along with an interpretation!

One-sample t-test

The null hypothesis is that the population mean is equal to some value μ 0 \mu_0 μ 0 ​ .

The alternative hypothesis is that the population mean is:

  • different from μ 0 \mu_0 μ 0 ​ ;
  • smaller than μ 0 \mu_0 μ 0 ​ ; or
  • greater than μ 0 \mu_0 μ 0 ​ .

One-sample t-test formula :

  • μ 0 \mu_0 μ 0 ​ — Mean postulated in the null hypothesis;
  • n n n — Sample size;
  • x ˉ \bar{x} x ˉ — Sample mean; and
  • s s s — Sample standard deviation.

Number of degrees of freedom in t-test (one-sample) = n − 1 n-1 n − 1 .

The null hypothesis is that the actual difference between these groups' means, μ 1 \mu_1 μ 1 ​ , and μ 2 \mu_2 μ 2 ​ , is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the difference μ 1 − μ 2 \mu_1 - \mu_2 μ 1 ​ − μ 2 ​ is:

  • Different from Δ \Delta Δ ;
  • Smaller than Δ \Delta Δ ; or
  • Greater than Δ \Delta Δ .

In particular, if this pre-determined difference is zero ( Δ = 0 \Delta = 0 Δ = 0 ):

The null hypothesis is that the population means are equal.

The alternate hypothesis is that the population means are:

  • μ 1 \mu_1 μ 1 ​ and μ 2 \mu_2 μ 2 ​ are different from one another;
  • μ 1 \mu_1 μ 1 ​ is smaller than μ 2 \mu_2 μ 2 ​ ; and
  • μ 1 \mu_1 μ 1 ​ is greater than μ 2 \mu_2 μ 2 ​ .

Formally, to perform a t-test, we should additionally assume that the variances of the two populations are equal (this assumption is called the homogeneity of variance ).

There is a version of a t-test that can be applied without the assumption of homogeneity of variance: it is called a Welch's t-test . For your convenience, we describe both versions.

Two-sample t-test if variances are equal

Use this test if you know that the two populations' variances are the same (or very similar).

Two-sample t-test formula (with equal variances) :

where s p s_p s p ​ is the so-called pooled standard deviation , which we compute as:

  • Δ \Delta Δ — Mean difference postulated in the null hypothesis;
  • n 1 n_1 n 1 ​ — First sample size;
  • x ˉ 1 \bar{x}_1 x ˉ 1 ​ — Mean for the first sample;
  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • n 2 n_2 n 2 ​ — Second sample size;
  • x ˉ 2 \bar{x}_2 x ˉ 2 ​ — Mean for the second sample; and
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

Number of degrees of freedom in t-test (two samples, equal variances) = n 1 + n 2 − 2 n_1 + n_2 - 2 n 1 ​ + n 2 ​ − 2 .

Two-sample t-test if variances are unequal (Welch's t-test)

Use this test if the variances of your populations are different.

Two-sample Welch's t-test formula if variances are unequal:

  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

The number of degrees of freedom in a Welch's t-test (two-sample t-test with unequal variances) is very difficult to count. We can approximate it with the help of the following Satterthwaite formula :

Alternatively, you can take the smaller of n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 as a conservative estimate for the number of degrees of freedom.

🔎 The Satterthwaite formula for the degrees of freedom can be rewritten as a scaled weighted harmonic mean of the degrees of freedom of the respective samples: n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 , and the weights are proportional to the standard deviations of the corresponding samples.

As we commonly perform a paired t-test when we have data about the same subjects measured twice (before and after some treatment), let us adopt the convention of referring to the samples as the pre-group and post-group.

The null hypothesis is that the true difference between the means of pre- and post-populations is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the actual difference between these means is:

Typically, this pre-determined difference is zero. We can then reformulate the hypotheses as follows:

The null hypothesis is that the pre- and post-means are the same, i.e., the treatment has no impact on the population .

The alternative hypothesis:

  • The pre- and post-means are different from one another (treatment has some effect);
  • The pre-mean is smaller than the post-mean (treatment increases the result); or
  • The pre-mean is greater than the post-mean (treatment decreases the result).

Paired t-test formula

In fact, a paired t-test is technically the same as a one-sample t-test! Let us see why it is so. Let x 1 , . . . , x n x_1, ... , x_n x 1 ​ , ... , x n ​ be the pre observations and y 1 , . . . , y n y_1, ... , y_n y 1 ​ , ... , y n ​ the respective post observations. That is, x i , y i x_i, y_i x i ​ , y i ​ are the before and after measurements of the i -th subject.

For each subject, compute the difference, d i : = x i − y i d_i := x_i - y_i d i ​ := x i ​ − y i ​ . All that happens next is just a one-sample t-test performed on the sample of differences d 1 , . . . , d n d_1, ... , d_n d 1 ​ , ... , d n ​ . Take a look at the formula for the T-score :

Δ \Delta Δ — Mean difference postulated in the null hypothesis;

n n n — Size of the sample of differences, i.e., the number of pairs;

x ˉ \bar{x} x ˉ — Mean of the sample of differences; and

s s s  — Standard deviation of the sample of differences.

Number of degrees of freedom in t-test (paired): n − 1 n - 1 n − 1

t-test vs Z-test

We use a Z-test when we want to test the population mean of a normally distributed dataset, which has a known population variance . If the number of degrees of freedom is large, then the t-Student distribution is very close to N(0,1).

Hence, if there are many data points (at least 30), you may swap a t-test for a Z-test, and the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test because, in such cases, the t-Student distribution differs significantly from the N(0,1)!

🙋 Have you concluded you need to perform the z-test? Head straight to our z-test calculator !

What is a t-test?

A t-test is a widely used statistical test that analyzes the means of one or two groups of data. For instance, a t-test is performed on medical data to determine whether a new drug really helps.

What are different types of t-tests?

Different types of t-tests are:

  • One-sample t-test;
  • Two-sample t-test; and
  • Paired t-test.

How to find the t value in a one sample t-test?

To find the t-value:

  • Subtract the null hypothesis mean from the sample mean value.
  • Divide the difference by the standard deviation of the sample.
  • Multiply the resultant with the square root of the sample size.

.css-slt4t3.css-slt4t3{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-slt4t3.css-slt4t3:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-slt4t3 .js-external-link-button.link-like,.css-slt4t3 .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-slt4t3 .js-external-link-button.link-like:hover,.css-slt4t3 .js-external-link-anchor:hover,.css-slt4t3 .js-external-link-button.link-like:active,.css-slt4t3 .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-slt4t3 .js-external-link-button.link-like:focus-visible,.css-slt4t3 .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-slt4t3 p,.css-slt4t3 div{margin:0px;display:block;}.css-slt4t3 pre{margin:0px;display:block;}.css-slt4t3 pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-slt4t3 pre:not(:first-child){padding-top:8px;}.css-slt4t3 ul,.css-slt4t3 ol{display:block margin:0px;padding-left:20px;}.css-slt4t3 ul li,.css-slt4t3 ol li{padding-top:8px;}.css-slt4t3 ul ul,.css-slt4t3 ol ul,.css-slt4t3 ul ol,.css-slt4t3 ol ol{padding-top:0px;}.css-slt4t3 ul:not(:first-child),.css-slt4t3 ol:not(:first-child){padding-top:4px;} .css-4okk7a{margin:auto;background-color:white;overflow:auto;overflow-wrap:break-word;word-break:break-word;}.css-4okk7a code,.css-4okk7a kbd,.css-4okk7a pre,.css-4okk7a samp{font-family:monospace;}.css-4okk7a code{padding:2px 4px;color:#444;background:#ddd;border-radius:4px;}.css-4okk7a figcaption,.css-4okk7a caption{text-align:center;}.css-4okk7a figcaption{font-size:12px;font-style:italic;overflow:hidden;}.css-4okk7a h3{font-size:1.75rem;}.css-4okk7a h4{font-size:1.5rem;}.css-4okk7a .mathBlock{font-size:24px;-webkit-padding-start:4px;padding-inline-start:4px;}.css-4okk7a .mathBlock .katex{font-size:24px;text-align:left;}.css-4okk7a .math-inline{background-color:#f0f0f0;display:inline-block;font-size:inherit;padding:0 3px;}.css-4okk7a .videoBlock,.css-4okk7a .imageBlock{margin-bottom:16px;}.css-4okk7a .imageBlock__image-align--left,.css-4okk7a .videoBlock__video-align--left{float:left;}.css-4okk7a .imageBlock__image-align--right,.css-4okk7a .videoBlock__video-align--right{float:right;}.css-4okk7a .imageBlock__image-align--center,.css-4okk7a .videoBlock__video-align--center{display:block;margin-left:auto;margin-right:auto;clear:both;}.css-4okk7a .imageBlock__image-align--none,.css-4okk7a .videoBlock__video-align--none{clear:both;margin-left:0;margin-right:0;}.css-4okk7a .videoBlock__video--wrapper{position:relative;padding-bottom:56.25%;height:0;}.css-4okk7a .videoBlock__video--wrapper iframe{position:absolute;top:0;left:0;width:100%;height:100%;}.css-4okk7a .videoBlock__caption{text-align:left;}@font-face{font-family:'KaTeX_AMS';src:url(/katex-fonts/KaTeX_AMS-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_AMS-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_AMS-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Script';src:url(/katex-fonts/KaTeX_Script-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Script-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Script-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size1';src:url(/katex-fonts/KaTeX_Size1-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size1-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size1-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size2';src:url(/katex-fonts/KaTeX_Size2-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size2-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size2-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size3';src:url(/katex-fonts/KaTeX_Size3-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size3-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size3-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size4';src:url(/katex-fonts/KaTeX_Size4-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size4-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size4-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Typewriter';src:url(/katex-fonts/KaTeX_Typewriter-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Typewriter-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Typewriter-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}.css-4okk7a .katex{font:normal 1.21em KaTeX_Main,Times New Roman,serif;line-height:1.2;text-indent:0;text-rendering:auto;}.css-4okk7a .katex *{-ms-high-contrast-adjust:none!important;border-color:currentColor;}.css-4okk7a .katex .katex-version::after{content:'0.13.13';}.css-4okk7a .katex .katex-mathml{position:absolute;clip:rect(1px, 1px, 1px, 1px);padding:0;border:0;height:1px;width:1px;overflow:hidden;}.css-4okk7a .katex .katex-html>.newline{display:block;}.css-4okk7a .katex .base{position:relative;display:inline-block;white-space:nowrap;width:-webkit-min-content;width:-moz-min-content;width:-webkit-min-content;width:-moz-min-content;width:min-content;}.css-4okk7a .katex .strut{display:inline-block;}.css-4okk7a .katex .textbf{font-weight:bold;}.css-4okk7a .katex .textit{font-style:italic;}.css-4okk7a .katex .textrm{font-family:KaTeX_Main;}.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .texttt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathnormal{font-family:KaTeX_Math;font-style:italic;}.css-4okk7a .katex .mathit{font-family:KaTeX_Main;font-style:italic;}.css-4okk7a .katex .mathrm{font-style:normal;}.css-4okk7a .katex .mathbf{font-family:KaTeX_Main;font-weight:bold;}.css-4okk7a .katex .boldsymbol{font-family:KaTeX_Math;font-weight:bold;font-style:italic;}.css-4okk7a .katex .amsrm{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathbb,.css-4okk7a .katex .textbb{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathcal{font-family:KaTeX_Caligraphic;}.css-4okk7a .katex .mathfrak,.css-4okk7a .katex .textfrak{font-family:KaTeX_Fraktur;}.css-4okk7a .katex .mathtt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathscr,.css-4okk7a .katex .textscr{font-family:KaTeX_Script;}.css-4okk7a .katex .mathsf,.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .mathboldsf,.css-4okk7a .katex .textboldsf{font-family:KaTeX_SansSerif;font-weight:bold;}.css-4okk7a .katex .mathitsf,.css-4okk7a .katex .textitsf{font-family:KaTeX_SansSerif;font-style:italic;}.css-4okk7a .katex .mainrm{font-family:KaTeX_Main;font-style:normal;}.css-4okk7a .katex .vlist-t{display:inline-table;table-layout:fixed;border-collapse:collapse;}.css-4okk7a .katex .vlist-r{display:table-row;}.css-4okk7a .katex .vlist{display:table-cell;vertical-align:bottom;position:relative;}.css-4okk7a .katex .vlist>span{display:block;height:0;position:relative;}.css-4okk7a .katex .vlist>span>span{display:inline-block;}.css-4okk7a .katex .vlist>span>.pstrut{overflow:hidden;width:0;}.css-4okk7a .katex .vlist-t2{margin-right:-2px;}.css-4okk7a .katex .vlist-s{display:table-cell;vertical-align:bottom;font-size:1px;width:2px;min-width:2px;}.css-4okk7a .katex .vbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-align-items:baseline;-webkit-box-align:baseline;-ms-flex-align:baseline;align-items:baseline;}.css-4okk7a .katex .hbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:100%;}.css-4okk7a .katex .thinbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:0;max-width:0;}.css-4okk7a .katex .msupsub{text-align:left;}.css-4okk7a .katex .mfrac>span>span{text-align:center;}.css-4okk7a .katex .mfrac .frac-line{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .mfrac .frac-line,.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline,.css-4okk7a .katex .hdashline,.css-4okk7a .katex .rule{min-height:1px;}.css-4okk7a .katex .mspace{display:inline-block;}.css-4okk7a .katex .llap,.css-4okk7a .katex .rlap,.css-4okk7a .katex .clap{width:0;position:relative;}.css-4okk7a .katex .llap>.inner,.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{position:absolute;}.css-4okk7a .katex .llap>.fix,.css-4okk7a .katex .rlap>.fix,.css-4okk7a .katex .clap>.fix{display:inline-block;}.css-4okk7a .katex .llap>.inner{right:0;}.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{left:0;}.css-4okk7a .katex .clap>.inner>span{margin-left:-50%;margin-right:50%;}.css-4okk7a .katex .rule{display:inline-block;border:solid 0;position:relative;}.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .hdashline{display:inline-block;width:100%;border-bottom-style:dashed;}.css-4okk7a .katex .sqrt>.root{margin-left:0.27777778em;margin-right:-0.55555556em;}.css-4okk7a .katex .sizing.reset-size1.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size1{font-size:1em;}.css-4okk7a .katex .sizing.reset-size1.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size2{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size1.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size3{font-size:1.4em;}.css-4okk7a .katex .sizing.reset-size1.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size4{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size1.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size5{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size1.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size6{font-size:2em;}.css-4okk7a .katex .sizing.reset-size1.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size7{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size1.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size8{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size1.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size9{font-size:3.456em;}.css-4okk7a .katex .sizing.reset-size1.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size10{font-size:4.148em;}.css-4okk7a .katex .sizing.reset-size1.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size11{font-size:4.976em;}.css-4okk7a .katex .sizing.reset-size2.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size1{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size2.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size2{font-size:1em;}.css-4okk7a .katex .sizing.reset-size2.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size3{font-size:1.16666667em;}.css-4okk7a .katex .sizing.reset-size2.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size4{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size2.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size5{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size2.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size6{font-size:1.66666667em;}.css-4okk7a .katex .sizing.reset-size2.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size7{font-size:2em;}.css-4okk7a .katex .sizing.reset-size2.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size8{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size2.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size9{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size2.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size10{font-size:3.45666667em;}.css-4okk7a .katex .sizing.reset-size2.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size11{font-size:4.14666667em;}.css-4okk7a .katex .sizing.reset-size3.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size1{font-size:0.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size2{font-size:0.85714286em;}.css-4okk7a .katex .sizing.reset-size3.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size3{font-size:1em;}.css-4okk7a .katex .sizing.reset-size3.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size4{font-size:1.14285714em;}.css-4okk7a .katex .sizing.reset-size3.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size5{font-size:1.28571429em;}.css-4okk7a .katex .sizing.reset-size3.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size6{font-size:1.42857143em;}.css-4okk7a .katex .sizing.reset-size3.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size7{font-size:1.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size8{font-size:2.05714286em;}.css-4okk7a .katex .sizing.reset-size3.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size9{font-size:2.46857143em;}.css-4okk7a .katex .sizing.reset-size3.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size10{font-size:2.96285714em;}.css-4okk7a .katex .sizing.reset-size3.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size11{font-size:3.55428571em;}.css-4okk7a .katex .sizing.reset-size4.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size1{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size4.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size2{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size4.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size3{font-size:0.875em;}.css-4okk7a .katex .sizing.reset-size4.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size4{font-size:1em;}.css-4okk7a .katex .sizing.reset-size4.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size5{font-size:1.125em;}.css-4okk7a .katex .sizing.reset-size4.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size6{font-size:1.25em;}.css-4okk7a .katex .sizing.reset-size4.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size7{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size4.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size8{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size4.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size9{font-size:2.16em;}.css-4okk7a .katex .sizing.reset-size4.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size10{font-size:2.5925em;}.css-4okk7a .katex .sizing.reset-size4.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size11{font-size:3.11em;}.css-4okk7a .katex .sizing.reset-size5.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size1{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size5.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size2{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size5.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size3{font-size:0.77777778em;}.css-4okk7a .katex .sizing.reset-size5.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size4{font-size:0.88888889em;}.css-4okk7a .katex .sizing.reset-size5.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size5{font-size:1em;}.css-4okk7a .katex .sizing.reset-size5.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size6{font-size:1.11111111em;}.css-4okk7a .katex .sizing.reset-size5.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size7{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size5.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size8{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size5.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size9{font-size:1.92em;}.css-4okk7a .katex .sizing.reset-size5.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size10{font-size:2.30444444em;}.css-4okk7a .katex .sizing.reset-size5.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size11{font-size:2.76444444em;}.css-4okk7a .katex .sizing.reset-size6.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size1{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size6.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size2{font-size:0.6em;}.css-4okk7a .katex .sizing.reset-size6.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size3{font-size:0.7em;}.css-4okk7a .katex .sizing.reset-size6.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size4{font-size:0.8em;}.css-4okk7a .katex .sizing.reset-size6.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size5{font-size:0.9em;}.css-4okk7a .katex .sizing.reset-size6.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size6{font-size:1em;}.css-4okk7a .katex .sizing.reset-size6.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size7{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size6.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size8{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size6.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size9{font-size:1.728em;}.css-4okk7a .katex .sizing.reset-size6.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size10{font-size:2.074em;}.css-4okk7a .katex .sizing.reset-size6.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size11{font-size:2.488em;}.css-4okk7a .katex .sizing.reset-size7.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size1{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size7.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size2{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size7.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size3{font-size:0.58333333em;}.css-4okk7a .katex .sizing.reset-size7.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size4{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size7.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size5{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size7.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size6{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size7.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size7{font-size:1em;}.css-4okk7a .katex .sizing.reset-size7.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size8{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size7.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size9{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size7.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size10{font-size:1.72833333em;}.css-4okk7a .katex .sizing.reset-size7.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size11{font-size:2.07333333em;}.css-4okk7a .katex .sizing.reset-size8.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size1{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size8.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size2{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size8.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size3{font-size:0.48611111em;}.css-4okk7a .katex .sizing.reset-size8.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size4{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size8.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size5{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size8.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size6{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size8.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size7{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size8.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size8{font-size:1em;}.css-4okk7a .katex .sizing.reset-size8.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size9{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size8.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size10{font-size:1.44027778em;}.css-4okk7a .katex .sizing.reset-size8.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size11{font-size:1.72777778em;}.css-4okk7a .katex .sizing.reset-size9.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size1{font-size:0.28935185em;}.css-4okk7a .katex .sizing.reset-size9.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size2{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size9.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size3{font-size:0.40509259em;}.css-4okk7a .katex .sizing.reset-size9.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size4{font-size:0.46296296em;}.css-4okk7a .katex .sizing.reset-size9.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size5{font-size:0.52083333em;}.css-4okk7a .katex .sizing.reset-size9.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size6{font-size:0.5787037em;}.css-4okk7a .katex .sizing.reset-size9.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size7{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size9.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size8{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size9.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size9{font-size:1em;}.css-4okk7a .katex .sizing.reset-size9.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size10{font-size:1.20023148em;}.css-4okk7a .katex .sizing.reset-size9.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size11{font-size:1.43981481em;}.css-4okk7a .katex .sizing.reset-size10.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size1{font-size:0.24108004em;}.css-4okk7a .katex .sizing.reset-size10.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size2{font-size:0.28929605em;}.css-4okk7a .katex .sizing.reset-size10.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size3{font-size:0.33751205em;}.css-4okk7a .katex .sizing.reset-size10.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size4{font-size:0.38572806em;}.css-4okk7a .katex .sizing.reset-size10.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size5{font-size:0.43394407em;}.css-4okk7a .katex .sizing.reset-size10.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size6{font-size:0.48216008em;}.css-4okk7a .katex .sizing.reset-size10.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size7{font-size:0.57859209em;}.css-4okk7a .katex .sizing.reset-size10.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size8{font-size:0.69431051em;}.css-4okk7a .katex .sizing.reset-size10.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size9{font-size:0.83317261em;}.css-4okk7a .katex .sizing.reset-size10.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size10{font-size:1em;}.css-4okk7a .katex .sizing.reset-size10.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size11{font-size:1.19961427em;}.css-4okk7a .katex .sizing.reset-size11.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size1{font-size:0.20096463em;}.css-4okk7a .katex .sizing.reset-size11.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size2{font-size:0.24115756em;}.css-4okk7a .katex .sizing.reset-size11.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size3{font-size:0.28135048em;}.css-4okk7a .katex .sizing.reset-size11.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size4{font-size:0.32154341em;}.css-4okk7a .katex .sizing.reset-size11.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size5{font-size:0.36173633em;}.css-4okk7a .katex .sizing.reset-size11.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size6{font-size:0.40192926em;}.css-4okk7a .katex .sizing.reset-size11.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size7{font-size:0.48231511em;}.css-4okk7a .katex .sizing.reset-size11.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size8{font-size:0.57877814em;}.css-4okk7a .katex .sizing.reset-size11.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size9{font-size:0.69453376em;}.css-4okk7a .katex .sizing.reset-size11.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size10{font-size:0.83360129em;}.css-4okk7a .katex .sizing.reset-size11.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size11{font-size:1em;}.css-4okk7a .katex .delimsizing.size1{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.size2{font-family:KaTeX_Size2;}.css-4okk7a .katex .delimsizing.size3{font-family:KaTeX_Size3;}.css-4okk7a .katex .delimsizing.size4{font-family:KaTeX_Size4;}.css-4okk7a .katex .delimsizing.mult .delim-size1>span{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.mult .delim-size4>span{font-family:KaTeX_Size4;}.css-4okk7a .katex .nulldelimiter{display:inline-block;width:0.12em;}.css-4okk7a .katex .delimcenter{position:relative;}.css-4okk7a .katex .op-symbol{position:relative;}.css-4okk7a .katex .op-symbol.small-op{font-family:KaTeX_Size1;}.css-4okk7a .katex .op-symbol.large-op{font-family:KaTeX_Size2;}.css-4okk7a .katex .op-limits>.vlist-t{text-align:center;}.css-4okk7a .katex .accent>.vlist-t{text-align:center;}.css-4okk7a .katex .accent .accent-body{position:relative;}.css-4okk7a .katex .accent .accent-body:not(.accent-full){width:0;}.css-4okk7a .katex .overlay{display:block;}.css-4okk7a .katex .mtable .vertical-separator{display:inline-block;min-width:1px;}.css-4okk7a .katex .mtable .arraycolsep{display:inline-block;}.css-4okk7a .katex .mtable .col-align-c>.vlist-t{text-align:center;}.css-4okk7a .katex .mtable .col-align-l>.vlist-t{text-align:left;}.css-4okk7a .katex .mtable .col-align-r>.vlist-t{text-align:right;}.css-4okk7a .katex .svg-align{text-align:left;}.css-4okk7a .katex svg{display:block;position:absolute;width:100%;height:inherit;fill:currentColor;stroke:currentColor;fill-rule:nonzero;fill-opacity:1;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;}.css-4okk7a .katex svg path{stroke:none;}.css-4okk7a .katex img{border-style:none;min-width:0;min-height:0;max-width:none;max-height:none;}.css-4okk7a .katex .stretchy{width:100%;display:block;position:relative;overflow:hidden;}.css-4okk7a .katex .stretchy::before,.css-4okk7a .katex .stretchy::after{content:'';}.css-4okk7a .katex .hide-tail{width:100%;position:relative;overflow:hidden;}.css-4okk7a .katex .halfarrow-left{position:absolute;left:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .halfarrow-right{position:absolute;right:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .brace-left{position:absolute;left:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .brace-center{position:absolute;left:25%;width:50%;overflow:hidden;}.css-4okk7a .katex .brace-right{position:absolute;right:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .x-arrow-pad{padding:0 0.5em;}.css-4okk7a .katex .cd-arrow-pad{padding:0 0.55556em 0 0.27778em;}.css-4okk7a .katex .x-arrow,.css-4okk7a .katex .mover,.css-4okk7a .katex .munder{text-align:center;}.css-4okk7a .katex .boxpad{padding:0 0.3em 0 0.3em;}.css-4okk7a .katex .fbox,.css-4okk7a .katex .fcolorbox{box-sizing:border-box;border:0.04em solid;}.css-4okk7a .katex .cancel-pad{padding:0 0.2em 0 0.2em;}.css-4okk7a .katex .cancel-lap{margin-left:-0.2em;margin-right:-0.2em;}.css-4okk7a .katex .sout{border-bottom-style:solid;border-bottom-width:0.08em;}.css-4okk7a .katex .angl{box-sizing:border-box;border-top:0.049em solid;border-right:0.049em solid;margin-right:0.03889em;}.css-4okk7a .katex .anglpad{padding:0 0.03889em 0 0.03889em;}.css-4okk7a .katex .eqn-num::before{counter-increment:katexEqnNo;content:'(' counter(katexEqnNo) ')';}.css-4okk7a .katex .mml-eqn-num::before{counter-increment:mmlEqnNo;content:'(' counter(mmlEqnNo) ')';}.css-4okk7a .katex .mtr-glue{width:50%;}.css-4okk7a .katex .cd-vert-arrow{display:inline-block;position:relative;}.css-4okk7a .katex .cd-label-left{display:inline-block;position:absolute;right:calc(50% + 0.3em);text-align:left;}.css-4okk7a .katex .cd-label-right{display:inline-block;position:absolute;left:calc(50% + 0.3em);text-align:right;}.css-4okk7a .katex-display{display:block;margin:1em 0;text-align:center;}.css-4okk7a .katex-display>.katex{display:block;white-space:nowrap;}.css-4okk7a .katex-display>.katex>.katex-html{display:block;position:relative;}.css-4okk7a .katex-display>.katex>.katex-html>.tag{position:absolute;right:0;}.css-4okk7a .katex-display.leqno>.katex>.katex-html>.tag{left:0;right:auto;}.css-4okk7a .katex-display.fleqn>.katex{text-align:left;padding-left:2em;}.css-4okk7a body{counter-reset:katexEqnNo mmlEqnNo;}.css-4okk7a table{width:-webkit-max-content;width:-moz-max-content;width:max-content;}.css-4okk7a .tableBlock{max-width:100%;margin-bottom:1rem;overflow-y:scroll;}.css-4okk7a .tableBlock thead,.css-4okk7a .tableBlock thead th{border-bottom:1px solid #333!important;}.css-4okk7a .tableBlock th,.css-4okk7a .tableBlock td{padding:10px;text-align:left;}.css-4okk7a .tableBlock th{font-weight:bold!important;}.css-4okk7a .tableBlock caption{caption-side:bottom;color:#555;font-size:12px;font-style:italic;text-align:center;}.css-4okk7a .tableBlock caption>p{margin:0;}.css-4okk7a .tableBlock th>p,.css-4okk7a .tableBlock td>p{margin:0;}.css-4okk7a .tableBlock [data-background-color='aliceblue']{background-color:#f0f8ff;color:#000;}.css-4okk7a .tableBlock [data-background-color='black']{background-color:#000;color:#fff;}.css-4okk7a .tableBlock [data-background-color='chocolate']{background-color:#d2691e;color:#fff;}.css-4okk7a .tableBlock [data-background-color='cornflowerblue']{background-color:#6495ed;color:#fff;}.css-4okk7a .tableBlock [data-background-color='crimson']{background-color:#dc143c;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkblue']{background-color:#00008b;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkseagreen']{background-color:#8fbc8f;color:#000;}.css-4okk7a .tableBlock [data-background-color='deepskyblue']{background-color:#00bfff;color:#000;}.css-4okk7a .tableBlock [data-background-color='gainsboro']{background-color:#dcdcdc;color:#000;}.css-4okk7a .tableBlock [data-background-color='grey']{background-color:#808080;color:#fff;}.css-4okk7a .tableBlock [data-background-color='lemonchiffon']{background-color:#fffacd;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightpink']{background-color:#ffb6c1;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightsalmon']{background-color:#ffa07a;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightskyblue']{background-color:#87cefa;color:#000;}.css-4okk7a .tableBlock [data-background-color='mediumblue']{background-color:#0000cd;color:#fff;}.css-4okk7a .tableBlock [data-background-color='omnigrey']{background-color:#f0f0f0;color:#000;}.css-4okk7a .tableBlock [data-background-color='white']{background-color:#fff;color:#000;}.css-4okk7a .tableBlock [data-text-align='center']{text-align:center;}.css-4okk7a .tableBlock [data-text-align='left']{text-align:left;}.css-4okk7a .tableBlock [data-text-align='right']{text-align:right;}.css-4okk7a .tableBlock [data-vertical-align='bottom']{vertical-align:bottom;}.css-4okk7a .tableBlock [data-vertical-align='middle']{vertical-align:middle;}.css-4okk7a .tableBlock [data-vertical-align='top']{vertical-align:top;}.css-4okk7a .tableBlock__font-size--xxsmall{font-size:10px;}.css-4okk7a .tableBlock__font-size--xsmall{font-size:12px;}.css-4okk7a .tableBlock__font-size--small{font-size:14px;}.css-4okk7a .tableBlock__font-size--large{font-size:18px;}.css-4okk7a .tableBlock__border--some tbody tr:not(:last-child){border-bottom:1px solid #e2e5e7;}.css-4okk7a .tableBlock__border--bordered td,.css-4okk7a .tableBlock__border--bordered th{border:1px solid #e2e5e7;}.css-4okk7a .tableBlock__border--borderless tbody+tbody,.css-4okk7a .tableBlock__border--borderless td,.css-4okk7a .tableBlock__border--borderless th,.css-4okk7a .tableBlock__border--borderless tr,.css-4okk7a .tableBlock__border--borderless thead,.css-4okk7a .tableBlock__border--borderless thead th{border:0!important;}.css-4okk7a .tableBlock:not(.tableBlock__table-striped) tbody tr{background-color:unset!important;}.css-4okk7a .tableBlock__table-striped tbody tr:nth-of-type(odd){background-color:#f9fafc!important;}.css-4okk7a .tableBlock__table-compactl th,.css-4okk7a .tableBlock__table-compact td{padding:3px!important;}.css-4okk7a .tableBlock__full-size{width:100%;}.css-4okk7a .textBlock{margin-bottom:16px;}.css-4okk7a .textBlock__text-formatting--finePrint{font-size:12px;}.css-4okk7a .textBlock__text-infoBox{padding:0.75rem 1.25rem;margin-bottom:1rem;border:1px solid transparent;border-radius:0.25rem;}.css-4okk7a .textBlock__text-infoBox p{margin:0;}.css-4okk7a .textBlock__text-infoBox--primary{background-color:#cce5ff;border-color:#b8daff;color:#004085;}.css-4okk7a .textBlock__text-infoBox--secondary{background-color:#e2e3e5;border-color:#d6d8db;color:#383d41;}.css-4okk7a .textBlock__text-infoBox--success{background-color:#d4edda;border-color:#c3e6cb;color:#155724;}.css-4okk7a .textBlock__text-infoBox--danger{background-color:#f8d7da;border-color:#f5c6cb;color:#721c24;}.css-4okk7a .textBlock__text-infoBox--warning{background-color:#fff3cd;border-color:#ffeeba;color:#856404;}.css-4okk7a .textBlock__text-infoBox--info{background-color:#d1ecf1;border-color:#bee5eb;color:#0c5460;}.css-4okk7a .textBlock__text-infoBox--dark{background-color:#d6d8d9;border-color:#c6c8ca;color:#1b1e21;}.css-4okk7a .text-overline{-webkit-text-decoration:overline;text-decoration:overline;}.css-4okk7a.css-4okk7a{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-4okk7a.css-4okk7a:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-4okk7a .js-external-link-button.link-like,.css-4okk7a .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-4okk7a .js-external-link-button.link-like:hover,.css-4okk7a .js-external-link-anchor:hover,.css-4okk7a .js-external-link-button.link-like:active,.css-4okk7a .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-4okk7a .js-external-link-button.link-like:focus-visible,.css-4okk7a .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-4okk7a p,.css-4okk7a div{margin:0px;display:block;}.css-4okk7a pre{margin:0px;display:block;}.css-4okk7a pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-4okk7a pre:not(:first-child){padding-top:8px;}.css-4okk7a ul,.css-4okk7a ol{display:block margin:0px;padding-left:20px;}.css-4okk7a ul li,.css-4okk7a ol li{padding-top:8px;}.css-4okk7a ul ul,.css-4okk7a ol ul,.css-4okk7a ul ol,.css-4okk7a ol ol{padding-top:0px;}.css-4okk7a ul:not(:first-child),.css-4okk7a ol:not(:first-child){padding-top:4px;} Test setup

Choose test type

t-test for the population mean, μ, based on one independent sample . Null hypothesis H 0 : μ = μ 0  

Alternative hypothesis H 1

Test details

Significance level α

The probability that we reject a true H 0 (type I error).

Degrees of freedom

Calculated as sample size minus one.

Test results

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

FAQ: What are the differences between one-tailed and two-tailed tests?

When you conduct a test of statistical significance, whether it is from a correlation, an ANOVA, a regression or some other kind of test, you are given a p-value somewhere in the output.  If your test statistic is symmetrically distributed, you can select one of three alternative hypotheses. Two of these correspond to one-tailed tests and one corresponds to a two-tailed test.  However, the p-value presented is (almost always) for a two-tailed test.  But how do you choose which test?  Is the p-value appropriate for your test? And, if it is not, how can you calculate the correct p-value for your test given the p-value in your output?  

What is a two-tailed test?

First let’s start with the meaning of a two-tailed test.  If you are using a significance level of 0.05, a two-tailed test allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction.  This means that .025 is in each tail of the distribution of your test statistic. When using a two-tailed test, regardless of the direction of the relationship you hypothesize, you are testing for the possibility of the relationship in both directions.  For example, we may wish to compare the mean of a sample to a given value x using a t-test.  Our null hypothesis is that the mean is equal to x . A two-tailed test will test both if the mean is significantly greater than x and if the mean significantly less than x . The mean is considered significantly different from x if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05.     

What is a one-tailed test?

Next, let’s discuss the meaning of a one-tailed test.  If you are using a significance level of .05, a one-tailed test allots all of your alpha to testing the statistical significance in the one direction of interest.  This means that .05 is in one tail of the distribution of your test statistic. When using a one-tailed test, you are testing for the possibility of the relationship in one direction and completely disregarding the possibility of a relationship in the other direction.  Let’s return to our example comparing the mean of a sample to a given value x using a t-test.  Our null hypothesis is that the mean is equal to x . A one-tailed test will test either if the mean is significantly greater than x or if the mean is significantly less than x , but not both. Then, depending on the chosen tail, the mean is significantly greater than or less than x if the test statistic is in the top 5% of its probability distribution or bottom 5% of its probability distribution, resulting in a p-value less than 0.05.  The one-tailed test provides more power to detect an effect in one direction by not testing the effect in the other direction. A discussion of when this is an appropriate option follows.   

When is a one-tailed test appropriate?

Because the one-tailed test provides more power to detect an effect, you may be tempted to use a one-tailed test whenever you have a hypothesis about the direction of an effect. Before doing so, consider the consequences of missing an effect in the other direction.  Imagine you have developed a new drug that you believe is an improvement over an existing drug.  You wish to maximize your ability to detect the improvement, so you opt for a one-tailed test. In doing so, you fail to test for the possibility that the new drug is less effective than the existing drug.  The consequences in this example are extreme, but they illustrate a danger of inappropriate use of a one-tailed test.

So when is a one-tailed test appropriate? If you consider the consequences of missing an effect in the untested direction and conclude that they are negligible and in no way irresponsible or unethical, then you can proceed with a one-tailed test. For example, imagine again that you have developed a new drug. It is cheaper than the existing drug and, you believe, no less effective.  In testing this drug, you are only interested in testing if it less effective than the existing drug.  You do not care if it is significantly more effective.  You only wish to show that it is not less effective. In this scenario, a one-tailed test would be appropriate. 

When is a one-tailed test NOT appropriate?

Choosing a one-tailed test for the sole purpose of attaining significance is not appropriate.  Choosing a one-tailed test after running a two-tailed test that failed to reject the null hypothesis is not appropriate, no matter how "close" to significant the two-tailed test was.  Using statistical tests inappropriately can lead to invalid results that are not replicable and highly questionable–a steep price to pay for a significance star in your results table!   

Deriving a one-tailed test from two-tailed output

The default among statistical packages performing tests is to report two-tailed p-values.  Because the most commonly used test statistic distributions (standard normal, Student’s t) are symmetric about zero, most one-tailed p-values can be derived from the two-tailed p-values.   

Below, we have the output from a two-sample t-test in Stata.  The test is comparing the mean male score to the mean female score.  The null hypothesis is that the difference in means is zero.  The two-sided alternative is that the difference in means is not zero.  There are two one-sided alternatives that one could opt to test instead: that the male score is higher than the female score (diff  > 0) or that the female score is higher than the male score (diff < 0).  In this instance, Stata presents results for all three alternatives.  Under the headings Ha: diff < 0 and Ha: diff > 0 are the results for the one-tailed tests. In the middle, under the heading Ha: diff != 0 (which means that the difference is not equal to 0), are the results for the two-tailed test. 

Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- male | 91 50.12088 1.080274 10.30516 47.97473 52.26703 female | 109 54.99083 .7790686 8.133715 53.44658 56.53507 ---------+-------------------------------------------------------------------- combined | 200 52.775 .6702372 9.478586 51.45332 54.09668 ---------+-------------------------------------------------------------------- diff | -4.869947 1.304191 -7.441835 -2.298059 ------------------------------------------------------------------------------ Degrees of freedom: 198 Ho: mean(male) - mean(female) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -3.7341 t = -3.7341 t = -3.7341 P < t = 0.0001 P > |t| = 0.0002 P > t = 0.9999

Note that the test statistic, -3.7341, is the same for all of these tests.  The two-tailed p-value is P > |t|. This can be rewritten as P(>3.7341) + P(< -3.7341).  Because the t-distribution is symmetric about zero, these two probabilities are equal: P > |t| = 2 *  P(< -3.7341).  Thus, we can see that the two-tailed p-value is twice the one-tailed p-value for the alternative hypothesis that (diff < 0).  The other one-tailed alternative hypothesis has a p-value of P(>-3.7341) = 1-(P<-3.7341) = 1-0.0001 = 0.9999.   So, depending on the direction of the one-tailed hypothesis, its p-value is either 0.5*(two-tailed p-value) or 1-0.5*(two-tailed p-value) if the test statistic symmetrically distributed about zero. 

In this example, the two-tailed p-value suggests rejecting the null hypothesis of no difference. Had we opted for the one-tailed test of (diff > 0), we would fail to reject the null because of our choice of tails. 

The output below is from a regression analysis in Stata.  Unlike the example above, only the two-sided p-values are presented in this output.

Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 46.58 Model | 7363.62077 2 3681.81039 Prob > F = 0.0000 Residual | 15572.5742 197 79.0486001 R-squared = 0.3210 -------------+------------------------------ Adj R-squared = 0.3142 Total | 22936.195 199 115.257261 Root MSE = 8.8909 ------------------------------------------------------------------------------ socst | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- science | .2191144 .0820323 2.67 0.008 .0573403 .3808885 math | .4778911 .0866945 5.51 0.000 .3069228 .6488594 _cons | 15.88534 3.850786 4.13 0.000 8.291287 23.47939 ------------------------------------------------------------------------------

For each regression coefficient, the tested null hypothesis is that the coefficient is equal to zero.  Thus, the one-tailed alternatives are that the coefficient is greater than zero and that the coefficient is less than zero. To get the p-value for the one-tailed test of the variable science having a coefficient greater than zero, you would divide the .008 by 2, yielding .004 because the effect is going in the predicted direction. This is P(>2.67). If you had made your prediction in the other direction (the opposite direction of the model effect), the p-value would have been 1 – .004 = .996.  This is P(<2.67). For all three p-values, the test statistic is 2.67. 

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS

Hypothesis Testing

Type ii error.

The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is known as a t test and we use the t distribution. Use of the t distribution relies on the degrees of freedom, which is equal to the sample size minus one. Furthermore, if the population standard deviation σ is unknown, the sample standard deviation s is used instead. To switch from σ known to σ unknown, click on $\boxed{\sigma}$ and select $\boxed{s}$ in the Hypothesis Testing Calculator.

Next, the test statistic is used to conduct the test using either the p-value approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or two-tailed. The form can easily be identified by looking at the alternative hypothesis (H a ). If there is a less than sign in the alternative hypothesis then it is a lower tail test, greater than sign is an upper tail test and inequality is a two-tailed test. To switch from a lower tail test to an upper tail or two-tailed test, click on $\boxed{\geq}$ and select $\boxed{\leq}$ or $\boxed{=}$, respectively.

In the p-value approach, the test statistic is used to calculate a p-value. If the test is a lower tail test, the p-value is the probability of getting a value for the test statistic at least as small as the value from the sample. If the test is an upper tail test, the p-value is the probability of getting a value for the test statistic at least as large as the value from the sample. In a two-tailed test, the p-value is the probability of getting a value for the test statistic at least as unlikely as the value from the sample.

To test the hypothesis in the p-value approach, compare the p-value to the level of significance. If the p-value is less than or equal to the level of signifance, reject the null hypothesis. If the p-value is greater than the level of significance, do not reject the null hypothesis. This method remains unchanged regardless of whether it's a lower tail, upper tail or two-tailed test. To change the level of significance, click on $\boxed{.05}$. Note that if the test statistic is given, you can calculate the p-value from the test statistic by clicking on the switch symbol twice.

In the critical value approach, the level of significance ($\alpha$) is used to calculate the critical value. In a lower tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the lower tail of the sampling distribution of the test statistic. In an upper tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the upper tail of the sampling distribution of the test statistic. In a two-tailed test, the critical values are the values of the test statistic providing areas of $\alpha / 2$ in the lower and upper tail of the sampling distribution of the test statistic.

To test the hypothesis in the critical value approach, compare the critical value to the test statistic. Unlike the p-value approach, the method we use to decide whether to reject the null hypothesis depends on the form of the hypothesis test. In a lower tail test, if the test statistic is less than or equal to the critical value, reject the null hypothesis. In an upper tail test, if the test statistic is greater than or equal to the critical value, reject the null hypothesis. In a two-tailed test, if the test statistic is less than or equal the lower critical value or greater than or equal to the upper critical value, reject the null hypothesis.

When conducting a hypothesis test, there is always a chance that you come to the wrong conclusion. There are two types of errors you can make: Type I Error and Type II Error. A Type I Error is committed if you reject the null hypothesis when the null hypothesis is true. Ideally, we'd like to accept the null hypothesis when the null hypothesis is true. A Type II Error is committed if you accept the null hypothesis when the alternative hypothesis is true. Ideally, we'd like to reject the null hypothesis when the alternative hypothesis is true.

Hypothesis testing is closely related to the statistical area of confidence intervals. If the hypothesized value of the population mean is outside of the confidence interval, we can reject the null hypothesis. Confidence intervals can be found using the Confidence Intervals Calculator . The calculator on this page does hypothesis tests for one population mean. Sometimes we're interest in hypothesis tests about two population means. These can be solved using the Two Populations Calculator . The probability of a Type II Error can be calculated by clicking on the link at the bottom of the page.

  • Search Search Please fill out this field.

What Is a Two-Tailed Test?

Understanding a two-tailed test, special considerations, two-tailed vs. one-tailed test.

  • Two-Tailed Test FAQs
  • Corporate Finance
  • Financial Analysis

What Is a Two-Tailed Test? Definition and Example

Adam Hayes, Ph.D., CFA, is a financial writer with 15+ years Wall Street experience as a derivatives trader. Besides his extensive derivative trading expertise, Adam is an expert in economics and behavioral finance. Adam received his master's in economics from The New School for Social Research and his Ph.D. from the University of Wisconsin-Madison in sociology. He is a CFA charterholder as well as holding FINRA Series 7, 55 & 63 licenses. He currently researches and teaches economic sociology and the social studies of finance at the Hebrew University in Jerusalem.

two tailed hypothesis test p value

Investopedia / Joules Garcia

A two-tailed test, in statistics, is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. It is used in null-hypothesis testing and testing for statistical significance . If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.

Key Takeaways

  • In statistics, a two-tailed test is a method in which the critical area of a distribution is two-sided and tests whether a sample is greater or less than a range of values.
  • It is used in null-hypothesis testing and testing for statistical significance.
  • If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.
  • By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%.

A basic concept of inferential statistics is hypothesis testing , which determines whether a claim is true or not given a population parameter. A hypothesis test that is designed to show whether the mean of a sample is significantly greater than and significantly less than the mean of a population is referred to as a two-tailed test. The two-tailed test gets its name from testing the area under both tails of a normal distribution , although the test can be used in other non-normal distributions.

A two-tailed test is designed to examine both sides of a specified data range as designated by the probability distribution involved. The probability distribution should represent the likelihood of a specified outcome based on predetermined standards. This requires the setting of a limit designating the highest (or upper) and lowest (or lower) accepted variable values included within the range. Any data point that exists above the upper limit or below the lower limit is considered out of the acceptance range and in an area referred to as the rejection range.

There is no inherent standard about the number of data points that must exist within the acceptance range. In instances where precision is required, such as in the creation of pharmaceutical drugs, a rejection rate of 0.001% or less may be instituted. In instances where precision is less critical, such as the number of food items in a product bag, a rejection rate of 5% may be appropriate.

A two-tailed test can also be used practically during certain production activities in a firm, such as with the production and packaging of candy at a particular facility. If the production facility designates 50 candies per bag as its goal, with an acceptable distribution of 45 to 55 candies, any bag found with an amount below 45 or above 55 is considered within the rejection range.

To confirm the packaging mechanisms are properly calibrated to meet the expected output, random sampling may be taken to confirm accuracy. A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member has an equal probability of being chosen.

For the packaging mechanisms to be considered accurate, an average of 50 candies per bag with an appropriate distribution is desired. Additionally, the number of bags that fall within the rejection range needs to fall within the probability distribution limit considered acceptable as an error rate. Here, the null hypothesis would be that the mean is 50 while the alternate hypothesis would be that it is not 50.

If, after conducting the two-tailed test, the z-score falls in the rejection region, meaning that the deviation is too far from the desired mean, then adjustments to the facility or associated equipment may be required to correct the error. Regular use of two-tailed testing methods can help ensure production stays within limits over the long term.

Be careful to note if a statistical test is one- or two-tailed as this will greatly influence a model's interpretation.

When a hypothesis test is set up to show that the sample mean would be only higher than the population mean, this is referred to as a  one-tailed test . A formulation of this hypothesis would be, for example, that "the returns on an investment fund would be  at least  x%." One-tailed tests could also be set up to show that the sample mean could be only less than the population mean. The key difference from a two-tailed test is that in a two-tailed test, the sample mean could be different from the population mean by being  either  higher or lower than it.

If the sample being tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of the null hypothesis. A one-tailed test is also known as a directional hypothesis or directional test.

A two-tailed test, on the other hand, is designed to examine both sides of a specified data range to test whether a sample is greater than or less than the range of values.

Example of a Two-Tailed Test

As a hypothetical example, imagine that a new  stockbroker , named XYZ, claims that their brokerage fees are lower than that of your current stockbroker, ABC) Data available from an independent research firm indicates that the mean and standard deviation of all ABC broker clients are $18 and $6, respectively.

A sample of 100 clients of ABC is taken, and brokerage charges are calculated with the new rates of XYZ broker. If the mean of the sample is $18.75 and the sample standard deviation is $6, can any inference be made about the difference in the average brokerage bill between ABC and XYZ broker?

  • H 0 : Null Hypothesis: mean = 18
  • H 1 : Alternative Hypothesis: mean <> 18 (This is what we want to prove.)
  • Rejection region: Z <= - Z 2.5  and Z>=Z 2.5  (assuming 5% significance level, split 2.5 each on either side).
  • Z = (sample mean – mean) / (std-dev / sqrt (no. of samples)) = (18.75 – 18) / (6/(sqrt(100)) = 1.25

This calculated Z value falls between the two limits defined by: - Z 2.5  = -1.96 and Z 2.5  = 1.96.

This concludes that there is insufficient evidence to infer that there is any difference between the rates of your existing broker and the new broker. Therefore, the null hypothesis cannot be rejected. Alternatively, the p-value = P(Z< -1.25)+P(Z >1.25) = 2 * 0.1056 = 0.2112 = 21.12%, which is greater than 0.05 or 5%, leads to the same conclusion.

How Is a Two-Tailed Test Designed?

A two-tailed test is designed to determine whether a claim is true or not given a population parameter. It examines both sides of a specified data range as designated by the probability distribution involved. As such, the probability distribution should represent the likelihood of a specified outcome based on predetermined standards.

What Is the Difference Between a Two-Tailed and One-Tailed Test?

A two-tailed hypothesis test is designed to show whether the sample mean is significantly greater than  or  significantly less than the mean of a population. The two-tailed test gets its name from testing the area under both tails (sides) of a normal distribution. A one-tailed hypothesis test, on the other hand, is set up to show only one test; that the sample mean would be higher than the population mean, or, in a separate test, that the sample mean would be lower than the population mean.

What Is a Z-score?

A Z-score numerically describes a value's relationship to the mean of a group of values and is measured in terms of the number of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score whereas Z-scores of 1.0 and -1.0 would indicate values one standard deviation above or below the mean. In most large data sets, 99% of values have a Z-score between -3 and 3, meaning they lie within three standard deviations above and below the mean.

San Jose State University. " 6: Introduction to Null Hypothesis Significance Testing ."

two tailed hypothesis test p value

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

2.1.5: Critical Values, p-values, and Significance

  • Last updated
  • Save as PDF
  • Page ID 22070

  • Michelle Oja
  • Taft College

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

A low probability value casts doubt on the null hypothesis. How low must the probability value be in order to conclude that the null hypothesis is false? Although there is clearly no right or wrong answer to this question, it is conventional to conclude the null hypothesis is false if the probability value is less than 0.05. More conservative researchers conclude the null hypothesis is false only if the probability value is less than 0.01. When a researcher concludes that the null hypothesis is false, the researcher is said to have rejected the null hypothesis. The probability value below which the null hypothesis is rejected is called the α level or simply \(α\) (“alpha”). It is also called the significance level. If α is not explicitly specified, assume that \(α\) = 0.05.

The significance level is a threshold we set before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use. If our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis; if not, we fail to reject the null (we never “accept” the null).

There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effect. The value laid out in H0 is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true. Now recall that values of z which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as or more extreme than \(z\) is very small as we get into the tails of the distribution. Our significance level corresponds to the area under the tail that is exactly equal to α: if we use our normal criterion of \(α\) = .05, then 5% of the area under the curve becomes what we call the rejection region (also called the critical region) of the distribution. This is illustrated in Figure \(\PageIndex{1}\).

Standard normal curve with the right tail shaded starting between 1 SD and 2 SDs higher than the mean, and called the Rejection Region.

The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.

The rejection region is bounded by a specific \(z\)-value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value, \(z_{crit}\) (“\(z\)-crit”) or \(z*\) (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z-score corresponding to any area under the curve like we did in Unit 1. If we go to the normal table, we will find that the z-score corresponding to 5% of the area under the curve is equal to 1.645 (\(z\) = -1.64 corresponds to 0.0505 and \(z\) = -1.65 corresponds to 0.0497, so .05 is between them) if look at the proportion below the z-score.  It would be 1.645 are looking for the top 5% of scores. The direction must be determined by your alternative hypothesis; drawing then shading the distribution is helpful for keeping directionality straight.

We will only talk about non-directional hypotheses when discussing Confidence Intervals, so here is a brief introduction. For non-directional research hypothesis, the critical region must be split between both tails. But we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For \(α\) = .05, this means 2.5% of the area is in each tail, which, based on the z-table, corresponds to critical values of \(z*\) = ±1.96. This is shown in Figure \(\PageIndex{2}\).

The Rejection Region is now shaded on both tails more extreme than 2 SDs from the center of the standard normal curve.

Thus, any \(z\)-score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. Remember that as \(z\) gets larger (bigger standard deviations), the corresponding area under the curve beyond \(z\) gets smaller. Thus, if the area is smaller, the probability gets smaller that the number that we have from our sample is similar to the population. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true (that there really is no difference between the mean of the sample and the mean of the population) gets smaller (so there probably is a difference between the mean of the sample and the mean of the population).

The \(z\)-statistic is very useful when we are doing our calculations by hand. However, when we use computer software, it will report to us a \(p\)-value, which is simply the proportion of the area under the curve in the tails beyond our obtained \(z\)-statistic. We can directly compare this \(p\)-value to \(α\) to test our null hypothesis: if \(p < α\), we reject \(H_0\), but if \(p > α\), we fail to reject. Note also that the reverse is always true: if we use critical values to test our hypothesis, we will always know if \(p\) is greater than or less than \(α\). If we reject, we know that \(p < α\) because the obtained statistic falls farther out into the tail than the critical value that corresponds to \(α\), so the proportion (\(p\)-value) for thatstatistic will be smaller. Conversely, if we fail to reject, we know that the proportion will be larger than \(α\) because the statistic will not be as far into the tail. This is illustrated for a one-tailed test in Figure \(\PageIndex{3}\).

Three standard normal curves with differing Rejection Regions shaded depending on different probabilities.

When the null hypothesis is rejected, the effect is said to be statistically significant.

Statistical Significance

If we reject the null hypothesis, we can state that the means are different, or, in other words that the means are statistically significant. It is very important to keep in mind that statistical significance means only that the null hypothesis (of no mean differences) is rejected; it does not mean that the effect is important, which is what “significant” usually means. When an effect is statistically significant, you can have confidence the effect is not exactly zero. Finding that an effect is statistically significant does not tell you about how large or important the effect is. Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant meant that the statistics signified that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation. In writing research reports, you should use the word "substantial" if you are not talking about a statistically significant calculation to avoid any confusion.

Still confused? The next section covers this information slightly differently, then there's a summary that discusses it all differently again. These are tough concepts, so try your best and keep at it!

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

10.1: Comparing Two Independent Population Means (Hypothesis test)

  • Last updated
  • Save as PDF
  • Page ID 155268

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

  • The two independent samples are simple random samples from two distinct populations.
  • if the sample sizes are small, the distributions are important (should be normal)
  • if the sample sizes are large, the distributions are not important (need not be normal)

The test comparing two independent population means with unknown and possibly unequal population standard deviations is called the Aspin-Welch \(t\)-test. The degrees of freedom formula was developed by Aspin-Welch.

The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples. In order to account for the variation, we take the difference of the sample means, \(\bar{X}_{1} - \bar{X}_{2}\), and divide by the standard error in order to standardize the difference. The result is a t-score test statistic.

Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error , of the difference in sample means , \(\bar{X}_{1} - \bar{X}_{2}\).

The standard error is:

\[\sqrt{\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}}\]

The test statistic ( t -score) is calculated as follows:

\[\dfrac{(\bar{x}-\bar{x}) - (\mu_{1} - \mu_{2})}{\sqrt{\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}}}\]

  • \(s_{1}\) and \(s_{2}\), the sample standard deviations, are estimates of \(\sigma_{1}\) and \(\sigma_{1}\), respectively.
  • \(\sigma_{1}\) and \(\sigma_{2}\) are the unknown population standard deviations.
  • \(\bar{x}_{1}\) and \(\bar{x}_{2}\) are the sample means. \(\mu_{1}\) and \(\mu_{2}\) are the population means.

The number of degrees of freedom (\(df\)) requires a somewhat complicated calculation. However, a computer or calculator calculates it easily. The \(df\) are not always a whole number. The test statistic calculated previously is approximated by the Student's t -distribution with \(df\) as follows:

Degrees of freedom

\[df = \dfrac{\left(\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}}{\left(\dfrac{1}{n_{1}-1}\right)\left(\dfrac{(s_{1})^{2}}{n_{1}}\right)^{2} + \left(\dfrac{1}{n_{2}-1}\right)\left(\dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}}\]

We can also use a conservative estimation of degree of freedom by taking DF to be the smallest of \(n_{1}-1\) and \(n_{2}-1\)

When both sample sizes \(n_{1}\) and \(n_{2}\) are five or larger, the Student's t approximation is very good. Notice that the sample variances \((s_{1})^{2}\) and \((s_{2})^{2}\) are not pooled. (If the question comes up, do not pool the variances.)

It is not necessary to compute the degrees of freedom by hand. A calculator or computer easily computes it.

Example \(\PageIndex{1}\): Independent groups

The average amount of time boys and girls aged seven to 11 spend playing sports each day is believed to be the same. A study is done and data are collected, resulting in the data in Table \(\PageIndex{1}\). Each populations has a normal distribution.

Is there a difference in the mean amount of time boys and girls aged seven to 11 play sports each day? Test at the 5% level of significance.

The population standard deviations are not known. Let g be the subscript for girls and b be the subscript for boys. Then, \(\mu_{g}\) is the population mean for girls and \(\mu_{b}\) is the population mean for boys. This is a test of two independent groups, two population means.

Random variable: \(\bar{X}_{g} - \bar{X}_{b} =\) difference in the sample mean amount of time girls and boys play sports each day.

  • \(H_{0}: \mu_{g} = \mu_{b}\)  
  • \(H_{0}: \mu_{g} - \mu_{b} = 0\)
  • \(H_{a}: \mu_{g} \neq \mu_{b}\)  
  • \(H_{a}: \mu_{g} - \mu_{b} \neq 0\)

The words "the same" tell you \(H_{0}\) has an "=". Since there are no other words to indicate \(H_{a}\), assume it says "is different." This is a two-tailed test.

Distribution for the test: Use \(t_{df}\) where \(df\) is calculated using the \(df\) formula for independent groups, two population means. Using a calculator, \(df\) is approximately 18.8462. Do not pool the variances.

Calculate the p -value using a Student's t -distribution: \(p\text{-value} = 0.0054\)

This is a normal distribution curve representing the difference in the average amount of time girls and boys play sports all day. The mean is equal to zero, and the values -1.2, 0, and 1.2 are labeled on the horizontal axis. Two vertical lines extend from -1.2 and 1.2 to the curve. The region to the left of x = -1.2 and the region to the right of x = 1.2 are shaded to represent the p-value. The area of each region is 0.0028.

\[s_{g} = 0.866\]

\[s_{b} = 1\]

\[\bar{x}_{g} - \bar{x}_{b} = 2 - 3.2 = -1.2\]

Half the \(p\text{-value}\) is below –1.2 and half is above 1.2.

Make a decision: Since \(\alpha > p\text{-value}\), reject \(H_{0}\). This means you reject \(\mu_{g} = \mu_{b}\). The means are different.

Press STAT . Arrow over to TESTS and press 4:2-SampTTest . Arrow over to Stats and press ENTER . Arrow down and enter 2 for the first sample mean, \(\sqrt{0.866}\) for Sx1, 9 for n1, 3.2 for the second sample mean, 1 for Sx2, and 16 for n2. Arrow down to μ1: and arrow to does not equal μ2. Press ENTER . Arrow down to Pooled: and No . Press ENTER . Arrow down to Calculate and press ENTER . The \(p\text{-value}\) is \(p = 0.0054\), the dfs are approximately 18.8462, and the test statistic is -3.14. Do the procedure again but instead of Calculate do Draw.

Conclusion: At the 5% level of significance, the sample data show there is sufficient evidence to conclude that the mean number of hours that girls and boys aged seven to 11 play sports per day is different (mean number of hours boys aged seven to 11 play sports per day is greater than the mean number of hours played by girls OR the mean number of hours girls aged seven to 11 play sports per day is greater than the mean number of hours played by boys).

Exercise \(\PageIndex{1}\)

Two samples are shown in Table. Both have normal distributions. The means for the two populations are thought to be the same. Is there a difference in the means? Test at the 5% level of significance.

The \(p\text{-value}\) is \(0.4125\), which is much higher than 0.05, so we decline to reject the null hypothesis. There is not sufficient evidence to conclude that the means of the two populations are not the same.

When the sum of the sample sizes is larger than \(30 (n_{1} + n_{2} > 30)\) you can use the normal distribution to approximate the Student's \(t\).

Example \(\PageIndex{2}\)

A study is done by a community group in two neighboring colleges to determine which one graduates students with more math classes. College A samples 11 graduates. Their average is four math classes with a standard deviation of 1.5 math classes. College B samples nine graduates. Their average is 3.5 math classes with a standard deviation of one math class. The community group believes that a student who graduates from college A has taken more math classes, on the average. Both populations have a normal distribution. Test at a 1% significance level. Answer the following questions.

  • Is this a test of two means or two proportions?
  • Are the populations standard deviations known or unknown?
  • Which distribution do you use to perform the test?
  • What is the random variable?
  • What are the null and alternate hypotheses? Write the null and alternate hypotheses in words and in symbols.
  • Is this test right-, left-, or two-tailed?
  • What is the \(p\text{-value}\)?
  • Do you reject or not reject the null hypothesis?
  • Student's t
  • \(\bar{X}_{A} - \bar{X}_{B}\)
  • \(H_{0}: \mu_{A} \leq \mu_{B}\) and \(H_{a}: \mu_{A} > \mu_{B}\)

alt

  • h. Do not reject.
  • i. At the 1% level of significance, from the sample data, there is not sufficient evidence to conclude that a student who graduates from college A has taken more math classes, on the average, than a student who graduates from college B.

Exercise \(\PageIndex{2}\)

A study is done to determine if Company A retains its workers longer than Company B. Company A samples 15 workers, and their average time with the company is five years with a standard deviation of 1.2. Company B samples 20 workers, and their average time with the company is 4.5 years with a standard deviation of 0.8. The populations are normally distributed.

  • Are the population standard deviations known?
  • Conduct an appropriate hypothesis test. At the 5% significance level, what is your conclusion?
  • They are unknown.
  • The \(p\text{-value} = 0.0878\). At the 5% level of significance, there is insufficient evidence to conclude that the workers of Company A stay longer with the company.

Example \(\PageIndex{3}\)

A professor at a large community college wanted to determine whether there is a difference in the means of final exam scores between students who took his statistics course online and the students who took his face-to-face statistics class. He believed that the mean of the final exam scores for the online class would be lower than that of the face-to-face class. Was the professor correct? The randomly selected 30 final exam scores from each group are listed in Table \(\PageIndex{3}\) and Table \(\PageIndex{4}\).

Is the mean of the Final Exam scores of the online class lower than the mean of the Final Exam scores of the face-to-face class? Test at a 5% significance level. Answer the following questions:

  • Are the population standard deviations known or unknown?
  • What are the null and alternative hypotheses? Write the null and alternative hypotheses in words and in symbols.
  • Is this test right, left, or two tailed?
  • At the ___ level of significance, from the sample data, there ______ (is/is not) sufficient evidence to conclude that ______.

(See the conclusion in Example, and write yours in a similar fashion)

Be careful not to mix up the information for Group 1 and Group 2!

  • Student's \(t\)
  • \(\bar{X}_{1} - \bar{X}_{2}\)
  • \(H_{0}: \mu_{1} = \mu_{2}\) Null hypothesis: the means of the final exam scores are equal for the online and face-to-face statistics classes.
  • \(H_{a}: \mu_{1} < \mu_{2}\) Alternative hypothesis: the mean of the final exam scores of the online class is less than the mean of the final exam scores of the face-to-face class.
  • left-tailed

This is a normal distribution curve with mean equal to zero. A vertical line near the tail of the curve to the left of zero extends from the axis to the curve. The region under the curve to the left of the line is shaded representing p-value = 0.0011.

Figure \(\PageIndex{3}\).

  • Reject the null hypothesis

At the 5% level of significance, from the sample data, there is (is/is not) sufficient evidence to conclude that the mean of the final exam scores for the online class is less than the mean of final exam scores of the face-to-face class.

First put the data for each group into two lists (such as L1 and L2). Press STAT. Arrow over to TESTS and press 4:2SampTTest. Make sure Data is highlighted and press ENTER. Arrow down and enter L1 for the first list and L2 for the second list. Arrow down to \(\mu_{1}\): and arrow to \(\neq \mu_{1}\) (does not equal). Press ENTER. Arrow down to Pooled: No. Press ENTER. Arrow down to Calculate and press ENTER.

Cohen's Standards for Small, Medium, and Large Effect Sizes

Cohen's \(d\) is a measure of effect size based on the differences between two means. Cohen’s \(d\), named for United States statistician Jacob Cohen, measures the relative strength of the differences between the means of two populations based on sample data. The calculated value of effect size is then compared to Cohen’s standards of small, medium, and large effect sizes.

Cohen's \(d\) is the measure of the difference between two means divided by the pooled standard deviation: \(d = \dfrac{\bar{x}_{2}-\bar{x}_{2}}{s_{\text{pooled}}}\) where \(s_{pooled} = \sqrt{\dfrac{(n_{1}-1)s^{2}_{1} + (n_{2}-1)s^{2}_{2}}{n_{1}+n_{2}-2}}\)

Example \(\PageIndex{4}\)

Calculate Cohen’s d for Example. Is the size of the effect small, medium, or large? Explain what the size of the effect means for this problem.

\(\mu_{1} = 4 s_{1} = 1.5 n_{1} = 11\)

\(\mu_{2} = 3.5 s_{2} = 1 n_{2} = 9\)

\(d = 0.384\)

The effect is small because 0.384 is between Cohen’s value of 0.2 for small effect size and 0.5 for medium effect size. The size of the differences of the means for the two colleges is small indicating that there is not a significant difference between them.

Example \(\PageIndex{5}\)

Calculate Cohen’s \(d\) for Example. Is the size of the effect small, medium or large? Explain what the size of the effect means for this problem.

\(d = 0.834\); Large, because 0.834 is greater than Cohen’s 0.8 for a large effect size. The size of the differences between the means of the Final Exam scores of online students and students in a face-to-face class is large indicating a significant difference.

Example 10.2.6

Weighted alpha is a measure of risk-adjusted performance of stocks over a period of a year. A high positive weighted alpha signifies a stock whose price has risen while a small positive weighted alpha indicates an unchanged stock price during the time period. Weighted alpha is used to identify companies with strong upward or downward trends. The weighted alpha for the top 30 stocks of banks in the northeast and in the west as identified by Nasdaq on May 24, 2013 are listed in Table and Table, respectively.

Is there a difference in the weighted alpha of the top 30 stocks of banks in the northeast and in the west? Test at a 5% significance level. Answer the following questions:

  • Calculate Cohen’s d and interpret it.
  • Student’s-t
  • \(H_{0}: \mu_{1} = \mu_{2}\) Null hypothesis: the means of the weighted alphas are equal.
  • \(H_{a}: \mu_{1} \neq \mu_{2}\) Alternative hypothesis : the means of the weighted alphas are not equal.
  • \(p\text{-value} = 0.8787\)
  • Do not reject the null hypothesis

This is a normal distribution curve with mean equal to zero. Both the right and left tails of the curve are shaded. Each tail represents 1/2(p-value) = 0.4394.

Figure \(\PageIndex{4}\).

  • \(d = 0.040\), Very small, because 0.040 is less than Cohen’s value of 0.2 for small effect size. The size of the difference of the means of the weighted alphas for the two regions of banks is small indicating that there is not a significant difference between their trends in stocks.
  • Data from Graduating Engineer + Computer Careers. Available online at www.graduatingengineer.com
  • Data from Microsoft Bookshelf .
  • Data from the United States Senate website, available online at www.Senate.gov (accessed June 17, 2013).
  • “List of current United States Senators by Age.” Wikipedia. Available online at en.Wikipedia.org/wiki/List_of...enators_by_age (accessed June 17, 2013).
  • “Sectoring by Industry Groups.” Nasdaq. Available online at www.nasdaq.com/markets/barcha...&base=industry (accessed June 17, 2013).
  • “Strip Clubs: Where Prostitution and Trafficking Happen.” Prostitution Research and Education, 2013. Available online at www.prostitutionresearch.com/ProsViolPosttrauStress.html (accessed June 17, 2013).
  • “World Series History.” Baseball-Almanac, 2013. Available online at http://www.baseball-almanac.com/ws/wsmenu.shtml (accessed June 17, 2013).

Two population means from independent samples where the population standard deviations are not known

  • Random Variable: \(\bar{X}_{1} - \bar{X}_{2} =\) the difference of the sampling means
  • Distribution: Student's t -distribution with degrees of freedom (variances not pooled)

Formula Review

Standard error: \[SE = \sqrt{\dfrac{(s_{1}^{2})}{n_{1}} + \dfrac{(s_{2}^{2})}{n_{2}}}\]

Test statistic ( t -score): \[t = \dfrac{(\bar{x}_{1}-\bar{x}_{2}) - (\mu_{1}-\mu_{2})}{\sqrt{\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}}}\]

Degrees of freedom:

\[df = \dfrac{\left(\dfrac{(s_{1})^{2}}{n_{1}} + \dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}}{\left(\dfrac{1}{n_{1} - 1}\right)\left(\dfrac{(s_{1})^{2}}{n_{1}}\right)^{2}} + \left(\dfrac{1}{n_{2} - 1}\right)\left(\dfrac{(s_{2})^{2}}{n_{2}}\right)^{2}\]

  • \(s_{1}\) and \(s_{2}\) are the sample standard deviations, and n 1 and n 2 are the sample sizes.
  • \(x_{1}\) and \(x_{2}\) are the sample means.

OR use the   DF to be the smallest of \(n_{1}-1\) and \(n_{2}-1\)

Cohen’s \(d\) is the measure of effect size:

\[d = \dfrac{\bar{x}_{1} - \bar{x}_{2}}{s_{\text{pooled}}}\]

\[s_{\text{pooled}} = \sqrt{\dfrac{(n_{1} - 1)s^{2}_{1} + (n_{2} - 1)s^{2}_{2}}{n_{1} + n_{2} - 2}}\]

  • The domain of the random variable (RV) is not necessarily a numerical set; the domain may be expressed in words; for example, if \(X =\) hair color, then the domain is {black, blond, gray, green, orange}.
  • We can tell what specific value x of the random variable \(X\) takes only after performing the experiment.

Statistics Tutorial

Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing a proportion (two tailed).

A population proportion is the share of a population that belongs to a particular category .

Hypothesis tests are used to check a claim about the size of that population proportion.

Hypothesis Testing a Proportion

The following steps are used for a hypothesis test:

  • Check the conditions
  • Define the claims
  • Decide the significance level
  • Calculate the test statistic

For example:

  • Population : Nobel Prize winners
  • Category : Women

And we want to check the claim:

"The share of Nobel Prize winners that are women is not 50%"

By taking a sample of 100 randomly selected Nobel Prize winners we could find that:

10 out of 100 Nobel Prize winners in the sample were women

The sample proportion is then: \(\displaystyle \frac{10}{100} = 0.1\), or 10%.

From this sample data we check the claim with the steps below.

1. Checking the Conditions

The conditions for calculating a confidence interval for a proportion are:

  • The sample is randomly selected
  • Being in the category
  • Not being in the category
  • 5 members in the category
  • 5 members not in the category

In our example, we randomly selected 10 people that were women.

The rest were not women, so there are 90 in the other category.

The conditions are fulfilled in this case.

Note: It is possible to do a hypothesis test without having 5 of each category. But special adjustments need to be made.

2. Defining the Claims

We need to define a null hypothesis (\(H_{0}\)) and an alternative hypothesis (\(H_{1}\)) based on the claim we are checking.

The claim was:

In this case, the parameter is the proportion of Nobel Prize winners that are women (\(p\)).

The null and alternative hypothesis are then:

Null hypothesis : 50% of Nobel Prize winners were women.

Alternative hypothesis : The share of Nobel Prize winners that are women is not 50%

Which can be expressed with symbols as:

\(H_{0}\): \(p = 0.50 \)

\(H_{1}\): \(p \neq 0.50 \)

This is a ' two-tailed ' test, because the alternative hypothesis claims that the proportion is different (larger or smaller) than in the null hypothesis.

If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.

Advertisement

3. Deciding the Significance Level

The significance level (\(\alpha\)) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

  • \(\alpha = 0.1\) (10%)
  • \(\alpha = 0.05\) (5%)
  • \(\alpha = 0.01\) (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

Note: A 5% significance level means that when we reject a null hypothesis:

We expect to reject a true null hypothesis 5 out of 100 times.

4. Calculating the Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a standardized value calculated from the sample.

The formula for the test statistic (TS) of a population proportion is:

\(\displaystyle \frac{\hat{p} - p}{\sqrt{p(1-p)}} \cdot \sqrt{n} \)

\(\hat{p}-p\) is the difference between the sample proportion (\(\hat{p}\)) and the claimed population proportion (\(p\)).

\(n\) is the sample size.

In our example:

The claimed (\(H_{0}\)) population proportion (\(p\)) was \( 0.50 \)

The sample size (\(n\)) was \(100\)

So the test statistic (TS) is then:

\(\displaystyle \frac{0.1-0.5}{\sqrt{0.5(1-0.5)}} \cdot \sqrt{100} = \frac{-0.4}{\sqrt{0.5(0.5)}} \cdot \sqrt{100} = \frac{-0.4}{\sqrt{0.25}} \cdot \sqrt{100} = \frac{-0.4}{0.5} \cdot 10 = \underline{-8}\)

You can also calculate the test statistic using programming language functions:

With Python use the scipy and math libraries to calculate the test statistic for a proportion.

With R use the built-in math functions to calculate the test statistic for a proportion.

5. Concluding

There are two main approaches for making the conclusion of a hypothesis test:

  • The critical value approach compares the test statistic with the critical value of the significance level.
  • The P-value approach compares the P-value of the test statistic and with the significance level.

Note: The two approaches are only different in how they present the conclusion.

The Critical Value Approach

For the critical value approach we need to find the critical value (CV) of the significance level (\(\alpha\)).

For a population proportion test, the critical value (CV) is a Z-value from a standard normal distribution .

This critical Z-value (CV) defines the rejection region for the test.

The rejection region is an area of probability in the tails of the standard normal distribution.

Because the claim is that the population proportion is different from 50%, the rejection region is split into both the left and right tail:

Choosing a significance level (\(\alpha\)) of 0.01, or 1%, we can find the critical Z-value from a Z-table , or with a programming language function:

Note: Because this is a two-tailed test the tail area (\(\alpha\)) needs to be split in half (divided by 2).

With Python use the Scipy Stats library norm.ppf() function find the Z-value for an \(\alpha\)/2 = 0.005 in the left tail.

With R use the built-in qnorm() function to find the Z-value for an \(\alpha\) = 0.005 in the left tail.

Using either method we can find that the critical Z-value in the left tail is \(\approx \underline{-2.5758}\)

Since a normal distribution i symmetric, we know that the critical Z-value in the right tail will be the same number, only positive: \(\underline{2.5758}\)

For a two-tailed test we need to check if the test statistic (TS) is smaller than the negative critical value (-CV), or bigger than the positive critical value (CV).

If the test statistic is smaller than the negative critical value, the test statistic is in the rejection region .

If the test statistic is bigger than the positive critical value, the test statistic is in the rejection region .

When the test statistic is in the rejection region, we reject the null hypothesis (\(H_{0}\)).

Here, the test statistic (TS) was \(\approx \underline{-8}\) and the critical value was \(\approx \underline{-2.5758}\)

Here is an illustration of this test in a graph:

Since the test statistic was smaller than the negative critical value we reject the null hypothesis.

This means that the sample data supports the alternative hypothesis.

And we can summarize the conclusion stating:

The sample data supports the claim that "The share of Nobel Prize winners that are women is not 50%" at a 1% significance level .

The P-Value Approach

For the P-value approach we need to find the P-value of the test statistic (TS).

If the P-value is smaller than the significance level (\(\alpha\)), we reject the null hypothesis (\(H_{0}\)).

The test statistic was found to be \( \approx \underline{-8} \)

For a population proportion test, the test statistic is a Z-Value from a standard normal distribution .

Because this is a two-tailed test, we need to find the P-value of a Z-value smaller than -8 and multiply it by 2 .

We can find the P-value using a Z-table , or with a programming language function:

With Python use the Scipy Stats library norm.cdf() function find the P-value of a Z-value smaller than -8 for a two tailed test:

With R use the built-in pnorm() function find the P-value of a Z-value smaller than -8 for a two tailed test:

Using either method we can find that the P-value is \(\approx \underline{1.25 \cdot 10^{-15}}\) or \(0.00000000000000125\)

This tells us that the significance level (\(\alpha\)) would need to be bigger than 0.000000000000125%, to reject the null hypothesis.

This P-value is smaller than any of the common significance levels (10%, 5%, 1%).

So the null hypothesis is rejected at all of these significance levels.

The sample data supports the claim that "The share of Nobel Prize winners that are women is not 50%" at a 10%, 5%, and 1% significance level .

Calculating a P-Value for a Hypothesis Test with Programming

Many programming languages can calculate the P-value to decide outcome of a hypothesis test.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.

The P-value calculated here will tell us the lowest possible significance level where the null-hypothesis can be rejected.

With Python use the scipy and math libraries to calculate the P-value for a two-tailed tailed hypothesis test for a proportion.

Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from than 0.50.

With R use the built-in prop.test() function find the P-value for a left tailed hypothesis test for a proportion.

Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from 0.50.

Note: The conf.level in the R code is the reverse of the significance level.

Here, the significance level is 0.01, or 1%, so the conf.level is 1-0.01 = 0.99, or 99%.

Left-Tailed and Two-Tailed Tests

This was an example of a two tailed test, where the alternative hypothesis claimed that parameter is different from the null hypothesis claim.

You can check out an equivalent step-by-step guide for other types here:

  • Right-Tailed Test
  • Left-Tailed Test

Get Certified

COLOR PICKER

colorpicker

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

Top Tutorials

Top references, top examples, get certified.

IMAGES

  1. P-value Question Example

    two tailed hypothesis test p value

  2. The p-value and rejecting the null (for one- and two-tail tests

    two tailed hypothesis test p value

  3. p-Value in Hypothesis Testing

    two tailed hypothesis test p value

  4. Hypothesis Testing Sigma Unknown Two Tailed P-Value Method

    two tailed hypothesis test p value

  5. Hypothesis testing tutorial using p value method

    two tailed hypothesis test p value

  6. Statistics

    two tailed hypothesis test p value

VIDEO

  1. Hypothesis Testing Sigma Unknown Left Tailed P-Value Method

  2. Test of Hypothesis

  3. 02. SPSS Classroom

  4. Hypothesis Testing & Two-tailed and One-tailed Test (tagalog and basic)

  5. Testing and Estimation

  6. Hypothesis Z Test Part 6 Single Proportion Two Tailed Hypothesis test MBS First Semester Statistics

COMMENTS

  1. S.3.2 Hypothesis Testing (P-Value Approach)

    Two-Tailed. In our example concerning the mean grade point average, suppose again that our random sample of n = 15 students majoring in mathematics yields a test statistic t* instead of equaling -2.5.The P-value for conducting the two-tailed test H 0: μ = 3 versus H A: μ ≠ 3 is the probability that we would observe a test statistic less than -2.5 or greater than 2.5 if the population mean ...

  2. The p-value and rejecting the null (for one- and two-tail tests)

    The p-value (or the observed level of significance) is the smallest level of significance at which you can reject the null hypothesis, assuming the null hypothesis is true. You can also think about the p-value as the total area of the region of rejection. Remember that in a one-tailed test, the regi

  3. Two-Tailed Hypothesis Tests: 3 Example Problems

    t-test statistic:-0.288525; two-tailed p-value: 0.776; Since the p-value is not less than .05, the engineer fails to reject the null hypothesis. ... This is an example of a two-tailed hypothesis test because the alternative hypothesis contains the not equal "≠" sign. The botanist believes that the new fertilizer will influence plant ...

  4. How to Find the P value: Process and Calculations

    In this case, our t-value of 2.289 produces a p value between 0.02 and 0.05 for a two-tailed test. Our results are statistically significant, and they are consistent with the calculator's more precise results. Displaying the P value in a Chart. In the example above, you saw how to calculate a p-value starting with the sample statistics.

  5. One-Tailed and Two-Tailed Hypothesis Tests Explained

    With a two-tailed hypothesis test, you'll obtain a two-sided confidence interval. The confidence interval tells us that the population mean is likely to fall between 3.372 and 4.828. ... Then, you divide the p-value from a two-tailed test in half to get the p-value for a one tailed test. You'd still compare it to your original alpha. For ...

  6. p-value Calculator

    It is the alternative hypothesis that determines what "extreme" actually means, so the p-value depends on the alternative hypothesis that you state: left-tailed, right-tailed, or two-tailed. In the formulas below, S stands for a test statistic, x for the value it produced for a given sample, and Pr(event | H 0 ) is the probability of an event ...

  7. Understanding P-values

    Reporting p values. P values of statistical tests are usually reported in the results section of a research paper, along with the key information needed for readers to put the p values in context - for example, correlation coefficient in a linear regression, or the average difference between treatment groups in a t-test.. Example: Reporting the results In our comparison of mouse diet A and ...

  8. One-tail vs. two-tail P values

    In this example, a two-tailed P value tests the null hypothesis that the drug does not alter the creatinine level; a one-tailed P value tests the null hypothesis that the drug does not increase the creatinine level. The issue in choosing between one- and two-tailed P values is not whether or not you expect a difference to exist.

  9. Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

    In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960.

  10. One-tailed and two-tailed tests (video)

    A one tailed test does not leave more room to conclude that the alternative hypothesis is true. The benefit (increased certainty) of a one tailed test doesn't come free, as the analyst must know "something more", which is the direction of the effect, compared to a two tailed test. ( 3 votes)

  11. Understanding P-Values and Statistical Significance

    The p-value in statistics quantifies the evidence against a null hypothesis. A low p-value suggests data is inconsistent with the null, potentially favoring an alternative hypothesis. Common significance thresholds are 0.05 or 0.01. ... Two-Tailed Test In a normal distribution, the significance level corresponds to regions in the tails of the ...

  12. How Hypothesis Tests Work: Significance Levels (Alpha) and P values

    Using P values and Significance Levels Together. If your P value is less than or equal to your alpha level, reject the null hypothesis. The P value results are consistent with our graphical representation. The P value of 0.03112 is significant at the alpha level of 0.05 but not 0.01.

  13. Hypothesis testing and p-values (video)

    In this video there was no critical value set for this experiment. In the last seconds of the video, Sal briefly mentions a p-value of 5% (0.05), which would have a critical of value of z = (+/-) 1.96. Since the experiment produced a z-score of 3, which is more extreme than 1.96, we reject the null hypothesis.

  14. t-test Calculator

    These next steps will tell you how to calculate the p-value from t-test or its critical values, and then which decision to make about the null hypothesis. Decide on the alternative hypothesis : Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations ...

  15. FAQ: What are the differences between one-tailed and two-tailed tests?

    So, depending on the direction of the one-tailed hypothesis, its p-value is either .5*(two-tailed p-value) or 1-.5*(two-tailed p-value) if the test statistic symmetrically distributed about zero. In this example, the two-tailed p-value suggests rejecting the null hypothesis of no difference.

  16. P-Value Method for Hypothesis Testing

    So, we find the probability of the Z-score by going at +3 vertically and 0.05 horizontally. The probability of Z-score comes out to be 0.99886. Now, to calculate the p-value : P-value = 1 - Prob(Z-score) P-value = 1 - 0.99886. P-value = 0.001. Since, this is a 2-tailed test, we will multiply the the p-value by 2.

  17. Hypothesis Testing Calculator

    In a two-tailed test, the p-value is the probability of getting a value for the test statistic at least as unlikely as the value from the sample. To test the hypothesis in the p-value approach, compare the p-value to the level of significance. If the p-value is less than or equal to the level of signifance, reject the null hypothesis.

  18. One- and two-tailed tests

    A two-tailed test applied to the normal distribution. A one-tailed test, showing the p-value as the size of one tail.. In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic.A two-tailed test is appropriate if the estimated value is greater ...

  19. What Is a Two-Tailed Test? Definition and Example

    Two-Tailed Test: A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values ...

  20. 5 Tips for Interpreting P-Values Correctly in Hypothesis Testing

    3. Avoid Threshold Thinking. A common pitfall in interpreting p-values is falling into the threshold thinking trap. The most commonly used cut-off value for whether a calculated p-value is statistically significant is 0.05. Typically, a p-value of less than 0.05 is considered statistically significant evidence against the null hypothesis.

  21. 8.4: Hypothesis Test Examples for Proportions

    When you calculate the \(p\)-value and draw the picture, the \(p\)-value is the area in the left tail, the right tail, or split evenly between the two tails. For this reason, we call the hypothesis test left, right, or two tailed. The alternative hypothesis, \(H_{a}\), tells you if the test is left, right, or two-tailed.

  22. One Tailed And Two Tailed Hypothesis Tests

    In this video we discuss one tailed, two tailed hypothesis tests and we also cover p values for hypothesis testing. We discuss when to use one and two taile...

  23. 2.1.5: Critical Values, p-values, and Significance

    1. Figure 2.1.5.1 2.1.5. 1: The rejection region for a one-tailed test. (CC-BY-NC-SA Foster et al. from An Introduction to Psychological Statistics) The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.

  24. 10.1: Comparing Two Independent Population Means (Hypothesis test)

    Then, μg is the population mean for girls and μb is the population mean for boys. This is a test of two independent groups, two population means. Random variable: ˉXg − ˉXb = difference in the sample mean amount of time girls and boys play sports each day. H0: μg = μb. H0: μg − μb = 0.

  25. Statistics

    With Python use the scipy and math libraries to calculate the P-value for a two-tailed tailed hypothesis test for a proportion. Here, the sample size is 100, the occurrences are 10, and the test is for a proportion different from than 0.50. ... # Output the p-value of the test statistic (two-tailed test) print(2*stats.norm.cdf(test_stat))

  26. Doubling &/or halving p-values for one- vs. two-tailed tests

    If you do a two-tailed test and computation gives you p = 0.03 p = 0.03, then p < 0.05 p < 0.05. The result is significant. If you do a one-tailed test, you will get a different result, depending on which tail you investigate. It will be either a lot larger or only half as big. α = 0.05 α = 0.05 is the usual convention, no matter whether you ...

  27. One-tailed vs Two-tailed Tests: P-Value Differences

    The p-value is a measure of the strength of the evidence against the null hypothesis. In one-tailed tests, since you are only looking at one end of the distribution, the p-value is calculated ...

  28. PDF Paired two-sample t-test: examples

    Create a null hypothesis and alternative hypothesis: Calculate tcalc Compare tca,cto various values (i.e., widths of CIS). Determine probability, p_yalug, of seeing tcalc as extreme as we do. Decide to "fail to reject Ho" or "reject Ho" based on the p value. Ho: yd = 0 consistent with non-small p values. HA: would give us small p values.