Fiveable

Find what you need to study

2.8 Least Squares Regression

8 min read • december 29, 2022

Athena_Codes

Athena_Codes

Jed Quiaoit

Jed Quiaoit

The least squares regression line (LSRL) is the best linear regression line that exists in the sense that it minimizes the sum of the squared residuals. (Remember from previous sections that residuals are the differences between the observed values of the response variable , y, and the predicted values, ŷ, from the model.)

The least squares criterion is used to find the line of best fit because it minimizes the sum of the squared residuals. This is done by minimizing the difference between the observed and predicted values, which in turn maximizes the accuracy of the model.

The least squares regression line is given by the formula ŷ = a + bx, where ŷ is the predicted value of the response variable , x is the predictor or explanatory variable, a is the y-intercept (the value of ŷ when x is zero), and b is the slope (the change in ŷ per unit change in x). The y-intercept and slope can be calculated using the one-variable statistics of x and y.

The reason why the residuals are squared in the least squares criterion is to give more weight to larger residuals and to eliminate the cancellation of positive and negative residuals. Squaring the residuals also has the effect of penalizing larger deviations from the line of best fit more heavily, which can help to reduce the overall variance in the model. 🪢

The slope is the predicted increase in the response variable with an increase of one unit of the explanatory variable . To find the slope , we have the formula: ⛰️

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.35-whWDxLSMvi8O.png?alt=media&token=485a9fc0-8e75-42a2-b8ca-0ee8b9caa420

image courtesy of: codecogs.com

(where b is the slope , r is the correlation coefficient between x and y, sy is the standard deviation of y, and sx is the standard deviation of x.)

The least squares estimate of the slope takes into account the variability in both x and y and the strength of the linear relationship between them. It is a weighted average of the deviation of y from the mean of y over the deviation of x from the mean of x, with the correlation coefficient , r , serving as a correction factor.

Template for Interpretation

When asked to interpret a slope of a LSRL, follow the template below:

⭐ "There is a predicted increase/decrease of ______ ( slope in unit of y variable) for every 1 (unit of x variable)."

  • Correct definition
  • Word "predicted"

LSRL—y-intercept

Once you have calculated the slope of the least squares regression line , you can use the point-slope form to find the y-intercept and the general formula for the line.

The point-slope form of a linear equation is given by:

ŷ - y1 = m(x - x1)

where ŷ is the predicted value of the response variable , m is the slope of the line, x is the predictor or explanatory variable, and (x1, y1) is a point on the line.

The LSRL always passes through the point (x̄, ȳ) , where x̄ is the mean of the predictor variable and ȳ is the mean of the response variable . Therefore, we can use this point to find the y-intercept of the line using the point-slope form .

Substituting the values into the point-slope form , we have:

ŷ - ȳ = b(x - x̄)

Solving for ŷ, we get:

ŷ = bx + (-bx̄ + ȳ)

The expression in parentheses is the y-intercept of the line, which represents the value of the response variable when the explanatory variable is zero . 💛

Template time! When asked to interpret a y-intercept of a LSRL, follow the template below:

⭐ "The predicted value of (y in context) is _____ when (x value in context) is 0 (units in context)."

LSRL—Coefficient of Determination

The coefficient of determination , also known as R-squared , is a statistic that is used to evaluate the fit of a linear regression model (how well the LSRL fits the data). It is a measure of how much of the variability in the response variable (y) can be explained by the model. 🍄

R-squared can be defined as the square of the correlation coefficient (r) between the observed and predicted values of the response variable . It is represented by the symbol R-squared and ranges from 0 to 1, with a value of 0 indicating no relationship between the explanatory and response variables (LSRL does not model the data at all) and a value of 1 indicating a perfect linear relationship.

There is also another formula for r^2 as well. This formula is:

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.41-1jWZ9EdyNZ1D.png?alt=media&token=60306eac-3e4d-4c98-8d9d-3a9af5a4985e

This is saying that this is the percent difference between the variance of y and the sum of the residual squared. In other words, this is the reduction in the variation of y due to the LSRL. When interpreting this we say that it is the “percentage of the variation of y that can be explained by a linear model with respect to x.”

Template time yet again! When asked to interpret a coefficient of determination for a least squares regression model, use the template below:

⭐ "____% of the variation in (y in context) is due to its linear relationship with (x in context)."

  • Linking linear relationship

LSRL—Standard Deviation of the Residuals

The last statistic we will talk about is the standard deviation of the residuals, also called s. S is the typical residual by a given data point of the data with respect to the LSRL. The formula for s is given as: 🐫

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.46-senL6A1vzXWl.png?alt=media&token=a8d8719a-1b29-4cb2-b889-3377be04e77f

image courtesy of: apcentral.collegeboard.org

which looks similar to the sample standard deviation , except we will divide by n-2 and not n-1. Why? We will learn more about s when we learn inference for regression in Unit 9.

Reading a Computer Printout

On the AP test, it is very likely that you will be expected to read a computer printout of the data. Here is a sample printout with a look at where most of the statistics you will need to use are (the rest you will learn in Unit 9): 🖥️

https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FScreen%20Shot%202020-04-25%20at%2011.47-0GRdP9LpGgn7.png?alt=media&token=ac6e86fd-aaa8-4ae7-8ae1-0f0627d2c3a7

Courtesy of Starnes, Daren S. and Tabor, Josh. The Practice of Statistics—For the AP Exam, 5th Edition. Cengage Publishing.

💡 Always use R-Sq, NEVER R-Sq(adj)!

🎥 Watch: AP Stats - Least Squares Regression Lines

Practice Problem

A researcher is studying the relationship between the amount of sleep (in hours) and the performance on a cognitive test. She collects data from 50 participants and fits a linear regression model to the data. The summary of the model is shown below:

Summary of Linear Regression Model:

Response variable : Performance on cognitive test (y)

Explanatory variable: Amount of sleep (x)

Slope (b): -2.5

Y-intercept (a): 50

Correlation coefficient (r): -0.7

R-squared: 0.49

a) Interpret the slope of the model in the context of the problem.

b) Interpret the y-intercept of the model in the context of the problem.

c) Interpret the correlation coefficient of the model in the context of the problem.

d) Interpret the R-squared value of the model in the context of the problem.

e) Based on the summary of the model, do you think that the amount of sleep has a significant effect on the performance on the cognitive test? Why or why not?

f) Suppose the researcher collects data from an additional 50 participants and fits a new linear regression model to the combined data. The summary of the new model is shown below:

Slope (b): -1.9

Y-intercept (a): 48

Correlation coefficient (r): -0.6

R-squared: 0.36

Compare the two models and explain how the new model differs from the original model in terms of the strength and direction of the relationship between the amount of sleep and the performance on the cognitive test.

a) The slope of the model is -2.5, which means that for every one-hour increase in the amount of sleep, the performance on the cognitive test is predicted to decrease by 2.5 points.

b) The y-intercept of the model is 50, which means that the performance on the cognitive test is predicted to be 50 points when the amount of sleep is zero.

c) The correlation coefficient of the model is -0.7, which indicates a strong negative linear relationship between the amount of sleep and the performance on the cognitive test. A negative correlation means that as the amount of sleep increases, the performance on the cognitive test decreases.

d) The R-squared value of the model is 0.49, which means that 49% of the variability in the performance on the cognitive test can be explained by the model. This indicates that the model is able to capture a significant portion of the variance in the data, but there may be other factors that are not being considered that are also contributing to the performance on the cognitive test.

e) Based on the summary of the model, it appears that the amount of sleep has a significant effect on the performance on the cognitive test. The slope of the model is negative, indicating a negative relationship between the variables, and the correlation coefficient is strong (close to -1). However, it is important to note that the R-squared value is not 1, which means that there are other factors that are also influencing the performance on the cognitive test.

f) In the new model, the slope is -1.9, which is slightly less negative than the slope in the original model (-2.5). This suggests that the relationship between the amount of sleep and the performance on the cognitive test is slightly weaker in the new model compared to the original model.

The y-intercept is also slightly lower in the new model (48) compared to the original model (50).

The correlation coefficient is slightly weaker in the new model (-0.6) compared to the original model (-0.7).

Finally, the R-squared value is lower in the new model (0.36) compared to the original model (0.49).

Overall, these differences suggest that the new model has a slightly weaker and less negative relationship between the amount of sleep and the performance on the cognitive test compared to the original model.

Key Terms to Review ( 11 )

Coefficient of Determination (R-squared)

Correlation Coefficient

Least squares criterion

Least Squares Regression Line

One-variable statistics

Point-Slope Form

Predictor Variable

Response Variable

Standard Deviation

Y-intercept

Fiveable

Stay Connected

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

AP Statistics

  • Use graphical representations of two categorical variables to compare data and determine if variables are potentially associated.
  • Evaluate data in two-way tables
  • Generate joint relative frequencies from data in two-way tables.
  • Assignment 2-1:
  • Represent bivariate data with a scatterplot
  • Describe the relationship in a scatterplot
  • Calculate and interpret the correlation coefficient, r, in context.
  • Assignment 2.2: Pg. 132 #1-5
  • Screencast Quiz
  • Write the equation for a regression line.
  • Interpret the slope and y-intercept of a regression line in context.
  • Detect when extrapolation occurs.
  • Assignment 2.3:
  • Calculate residuals and make a residual plot.
  • Determine if regression model is a good fit for data.
  • Interpret, in context, the correlation coefficient and the coefficient of determination.
  • Interpret se , the standard error.
  • Analyze the impact of influential points on a regression
  • Assignment 2.4:
  • 2-4 Screencast
  • 2-4 Screencast Quiz
  • Minitab Activity
  • Residual Plots Activity
  • Residual Plots FRQ Practice
  • Perform a power regression.
  • Perform transformation regressions.
  • Assignment 2.5:
  • Screencast Examples
  • Screencast Example 3
  • Checkpoint Quiz

Helping math teachers bring statistics to life

Stats Medic.png

Regression Line, Predictions & Residuals ( Topics 2.6-2.8 )

Chapter 3 - day 4, learning targets.

Make predictions using regression lines, keeping in mind the dangers of extrapolation.

Calculate and interpret a residual.

Interpret the slope and y intercept of a least-squares regression line.

Activity: How good are the predictions for Barbie?  

jpg.jpg

Answer Key:

Students use the online applet to find the line of best fit for some Barbie Bungee data collected by one of the groups. The group forgot to record a value for 5 rubber bands, so students will use the line of best fit to make a prediction. It is then revealed that the group found their measurement for 5 rubber bands, leading students to think about how close their prediction was to the actual value (residual!).

Notice that in the activity, we avoided "formal" language (residual, extrapolation). This is done intentionally to keep the activity accessible to all students.  We always layer on the formality when we debrief the activity.  Remember: Experience first, formalize later.

We let students write their own interpretation for slope and y-intercept when they were working on the activity. During the debrief, we formalize by dialing in the necessary components of interpreting slope and y-intercept.

Teaching Tip:

Students are most familiar with slope intercept form for equations of lines ( y = mx + b ). So why do statisticians prefer

y = a + bx ? The answer is that most often in the real world, there are more than one explanatory variables that can help predict the response variable. Statisticians might create a model that had three explanatory variables ( x 1 , x 2 , x 3 ) that looks like this:   

y = a + bx 1 + b 2 x 2  + b 3 x 3 .  The y -intercept ( a ) is a starting point for making a prediction for the response variable and then each time we add one more explanatory variable we are refining that prediction. This process is called multiple regression (not part of the AP® Statistics course).

Always have students use context rather than x and y when writing out a regression equation. Also, make sure they don’t forget the “hat” on the response variable, or the word “predicted” in front.  This will help them with calculating residuals and interpreting slope and y-intercept.

Students will often mix up the order in a calculation of a residual by taking (predicted y – actual y). An easy way to remember the correct order of subtraction is to think AP = A ctual − P redicted. They should be able to remember this one.

Luke's Lesson Notes

Here is a brief video highlighting some key information to help you prepare to teach this lesson.

  • Statistics And Probability

AP Statistics Worksheet Linear Regression

ap statistics assignment linear regression lines

Related documents

Module 14 Project RUBRIC.docx

Add this document to collection(s)

You can add this document to your study collection(s)

Add this document to saved

You can add this document to your saved list

Suggest us how to improve StudyLib

(For complaints, use another form )

Input it if you want to receive answer

Resources: Course Assignments

Module 3 Assignment: Linear Regression

In this activity we will:

  • Find a regression line and plot it on the scatterplot.
  • Examine the effect of outliers on the regression line.
  • Use the regression line to make predictions and evaluate how reliable these predictions are.

The modern Olympic Games have changed dramatically since their inception in 1896. For example, many commentators have remarked on the change in the quality of athletic performances from year to year. Regression will allow us to investigate the change in winning times for one event—the 1,500 meter race.

Instructions

Click on the link corresponding to your statistical package to see instructions for completing the activity, and then answer the questions below.

R  |  StatCrunch  |  Minitab  |  Excel  |  TI Calculator

Observe that the form of the relationship between the 1,500 meter race’s winning time and the year is linear. The least squares regression line is therefore an appropriate way to summarize the relationship and examine the change in winning times over the course of the last century. We will now find the least squares regression line and plot it on a scatterplot.

Question 1:

Give the equation for the least squares regression line, and interpret it in context.

Question 2:

Give the equation for this new line and compare it with the line you found for the whole dataset, commenting on the effect of the outlier.

Question 3:

Our least squares regression line associates years as an explanatory variable, with times in the 1,500 meter race as the response variable. Use the least squares regression line you found in question 2 to predict the 1,500 meter time in the 2008 Olympic Games in Beijing. Comment on your prediction.

Concepts in Statistics Copyright © 2023 by CUNY School of Professional Studies is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

IMAGES

  1. Linear Regression Review AP Stats

    ap statistics assignment linear regression lines

  2. Interpreting slope of regression line

    ap statistics assignment linear regression lines

  3. AP Statistics: Chapter 8

    ap statistics assignment linear regression lines

  4. Linear Regression

    ap statistics assignment linear regression lines

  5. Linear regression

    ap statistics assignment linear regression lines

  6. AP Statistics. Linear Regression Model by Stats With Hogan

    ap statistics assignment linear regression lines

VIDEO

  1. 3. Regression Analysis

  2. Linear Regression Topic

  3. ASSIGNMENT 1

  4. Stat101_Section5.2_ Simple Linear Regression

  5. Linear Regression Assignment

  6. Business statistics assignment (DBS2C)

COMMENTS

  1. AP Stats Unit 2 Notes: Linear Regression Models

    The. least squares regression line. is a specific type of linear regression model that is used to minimize the sum of the squared differences between the observed values of the dependent variable and the predicted values of the dependent variable. This line is also known as the " line of best fit " because it is the line that best fits the data ...

  2. PDF Assignment 7 t

    Linear regression lines on (x,y), on the transformed data (x,log(y)), and on the transformed data (p x,y) result in the following computer output: (a) Interpret the coecient of determination for the transformed data (p x,y). (b) Compare the thee regression lines as to goodness-of-fit for a linear model. Page 3 or I 99.81 of the variation in ...

  3. PDF AP Statistics Review Linear Regression

    Linear Regression Page 1 of 18 Ways to obtain a best fit line • In a calculator, put x in L1 and y in L2. Choose STAT/CALC/LIN REG L1, L2, (optional) Y1 (VARS/Y-Vars/1/1). • From computer output, find the COEF column. The y-intercept is the coefficient labeled CONSTANT, and the slope is the coefficient of the explanatory variable.

  4. PDF AP Statistics

    AP Statistics. Chapter 7 Linear Regression. Objectives: • Linear model • Predicted value • Residuals • Least squares • Regression to the mean • Regression line • Line of best fit • Slope intercept • s. e. • R2. 2. Fat Versus Protein: An Example. • The following is a scatterplot of total fat.

  5. asd.pdf

    Use the data set below to answer questions 1 through 13. You are expected to round your answers to thousandths. The data included with this assignment are hitting statistics for the top 25 home run hitters from both the National and American Leagues in Major League Baseball. The statistics included are the number of base hits, the number of runs, and the number of home runs each player had in ...

  6. Unit 8

    ASSIGNMENT. B - 3/15 A - 3/18. 3.1 (Scatterplots & Correlation) 3.2 (Least-Squares Regression Line) Study Flashcards for Interpretations Quiz on Friday 3/22 (A) & Monday 3/25 (B) B - 3/19 A - 3/20. 3.2 (Least-Squares Regression Line) ... Complete Linear Regression Review WS

  7. Solved AP Statistics Page 1 of 8 Assignment: Linear

    AP Statistics Page 2 of 8 Assignment: Linear Regression Lines 1. Enter the number of hits for all 50 players into your calculator in your first list and enter the number of runs into your second list. Do a linear regression on the calculator of x, the number of hits, to y, the number of runs. Record your linear regression equation y1.

  8. PDF AP Statistics

    Correlation Coefficient on Ti84 Calculator can compute correlation coefficient, but you need to run linear regression (explained more fully later): I ) Mode, Stat Diagnostics: set to ON (only done once) 2) Data entered into Ll and 1.2 (or any list) 3) Stat, -> Calc, 8: LinReg(a+bx) list: I-I list: La reqList: alculate 5) Screen should display r ...

  9. AP Statistics: Linear Regression Part 1

    This video covers the beginnings of linear regression. What a regression is and how to identify, interpret, and predict using a line. Residuals and residua...

  10. AP Statistics 2024

    🎥 Watch: AP Stats - Least Squares Regression Lines. Practice Problem. A researcher is studying the relationship between the amount of sleep (in hours) and the performance on a cognitive test. She collects data from 50 participants and fits a linear regression model to the data. The summary of the model is shown below:

  11. AP Statistics Chapter 2

    AP Statistics. Chapter 2 Exploring Two-Variable Data. Unit 2-1: Unit 2-2: Unit 2-3: Unit 2-4: ... Unit 2-3 Linear Regression Models and Least Squares Regression Outline. ... Interpret the slope and y-intercept of a regression line in context. Detect when extrapolation occurs. Assignment 2.3: 2-3 Slides; 2-3 Notes; Unit 2-4 Residuals & Residual ...

  12. AP Statistics Chapter 8- Linear Regression Flashcards

    The origin point (0,0) The R^2 value shows how much of the variation in the response variable can be accounted for by the linear regression model. If R^2 = 0.95, what can be concluded about the relationship between x and y? 95% of the variability in Y is accounted for by the linear relationship with X.

  13. AP Stats: Regression Line, Predictions & Residuals

    y = a + bx1 + b2x2 + b3x3. The y-intercept (a) is a starting point for making a prediction for the response variable and then each time we add one more explanatory variable we are refining that prediction. This process is called multiple regression (not part of the AP® Statistics course). Always have students use context rather than x and y ...

  14. Solved AP Statistics Assignment: Linear Regression Lines

    AP Statistics Assignment: Linear Regression Lines Page 1 of 3 Use the data set below to answer questions 1 through 13. You are expected to round your answers to thousandths. The data included with this assignment are hitting statistics for the top 25 home run hitters from both the National and American Leagues in Major League Baseball.

  15. AP STATISTICS: LINEAR REGRESSION Flashcards

    Study with Quizlet and memorize flashcards containing terms like interpreting slope, interpreting y intercept, interpreting correlation coefficient and more.

  16. AP Statistics Chapter 12: Inference for Linear Regression

    Least Square Regression Line. The linear fit that matches the pattern of a set of paired data as closely as possible Predicted y = a + b (x) to predict a response variable y from an explanatory variable x. Sample Regression Line. estimate of the population (true) least square regression line. Population Regression Line.

  17. Linear Regression Lines in AP Statistics: Analyzing the

    AP Statistics Assignment: Linear Regression Lines Use the data set below to answer questions 1 through 13. You are expected to round your answers to thousandths. The data included with this assignment are hitting statistics for the top 25 home run hitters from both the National and American Leagues in Major League Baseball. The statistics included are the number of base hits, the number of ...

  18. PDF LINEAR REGRESSION CHAP 7

    LINEAR REGRESSION CHAP 7 AP Statistics 1 The object [of statistics] is to discover methods of condensing ... Because the Least Squares Regression Line minimizes the vertical distances to the observed data, you can't ever (no, not ever … don't even ... Assignment 2: Do Ch 7 exercises # 19-31 odd, 37-41 odd, 45, 53, 57, 59 www.causeweb.org ...

  19. AP Statistics Regression Lines Assignment: Linear Use

    Question: AP Statistics Regression Lines Assignment: Linear Use the data set below to answer questions ithrough 13. You are expected to round your forthe 25 home run hitters from both the National and American Leagues in Major League Baseball. The home runs are the number of base hits, the number of runs, and the number of season as of 8/17/99.

  20. 3.2.6 Practice.docx

    AP Statistics Assignment: Linear Regression Lines 1. Enter the number of hits for all 50 players into your calculator in your first list and enter the number of runs into your second list. Do a linear regression on the calculator of x, the number of hits, to y, the number of runs. Record your linear regression equation y1. (If you're not sure how to do this on your calculator, refer to the ...

  21. AP Statistics Worksheet Linear Regression

    AP Statistics Worksheet Linear Regression Name: _____ 1. Draw a scatterplot of the data. Check Quantitative Condition. 2. Describe the relationship between Temperature and Latitude in the US. 3. Check your conditions to see if you can proceed with the regression analysis. State if they have been meet and explain way. 4.

  22. Module 3 Assignment: Linear Regression

    Module 3 Assignment: Linear Regression. In this activity we will: Find a regression line and plot it on the scatterplot. Examine the effect of outliers on the regression line. Use the regression line to make predictions and evaluate how reliable these predictions are.

  23. AP SLALISLILS Assignment: Linear Regression Lines

    Statistics and Probability questions and answers. AP SLALISLILS Assignment: Linear Regression Lines noiz 2 vage 4 ore 15minangaza 5. Calculate the residuals and store them in your calculator. Then plot them against the or X-values and sketch the residual plot. (The sketch does not have to be perfect.)