meaning of regression analysis in research

Regression Analysis

Regression analysis is a quantitative research method which is used when the study involves modelling and analysing several variables, where the relationship includes a dependent variable and one or more independent variables. In simple terms, regression analysis is a quantitative method used to test the nature of relationships between a dependent variable and one or more independent variables.

The basic form of regression models includes unknown parameters (β), independent variables (X), and the dependent variable (Y).

Regression model, basically, specifies the relation of dependent variable (Y) to a function combination of independent variables (X) and unknown parameters (β)

Y ≈ f (X, β)

Regression equation can be used to predict the values of ‘y’, if the value of ‘x’ is given, and both ‘y’ and ‘x’ are the two sets of measures of a sample size of ‘n’. The formulae for regression equation would be

Do not be intimidated by visual complexity of correlation and regression formulae above. You don’t have to apply the formula manually, and correlation and regression analyses can be run with the application of popular analytical software such as Microsoft Excel, Microsoft Access, SPSS and others.

Linear regression analysis is based on the following set of assumptions:

1. Assumption of linearity . There is a linear relationship between dependent and independent variables.

2. Assumption of homoscedasticity . Data values for dependent and independent variables have equal variances.

3. Assumption of absence of collinearity or multicollinearity . There is no correlation between two or more independent variables.

4. Assumption of normal distribution . The data for the independent variables and dependent variable are normally distributed

My e-book, The Ultimate Guide to Writing a Dissertation in Business Studies: a step by step assistance offers practical assistance to complete a dissertation with minimum or no stress. The e-book covers all stages of writing a dissertation starting from the selection to the research area to submitting the completed version of the work within the deadline. John Dudovskiy

Home » Regression Analysis – Methods, Types and Examples

Regression Analysis – Methods, Types and Examples

Table of Contents

Regression Analysis

Regression analysis is a set of statistical processes for estimating the relationships among variables . It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables (or ‘predictors’).

Regression Analysis Methodology

Here is a general methodology for performing regression analysis:

Define the research question: Clearly state the research question or hypothesis you want to investigate. Identify the dependent variable (also called the response variable or outcome variable) and the independent variables (also called predictor variables or explanatory variables) that you believe are related to the dependent variable.
Collect data: Gather the data for the dependent variable and independent variables. Ensure that the data is relevant, accurate, and representative of the population or phenomenon you are studying.
Explore the data: Perform exploratory data analysis to understand the characteristics of the data, identify any missing values or outliers, and assess the relationships between variables through scatter plots, histograms, or summary statistics.
Choose the regression model: Select an appropriate regression model based on the nature of the variables and the research question. Common regression models include linear regression, multiple regression, logistic regression, polynomial regression, and time series regression, among others.
Assess assumptions: Check the assumptions of the regression model. Some common assumptions include linearity (the relationship between variables is linear), independence of errors, homoscedasticity (constant variance of errors), and normality of errors. Violation of these assumptions may require additional steps or alternative models.
Estimate the model: Use a suitable method to estimate the parameters of the regression model. The most common method is ordinary least squares (OLS), which minimizes the sum of squared differences between the observed and predicted values of the dependent variable.
I nterpret the results: Analyze the estimated coefficients, p-values, confidence intervals, and goodness-of-fit measures (e.g., R-squared) to interpret the results. Determine the significance and direction of the relationships between the independent variables and the dependent variable.
Evaluate model performance: Assess the overall performance of the regression model using appropriate measures, such as R-squared, adjusted R-squared, and root mean squared error (RMSE). These measures indicate how well the model fits the data and how much of the variation in the dependent variable is explained by the independent variables.
Test assumptions and diagnose problems: Check the residuals (the differences between observed and predicted values) for any patterns or deviations from assumptions. Conduct diagnostic tests, such as examining residual plots, testing for multicollinearity among independent variables, and assessing heteroscedasticity or autocorrelation, if applicable.
Make predictions and draw conclusions: Once you have a satisfactory model, use it to make predictions on new or unseen data. Draw conclusions based on the results of the analysis, considering the limitations and potential implications of the findings.

Types of Regression Analysis

Types of Regression Analysis are as follows:

Linear Regression

Linear regression is the most basic and widely used form of regression analysis. It models the linear relationship between a dependent variable and one or more independent variables. The goal is to find the best-fitting line that minimizes the sum of squared differences between observed and predicted values.

Multiple Regression

Multiple regression extends linear regression by incorporating two or more independent variables to predict the dependent variable. It allows for examining the simultaneous effects of multiple predictors on the outcome variable.

Polynomial Regression

Polynomial regression models non-linear relationships between variables by adding polynomial terms (e.g., squared or cubic terms) to the regression equation. It can capture curved or nonlinear patterns in the data.

Logistic Regression

Logistic regression is used when the dependent variable is binary or categorical. It models the probability of the occurrence of a certain event or outcome based on the independent variables. Logistic regression estimates the coefficients using the logistic function, which transforms the linear combination of predictors into a probability.

Ridge Regression and Lasso Regression

Ridge regression and Lasso regression are techniques used for addressing multicollinearity (high correlation between independent variables) and variable selection. Both methods introduce a penalty term to the regression equation to shrink or eliminate less important variables. Ridge regression uses L2 regularization, while Lasso regression uses L1 regularization.

Time Series Regression

Time series regression analyzes the relationship between a dependent variable and independent variables when the data is collected over time. It accounts for autocorrelation and trends in the data and is used in forecasting and studying temporal relationships.

Nonlinear Regression

Nonlinear regression models are used when the relationship between the dependent variable and independent variables is not linear. These models can take various functional forms and require estimation techniques different from those used in linear regression.

Poisson Regression

Poisson regression is employed when the dependent variable represents count data. It models the relationship between the independent variables and the expected count, assuming a Poisson distribution for the dependent variable.

Generalized Linear Models (GLM)

GLMs are a flexible class of regression models that extend the linear regression framework to handle different types of dependent variables, including binary, count, and continuous variables. GLMs incorporate various probability distributions and link functions.

Regression Analysis Formulas

Regression analysis involves estimating the parameters of a regression model to describe the relationship between the dependent variable (Y) and one or more independent variables (X). Here are the basic formulas for linear regression, multiple regression, and logistic regression:

Linear Regression:

Simple Linear Regression Model: Y = β0 + β1X + ε

Multiple Linear Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

In both formulas:

Y represents the dependent variable (response variable).
X represents the independent variable(s) (predictor variable(s)).
β0, β1, β2, …, βn are the regression coefficients or parameters that need to be estimated.
ε represents the error term or residual (the difference between the observed and predicted values).

Multiple Regression:

Multiple regression extends the concept of simple linear regression by including multiple independent variables.

Multiple Regression Model: Y = β0 + β1X1 + β2X2 + … + βnXn + ε

The formulas are similar to those in linear regression, with the addition of more independent variables.

Logistic Regression:

Logistic regression is used when the dependent variable is binary or categorical. The logistic regression model applies a logistic or sigmoid function to the linear combination of the independent variables.

Logistic Regression Model: p = 1 / (1 + e^-(β0 + β1X1 + β2X2 + … + βnXn))

In the formula:

p represents the probability of the event occurring (e.g., the probability of success or belonging to a certain category).
X1, X2, …, Xn represent the independent variables.
e is the base of the natural logarithm.

The logistic function ensures that the predicted probabilities lie between 0 and 1, allowing for binary classification.

Regression Analysis Examples

Regression Analysis Examples are as follows:

Stock Market Prediction: Regression analysis can be used to predict stock prices based on various factors such as historical prices, trading volume, news sentiment, and economic indicators. Traders and investors can use this analysis to make informed decisions about buying or selling stocks.
Demand Forecasting: In retail and e-commerce, real-time It can help forecast demand for products. By analyzing historical sales data along with real-time data such as website traffic, promotional activities, and market trends, businesses can adjust their inventory levels and production schedules to meet customer demand more effectively.
Energy Load Forecasting: Utility companies often use real-time regression analysis to forecast electricity demand. By analyzing historical energy consumption data, weather conditions, and other relevant factors, they can predict future energy loads. This information helps them optimize power generation and distribution, ensuring a stable and efficient energy supply.
Online Advertising Performance: It can be used to assess the performance of online advertising campaigns. By analyzing real-time data on ad impressions, click-through rates, conversion rates, and other metrics, advertisers can adjust their targeting, messaging, and ad placement strategies to maximize their return on investment.
Predictive Maintenance: Regression analysis can be applied to predict equipment failures or maintenance needs. By continuously monitoring sensor data from machines or vehicles, regression models can identify patterns or anomalies that indicate potential failures. This enables proactive maintenance, reducing downtime and optimizing maintenance schedules.
Financial Risk Assessment: Real-time regression analysis can help financial institutions assess the risk associated with lending or investment decisions. By analyzing real-time data on factors such as borrower financials, market conditions, and macroeconomic indicators, regression models can estimate the likelihood of default or assess the risk-return tradeoff for investment portfolios.

Importance of Regression Analysis

Importance of Regression Analysis is as follows:

Relationship Identification: Regression analysis helps in identifying and quantifying the relationship between a dependent variable and one or more independent variables. It allows us to determine how changes in independent variables impact the dependent variable. This information is crucial for decision-making, planning, and forecasting.
Prediction and Forecasting: Regression analysis enables us to make predictions and forecasts based on the relationships identified. By estimating the values of the dependent variable using known values of independent variables, regression models can provide valuable insights into future outcomes. This is particularly useful in business, economics, finance, and other fields where forecasting is vital for planning and strategy development.
Causality Assessment: While correlation does not imply causation, regression analysis provides a framework for assessing causality by considering the direction and strength of the relationship between variables. It allows researchers to control for other factors and assess the impact of a specific independent variable on the dependent variable. This helps in determining the causal effect and identifying significant factors that influence outcomes.
Model Building and Variable Selection: Regression analysis aids in model building by determining the most appropriate functional form of the relationship between variables. It helps researchers select relevant independent variables and eliminate irrelevant ones, reducing complexity and improving model accuracy. This process is crucial for creating robust and interpretable models.
Hypothesis Testing: Regression analysis provides a statistical framework for hypothesis testing. Researchers can test the significance of individual coefficients, assess the overall model fit, and determine if the relationship between variables is statistically significant. This allows for rigorous analysis and validation of research hypotheses.
Policy Evaluation and Decision-Making: Regression analysis plays a vital role in policy evaluation and decision-making processes. By analyzing historical data, researchers can evaluate the effectiveness of policy interventions and identify the key factors contributing to certain outcomes. This information helps policymakers make informed decisions, allocate resources effectively, and optimize policy implementation.
Risk Assessment and Control: Regression analysis can be used for risk assessment and control purposes. By analyzing historical data, organizations can identify risk factors and develop models that predict the likelihood of certain outcomes, such as defaults, accidents, or failures. This enables proactive risk management, allowing organizations to take preventive measures and mitigate potential risks.

When to Use Regression Analysis

Prediction : Regression analysis is often employed to predict the value of the dependent variable based on the values of independent variables. For example, you might use regression to predict sales based on advertising expenditure, or to predict a student’s academic performance based on variables like study time, attendance, and previous grades.
Relationship analysis: Regression can help determine the strength and direction of the relationship between variables. It can be used to examine whether there is a linear association between variables, identify which independent variables have a significant impact on the dependent variable, and quantify the magnitude of those effects.
Causal inference: Regression analysis can be used to explore cause-and-effect relationships by controlling for other variables. For example, in a medical study, you might use regression to determine the impact of a specific treatment while accounting for other factors like age, gender, and lifestyle.
Forecasting : Regression models can be utilized to forecast future trends or outcomes. By fitting a regression model to historical data, you can make predictions about future values of the dependent variable based on changes in the independent variables.
Model evaluation: Regression analysis can be used to evaluate the performance of a model or test the significance of variables. You can assess how well the model fits the data, determine if additional variables improve the model’s predictive power, or test the statistical significance of coefficients.
Data exploration : Regression analysis can help uncover patterns and insights in the data. By examining the relationships between variables, you can gain a deeper understanding of the data set and identify potential patterns, outliers, or influential observations.

Applications of Regression Analysis

Here are some common applications of regression analysis:

Economic Forecasting: Regression analysis is frequently employed in economics to forecast variables such as GDP growth, inflation rates, or stock market performance. By analyzing historical data and identifying the underlying relationships, economists can make predictions about future economic conditions.
Financial Analysis: Regression analysis plays a crucial role in financial analysis, such as predicting stock prices or evaluating the impact of financial factors on company performance. It helps analysts understand how variables like interest rates, company earnings, or market indices influence financial outcomes.
Marketing Research: Regression analysis helps marketers understand consumer behavior and make data-driven decisions. It can be used to predict sales based on advertising expenditures, pricing strategies, or demographic variables. Regression models provide insights into which marketing efforts are most effective and help optimize marketing campaigns.
Health Sciences: Regression analysis is extensively used in medical research and public health studies. It helps examine the relationship between risk factors and health outcomes, such as the impact of smoking on lung cancer or the relationship between diet and heart disease. Regression analysis also helps in predicting health outcomes based on various factors like age, genetic markers, or lifestyle choices.
Social Sciences: Regression analysis is widely used in social sciences like sociology, psychology, and education research. Researchers can investigate the impact of variables like income, education level, or social factors on various outcomes such as crime rates, academic performance, or job satisfaction.
Operations Research: Regression analysis is applied in operations research to optimize processes and improve efficiency. For example, it can be used to predict demand based on historical sales data, determine the factors influencing production output, or optimize supply chain logistics.
Environmental Studies: Regression analysis helps in understanding and predicting environmental phenomena. It can be used to analyze the impact of factors like temperature, pollution levels, or land use patterns on phenomena such as species diversity, water quality, or climate change.
Sports Analytics: Regression analysis is increasingly used in sports analytics to gain insights into player performance, team strategies, and game outcomes. It helps analyze the relationship between various factors like player statistics, coaching strategies, or environmental conditions and their impact on game outcomes.

Advantages and Disadvantages of Regression Analysis

Advantages of Regression Analysis	Disadvantages of Regression Analysis
Provides a quantitative measure of the relationship between variables	Assumes a linear relationship between variables, which may not always hold true
Helps in predicting and forecasting outcomes based on historical data	Requires a large sample size to produce reliable results
Identifies and measures the significance of independent variables on the dependent variable	Assumes no multicollinearity, meaning that independent variables should not be highly correlated with each other
Provides estimates of the coefficients that represent the strength and direction of the relationship between variables	Assumes the absence of outliers or influential data points
Allows for hypothesis testing to determine the statistical significance of the relationship	Can be sensitive to the inclusion or exclusion of certain variables, leading to different results
Can handle both continuous and categorical variables	Assumes the independence of observations, which may not hold true in some cases
Offers a visual representation of the relationship through the use of scatter plots and regression lines	May not capture complex non-linear relationships between variables without appropriate transformations
Provides insights into the marginal effects of independent variables on the dependent variable	Requires the assumption of homoscedasticity, meaning that the variance of errors is constant across all levels of the independent variables

About the author

Muhammad Hassan

Researcher, Academic Writer, Web developer

Factor Analysis – Steps, Methods and Examples

Multidimensional Scaling – Types, Formulas and...

ANOVA (Analysis of variance) – Formulas, Types...

Discriminant Analysis – Methods, Types and...

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis – Methods, Applications and...

What is Regression Analysis?

Regression Analysis – Linear Model Assumptions
Regression Analysis – Simple Linear Regression
Regression Analysis – Multiple Linear Regression

Regression Analysis in Finance

Regression tools, additional resources, regression analysis.

The estimation of relationships between a dependent variable and one or more independent variables

Regression analysis is a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables . It can be utilized to assess the strength of the relationship between variables and for modeling the future relationship between them.

Regression Analysis - Types of Regression Analysis

Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most common models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for more complicated data sets in which the dependent and independent variables show a nonlinear relationship.

Regression analysis offers numerous applications in various disciplines, including finance .

Regression Analysis – Linear Model Assumptions

Linear regression analysis is based on six fundamental assumptions:

The dependent and independent variables show a linear relationship between the slope and the intercept.
The independent variable is not random.
The value of the residual (error) is zero.
The value of the residual (error) is constant across all observations.
The value of the residual (error) is not correlated across all observations.
The residual (error) values follow the normal distribution.

Regression Analysis – Simple Linear Regression

Simple linear regression is a model that assesses the relationship between a dependent variable and an independent variable. The simple linear model is expressed using the following equation:

Y = a + bX + ϵ

Y – Dependent variable
X – Independent (explanatory) variable
a – Intercept
b – Slope
ϵ – Residual (error)

Check out the following video to learn more about simple linear regression:

Regression Analysis – Multiple Linear Regression

Multiple linear regression analysis is essentially similar to the simple linear model, with the exception that multiple independent variables are used in the model. The mathematical representation of multiple linear regression is:

Y = a + b X 1 + c X 2 + d X 3 + ϵ

X 1 , X 2 , X 3 – Independent (explanatory) variables
b, c, d – Slopes

Multiple linear regression follows the same conditions as the simple linear model. However, since there are several independent variables in multiple linear analysis, there is another mandatory condition for the model:

Non-collinearity: Independent variables should show a minimum correlation with each other. If the independent variables are highly correlated with each other, it will be difficult to assess the true relationships between the dependent and independent variables.

Regression analysis comes with several applications in finance. For example, the statistical method is fundamental to the Capital Asset Pricing Model (CAPM) . Essentially, the CAPM equation is a model that determines the relationship between the expected return of an asset and the market risk premium.

The analysis is also used to forecast the returns of securities, based on different factors, or to forecast the performance of a business. Learn more forecasting methods in CFI’s Budgeting and Forecasting Course !

1. Beta and CAPM

In finance, regression analysis is used to calculate the Beta (volatility of returns relative to the overall market) for a stock. It can be done in Excel using the Slope function .

Screenshot of Beta Calculator Template in Excel

Download CFI’s free beta calculator !

2. Forecasting Revenues and Expenses

When forecasting financial statements for a company, it may be useful to do a multiple regression analysis to determine how changes in certain assumptions or drivers of the business will impact revenue or expenses in the future. For example, there may be a very high correlation between the number of salespeople employed by a company, the number of stores they operate, and the revenue the business generates.

Simple Linear Regression - Forecasting Revenues and Expenses

The above example shows how to use the Forecast function in Excel to calculate a company’s revenue, based on the number of ads it runs.

Learn more forecasting methods in CFI’s Budgeting and Forecasting Course !

Excel remains a popular tool to conduct basic regression analysis in finance, however, there are many more advanced statistical tools that can be used.

Python and R are both powerful coding languages that have become popular for all types of financial modeling, including regression. These techniques form a core part of data science and machine learning, where models are trained to detect these relationships in data.

Learn more about regression analysis, Python, and Machine Learning in CFI’s Business Intelligence & Data Analysis certification.

To learn more about related topics, check out the following free CFI resources:

Cost Behavior Analysis
Forecasting Methods
Joseph Effect
Variance Inflation Factor (VIF)
High Low Method vs. Regression Analysis
See all data science resources
Share this article

Create a free account to unlock this Template

Access and download collection of free Templates to help power your productivity and performance.

Already have an account? Log in

Supercharge your skills with Premium Templates

Take your learning and productivity to the next level with our Premium Templates.

Upgrading to a paid membership gives you access to our extensive collection of plug-and-play Templates designed to power your performance—as well as CFI's full course catalog and accredited Certification Programs.

Already have a Self-Study or Full-Immersion membership? Log in

Access Exclusive Templates

Gain unlimited access to more than 250 productivity Templates, CFI's full course catalog and accredited Certification Programs, hundreds of resources, expert reviews and support, the chance to work with real-world finance and research tools, and more.

Already have a Full-Immersion membership? Log in

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

Machine learning
Sustainability
Black holes
Classes and programs

Departments

Aeronautics and Astronautics
Brain and Cognitive Sciences
Architecture
Political Science
Mechanical Engineering

Centers, Labs, & Programs

Abdul Latif Jameel Poverty Action Lab (J-PAL)
Picower Institute for Learning and Memory
Lincoln Laboratory
School of Architecture + Planning
School of Engineering
School of Humanities, Arts, and Social Sciences
Sloan School of Management
School of Science
MIT Schwarzman College of Computing

Explained: Regression analysis

meaning of regression analysis in research

Previous image Next image

Share this news article on:

More MIT News

The Grand Concourse Avenue street in the Bronx borough, underneath subway tracks.

Study tracks exposure to air pollution through the day

Read full story →

Seven people pose poolside. The four in front hold remotely operated underwater vehicles that they built.

Edgerton Center hosts workshop for deaf high school students in STEM

Rachel Thompson, Eric Grimson, Yossi Sheffi, Nick Jennings, and Jan Godsell pose at an agreement signing ceremony. The words "Loughborough University" are on the wall behind them.

MIT Global SCALE Network expands by adding center at Loughborough University

Two schematics of the crystal structure of boron nitride, one slightly slightly different. An arrow with "Slide" appears between them.

New transistor’s superlative properties could have broad electronics applications

About 30 students wave while on a boat in a river.

When learning at MIT means studying thousands of miles away

Øie Kolden and Lars Erik Fagernæs, both wearing Aviant shirts, pose together on a grassy field on a cloudy day.

Flying high to enable sustainable delivery, remote care

More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

Map (opens in new window)
Events (opens in new window)
People (opens in new window)
Careers (opens in new window)
Accessibility
Social Media Hub
MIT on Facebook
MIT on YouTube
MIT on Instagram

Search Search Please fill out this field.

What Is Regression?

Understanding regression, calculating regression, the bottom line.

Macroeconomics

Regression: Definition, Analysis, Calculation, and Example

Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between a dependent variable and one or more independent variables.

Linear regression is the most common form of this technique. Also called simple regression or ordinary least squares (OLS), linear regression establishes the linear relationship between two variables.

Linear regression is graphically depicted using a straight line of best fit with the slope defining how the change in one variable impacts a change in the other. The y-intercept of a linear regression relationship represents the value of the dependent variable when the value of the independent variable is zero. Nonlinear regression models also exist, but are far more complex.

Key Takeaways

Regression is a statistical technique that relates a dependent variable to one or more independent variables.
A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the independent variables.
It does this by essentially determining a best-fit line and seeing how the data is dispersed around this line.
Regression helps economists and financial analysts in things ranging from asset valuation to making predictions.
For regression results to be properly interpreted, several assumptions about the data and the model itself must hold.

In economics, regression is used to help investment managers value assets and understand the relationships between factors such as commodity prices and the stocks of businesses dealing in those commodities.

While a powerful tool for uncovering the associations between variables observed in data, it cannot easily indicate causation. Regression as a statistical technique should not be confused with the concept of regression to the mean, also known as mean reversion .

Joules Garcia / Investopedia

Regression captures the correlation between variables observed in a data set and quantifies whether those correlations are statistically significant or not.

The two basic types of regression are simple linear regression and multiple linear regression , although there are nonlinear regression methods for more complicated data and analysis. Simple linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y, while multiple linear regression uses two or more independent variables to predict the outcome. Analysts can use stepwise regression to examine each independent variable contained in the linear regression model.

Regression can help finance and investment professionals. For instance, a company might use it to predict sales based on weather, previous sales, gross domestic product (GDP) growth, or other types of conditions. The capital asset pricing model (CAPM) is an often-used regression model in finance for pricing assets and discovering the costs of capital.

Regression and Econometrics

Econometrics is a set of statistical techniques used to analyze data in finance and economics. An example of the application of econometrics is to study the income effect using observable data. An economist may, for example, hypothesize that as a person increases their income , their spending will also increase.

If the data show that such an association is present, a regression analysis can then be conducted to understand the strength of the relationship between income and consumption and whether or not that relationship is statistically significant.

Note that you can have several independent variables in an analysis—for example, changes to GDP and inflation in addition to unemployment in explaining stock market prices. When more than one independent variable is used, it is referred to as multiple linear regression . This is the most commonly used tool in econometrics.

Econometrics is sometimes criticized for relying too heavily on the interpretation of regression output without linking it to economic theory or looking for causal mechanisms. It is crucial that the findings revealed in the data are able to be adequately explained by a theory.

Linear regression models often use a least-squares approach to determine the line of best fit. The least-squares technique is determined by minimizing the sum of squares created by a mathematical function. A square is, in turn, determined by squaring the distance between a data point and the regression line or mean value of the data set.

Once this process has been completed (usually done today with software), a regression model is constructed. The general form of each type of regression model is:

Simple linear regression:

Y = a + b X + u \begin{aligned}&Y = a + bX + u \\\end{aligned} Y = a + b X + u

Multiple linear regression:

Y = a + b 1 X 1 + b 2 X 2 + b 3 X 3 + . . . + b t X t + u where: Y = The dependent variable you are trying to predict or explain X = The explanatory (independent) variable(s) you are using to predict or associate with Y a = The y-intercept b = (beta coefficient) is the slope of the explanatory variable(s) u = The regression residual or error term \begin{aligned}&Y = a + b_1X_1 + b_2X_2 + b_3X_3 + ... + b_tX_t + u \\&\textbf{where:} \\&Y = \text{The dependent variable you are trying to predict} \\&\text{or explain} \\&X = \text{The explanatory (independent) variable(s) you are } \\&\text{using to predict or associate with Y} \\&a = \text{The y-intercept} \\&b = \text{(beta coefficient) is the slope of the explanatory} \\&\text{variable(s)} \\&u = \text{The regression residual or error term} \\\end{aligned} Y = a + b 1 X 1 + b 2 X 2 + b 3 X 3 + ... + b t X t + u where: Y = The dependent variable you are trying to predict or explain X = The explanatory (independent) variable(s) you are using to predict or associate with Y a = The y-intercept b = (beta coefficient) is the slope of the explanatory variable(s) u = The regression residual or error term

Example of How Regression Analysis Is Used in Finance

Regression is often used to determine how specific factors—such as the price of a commodity, interest rates, particular industries, or sectors—influence the price movement of an asset. The aforementioned CAPM is based on regression, and it's utilized to project the expected returns for stocks and to generate costs of capital. A stock’s returns are regressed against the returns of a broader index, such as the S&P 500, to generate a beta for the particular stock.

Beta is the stock’s risk in relation to the market or index and is reflected as the slope in the CAPM. The return for the stock in question would be the dependent variable Y, while the independent variable X would be the market risk premium.

Additional variables such as the market capitalization of a stock, valuation ratios, and recent returns can be added to the CAPM to get better estimates for returns. These additional factors are known as the Fama-French factors, named after the professors who developed the multiple linear regression model to better explain asset returns.

Why Is It Called Regression?

Although there is some debate about the origins of the name, the statistical technique described above most likely was termed “regression” by Sir Francis Galton in the 19th century to describe the statistical feature of biological data (such as heights of people in a population) to regress to some mean level. In other words, while there are shorter and taller people, only outliers are very tall or short, and most people cluster somewhere around (or “regress” to) the average.

What Is the Purpose of Regression?

In statistical analysis, regression is used to identify the associations between variables occurring in some data. It can show the magnitude of such an association and determine its statistical significance. Regression is a powerful tool for statistical inference and has been used to try to predict future outcomes based on past observations.

How Do You Interpret a Regression Model?

A regression model output may be in the form of Y = 1.0 + (3.2) X 1 - 2.0( X 2 ) + 0.21.

Here we have a multiple linear regression that relates some variable Y with two explanatory variables X 1 and X 2 . We would interpret the model as the value of Y changes by 3.2× for every one-unit change in X 1 (if X 1 goes up by 2, Y goes up by 6.4, etc.) holding all else constant. That means controlling for X 2 , X 1 has this observed relationship. Likewise, holding X1 constant, every one unit increase in X 2 is associated with a 2× decrease in Y. We can also note the y-intercept of 1.0, meaning that Y = 1 when X 1 and X 2 are both zero. The error term (residual) is 0.21.

What Are the Assumptions That Must Hold for Regression Models?

To properly interpret the output of a regression model, the following main assumptions about the underlying data process of what you are analyzing must hold:

The relationship between variables is linear;
There must be homoskedasticity , or the variance of the variables and error term must remain constant;
All explanatory variables are independent of one another;
All variables are normally distributed .

Regression is a statistical method that tries to determine the strength and character of the relationship between one dependent variable and a series of other variables. It is used in finance, investing, and other disciplines.

Regression analysis uncovers the associations between variables observed in data, but cannot easily indicate causation.

Margo Bergman. “ Quantitative Analysis for Business: 12. Simple Linear Regression and Correlation .” University of Washington Pressbooks, 2022.

Margo Bergman. “ Quantitative Analysis for Business: 13. Multiple Linear Regression .” University of Washington Pressbooks, 2022.

Fama, Eugene F., and Kenneth R. French, via Wiley Online Library. “ The Cross-Section of Expected Stock Returns .” The Journal of Finance , vol. 47, no. 2, June 1992, pp. 427–465.

Stanton, Jeffrey M., via Taylor & Francis Online. “ Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors .” Journal of Statistics Education , vol. 9, no. 3, 2001.

CFA Institute. “ Basics of Multiple Regression and Underlying Assumptions .”

Terms of Service
Editorial Policy
Privacy Policy

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

Free Account
Product Demos
For Digital
For Customer Care
For Human Resources
For Researchers
Financial Services
All Industries

Popular Use Cases

Customer Experience
Employee Experience
Net Promoter Score
Voice of Customer
Customer Success Hub
Product Documentation
Training & Certification
XM Institute
Popular Resources
Customer Stories
Artificial Intelligence
Market Research
Partnerships
Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

English/AU & NZ
Español/Europa
Español/América Latina
Português Brasileiro
REQUEST DEMO
Experience Management
Survey Data Analysis & Reporting
Regression Analysis

Try Qualtrics for free

The complete guide to regression analysis.

19 min read What is regression analysis and why is it useful? While most of us have heard the term, understanding regression analysis in detail may be something you need to brush up on. Here’s what you need to know about this popular method of analysis.

When you rely on data to drive and guide business decisions, as well as predict market trends, just gathering and analyzing what you find isn’t enough — you need to ensure it’s relevant and valuable.

The challenge, however, is that so many variables can influence business data: market conditions, economic disruption, even the weather! As such, it’s essential you know which variables are affecting your data and forecasts, and what data you can discard.

And one of the most effective ways to determine data value and monitor trends (and the relationships between them) is to use regression analysis, a set of statistical methods used for the estimation of relationships between independent and dependent variables.

In this guide, we’ll cover the fundamentals of regression analysis, from what it is and how it works to its benefits and practical applications.

Free eBook: 2024 global market research trends report

What is regression analysis?

Regression analysis is a statistical method. It’s used for analyzing different factors that might influence an objective – such as the success of a product launch, business growth, a new marketing campaign – and determining which factors are important and which ones can be ignored.

Regression analysis can also help leaders understand how different variables impact each other and what the outcomes are. For example, when forecasting financial performance, regression analysis can help leaders determine how changes in the business can influence revenue or expenses in the future.

Running an analysis of this kind, you might find that there’s a high correlation between the number of marketers employed by the company, the leads generated, and the opportunities closed.

This seems to suggest that a high number of marketers and a high number of leads generated influences sales success. But do you need both factors to close those sales? By analyzing the effects of these variables on your outcome, you might learn that when leads increase but the number of marketers employed stays constant, there is no impact on the number of opportunities closed, but if the number of marketers increases, leads and closed opportunities both rise.

Regression analysis can help you tease out these complex relationships so you can determine which areas you need to focus on in order to get your desired results, and avoid wasting time with those that have little or no impact. In this example, that might mean hiring more marketers rather than trying to increase leads generated.

How does regression analysis work?

Regression analysis starts with variables that are categorized into two types: dependent and independent variables. The variables you select depend on the outcomes you’re analyzing.

Understanding variables:

1. dependent variable.

This is the main variable that you want to analyze and predict. For example, operational (O) data such as your quarterly or annual sales, or experience (X) data such as your net promoter score (NPS) or customer satisfaction score (CSAT) .

These variables are also called response variables, outcome variables, or left-hand-side variables (because they appear on the left-hand side of a regression equation).

There are three easy ways to identify them:

Is the variable measured as an outcome of the study?
Does the variable depend on another in the study?
Do you measure the variable only after other variables are altered?

2. Independent variable

Independent variables are the factors that could affect your dependent variables. For example, a price rise in the second quarter could make an impact on your sales figures.

You can identify independent variables with the following list of questions:

Is the variable manipulated, controlled, or used as a subject grouping method by the researcher?
Does this variable come before the other variable in time?
Are you trying to understand whether or how this variable affects another?

Independent variables are often referred to differently in regression depending on the purpose of the analysis. You might hear them called:

Explanatory variables

Explanatory variables are those which explain an event or an outcome in your study. For example, explaining why your sales dropped or increased.

Predictor variables

Predictor variables are used to predict the value of the dependent variable. For example, predicting how much sales will increase when new product features are rolled out .

Experimental variables

These are variables that can be manipulated or changed directly by researchers to assess the impact. For example, assessing how different product pricing ($10 vs $15 vs $20) will impact the likelihood to purchase.

Subject variables (also called fixed effects)

Subject variables can’t be changed directly, but vary across the sample. For example, age, gender, or income of consumers.

Unlike experimental variables, you can’t randomly assign or change subject variables, but you can design your regression analysis to determine the different outcomes of groups of participants with the same characteristics. For example, ‘how do price rises impact sales based on income?’

Carrying out regression analysis

So regression is about the relationships between dependent and independent variables. But how exactly do you do it?

Assuming you have your data collection done already, the first and foremost thing you need to do is plot your results on a graph. Doing this makes interpreting regression analysis results much easier as you can clearly see the correlations between dependent and independent variables.

Let’s say you want to carry out a regression analysis to understand the relationship between the number of ads placed and revenue generated.

On the Y-axis, you place the revenue generated. On the X-axis, the number of digital ads. By plotting the information on the graph, and drawing a line (called the regression line) through the middle of the data, you can see the relationship between the number of digital ads placed and revenue generated.

This regression line is the line that provides the best description of the relationship between your independent variables and your dependent variable. In this example, we’ve used a simple linear regression model.

Statistical analysis software can draw this line for you and precisely calculate the regression line. The software then provides a formula for the slope of the line, adding further context to the relationship between your dependent and independent variables.

Simple linear regression analysis

A simple linear model uses a single straight line to determine the relationship between a single independent variable and a dependent variable.

This regression model is mostly used when you want to determine the relationship between two variables (like price increases and sales) or the value of the dependent variable at certain points of the independent variable (for example the sales levels at a certain price rise).

While linear regression is useful, it does require you to make some assumptions.

For example, it requires you to assume that:

the data was collected using a statistically valid sample collection method that is representative of the target population
The observed relationship between the variables can’t be explained by a ‘hidden’ third variable – in other words, there are no spurious correlations.
the relationship between the independent variable and dependent variable is linear – meaning that the best fit along the data points is a straight line and not a curved one

Multiple regression analysis

As the name suggests, multiple regression analysis is a type of regression that uses multiple variables. It uses multiple independent variables to predict the outcome of a single dependent variable. Of the various kinds of multiple regression, multiple linear regression is one of the best-known.

Multiple linear regression is a close relative of the simple linear regression model in that it looks at the impact of several independent variables on one dependent variable. However, like simple linear regression, multiple regression analysis also requires you to make some basic assumptions.

For example, you will be assuming that:

there is a linear relationship between the dependent and independent variables (it creates a straight line and not a curve through the data points)
the independent variables aren’t highly correlated in their own right

An example of multiple linear regression would be an analysis of how marketing spend, revenue growth, and general market sentiment affect the share price of a company.

With multiple linear regression models you can estimate how these variables will influence the share price, and to what extent.

Multivariate linear regression

Multivariate linear regression involves more than one dependent variable as well as multiple independent variables, making it more complicated than linear or multiple linear regressions. However, this also makes it much more powerful and capable of making predictions about complex real-world situations.

For example, if an organization wants to establish or estimate how the COVID-19 pandemic has affected employees in its different markets, it can use multivariate linear regression, with the different geographical regions as dependent variables and the different facets of the pandemic as independent variables (such as mental health self-rating scores, proportion of employees working at home, lockdown durations and employee sick days).

Through multivariate linear regression, you can look at relationships between variables in a holistic way and quantify the relationships between them. As you can clearly visualize those relationships, you can make adjustments to dependent and independent variables to see which conditions influence them. Overall, multivariate linear regression provides a more realistic picture than looking at a single variable.

However, because multivariate techniques are complex, they involve high-level mathematics that require a statistical program to analyze the data.

Logistic regression

Logistic regression models the probability of a binary outcome based on independent variables.

So, what is a binary outcome? It’s when there are only two possible scenarios, either the event happens (1) or it doesn’t (0). e.g. yes/no outcomes, pass/fail outcomes, and so on. In other words, if the outcome can be described as being in either one of two categories.

Logistic regression makes predictions based on independent variables that are assumed or known to have an influence on the outcome. For example, the probability of a sports team winning their game might be affected by independent variables like weather, day of the week, whether they are playing at home or away and how they fared in previous matches.

What are some common mistakes with regression analysis?

Across the globe, businesses are increasingly relying on quality data and insights to drive decision-making — but to make accurate decisions, it’s important that the data collected and statistical methods used to analyze it are reliable and accurate.

Using the wrong data or the wrong assumptions can result in poor decision-making, lead to missed opportunities to improve efficiency and savings, and — ultimately — damage your business long term.

Assumptions

When running regression analysis, be it a simple linear or multiple regression, it’s really important to check that the assumptions your chosen method requires have been met. If your data points don’t conform to a straight line of best fit, for example, you need to apply additional statistical modifications to accommodate the non-linear data. For example, if you are looking at income data, which scales on a logarithmic distribution, you should take the Natural Log of Income as your variable then adjust the outcome after the model is created.

Correlation vs. causation

It’s a well-worn phrase that bears repeating – correlation does not equal causation. While variables that are linked by causality will always show correlation, the reverse is not always true. Moreover, there is no statistic that can determine causality (although the design of your study overall can).

If you observe a correlation in your results, such as in the first example we gave in this article where there was a correlation between leads and sales, you can’t assume that one thing has influenced the other. Instead, you should use it as a starting point for investigating the relationship between the variables in more depth.

Choosing the wrong variables to analyze

Before you use any kind of statistical method, it’s important to understand the subject you’re researching in detail. Doing so means you’re making informed choices of variables and you’re not overlooking something important that might have a significant bearing on your dependent variable.

Model building The variables you include in your analysis are just as important as the variables you choose to exclude. That’s because the strength of each independent variable is influenced by the other variables in the model. Other techniques, such as Key Drivers Analysis, are able to account for these variable interdependencies.

Benefits of using regression analysis

There are several benefits to using regression analysis to judge how changing variables will affect your business and to ensure you focus on the right things when forecasting.

Here are just a few of those benefits:

Make accurate predictions

Regression analysis is commonly used when forecasting and forward planning for a business. For example, when predicting sales for the year ahead, a number of different variables will come into play to determine the eventual result.

Regression analysis can help you determine which of these variables are likely to have the biggest impact based on previous events and help you make more accurate forecasts and predictions.

Identify inefficiencies

Using a regression equation a business can identify areas for improvement when it comes to efficiency, either in terms of people, processes, or equipment.

For example, regression analysis can help a car manufacturer determine order numbers based on external factors like the economy or environment.

Using the initial regression equation, they can use it to determine how many members of staff and how much equipment they need to meet orders.

Drive better decisions

Improving processes or business outcomes is always on the minds of owners and business leaders, but without actionable data, they’re simply relying on instinct, and this doesn’t always work out.

This is particularly true when it comes to issues of price. For example, to what extent will raising the price (and to what level) affect next quarter’s sales?

There’s no way to know this without data analysis. Regression analysis can help provide insights into the correlation between price rises and sales based on historical data.

How do businesses use regression? A real-life example

Marketing and advertising spending are common topics for regression analysis. Companies use regression when trying to assess the value of ad spend and marketing spend on revenue.

A typical example is using a regression equation to assess the correlation between ad costs and conversions of new customers. In this instance,

our dependent variable (the factor we’re trying to assess the outcomes of) will be our conversions
the independent variable (the factor we’ll change to assess how it changes the outcome) will be the daily ad spend
the regression equation will try to determine whether an increase in ad spend has a direct correlation with the number of conversions we have

The analysis is relatively straightforward — using historical data from an ad account, we can use daily data to judge ad spend vs conversions and how changes to the spend alter the conversions.

By assessing this data over time, we can make predictions not only on whether increasing ad spend will lead to increased conversions but also what level of spending will lead to what increase in conversions. This can help to optimize campaign spend and ensure marketing delivers good ROI.

This is an example of a simple linear model. If you wanted to carry out a more complex regression equation, we could also factor in other independent variables such as seasonality, GDP, and the current reach of our chosen advertising networks.

By increasing the number of independent variables, we can get a better understanding of whether ad spend is resulting in an increase in conversions, whether it’s exerting an influence in combination with another set of variables, or if we’re dealing with a correlation with no causal impact – which might be useful for predictions anyway, but isn’t a lever we can use to increase sales.

Using this predicted value of each independent variable, we can more accurately predict how spend will change the conversion rate of advertising.

Regression analysis tools

Regression analysis is an important tool when it comes to better decision-making and improved business outcomes. To get the best out of it, you need to invest in the right kind of statistical analysis software.

The best option is likely to be one that sits at the intersection of powerful statistical analysis and intuitive ease of use, as this will empower everyone from beginners to expert analysts to uncover meaning from data, identify hidden trends and produce predictive models without statistical training being required.

To help prevent costly errors, choose a tool that automatically runs the right statistical tests and visualizations and then translates the results into simple language that anyone can put into action.

With software that’s both powerful and user-friendly, you can isolate key experience drivers, understand what influences the business, apply the most appropriate regression methods, identify data issues, and much more.

With Qualtrics’ Stats iQ™, you don’t have to worry about the regression equation because our statistical software will run the appropriate equation for you automatically based on the variable type you want to monitor. You can also use several equations, including linear regression and logistic regression, to gain deeper insights into business outcomes and make more accurate, data-driven decisions.

Related resources

Analysis & Reporting

Data Analysis 31 min read

Social media analytics 13 min read, kano analysis 21 min read, margin of error 11 min read, data saturation in qualitative research 8 min read, thematic analysis 11 min read, behavioral analytics 12 min read, request demo.

Ready to learn more about Qualtrics?

New Zealand
United Kingdom

What is Regression Analysis in Data Science?

Stay Informed With Our Weekly Newsletter

Receive crucial updates on the ever-evolving landscape of technology and innovation.

By clicking 'Sign Up', I acknowledge that my information will be used in accordance with the Institute of Data's Privacy Policy .

Understanding regression analysis in data science

Regression analysis stands as a central statistical technique in data science .

Regression analysis delves into the intricate relationship between dependent and independent variables, facilitating predictions of future scenarios and discerning variable impacts.

Organisation using regression analysis in data science

Core concepts of regression analysis

At its essence, regression analysis entails the positioning of a line or curve within a set of data points.

This line signifies the relationship between variables, assuming a linear bond. The main constituents include:

Dependent variable : The prediction or explanation subject.
Independent variables : Variables that influence the dependent variable.
Coefficients : Representing the intercepts and slopes of the equation.
Error term : Known as the residual, it’s the gap between actual and predicted data.

This technique illuminates the interplay between variables, vital in sectors like economics where discerning these relationships guides decision-making.

However, the foundational assumption of linearity is crucial, and overlooking elements like outliers can skew outcomes.

Regression’s significance in data science

Data professional using regression analysis technique

Regression analysis is a powerful statistical technique that quantifies relationships between variables, identifies significant predictors, and bases predictions on discerned patterns.

By employing regression, data scientists can pinpoint determinants of specific outcomes, imperative in areas like marketing where capturing variables’ impact on consumer behavior is essential.

Additionally, the technique aids in quantifying uncertainty, helping distinguish genuine associations from random occurrences.

Types of regression analysis

Linear regression : This basic form hypothesizes a linear relationship between variables. It’s routinely employed across various sectors to gauge how independent variables influence the dependent counterpart.
Logistic regression : Suited for binary outcomes, logistic regression predicts probabilities based on independent variables. It’s paramount in areas where outcomes have two categories.
Polynomial regression : Venturing beyond linearity, polynomial regression embraces non-linear associations by integrating polynomial terms, offering more nuanced curve fits.

Undertaking Regression Analysis

Data collection and refinement : Collect relevant data and prepare it for analysis. This involves cleaning the data, handling missing values, and transforming variables if necessary.
Model choice and application : The appropriate regression model must be selected once the data is ready. This involves considering the type of relationship, the distribution of variables, and the assumptions of the chosen model. The model is then fitted to the data using statistical algorithms.
Decoding results : Understanding the results is pivotal post-application—from variable significance to evaluating the goodness of fit and practical implications.

Relevance in predictive modeling

Data scientists forecasting report using regression analysis

Regression analysis is the backbone of predictive modeling .

By pinpointing key influencing elements, it augments the precision and reliability of predictions.

Furthermore, regression assists in estimating prospective trends in forecasting, enabling informed decision-making.

Hurdles and restrictions

Though potent, regression analysis is not without its challenges:

Assumptions and pitfalls : Regression relies heavily on several assumptions. Any deviation can lead to skewed interpretations. Being vigilant about these assumptions is paramount.
Addressing hurdles : To navigate these challenges, various strategies are employed. Transforming variables or using sophisticated regression types can mitigate issues.

In data science, regression analysis is a powerful tool that paves the way for insightful predictions and informed decision-making.

For data scientists and researchers, understanding and correctly applying regression analysis remains indispensable in their analytical toolkit.

Considering a future in data science?

The Institute of Data offers a comprehensive curriculum designed to equip you with in-demand skills.

Ready to position yourself at the forefront of the rapidly evolving arena? Contact our local team for a free career consultation .

Stay connected with Institute of Data

Redesigning Her Future: Pia’s Transition from Architecture to Data Science

How to Re-enter the Workforce After a Long Break

From Engineering to Data Science: Simon’s Journey into Tech

US - Maximizing Your Education_ How to Transition Into a Data Science (800 words)

Maximizing Your Education: How to Successfully Transition Into a Software Engineering Career

Full-Time vs Part-Time Study: A Guide to Entering the Tech Industry

How Angelias Transitioned from Business to Data Science A Journey of skill and perseverance.

How Angelia Transitioned from Business to Data Science: A Journey of Skill and Perseverance

Copy Link to Clipboard

SUGGESTED TOPICS
The Magazine
Newsletters
Managing Yourself
Managing Teams
Work-life Balance
The Big Idea
Data & Visuals
Reading Lists
Case Selections
HBR Learning
Topic Feeds
Account Settings
Email Preferences

A Refresher on Regression Analysis

Understanding one of the most important types of data analysis.

You probably know by now that whenever possible you should be making data-driven decisions at work . But do you know how to parse through all the data available to you? The good news is that you probably don’t need to do the number crunching yourself (hallelujah!) but you do need to correctly understand and interpret the analysis created by your colleagues. One of the most important types of data analysis is called regression analysis.

Amy Gallo is a contributing editor at Harvard Business Review, cohost of the Women at Work podcast , and the author of two books: Getting Along: How to Work with Anyone (Even Difficult People) and the HBR Guide to Dealing with Conflict . She writes and speaks about workplace dynamics. Watch her TEDx talk on conflict and follow her on LinkedIn . amyegallo

Partner Center

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
Explore content
About the journal
Publish with us
Sign up for alerts
Published: 31 January 2022

The clinician’s guide to interpreting a regression analysis

Sofia Bzovsky 1 ,
Mark R. Phillips ORCID: orcid.org/0000-0003-0923-261X 2 ,
Robyn H. Guymer ORCID: orcid.org/0000-0002-9441-4356 3 , 4 ,
Charles C. Wykoff 5 , 6 ,
Lehana Thabane ORCID: orcid.org/0000-0003-0355-9734 2 , 7 ,
Mohit Bhandari ORCID: orcid.org/0000-0001-9608-4808 1 , 2 &
Varun Chaudhary ORCID: orcid.org/0000-0002-9988-4146 1 , 2

on behalf of the R.E.T.I.N.A. study group

Eye volume 36 , pages 1715–1717 ( 2022 ) Cite this article

22k Accesses

10 Citations

1 Altmetric

Metrics details

Outcomes research

Introduction

When researchers are conducting clinical studies to investigate factors associated with, or treatments for disease and conditions to improve patient care and clinical practice, statistical evaluation of the data is often necessary. Regression analysis is an important statistical method that is commonly used to determine the relationship between several factors and disease outcomes or to identify relevant prognostic factors for diseases [ 1 ].

This editorial will acquaint readers with the basic principles of and an approach to interpreting results from two types of regression analyses widely used in ophthalmology: linear, and logistic regression.

Linear regression analysis

Linear regression is used to quantify a linear relationship or association between a continuous response/outcome variable or dependent variable with at least one independent or explanatory variable by fitting a linear equation to observed data [ 1 ]. The variable that the equation solves for, which is the outcome or response of interest, is called the dependent variable [ 1 ]. The variable that is used to explain the value of the dependent variable is called the predictor, explanatory, or independent variable [ 1 ].

In a linear regression model, the dependent variable must be continuous (e.g. intraocular pressure or visual acuity), whereas, the independent variable may be either continuous (e.g. age), binary (e.g. sex), categorical (e.g. age-related macular degeneration stage or diabetic retinopathy severity scale score), or a combination of these [ 1 ].

When investigating the effect or association of a single independent variable on a continuous dependent variable, this type of analysis is called a simple linear regression [ 2 ]. In many circumstances though, a single independent variable may not be enough to adequately explain the dependent variable. Often it is necessary to control for confounders and in these situations, one can perform a multivariable linear regression to study the effect or association with multiple independent variables on the dependent variable [ 1 , 2 ]. When incorporating numerous independent variables, the regression model estimates the effect or contribution of each independent variable while holding the values of all other independent variables constant [ 3 ].

When interpreting the results of a linear regression, there are a few key outputs for each independent variable included in the model:

Estimated regression coefficient—The estimated regression coefficient indicates the direction and strength of the relationship or association between the independent and dependent variables [ 4 ]. Specifically, the regression coefficient describes the change in the dependent variable for each one-unit change in the independent variable, if continuous [ 4 ]. For instance, if examining the relationship between a continuous predictor variable and intra-ocular pressure (dependent variable), a regression coefficient of 2 means that for every one-unit increase in the predictor, there is a two-unit increase in intra-ocular pressure. If the independent variable is binary or categorical, then the one-unit change represents switching from one category to the reference category [ 4 ]. For instance, if examining the relationship between a binary predictor variable, such as sex, where ‘female’ is set as the reference category, and intra-ocular pressure (dependent variable), a regression coefficient of 2 means that, on average, males have an intra-ocular pressure that is 2 mm Hg higher than females.

Confidence Interval (CI)—The CI, typically set at 95%, is a measure of the precision of the coefficient estimate of the independent variable [ 4 ]. A large CI indicates a low level of precision, whereas a small CI indicates a higher precision [ 5 ].

P value—The p value for the regression coefficient indicates whether the relationship between the independent and dependent variables is statistically significant [ 6 ].

Logistic regression analysis

As with linear regression, logistic regression is used to estimate the association between one or more independent variables with a dependent variable [ 7 ]. However, the distinguishing feature in logistic regression is that the dependent variable (outcome) must be binary (or dichotomous), meaning that the variable can only take two different values or levels, such as ‘1 versus 0’ or ‘yes versus no’ [ 2 , 7 ]. The effect size of predictor variables on the dependent variable is best explained using an odds ratio (OR) [ 2 ]. ORs are used to compare the relative odds of the occurrence of the outcome of interest, given exposure to the variable of interest [ 5 ]. An OR equal to 1 means that the odds of the event in one group are the same as the odds of the event in another group; there is no difference [ 8 ]. An OR > 1 implies that one group has a higher odds of having the event compared with the reference group, whereas an OR < 1 means that one group has a lower odds of having an event compared with the reference group [ 8 ]. When interpreting the results of a logistic regression, the key outputs include the OR, CI, and p-value for each independent variable included in the model.

Clinical example

Sen et al. investigated the association between several factors (independent variables) and visual acuity outcomes (dependent variable) in patients receiving anti-vascular endothelial growth factor therapy for macular oedema (DMO) by means of both linear and logistic regression [ 9 ]. Multivariable linear regression demonstrated that age (Estimate −0.33, 95% CI − 0.48 to −0.19, p < 0.001) was significantly associated with best-corrected visual acuity (BCVA) at 100 weeks at alpha = 0.05 significance level [ 9 ]. The regression coefficient of −0.33 means that the BCVA at 100 weeks decreases by 0.33 with each additional year of older age.

Multivariable logistic regression also demonstrated that age and ellipsoid zone status were statistically significant associated with achieving a BCVA letter score >70 letters at 100 weeks at the alpha = 0.05 significance level. Patients ≥75 years of age were at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those <50 years of age, since the OR is less than 1 (OR 0.96, 95% CI 0.94 to 0.98, p = 0.001) [ 9 ]. Similarly, patients between the ages of 50–74 years were also at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those <50 years of age, since the OR is less than 1 (OR 0.15, 95% CI 0.04 to 0.48, p = 0.001) [ 9 ]. As well, those with a not intact ellipsoid zone were at a decreased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those with an intact ellipsoid zone (OR 0.20, 95% CI 0.07 to 0.56; p = 0.002). On the other hand, patients with an ungradable/questionable ellipsoid zone were at an increased odds of achieving a BCVA letter score >70 letters at 100 weeks compared to those with an intact ellipsoid zone, since the OR is greater than 1 (OR 2.26, 95% CI 1.14 to 4.48; p = 0.02) [ 9 ].

The narrower the CI, the more precise the estimate is; and the smaller the p value (relative to alpha = 0.05), the greater the evidence against the null hypothesis of no effect or association.

Simply put, linear and logistic regression are useful tools for appreciating the relationship between predictor/explanatory and outcome variables for continuous and dichotomous outcomes, respectively, that can be applied in clinical practice, such as to gain an understanding of risk factors associated with a disease of interest.

Schneider A, Hommel G, Blettner M. Linear Regression. Anal Dtsch Ärztebl Int. 2010;107:776–82.

Google Scholar

Bender R. Introduction to the use of regression models in epidemiology. In: Verma M, editor. Cancer epidemiology. Methods in molecular biology. Humana Press; 2009:179–95.

Schober P, Vetter TR. Confounding in observational research. Anesth Analg. 2020;130:635.

Article Google Scholar

Schober P, Vetter TR. Linear regression in medical research. Anesth Analg. 2021;132:108–9.

Szumilas M. Explaining odds ratios. J Can Acad Child Adolesc Psychiatry. 2010;19:227–9.

Thiese MS, Ronna B, Ott U. P value interpretations and considerations. J Thorac Dis. 2016;8:E928–31.

Schober P, Vetter TR. Logistic regression in medical research. Anesth Analg. 2021;132:365–6.

Zabor EC, Reddy CA, Tendulkar RD, Patil S. Logistic regression in clinical studies. Int J Radiat Oncol Biol Phys. 2022;112:271–7.

Sen P, Gurudas S, Ramu J, Patrao N, Chandra S, Rasheed R, et al. Predictors of visual acuity outcomes after anti-vascular endothelial growth factor treatment for macular edema secondary to central retinal vein occlusion. Ophthalmol Retin. 2021;5:1115–24.

Download references

R.E.T.I.N.A. study group

Varun Chaudhary 1,2 , Mohit Bhandari 1,2 , Charles C. Wykoff 5,6 , Sobha Sivaprasad 8 , Lehana Thabane 2,7 , Peter Kaiser 9 , David Sarraf 10 , Sophie J. Bakri 11 , Sunir J. Garg 12 , Rishi P. Singh 13,14 , Frank G. Holz 15 , Tien Y. Wong 16,17 , and Robyn H. Guymer 3,4

Author information

Authors and affiliations.

Department of Surgery, McMaster University, Hamilton, ON, Canada

Sofia Bzovsky, Mohit Bhandari & Varun Chaudhary

Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, ON, Canada

Mark R. Phillips, Lehana Thabane, Mohit Bhandari & Varun Chaudhary

Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia

Robyn H. Guymer

Department of Surgery, (Ophthalmology), The University of Melbourne, Melbourne, VIC, Australia

Retina Consultants of Texas (Retina Consultants of America), Houston, TX, USA

Charles C. Wykoff

Blanton Eye Institute, Houston Methodist Hospital, Houston, TX, USA

Biostatistics Unit, St. Joseph’s Healthcare Hamilton, Hamilton, ON, Canada

Lehana Thabane

NIHR Moorfields Biomedical Research Centre, Moorfields Eye Hospital, London, UK

Sobha Sivaprasad

Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Peter Kaiser

Retinal Disorders and Ophthalmic Genetics, Stein Eye Institute, University of California, Los Angeles, CA, USA

David Sarraf

Department of Ophthalmology, Mayo Clinic, Rochester, MN, USA

Sophie J. Bakri

The Retina Service at Wills Eye Hospital, Philadelphia, PA, USA

Sunir J. Garg

Center for Ophthalmic Bioinformatics, Cole Eye Institute, Cleveland Clinic, Cleveland, OH, USA

Rishi P. Singh

Cleveland Clinic Lerner College of Medicine, Cleveland, OH, USA

Department of Ophthalmology, University of Bonn, Bonn, Germany

Frank G. Holz

Singapore Eye Research Institute, Singapore, Singapore

Tien Y. Wong

Singapore National Eye Centre, Duke-NUD Medical School, Singapore, Singapore

You can also search for this author in PubMed Google Scholar

Varun Chaudhary
, Mohit Bhandari
, Charles C. Wykoff
, Sobha Sivaprasad
, Lehana Thabane
, Peter Kaiser
, David Sarraf
, Sophie J. Bakri
, Sunir J. Garg
, Rishi P. Singh
, Frank G. Holz
, Tien Y. Wong
& Robyn H. Guymer

Contributions

SB was responsible for writing, critical review and feedback on manuscript. MRP was responsible for conception of idea, critical review and feedback on manuscript. RHG was responsible for critical review and feedback on manuscript. CCW was responsible for critical review and feedback on manuscript. LT was responsible for critical review and feedback on manuscript. MB was responsible for conception of idea, critical review and feedback on manuscript. VC was responsible for conception of idea, critical review and feedback on manuscript.

Corresponding author

Correspondence to Varun Chaudhary .

Ethics declarations

Competing interests.

SB: Nothing to disclose. MRP: Nothing to disclose. RHG: Advisory boards: Bayer, Novartis, Apellis, Roche, Genentech Inc.—unrelated to this study. CCW: Consultant: Acuela, Adverum Biotechnologies, Inc, Aerpio, Alimera Sciences, Allegro Ophthalmics, LLC, Allergan, Apellis Pharmaceuticals, Bayer AG, Chengdu Kanghong Pharmaceuticals Group Co, Ltd, Clearside Biomedical, DORC (Dutch Ophthalmic Research Center), EyePoint Pharmaceuticals, Gentech/Roche, GyroscopeTx, IVERIC bio, Kodiak Sciences Inc, Novartis AG, ONL Therapeutics, Oxurion NV, PolyPhotonix, Recens Medical, Regeron Pharmaceuticals, Inc, REGENXBIO Inc, Santen Pharmaceutical Co, Ltd, and Takeda Pharmaceutical Company Limited; Research funds: Adverum Biotechnologies, Inc, Aerie Pharmaceuticals, Inc, Aerpio, Alimera Sciences, Allergan, Apellis Pharmaceuticals, Chengdu Kanghong Pharmaceutical Group Co, Ltd, Clearside Biomedical, Gemini Therapeutics, Genentech/Roche, Graybug Vision, Inc, GyroscopeTx, Ionis Pharmaceuticals, IVERIC bio, Kodiak Sciences Inc, Neurotech LLC, Novartis AG, Opthea, Outlook Therapeutics, Inc, Recens Medical, Regeneron Pharmaceuticals, Inc, REGENXBIO Inc, Samsung Pharm Co, Ltd, Santen Pharmaceutical Co, Ltd, and Xbrane Biopharma AB—unrelated to this study. LT: Nothing to disclose. MB: Research funds: Pendopharm, Bioventus, Acumed—unrelated to this study. VC: Advisory Board Member: Alcon, Roche, Bayer, Novartis; Grants: Bayer, Novartis—unrelated to this study.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Bzovsky, S., Phillips, M.R., Guymer, R.H. et al. The clinician’s guide to interpreting a regression analysis. Eye 36 , 1715–1717 (2022). https://doi.org/10.1038/s41433-022-01949-z

Download citation

Received : 08 January 2022

Revised : 17 January 2022

Accepted : 18 January 2022

Published : 31 January 2022

Issue Date : September 2022

DOI : https://doi.org/10.1038/s41433-022-01949-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Factors affecting patient satisfaction at a plastic surgery outpatient department at a tertiary centre in south africa.

Chrysis Sofianos

BMC Health Services Research (2023)

Quick links

Explore articles by subject
Guide to authors
Editorial policies

How it works

A Beginner’s Guide to Regression Analysis

Published by Owen Ingram at September 1st, 2021 , Revised On July 5, 2022

Are you good with data-driven decisions at work? If not, why? What is stopping you from getting on the crest of a wave? There could be just one answer to these questions, and that is “too much data getting in the way.” Do not worry; there is a solution to every problem in this world, and there is definitely one for parsing through tons of data.

Yes, you heard it right! You will not have to get in trouble with the number crunching and counting with this solution. What is the solution?

Well, without further ado, we would like to introduce you to “regression,” which precisely is allowing one to see into the future.

What is Regression Analysis?

Here is a scenario to help you understand what regression is and how it helps you make better strategic decisions in research.

Let’s say you are the CEO of a company and are trying to predict the profit margin for the next month. Now you might have a lot of factors in your mind that can affect the number. Be it the number of sales you get in the month, the number of employees not taking leaves, or the number of hours each worker gives daily. But what if things do not go as planned? The “what if” list here has no stop; it can go on forever. All these impacting factors here are variables, and regression analysis is the process of mathematically figuring out which of these variables actually have an impact and which are not plausible.

So, we can say that regression analysis helps you find the relationship between a set of dependent and independent variables. There are different ways to find this relationship between variables, which in statistics is named “ regression models .”

We will learn about each in the next heading.

Types of Regression Models

If you are not sure which type of regression model you should use for a particular study, this section might help you.

Though there are numerous types of regression models depending on the type of variables , these are the most common ones.

Linear Regression

Logistic regression, ridge regression, lasso regression, polynomial regression, bayesian linear regression.

Linear regression is the real workhorse of the industry and probably is the first type that comes to mind. It is often known as Linear Least Squares and Ordinary Least Squares . This model consists of a dependent variable and a predictable variable that align with each other. Hence, the name linear regression. If the data you are dealing with contains more than one independent variable , then the linear regression here would be Multi-Linear Regression .

Logistic Regression comes into play when the dependent variable is discrete. This means that the target value will only have one or two values. For instance, a true or false, a yes or no, a 0 or 1, and so on. In this case, a sigmoid curve describes the relationship between the independent and dependent variables .

When using this regression model for the data analysis process , two things should strictly be taken into consideration:

Make sure there is no multi-linearity (like that in the linear regression model) or correlation between the two variables in the dataset
Also, ensure that the size of data is big with the equal manifestation of values to come in targeted variables

When there is a high correlation between the independent and dependent variables, this type of regression is used. It is simply because, with multi collinear data, least-square estimates give impartial numbers. However, if the collinearity is high, there might be a slight chance of unfair judgment.

Thus, a bias matrix is brought to the surface in ridge regression. This powerful type of regression is less vulnerable to overfitting. Are you familiar with the ‘overfitting’ word?

Overfitting in statistics is a modeling error that one makes when the function is too closely brought into line with limited data points. When a model in research has been compromised with this error, it might lose its value all at once.

Lasso Regression is best suitable for performing regularization alongside feature selection. This type of regression hinders the absolute size of the regression coefficient. What happens next? The coefficient value will almost come nearer zero, which the complete opposite of what happened in Ridge Regression.

This is why feature selection utilizes this regression model that helps to select a set of features from the dataset. Only required and limited features are used in Lasso Regression, and all the other features are zero. Researchers get rid of the overfitting in the model by doing this. But what if the independent variables are highly collinear?

In that case, this model will only choose one variable and turn the others to zero. We can say that it is somewhat like the Ridge Regression but with variable selection.

This is another type of regression that is almost the same as Multi-Linear Regression but with some changes. In the Polynomial Regression Model, the relationship between the two variables, dependent and independent , is denoted by the nth degree. While in a Multi-Linear Regression Model, the line is linear, here it is the opposite. The best fit line in Polynomial Regression passing through all the points is curved. This curve either depends on the value of n or the value of X.

This model is also prone to overfitting. It is best to assess the curve towards the end as the higher polynomials might give strange and unexpected results on extrapolation.

The last type of regression model we are going to discuss is the Bayesian Linear Regression. Have you heard of the Bayes theorem? Well, this regression type basically uses that to figure out the value of regression coefficients.

It is a lot like both Ridge Regression and Linear Regression, but the stability here is much higher. In this model, we find the value of the posterior distribution of the features instead of working on the least squares.

FAQs About Regression Analysis

What is regression.

It is a technique to find out the relationship between the dependent and independent variables

What is a linear regression model?

Linear Regression Model helps determine the relationship between different continuous variables by fitting a linear equation for dealing with data.

What is the difference between multi-linear regression and polynomial regression?

The only difference between Multi-Linear Regression and polynomial repression is that in the latter relationship between ‘x’ and ‘y’ is denoted by the nth value, so the line here is a curve. While in Multi-Linear, the line is straight.

What is overfitting in statistics?

When a function in statistics corresponds too closely to a particular set of data, some modeling error is possible. This modeling error is called overfitting.

What is ridge regression?

It is a method of finding the coefficients of multiple regression models in which the independent variables are highly correlated. In other words, it is a method to develop a parsimonious model when the number of predictable variables is higher than the observations in a set.

Regression Analysis: Definition, Types, Usage & Advantages

Regression analysis is perhaps one of the most widely used statistical methods for investigating or estimating the relationship between a set of independent and dependent variables. In statistical analysis , distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities.

It is also used as a blanket term for various data analysis techniques utilized in a qualitative research method for modeling and analyzing numerous variables. In the regression method, the dependent variable is a predictor or an explanatory element, and the dependent variable is the outcome or a response to a specific query.

LEARN ABOUT: Statistical Analysis Methods

Content Index

Definition of Regression Analysis

Types of regression analysis, regression analysis usage in market research, how regression analysis derives insights from surveys, advantages of using regression analysis in an online survey.

Regression analysis is often used to model or analyze data. Most survey analysts use it to understand the relationship between the variables, which can be further utilized to predict the precise outcome.

For Example – Suppose a soft drink company wants to expand its manufacturing unit to a newer location. Before moving forward, the company wants to analyze its revenue generation model and the various factors that might impact it. Hence, the company conducts an online survey with a specific questionnaire.

After using regression analysis, it becomes easier for the company to analyze the survey results and understand the relationship between different variables like electricity and revenue – here, revenue is the dependent variable.

LEARN ABOUT: Level of Analysis

In addition, understanding the relationship between different independent variables like pricing, number of workers, and logistics with the revenue helps the company estimate the impact of varied factors on sales and profits.

Survey researchers often use this technique to examine and find a correlation between different variables of interest. It provides an opportunity to gauge the influence of different independent variables on a dependent variable.

Overall, regression analysis saves the survey researchers’ additional efforts in arranging several independent variables in tables and testing or calculating their effect on a dependent variable. Different types of analytical research methods are widely used to evaluate new business ideas and make informed decisions.

Create a Free Account

Researchers usually start by learning linear and logistic regression first. Due to the widespread knowledge of these two methods and ease of application, many analysts think there are only two types of models. Each model has its own specialty and ability to perform if specific conditions are met.

This blog explains the commonly used seven types of multiple regression analysis methods that can be used to interpret the enumerated data in various formats.

01. Linear Regression Analysis

It is one of the most widely known modeling techniques, as it is amongst the first elite regression analysis methods picked up by people at the time of learning predictive modeling. Here, the dependent variable is continuous, and the independent variable is more often continuous or discreet with a linear regression line.

Please note that multiple linear regression has more than one independent variable than simple linear regression. Thus, linear regression is best to be used only when there is a linear relationship between the independent and a dependent variable.

A business can use linear regression to measure the effectiveness of the marketing campaigns, pricing, and promotions on sales of a product. Suppose a company selling sports equipment wants to understand if the funds they have invested in the marketing and branding of their products have given them substantial returns or not.

Linear regression is the best statistical method to interpret the results. The best thing about linear regression is it also helps in analyzing the obscure impact of each marketing and branding activity, yet controlling the constituent’s potential to regulate the sales.

If the company is running two or more advertising campaigns simultaneously, one on television and two on radio, then linear regression can easily analyze the independent and combined influence of running these advertisements together.

LEARN ABOUT: Data Analytics Projects

02. Logistic Regression Analysis

Logistic regression is commonly used to determine the probability of event success and event failure. Logistic regression is used whenever the dependent variable is binary, like 0/1, True/False, or Yes/No. Thus, it can be said that logistic regression is used to analyze either the close-ended questions in a survey or the questions demanding numeric responses in a survey.

Please note logistic regression does not need a linear relationship between a dependent and an independent variable, just like linear regression. Logistic regression applies a non-linear log transformation for predicting the odds ratio; therefore, it easily handles various types of relationships between a dependent and an independent variable.

Logistic regression is widely used to analyze categorical data, particularly for binary response data in business data modeling. More often, logistic regression is used when the dependent variable is categorical, like to predict whether the health claim made by a person is real(1) or fraudulent, to understand if the tumor is malignant(1) or not.

Businesses use logistic regression to predict whether the consumers in a particular demographic will purchase their product or will buy from the competitors based on age, income, gender, race, state of residence, previous purchase, etc.

03. Polynomial Regression Analysis

Polynomial regression is commonly used to analyze curvilinear data when an independent variable’s power is more than 1. In this regression analysis method, the best-fit line is never a ‘straight line’ but always a ‘curve line’ fitting into the data points.

Please note that polynomial regression is better to use when two or more variables have exponents and a few do not.

Additionally, it can model non-linearly separable data offering the liberty to choose the exact exponent for each variable, and that too with full control over the modeling features available.

When combined with response surface analysis, polynomial regression is considered one of the sophisticated statistical methods commonly used in multisource feedback research. Polynomial regression is used mostly in finance and insurance-related industries where the relationship between dependent and independent variables is curvilinear.

Suppose a person wants to budget expense planning by determining how long it would take to earn a definitive sum. Polynomial regression, by taking into account his/her income and predicting expenses, can easily determine the precise time he/she needs to work to earn that specific sum amount.

04. Stepwise Regression Analysis

This is a semi-automated process with which a statistical model is built either by adding or removing the dependent variable on the t-statistics of their estimated coefficients.

If used properly, the stepwise regression will provide you with more powerful data at your fingertips than any method. It works well when you are working with a large number of independent variables. It just fine-tunes the unit of analysis model by poking variables randomly.

Stepwise regression analysis is recommended to be used when there are multiple independent variables, wherein the selection of independent variables is done automatically without human intervention.

Please note, in stepwise regression modeling, the variable is added or subtracted from the set of explanatory variables. The set of added or removed variables is chosen depending on the test statistics of the estimated coefficient.

Suppose you have a set of independent variables like age, weight, body surface area, duration of hypertension, basal pulse, and stress index based on which you want to analyze its impact on the blood pressure.

In stepwise regression, the best subset of the independent variable is automatically chosen; it either starts by choosing no variable to proceed further (as it adds one variable at a time) or starts with all variables in the model and proceeds backward (removes one variable at a time).

Thus, using regression analysis, you can calculate the impact of each or a group of variables on blood pressure.

05. Ridge Regression Analysis

Ridge regression is based on an ordinary least square method which is used to analyze multicollinearity data (data where independent variables are highly correlated). Collinearity can be explained as a near-linear relationship between variables.

Whenever there is multicollinearity, the estimates of least squares will be unbiased, but if the difference between them is larger, then it may be far away from the true value. However, ridge regression eliminates the standard errors by appending some degree of bias to the regression estimates with a motive to provide more reliable estimates.

If you want, you can also learn about Selection Bias through our blog.

Please note, Assumptions derived through the ridge regression are similar to the least squared regression, the only difference being the normality. Although the value of the coefficient is constricted in the ridge regression, it never reaches zero suggesting the inability to select variables.

Suppose you are crazy about two guitarists performing live at an event near you, and you go to watch their performance with a motive to find out who is a better guitarist. But when the performance starts, you notice that both are playing black-and-blue notes at the same time.

Is it possible to find out the best guitarist having the biggest impact on sound among them when they are both playing loud and fast? As both of them are playing different notes, it is substantially difficult to differentiate them, making it the best case of multicollinearity, which tends to increase the standard errors of the coefficients.

Ridge regression addresses multicollinearity in cases like these and includes bias or a shrinkage estimation to derive results.

06. Lasso Regression Analysis

Lasso (Least Absolute Shrinkage and Selection Operator) is similar to ridge regression; however, it uses an absolute value bias instead of the square bias used in ridge regression.

It was developed way back in 1989 as an alternative to the traditional least-squares estimate with the intention to deduce the majority of problems related to overfitting when the data has a large number of independent variables.

Lasso has the capability to perform both – selecting variables and regularizing them along with a soft threshold. Applying lasso regression makes it easier to derive a subset of predictors from minimizing prediction errors while analyzing a quantitative response.

Please note that regression coefficients reaching zero value after shrinkage are excluded from the lasso model. On the contrary, regression coefficients having more value than zero are strongly associated with the response variables, wherein the explanatory variables can be either quantitative, categorical, or both.

Suppose an automobile company wants to perform a research analysis on average fuel consumption by cars in the US. For samples, they chose 32 models of car and 10 features of automobile design – Number of cylinders, Displacement, Gross horsepower, Rear axle ratio, Weight, ¼ mile time, v/s engine, transmission, number of gears, and number of carburetors.

As you can see a correlation between the response variable mpg (miles per gallon) is extremely correlated to some variables like weight, displacement, number of cylinders, and horsepower. The problem can be analyzed by using the glmnet package in R and lasso regression for feature selection.

07. Elastic Net Regression Analysis

It is a mixture of ridge and lasso regression models trained with L1 and L2 norms. The elastic net brings about a grouping effect wherein strongly correlated predictors tend to be in/out of the model together. Using the elastic net regression model is recommended when the number of predictors is far greater than the number of observations.

Please note that the elastic net regression model came into existence as an option to the lasso regression model as lasso’s variable section was too much dependent on data, making it unstable. By using elastic net regression, statisticians became capable of over-bridging the penalties of ridge and lasso regression only to get the best out of both models.

A clinical research team having access to a microarray data set on leukemia (LEU) was interested in constructing a diagnostic rule based on the expression level of presented gene samples for predicting the type of leukemia. The data set they had, consisted of a large number of genes and a few samples.

Apart from that, they were given a specific set of samples to be used as training samples, out of which some were infected with type 1 leukemia (acute lymphoblastic leukemia) and some with type 2 leukemia (acute myeloid leukemia).

Model fitting and tuning parameter selection by tenfold CV were carried out on the training data. Then they compared the performance of those methods by computing their prediction mean-squared error on the test data to get the necessary results.

A market research survey focuses on three major matrices; Customer Satisfaction , Customer Loyalty , and Customer Advocacy . Remember, although these matrices tell us about customer health and intentions, they fail to tell us ways of improving the position. Therefore, an in-depth survey questionnaire intended to ask consumers the reason behind their dissatisfaction is definitely a way to gain practical insights.

However, it has been found that people often struggle to put forth their motivation or demotivation or describe their satisfaction or dissatisfaction. In addition to that, people always give undue importance to some rational factors, such as price, packaging, etc. Overall, it acts as a predictive analytic and forecasting tool in market research.

When used as a forecasting tool, regression analysis can determine an organization’s sales figures by taking into account external market data. A multinational company conducts a market research survey to understand the impact of various factors such as GDP (Gross Domestic Product), CPI (Consumer Price Index), and other similar factors on its revenue generation model.

Obviously, regression analysis in consideration of forecasted marketing indicators was used to predict a tentative revenue that will be generated in future quarters and even in future years. However, the more forward you go in the future, the data will become more unreliable, leaving a wide margin of error .

Case study of using regression analysis

A water purifier company wanted to understand the factors leading to brand favorability. The survey was the best medium for reaching out to existing and prospective customers. A large-scale consumer survey was planned, and a discreet questionnaire was prepared using the best survey tool .

A number of questions related to the brand, favorability, satisfaction, and probable dissatisfaction were effectively asked in the survey. After getting optimum responses to the survey, regression analysis was used to narrow down the top ten factors responsible for driving brand favorability.

All the ten attributes derived (mentioned in the image below) in one or the other way highlighted their importance in impacting the favorability of that specific water purifier brand.

It is easy to run a regression analysis using Excel or SPSS, but while doing so, the importance of four numbers in interpreting the data must be understood.

The first two numbers out of the four numbers directly relate to the regression model itself.

F-Value: It helps in measuring the statistical significance of the survey model. Remember, an F-Value significantly less than 0.05 is considered to be more meaningful. Less than 0.05 F-Value ensures survey analysis output is not by chance.
R-Squared: This is the value wherein the independent variables try to explain the amount of movement by dependent variables. Considering the R-Squared value is 0.7, a tested independent variable can explain 70% of the dependent variable’s movement. It means the survey analysis output we will be getting is highly predictive in nature and can be considered accurate.

The other two numbers relate to each of the independent variables while interpreting regression analysis.

P-Value: Like F-Value, even the P-Value is statistically significant. Moreover, here it indicates how relevant and statistically significant the independent variable’s effect is. Once again, we are looking for a value of less than 0.05.
Interpretation: The fourth number relates to the coefficient achieved after measuring the impact of variables. For instance, we test multiple independent variables to get a coefficient. It tells us, ‘by what value the dependent variable is expected to increase when independent variables (which we are considering) increase by one when all other independent variables are stagnant at the same value.

In a few cases, the simple coefficient is replaced by a standardized coefficient demonstrating the contribution from each independent variable to move or bring about a change in the dependent variable.

01. Get access to predictive analytics

Do you know utilizing regression analysis to understand the outcome of a business survey is like having the power to unveil future opportunities and risks?

For example, after seeing a particular television advertisement slot, we can predict the exact number of businesses using that data to estimate a maximum bid for that slot. The finance and insurance industry as a whole depends a lot on regression analysis of survey data to identify trends and opportunities for more accurate planning and decision-making.

02. Enhance operational efficiency

Do you know businesses use regression analysis to optimize their business processes?

For example, before launching a new product line, businesses conduct consumer surveys to better understand the impact of various factors on the product’s production, packaging, distribution, and consumption.

A data-driven foresight helps eliminate the guesswork, hypothesis, and internal politics from decision-making. A deeper understanding of the areas impacting operational efficiencies and revenues leads to better business optimization.

03. Quantitative support for decision-making

Business surveys today generate a lot of data related to finance, revenue, operation, purchases, etc., and business owners are heavily dependent on various data analysis models to make informed business decisions.

For example, regression analysis helps enterprises to make informed strategic workforce decisions. Conducting and interpreting the outcome of employee surveys like Employee Engagement Surveys, Employee Satisfaction Surveys, Employer Improvement Surveys, Employee Exit Surveys, etc., boosts the understanding of the relationship between employees and the enterprise.

It also helps get a fair idea of certain issues impacting the organization’s working culture, working environment, and productivity. Furthermore, intelligent business-oriented interpretations reduce the huge pile of raw data into actionable information to make a more informed decision.

04. Prevent mistakes from happening due to intuitions

By knowing how to use regression analysis for interpreting survey results, one can easily provide factual support to management for making informed decisions. ; but do you know that it also helps in keeping out faults in the judgment?

For example, a mall manager thinks if he extends the closing time of the mall, then it will result in more sales. Regression analysis contradicts the belief that predicting increased revenue due to increased sales won’t support the increased operating expenses arising from longer working hours.

Regression analysis is a useful statistical method for modeling and comprehending the relationships between variables. It provides numerous advantages to various data types and interactions. Researchers and analysts may gain useful insights into the factors influencing a dependent variable and use the results to make informed decisions.

With QuestionPro Research, you can improve the efficiency and accuracy of regression analysis by streamlining the data gathering, analysis, and reporting processes. The platform’s user-friendly interface and wide range of features make it a valuable tool for researchers and analysts conducting regression analysis as part of their research projects.

LEARN MORE FREE TRIAL

MORE LIKE THIS

Qualtrics vs Google Forms: Which is the Best Platform?

Jul 24, 2024

TypeForm vs. SurveyMonkey: Which is Better in 2024?

SurveyMonkey vs Google Forms: A Detailed Comparison

Jul 23, 2024

Jotform vs Typeform: Which is the Best Option? Comparison (2024)

Other categories.

Academic Research
Artificial Intelligence
Assessments
Brand Awareness
Case Studies
Communities
Consumer Insights
Customer effort score
Customer Engagement
Customer Experience
Customer Loyalty
Customer Research
Customer Satisfaction
Employee Benefits
Employee Engagement
Employee Retention
Friday Five
General Data Protection Regulation
Insights Hub
Life@QuestionPro
Market Research
Mobile diaries
Mobile Surveys
New Features
Online Communities
Question Types
Questionnaire
QuestionPro Products
Release Notes
Research Tools and Apps
Revenue at Risk
Survey Templates
Training Tips
Tuesday CX Thoughts (TCXT)
Uncategorized
What’s Coming Up
Workforce Intelligence

Statistical Analysis
Medical Engineering
Biosignal Processing
Engineering

Regression Analysis

In book: A Concise Guide to Market Research (pp.193-233)

Ludwig-Maximilians-University of Munich

University of Melbourne

Abstract and Figures

Discover the world's research

25+ million members
160+ million publication pages
2.3+ billion citations

Thomas Neitzert
Muhammad Mansoor Uz Zaman Siddiqui
Syed Amir Iqbal
Ali Zulqarnain
Adeel Tabassum
TECHNOVATION
Thomas Clauss

Kissia Marie M. Baring
Carlo Jay O. Pagunan
Jonel Mark D. Sarno
Daffa Syah Alam

Ronny Susetyoko
Yarina Ahmad

Shimaa Shazana Mohd Ali
Mohd Syaiful Nizam Abu Hassan
MULTIMED TOOLS APPL

Oleg Kichigin
Grigory Kulkaev

Galina Nazarova
Chan Ching Siang

Connie R. Wanberg

Maria Rotundo
J MARKETING

Eugene W. Anderson
Barbara Everitt Bryant

J OPER RES SOC
Larry E. Toothaker
Leona S. Aiken
Stephen G. West
Samuel B. Green

Andy P. Field

Zoë C. Field

Rolph E. Anderson
Recruit researchers
Join for free
Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Business Essentials
Leadership & Management
Credential of Leadership, Impact, and Management in Business (CLIMB)
Entrepreneurship & Innovation
Digital Transformation
Finance & Accounting
Business in Society
For Organizations
Support Portal
Media Coverage
Founding Donors
Leadership Team

Harvard Business School →
HBS Online →
Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

Career Development
Communication
Decision-Making
Earning Your MBA
Negotiation
News & Events
Productivity
Staff Spotlight
Student Profiles
Work-Life Balance
AI Essentials for Business
Alternative Investments
Business Analytics
Business Strategy
Business and Climate Change
Creating Brand Value
Design Thinking and Innovation
Digital Marketing Strategy
Disruptive Strategy
Economics for Managers
Entrepreneurship Essentials
Financial Accounting
Global Business
Launching Tech Ventures
Leadership Principles
Leadership, Ethics, and Corporate Accountability
Leading Change and Organizational Renewal
Leading with Finance
Management Essentials
Negotiation Mastery
Organizational Leadership
Power and Influence for Positive Impact
Strategy Execution
Sustainable Business Strategy
Sustainable Investing
Winning with Digital Platforms

What Is Regression Analysis in Business Analytics?

Business professional using calculator for regression analysis

14 Dec 2021

Countless factors impact every facet of business. How can you consider those factors and know their true impact?

Imagine you seek to understand the factors that influence people’s decision to buy your company’s product. They range from customers’ physical locations to satisfaction levels among sales representatives to your competitors' Black Friday sales.

Understanding the relationships between each factor and product sales can enable you to pinpoint areas for improvement, helping you drive more sales.

To learn how each factor influences sales, you need to use a statistical analysis method called regression analysis .

If you aren’t a business or data analyst, you may not run regressions yourself, but knowing how analysis works can provide important insight into which factors impact product sales and, thus, which are worth improving.

Access your free e-book today.

Foundational Concepts for Regression Analysis

Before diving into regression analysis, you need to build foundational knowledge of statistical concepts and relationships.

Independent and Dependent Variables

Start with the basics. What relationship are you aiming to explore? Try formatting your answer like this: “I want to understand the impact of [the independent variable] on [the dependent variable].”

The independent variable is the factor that could impact the dependent variable . For example, “I want to understand the impact of employee satisfaction on product sales.”

In this case, employee satisfaction is the independent variable, and product sales is the dependent variable. Identifying the dependent and independent variables is the first step toward regression analysis.

Correlation vs. Causation

One of the cardinal rules of statistically exploring relationships is to never assume correlation implies causation. In other words, just because two variables move in the same direction doesn’t mean one caused the other to occur.

If two or more variables are correlated , their directional movements are related. If two variables are positively correlated , it means that as one goes up or down, so does the other. Alternatively, if two variables are negatively correlated , one goes up while the other goes down.

A correlation’s strength can be quantified by calculating the correlation coefficient , sometimes represented by r . The correlation coefficient falls between negative one and positive one.

r = -1 indicates a perfect negative correlation.

r = 1 indicates a perfect positive correlation.

r = 0 indicates no correlation.

Causation means that one variable caused the other to occur. Proving a causal relationship between variables requires a true experiment with a control group (which doesn’t receive the independent variable) and an experimental group (which receives the independent variable).

While regression analysis provides insights into relationships between variables, it doesn’t prove causation. It can be tempting to assume that one variable caused the other—especially if you want it to be true—which is why you need to keep this in mind any time you run regressions or analyze relationships between variables.

With the basics under your belt, here’s a deeper explanation of regression analysis so you can leverage it to drive strategic planning and decision-making.

Related: How to Learn Business Analytics without a Business Background

What Is Regression Analysis?

Regression analysis is the statistical method used to determine the structure of a relationship between two variables (single linear regression) or three or more variables (multiple regression).

According to the Harvard Business School Online course Business Analytics , regression is used for two primary purposes:

To study the magnitude and structure of the relationship between variables
To forecast a variable based on its relationship with another variable

Both of these insights can inform strategic business decisions.

“Regression allows us to gain insights into the structure of that relationship and provides measures of how well the data fit that relationship,” says HBS Professor Jan Hammond, who teaches Business Analytics, one of three courses that comprise the Credential of Readiness (CORe) program . “Such insights can prove extremely valuable for analyzing historical trends and developing forecasts.”

One way to think of regression is by visualizing a scatter plot of your data with the independent variable on the X-axis and the dependent variable on the Y-axis. The regression line is the line that best fits the scatter plot data. The regression equation represents the line’s slope and the relationship between the two variables, along with an estimation of error.

Physically creating this scatter plot can be a natural starting point for parsing out the relationships between variables.

Credential of Readiness | Master the fundamentals of business | Learn More

Types of Regression Analysis

There are two types of regression analysis: single variable linear regression and multiple regression.

Single variable linear regression is used to determine the relationship between two variables: the independent and dependent. The equation for a single variable linear regression looks like this:

Single Variable Linear Regression Formula

In the equation:

ŷ is the expected value of Y (the dependent variable) for a given value of X (the independent variable).
x is the independent variable.
α is the Y-intercept, the point at which the regression line intersects with the vertical axis.
β is the slope of the regression line, or the average change in the dependent variable as the independent variable increases by one.
ε is the error term, equal to Y – ŷ, or the difference between the actual value of the dependent variable and its expected value.

Multiple regression , on the other hand, is used to determine the relationship between three or more variables: the dependent variable and at least two independent variables. The multiple regression equation looks complex but is similar to the single variable linear regression equation:

Each component of this equation represents the same thing as in the previous equation, with the addition of the subscript k, which is the total number of independent variables being examined. For each independent variable you include in the regression, multiply the slope of the regression line by the value of the independent variable, and add it to the rest of the equation.

How to Run Regressions

You can use a host of statistical programs—such as Microsoft Excel, SPSS, and STATA—to run both single variable linear and multiple regressions. If you’re interested in hands-on practice with this skill, Business Analytics teaches learners how to create scatter plots and run regressions in Microsoft Excel, as well as make sense of the output and use it to drive business decisions.

Calculating Confidence and Accounting for Error

It’s important to note: This overview of regression analysis is introductory and doesn’t delve into calculations of confidence level, significance, variance, and error. When working in a statistical program, these calculations may be provided or require that you implement a function. When conducting regression analysis, these metrics are important for gauging how significant your results are and how much importance to place on them.

Business Analytics | Become a data-driven leader | Learn More

Why Use Regression Analysis?

Once you’ve generated a regression equation for a set of variables, you effectively have a roadmap for the relationship between your independent and dependent variables. If you input a specific X value into the equation, you can see the expected Y value.

This can be critical for predicting the outcome of potential changes, allowing you to ask, “What would happen if this factor changed by a specific amount?”

Returning to the earlier example, running a regression analysis could allow you to find the equation representing the relationship between employee satisfaction and product sales. You could input a higher level of employee satisfaction and see how sales might change accordingly. This information could lead to improved working conditions for employees, backed by data that shows the tie between high employee satisfaction and sales.

Whether predicting future outcomes, determining areas for improvement, or identifying relationships between seemingly unconnected variables, understanding regression analysis can enable you to craft data-driven strategies and determine the best course of action with all factors in mind.

Do you want to become a data-driven professional? Explore our eight-week Business Analytics course and our three-course Credential of Readiness (CORe) program to deepen your analytical skills and apply them to real-world business problems.

About the Author

Log in using your username and password

Search More Search for this keyword Advanced search
Latest content
Current issue
Write for Us
BMJ Journals

http://orcid.org/0000-0002-7839-8130 Parveen Ali 1 , 2 ,
http://orcid.org/0000-0003-0157-5319 Ahtisham Younas 3 , 4
1 School of Nursing and Midwifery , University of Sheffield , Sheffield , South Yorkshire , UK
2 Sheffiled University Interpersonal Violence Research Group , The University of Sheffiled SEAS , Sheffield , UK
3 Faculty of Nursing , Memorial University of Newfoundland , St. John's , Newfoundland and Labrador , Canada
4 Swat College of Nursing , Mingora, Swat , Pakistan
Correspondence to Ahtisham Younas, Memorial University of Newfoundland, St. John's, NL A1C 5S7, Canada; ay6133{at}mun.ca

https://doi.org/10.1136/ebnurs-2021-103425

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

statistics & research methods

Introduction

A nurse educator is interested in finding out the academic and non-academic predictors of success in nursing students. Given the complexity of educational and clinical learning environments, demographic, clinical and academic factors (age, gender, previous educational training, personal stressors, learning demands, motivation, assignment workload, etc) influencing nursing students’ success, she was able to list various potential factors contributing towards success relatively easily. Nevertheless, not all of the identified factors will be plausible predictors of increased success. Therefore, she could use a powerful statistical procedure called regression analysis to identify whether the likelihood of increased success is influenced by factors such as age, stressors, learning demands, motivation and education.

What is regression?

Purposes of regression analysis.

Regression analysis has four primary purposes: description, estimation, prediction and control. 1 , 2 By description, regression can explain the relationship between dependent and independent variables. Estimation means that by using the observed values of independent variables, the value of dependent variable can be estimated. 2 Regression analysis can be useful for predicting the outcomes and changes in dependent variables based on the relationships of dependent and independent variables. Finally, regression enables in controlling the effect of one or more independent variables while investigating the relationship of one independent variable with the dependent variable. 1

Types of regression analyses

There are commonly three types of regression analyses, namely, linear, logistic and multiple regression. The differences among these types are outlined in table 1 in terms of their purpose, nature of dependent and independent variables, underlying assumptions, and nature of curve. 1 , 3 However, more detailed discussion for linear regression is presented as follows.

View inline

Comparison of linear, logistic and multiple regression

Linear regression and interpretation

Linear regression analysis involves examining the relationship between one independent and dependent variable. Statistically, the relationship between one independent variable (x) and a dependent variable (y) is expressed as: y= β 0 + β 1 x+ε. In this equation, β 0 is the y intercept and refers to the estimated value of y when x is equal to 0. The coefficient β 1 is the regression coefficient and denotes that the estimated increase in the dependent variable for every unit increase in the independent variable. The symbol ε is a random error component and signifies imprecision of regression indicating that, in actual practice, the independent variables are cannot perfectly predict the change in any dependent variable. 1 Multiple linear regression follows the same logic as univariate linear regression except (a) multiple regression, there are more than one independent variable and (b) there should be non-collinearity among the independent variables.

Factors affecting regression

Linear and multiple regression analyses are affected by factors, namely, sample size, missing data and the nature of sample. 2

Small sample size may only demonstrate connections among variables with strong relationship. Therefore, sample size must be chosen based on the number of independent variables and expect strength of relationship.

Many missing values in the data set may affect the sample size. Therefore, all the missing values should be adequately dealt with before conducting regression analyses.

The subsamples within the larger sample may mask the actual effect of independent and dependent variables. Therefore, if subsamples are predefined, a regression within the sample could be used to detect true relationships. Otherwise, the analysis should be undertaken on the whole sample.

Building on her research interest mentioned in the beginning, let us consider a study by Ali and Naylor. 4 They were interested in identifying the academic and non-academic factors which predict the academic success of nursing diploma students. This purpose is consistent with one of the above-mentioned purposes of regression analysis (ie, prediction). Ali and Naylor’s chosen academic independent variables were preadmission qualification, previous academic performance and school type and the non-academic variables were age, gender, marital status and time gap. To achieve their purpose, they collected data from 628 nursing students between the age range of 15–34 years. They used both linear and multiple regression analyses to identify the predictors of student success. For analysis, they examined the relationship of academic and non-academic variables across different years of study and noted that academic factors accounted for 36.6%, 44.3% and 50.4% variability in academic success of students in year 1, year 2 and year 3, respectively. 4

Ali and Naylor presented the relationship among these variables using scatter plots, which are commonly used graphs for data display in regression analysis—see examples of various scatter plots in figure 1 . 4 In a scatter plot, the clustering of the dots denoted the strength of relationship, whereas the direction indicates the nature of relationships among variables as positive (ie, increase in one variable results in an increase in the other) and negative (ie, increase in one variable results in decrease in the other).

Download figure
Open in new tab
Download powerpoint

An Example of Scatter Plot for Regression.

Table 2 presents the results of regression analysis for academic and non-academic variables for year 4 students’ success. The significant predictors of student success are denoted with a significant p value. For every, significant predictor, the beta value indicates the percentage increase in students’ academic success with one unit increase in the variable.

Regression model for the final year students (N=343)

Conclusions

Regression analysis is a powerful and useful statistical procedure with many implications for nursing research. It enables researchers to describe, predict and estimate the relationships and draw plausible conclusions about the interrelated variables in relation to any studied phenomena. Regression also allows for controlling one or more variables when researchers are interested in examining the relationship among specific variables. Some of the key considerations are presented that may be useful for researchers undertaking regression analysis. While planning and conducting regression analysis, researchers should consider the type and number of dependent and independent variables as well as the nature and size of sample. Choosing a wrong type of regression analysis with small sample may result in erroneous conclusions about the studied phenomenon.

Ethics statements

Patient consent for publication.

Not required.

Montgomery DC ,
Schneider A ,

Twitter @parveenazamali, @@Ahtisham04

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Commissioned; internally peer reviewed.

Read the full text or download the PDF:

> Machine Learning

What is Regression Analysis? Types and Applications

Ayush Singh Rawat
Jun 07, 2021

Introduction

The field of Artificial Intelligence and machine learning is set to conquer most of the human disciplines; from art and literature to commerce and sociology; from computational biology and decision analysis to games and puzzles.” ~Anand Krish

Regression analysis is a way to find trends in data.

For example, you might guess that there’s a connection between how much you eat and how much you weigh; regression analysis can help you quantify that equation.

Regression analysis will provide you with an equation for a graph so that you can make predictions about your data.

For example, if you’ve been putting on weight over the last few years, it can predict how much you’ll weigh in ten years time if you continue to put on weight at the same rate.

It will also give you a slew of statistics (including a p-value and a correlation coefficient) to tell you how accurate your model is.

Introduction to Regression Analysis

Regression analysis is a statistical technique for analysing and comprehending the connection between two or more variables of interest. The methodology used to do regression analysis aids in understanding which elements are significant, which may be ignored, and how they interact with one another.

Regression is a statistical approach used in finance, investment, and other fields to identify the strength and type of a connection between one dependent variable (typically represented by Y) and a sequence of other variables (known as independent variables).

Regression is essentially the "best guess" at utilising a collection of data to generate some form of forecast. It is the process of fitting a set of points to a graph.

Regression analysis is a mathematical method for determining which of those factors has an effect. It provides answers to the following questions:

Which factors are most important

Which of these may we disregard

How do those elements interact with one another, and perhaps most significantly, how confident are we in all of these variables

These elements are referred to as variables in regression analysis. You have your dependent variable, which is the key aspect you're attempting to understand or forecast. Then there are your independent variables, which are the elements you assume have an effect on your dependent variable.

(Most related blog: 7 Types of Regression Techniques in ML )

Types of Regression Analysis

Types of regression analysis

Simple linear regression

The relationship between a dependent variable and a single independent variable is described using a basic linear regression methodology. A Simple Linear Regression model reveals a linear or slanted straight line relation, thus the name.

The simple linear model is expressed using the following equation:

Y = a + bX + ϵ

Y – variable that is dependent
X – Independent (explanatory) variable
a – Intercept
ϵ – Residual (error)

The dependent variable needs to be continuous/real, which is the most crucial component of Simple Linear Regression. On the other hand, the independent variable can be evaluated using either continuous or categorical values.

Multiple linear regression

Multiple linear regression (MLR), often known as multiple regression, is a statistical process that uses multiple explanatory factors to predict the outcome of a response variable.

MLR is a method of representing the linear relationship between explanatory (independent) and response (dependent) variables.

The mathematical representation of multiple linear regression is:

y=ß0+ ß1 x1+ …………..ßn xn + ϵ

Where, y = the dependent variable’s predicted value

B0 = the y-intercept

B1X1= B1 is the coefficient for regression of the first independent variable X1 (The effect of increasing the independent variable's value on the projected y value is referred to as X1.)

… = Repeat for as many independent variables as you're testing.

BnXn = the last independent variable's regression coefficient

ϵ = model error (i.e. how much flexibility is there in our y estimate)

Multiple linear regression uses the same criteria as single linear regression. Due to the huge number of independent variables in multiple linear regression, there is an extra need for the model:

The absence of a link between two independent variables with a low correlation is referred to as non-collinearity. It would be hard to determine the true correlations between the dependent and independent variables if the independent variables were strongly correlated.

(Related blog: Pearson’s Correlation Coefficient ‘r’ )

Non-linear regression

A sort of regression analysis in which data is fitted to a model and then displayed numerically is known as nonlinear regression.

Simple linear regression connects two variables (X and Y) in a straight line (y = mx + b), whereas nonlinear regression connects two variables (X and Y) in a nonlinear (curved) relationship.

The goal of the model is to minimise the sum of squares as much as possible. The sum of squares is a statistic that tracks how much Y observations differ from the nonlinear (curved) function that was used to anticipate Y.

In the same way that linear regression modelling aims to graphically trace a specific response from a set of factors, nonlinear regression modelling aims to do the same.

Because the function is generated by a series of approximations (iterations) that may be dependent on trial-and-error, nonlinear models are more complex to develop than linear models.

The Gauss-Newton methodology and the Levenberg-Marquardt approach are two well-known approaches used by mathematicians.

(Must check: Statistical Data Analysis )

What are applications of Regression Analysis ?

Most of the regression analysis is done to carry out processes in finances. So, here are 5 applications of Regression Analysis in the field of finance and others relating to it.

Applications of regression analysis

Forecasting:

The most common use of regression analysis in business is for forecasting future opportunities and threats. Demand analysis, for example, forecasts the amount of things a customer is likely to buy.

When it comes to business, though, demand is not the only dependent variable. Regressive analysis can anticipate significantly more than just direct income.

For example, we may predict the highest bid for an advertising by forecasting the number of consumers who would pass in front of a specific billboard.

Insurance firms depend extensively on regression analysis to forecast policyholder creditworthiness and the amount of claims that might be filed in a particular time period.

The Capital Asset Pricing Model (CAPM), which establishes the link between an asset's projected return and the related market risk premium, relies on the linear regression model.

It is also frequently used in financial analysis by financial analysts to anticipate corporate returns and operational performance.

The beta coefficient of a stock is calculated using regression analysis. Beta is a measure of return volatility in relation to total market risk.

Because it reflects the slope of the CAPM regression, we can rapidly calculate it in Excel using the SLOPE tool.

Comparing with competition:

It may be used to compare a company's financial performance to that of a certain counterpart.

It may also be used to determine the relationship between two firms' stock prices (this can be extended to find correlation between 2 competing companies, 2 companies operating in an unrelated industry etc).

It can assist the firm in determining which aspects are influencing their sales in contrast to the comparative firm. These techniques can assist small enterprises in achieving rapid success in a short amount of time.

Identifying problems:

Regression is useful not just for providing factual evidence for management choices, but also for detecting judgement mistakes.

A retail store manager, for example, may assume that extending shopping hours will significantly boost sales.

However, RA might suggest that the increase in income isn't enough to cover the increase in operational cost as a result of longer working hours (such as additional employee labour charges).

As a result, this research may give quantitative backing for choices and help managers avoid making mistakes based on their intuitions.

Reliable source

Many businesses and their top executives are now adopting regression analysis (and other types of statistical analysis ) to make better business decisions and reduce guesswork and gut instinct.

Regression enables firms to take a scientific approach to management. Both small and large enterprises are frequently bombarded with an excessive amount of data.

Managers may use regression analysis to filter through data and choose the relevant factors to make the best decisions possible.

For a long time, regression analysis has been utilised extensively by enterprises to transform data into useful information, and it continues to be a valuable asset to many leading sectors.

The significance of regression analysis lies in the fact that it is all about data: data refers to the statistics and statistics that identify your company.

The benefits of regression analysis are that it allows you to essentially crunch the data to assist you make better business decisions now and in the future.

Share Blog :

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

An Overview of Descriptive Analysis

What is PESTLE Analysis? Everything you need to know about it

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Scope of Managerial Economics

Dijkstra’s Algorithm: The Shortest Path Algorithm

Different Types of Research Methods

Latest Comments

maneeeshak443

One of the best posts on the internet about data science! This is a must-read for all data science aspirants, as it acts as the perfect career guidance article for them. After reading this, you will know what type of course you have to specialise in to secure your dream job and become successful in this field. <a href="https://360digitmg.com/india/hyderabad/data-science-certification-course-training-institute">best data science course in hyderabad with placement</a>

What is Regression Analysis and Why Should I Use It?

Survey Tips

Alchemer is an incredibly robust online survey software platform. It’s continually voted one of the best survey tools available on G2, FinancesOnline, and others. To make it even easier, we’ve created a series of blogs to help you better understand how to get the most from your Alchemer account.

Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables of interest.

While there are many types of regression analysis, at their core they all examine the influence of one or more independent variables on a dependent variable.

Regression analysis provides detailed insight that can be applied to further improve products and services.

Here at Alchemer, we offer hands-on application training events during which customers learn how to become super users of our software.

In order to understand the value being delivered at these training events, we distribute follow-up surveys to attendees with the goals of learning what they enjoyed, what they didn’t, and what we can improve on for future sessions.

The data collected from these feedback surveys allows us to measure the levels of satisfaction that our attendees associate with our events, and what variables influence those levels of satisfaction.

Could it be the topics covered in the individual sessions of the event? The length of the sessions? The food or catering services provided? The cost to attend? Any of these variables have the potential to impact an attendee’s level of satisfaction.

By performing a regression analysis on this survey data, we can determine whether or not these variables have impacted overall attendee satisfaction, and if so, to what extent.

This information then informs us about which elements of the sessions are being well received, and where we need to focus attention so that attendees are more satisfied in the future.

What is regression analysis and what does it mean to perform a regression?

Regression analysis is a reliable method of identifying which variables have impact on a topic of interest. The process of performing a regression allows you to confidently determine which factors matter most, which factors can be ignored, and how these factors influence each other.

In order to understand regression analysis fully, it’s essential to comprehend the following terms:

Dependent Variable: This is the main factor that you’re trying to understand or predict.
Independent Variables: These are the factors that you hypothesize have an impact on your dependent variable.

In our application training example above, attendees’ satisfaction with the event is our dependent variable. The topics covered, length of sessions, food provided, and the cost of a ticket are our independent variables.

How does regression analysis work?

In order to conduct a regression analysis, you’ll need to define a dependent variable that you hypothesize is being influenced by one or several independent variables.

You’ll then need to establish a comprehensive dataset to work with. Administering surveys to your audiences of interest is a terrific way to establish this dataset. Your survey should include questions addressing all of the independent variables that you are interested in.

Let’s continue using our application training example. In this case, we’d want to measure the historical levels of satisfaction with the events from the past three years or so (or however long you deem statistically significant), as well as any information possible in regards to the independent variables.

Perhaps we’re particularly curious about how the price of a ticket to the event has impacted levels of satisfaction.

To begin investigating whether or not there is a relationship between these two variables, we would begin by plotting these data points on a chart, which would look like the following theoretical example.

Regression Analysis: Plotting data is the first step in figuring out if there is a relationship between independent and dependent variables

(Plotting your data is the first step in figuring out if there is a relationship between your independent and dependent variables)

Our dependent variable (in this case, the level of event satisfaction) should be plotted on the y-axis, while our independent variable (the price of the event ticket) should be plotted on the x-axis.

Once your data is plotted, you may begin to see correlations. If the theoretical chart above did indeed represent the impact of ticket prices on event satisfaction, then we’d be able to confidently say that the higher the ticket price, the higher the levels of event satisfaction.

But how can we tell the degree to which ticket price affects event satisfaction?

To begin answering this question, draw a line through the middle of all of the data points on the chart. This line is referred to as your regression line, and it can be precisely calculated using a standard statistics program like Excel.

We’ll use a theoretical chart once more to depict what a regression line should look like.

The regression line summarizes the relationship between X and Y.

The regression line represents the relationship between your independent variable and your dependent variable.

Excel will even provide a formula for the slope of the line, which adds further context to the relationship between your independent and dependent variables.

The formula for a regression line might look something like Y = 100 + 7X + error term .

This tells you that if there is no “X”, then Y = 100. If X is our increase in ticket price, this informs us that if there is no increase in ticket price, event satisfaction will still increase by 100 points.

You’ll notice that the slope formula calculated by Excel includes an error term. Regression lines always consider an error term because in reality, independent variables are never precisely perfect predictors of dependent variables. This makes sense while looking at the impact of ticket prices on event satisfaction — there are clearly other variables that are contributing to event satisfaction outside of price.

Your regression line is simply an estimate based on the data available to you. So, the larger your error term, the less definitively certain your regression line is.

Why should your organization use regression analysis?

Regression analysis is helpful statistical method that can be leveraged across an organization to determine the degree to which particular independent variables are influencing dependent variables.

The possible scenarios for conducting regression analysis to yield valuable, actionable business insights are endless.

The next time someone in your business is proposing a hypothesis that states that one factor, whether you can control that factor or not, is impacting a portion of the business, suggest performing a regression analysis to determine just how confident you should be in that hypothesis! This will allow you to make more informed business decisions, allocate resources more efficiently, and ultimately boost your bottom line.

See all blog posts >

Customer Experience

Voice of the Customer

6 CX Challenges and How to Overcome Them Blog - Person on computer

Customer Experience , Customer Feedback

See it in Action

Privacy Overview
Strictly Necessary Cookies
3rd Party Cookies

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

Just one more step to your free trial.

.surveysparrow.com

Already using SurveySparrow? Login

By clicking on "Get Started", I agree to the Privacy Policy and Terms of Service .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Enterprise Survey Software

Enterprise Survey Software to thrive in your business ecosystem

NPS® Software

Turn customers into promoters

Offline Survey

Real-time data collection, on the move. Go internet-independent.

360 Assessment

Conduct omnidirectional employee assessments. Increase productivity, grow together.

Reputation Management

Turn your existing customers into raving promoters by monitoring online reviews.

Ticket Management

Build loyalty and advocacy by delivering personalized support experiences that matter.

Chatbot for Website

Collect feedback smartly from your website visitors with the engaging Chatbot for website.

Swift, easy, secure. Scalable for your organization.

Executive Dashboard

Customer journey map, craft beautiful surveys, share surveys, gain rich insights, recurring surveys, white label surveys, embedded surveys, conversational forms, mobile-first surveys, audience management, smart surveys, video surveys, secure surveys, api, webhooks, integrations, survey themes, accept payments, custom workflows, all features, customer experience, employee experience, product experience, marketing experience, sales experience, hospitality & travel, market research, saas startup programs, wall of love, success stories, sparrowcast, nps® benchmarks, learning centre, apps & integrations, testimonials.

Our surveys come with superpowers ⚡

Blog Best Of

What is Regression Analysis? Definition, Types, and Examples

Kate williams.

Last Updated:

22 January 2024

Table Of Contents

Regression Analysis Definition
Regression Analysis FAQs
Regression Analysis: Importance
Types of Regression Analysis
Uses By Businesses
Regression Analysis Use Cases

If you want to find data trends or predict sales based on certain variables, then regression analysis is the way to go.

In this article, we will learn about regression analysis, types of regression analysis, business applications, and its use cases. Feel free to jump to a section that’s relevant to you.

What is the definition of regression analysis?
Regression analysis: FAQs
Why is regression analysis important?
Types of regression analysis and when to use them
How is regression analysis used by businesses
Use cases of regression analysis

What is Regression Analysis?

Need a quick regression definition? In simple terms, regression analysis identifies the variables that have an impact on another variable .

The regression model is primarily used in finance, investing, and other areas to determine the strength and character of the relationship between one dependent variable and a series of other variables.

Regression Analysis: FAQs

Let us look at some of the most commonly asked questions about regression analysis before we head deep into understanding everything about the regression method.

1. What is multiple regression analysis meaning?

Multiple regression analysis is a statistical method that is used to predict the value of a dependent variable based on the values of two or more independent variables.

2. In regression analysis, what is the predictor variable called?

The predictor variable is the name given to an independent variable that we use in regression analysis.

The predictor variable provides information about an associated dependent variable regarding a certain outcome. At their core, predictor variables are those that are linked with particular outcomes.

3. What is a residual plot in a regression analysis?

A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis.

Moreover, the residual plot is a representation of how close each data point is (vertically) from the graph of the prediction equation of the regression model. If the data point is above or below the graph of the prediction equation of the model, then it is supposed to fit the data.

4. What is linear regression analysis?

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable that you want to predict is referred to as the dependent variable. The variable that you are using to predict the other value is called the independent variable.

Please enter a valid Email ID.

14-Day Free Trial • No Credit Card Required • No Strings Attached

Why is Regression Analysis Important?

There are many business applications of regression analysis.

For any machine learning problem which involves continuous numbers , regression analysis is essential. Some of those instances could be:
Testing automobiles
Weather analysis, and prediction
Sales and promotions forecasting
Financial forecasting
Time series forecasting
Regression analysis data also helps you understand whether the relationship between two different variables can give way to potential business opportunities .
For example, if you change one variable (say delivery speed), regression analysis will tell you the kind of effect that it has on other variables (such as customer satisfaction, small value orders, etc).
One of the best ways to solve regression issues in machine learning using a data model is through regression analysis. Plotting points on a chart, and running the best fit line , helps predict the possibility of errors.
The insights from these patterns help businesses to see the kind of difference that it makes to their bottom line .

5 Types of Regression Analysis and When to Use Them

1. linear regression analysis.

This type of regression analysis is one of the most basic types of regression and is used extensively in machine learning .
Linear regression has a predictor variable and a dependent variable which is related to each linearly.
Moreover, linear regression is used in cases where the relationship between the variables is related in a linear fashion.

Let’s say you are looking to measure the impact of email marketing on your sales. The linear analysis can be wrong as there will be aberrations. So, you should not use big data sets ( big data services ) for linear regression.

2. Logistic Regression Analysis

If your dependent variable has discrete values , that is, if they can have only one or two values, then logistic regression SPSS is the way to go.
The two values could be either 0 or 1, black or white, true or false, proceed or not proceed, and so on.
To show the relationship between the target and independent variables, logistic regression uses a sigmoid curve.

This type of regression is best used when there are large data sets that have a chance of equal occurrence of values in target variables. There should not be a huge correlation between the independent variables in the dataset.

3. Lasso Regression Analysis

Lasso regression is a regularization technique that reduces the model’s complexity.
How does it do that? By limiting the absolute size of the regression coefficient .
When doing so, the coefficient value becomes closer to zero. This does not happen with ridge regression.

Lass regression is advantageous as it uses feature selection – where it lets you select a set of features from the database to build your model. Since it uses only the required features, lasso regression manages to avoid overfitting.

4. Ridge Regression Analysis

If there is a high correlation between independent variables , ridge regression is the recommended tool.
It is also a regularization technique that reduces the complexity of the model .

Ridge regression manages to make the model less prone to overfitting by introducing a small amount of bias known as the ridge regression penalty, with the help of a bias matrix.

5. Polynomial Regression Analysis

Polynomial regression models a non-linear dataset with the help of a linear model .
Its working is similar to that of multiple linear regression. But it uses a non-linear curve and is mainly employed when data points are available in a non-linear fashion.
It transforms the data points into polynomial features of a given degree and manages to model them in the form of a linear model.

Polynomial regression involves fitting the data points using a polynomial line. Since this model is susceptible to overfitting, businesses are advised to analyze the curve during the end so that they get accurate results.

While there are many more regression analysis techniques, these are the most popular ones.

How is regression analysis used by businesses?

Regression stats help businesses understand what their data points represent and how to use them with the help of business analytics techniques.

Using this regression model, you will understand how the typical value of the dependent variable changes based on how the other independent variables are held fixed.

Data professionals use this incredibly powerful statistical tool to remove unwanted variables and select the ones that are more important for the business.

Here are some uses of regression analysis:

1. Business Optimization

The whole objective of regression analysis is to make use of the collected data and turn it into actionable insights .
With the help of regression analysis, there won’t be any guesswork or hunches based on which decisions need to be made.
Data-driven decision-making improves the output that the organization provides.
Also, regression charts help organizations experiment with inputs that might not have been earlier thought of, but now that it is backed with data, the chances of success are also incredibly high.
When there is a lot of data available, the accuracy of the insights will also be high.

2. Predictive Analytics

For businesses that want to stay ahead of the competition, they need to be able to predict future trends. Organizations use regression analysis to understand what the future holds for them.
To forecast trends, the data analysts predict how the dependent variables change based on the specific values given to them.
You can use multivariate linear regression for tasks such as charting growth plans, forecasting sales volumes, predicting inventory required, and so on.
Find out more about the area so that you can gather data from different sources
Collect the data required for the relevant variables
Specify and measure your regression model
If you have a model which fits the data, then use it to come up with predictions

3. Decision-making

For businesses to run effectively, they need to make better decisions and be aware of how each of their decisions will affect them. If they do not understand the consequences of their decisions, it can be difficult for their smooth functioning.
Businesses need to collect information about each of their departments – sales, operations, marketing, finance, HR, expenditures, budgetary allocation, and so on. Using relevant parameters and analyzing them helps businesses improve their outcomes.
Regression analysis helps businesses understand their data and gain insights into their operations . Business analysts use regression analysis extensively to make strategic business decisions.

4. Understanding failures

One of the most important things that most businesses miss doing is not reflecting on their failures.
Without contemplating why they met with failure for a marketing campaign or why their churn rate increased in the last two years, they will never find ways to make it right.
Regression analysis provides quantitative support to enable this kind decision-making.

5. Predicting Success

You can use regression analysis to predict the probability of success of an organization in various aspects.
Additionally, regression in stats analyses the data point of various sales data, including current sales data, to understand and predict the success rate in the future.

6. Risk Analysis

When analyzing data, data analysts, sometimes, make the mistake of considering correlation and causation as the same. However, businesses should know that correlation is not causation.
Financial organizations use regression data to assess their risk and guide them to make sound business decisions.

7. Provides New Insights

Looking at a huge set of data will help you get new insights. But data, without analysis, is meaningless.
With the help of regression analysis, you can find the relationship between a variety of variables to uncover patterns.
For example, regression models might indicate that there are more returns from a particular seller. So the eCommerce company can get in touch with the seller to understand how they send their products.

Each of these issues has different solutions to them. Without regression analysis, it might have been difficult to understand exactly what was the issue in the first place.

8. Analyze marketing effectiveness

When the company wants to know if the funds they have invested in marketing campaigns for a particular brand will give them enough ROI, then regression analysis is the way to go.
It is possible to check the isolated impact of each of the campaigns by controlling the factors that will have an impact on the sales.
Businesses invest in a number of marketing channels – email marketing , paid ads, Instagram influencers, etc. Regression statistics is capable of capturing the isolated ROI as well as the combined ROI of each of these companies.

7 Use Cases of Regression Analysis

1. credit card.

Credit card companies use regression analysis to understand various user factors such as the consumer’s future behavior, prediction of credit balance, risk of customer’s credit default, etc.
All of these data points help the company implement specific EMI options based on the results.
This will help credit card companies take note of the risky customers.
Simple linear regression (also called Ordinary Least Squares (OLS)) gives an overall rationale for the placing of the line of the best fit among the data points.
One of the most common applications using the statistical model is the Capital Asset Pricing Model (CAPM) which describes the relationship between the returns and risks of investing in a security.

3. Pharmaceuticals

Pharmaceutical companies use the process to analyze the quantitative stability data to estimate the shelf life of a product. This is because it finds the nature of the relationship between an attribute and time.
Medical researchers use regression analysis to understand if changes in drug dosage will have an impact on the blood pressure of patients. Pharma companies leveraging best engagement platforms of HCP to increase brand awareness in the virtual space.

For example, researchers will administer different dosages of a certain drug to patients and observe changes in their blood pressure. They will fit a simple regression model where they use dosage as the predictor variable and blood pressure as the response variable.

4. Text Editing

Logistic regression is a popular choice in a number of natural language processing (NLP) tasks s uch as text preprocessing.
After this, you can use logistic regression to make claims about the text fragment.
Email sorting, toxic speech detection, topic classification for questions, etc, are some of the areas where logistic regression shows great results.

5. Hospitality

You can use regression analysis to predict the intention of users and recognize them. For example, like where do the customers want to go? What they are planning to do?
It can even predict if the customer hasn’t typed anything in the search bar, based on how they started.
It is not possible to build such a huge and complex system from scratch. There are already several machine learning algorithms that have accumulated data and have simple models that make such predictions possible.

6. Professional sports

Data scientists working with professional sports teams use regression analysis to understand the effect that training regiments will have on the performance of players .
They will find out how different types of exercises, like weightlifting sessions or Zumba sessions, affect the number of points that player scores for their team (let’s say basketball).
Using Zumba and weightlifting as the predictor variables, and the total points scored as the response variable, they will fit the regression model.

Depending on the final values, the analysts will recommend that a player participates in more or less weightlifting or Zumba sessions to maximize their performance.

7. Agriculture

Agricultural scientists use regression analysis t o understand the effect of different fertilizers and how it affects the yield of the crops.
For example, the analysts might use different types of fertilizers and water on fields to understand if there is an impact on the crop’s yield.
Based on the final results, the agriculture analysts will change the number of fertilizers and water to maximize the crop output.

Wrapping Up

Using regression analysis helps you separate the effects that involve complicated research questions. It will allow you to make informed decisions, guide you with resource allocation, and increase your bottom line by a huge margin if you use the statistical method effectively.

If you are looking for an online survey tool to gather data for your regression analysis, SurveySparrow is one of the best choices. SurveySparrow has a host of features that lets you do as much as possible with a survey tool. Get on a call with us to understand how we can help you.

Product Marketing Manager at SurveySparrow

Excels in empowering visionary companies through storytelling and strategic go-to-market planning. With extensive experience in product marketing and customer experience management, she is an accomplished author, podcast host, and mentor, sharing her expertise across diverse platforms and audiences.

Sample size calculator – slovin’s formula to calculate sample size for surveys, questionnaire on consumer awareness: sample template + survey questions, how to calculate asset turnover ratio, see it to believe it..

14-Day Free Trial • Cancel Anytime • No Credit Card Required • Need a Demo?

Start your free trial today

No Credit Card Required. 14-Day Free Trial

Request a Demo

Want to learn more about SurveySparrow? We'll be in touch soon!

Perform Regression Analysis with a Dedicated Dashboard

Visualize and track trends in your survey data with multiple dashboards. try surveysparrow for free..

14-Day Free Trial • No Credit card required • 40% more completion rate

Hi there, we use cookies to offer you a better browsing experience and to analyze site traffic. By continuing to use our website, you consent to the use of these cookies. Learn More

Regression Analysis: Types, Importance and Limitations

Meaning of regression analysis, types of regression analysis, advantages of regression analysis, disadvantages of regression analysis, related posts:, add commercemates to your homescreen.

Prevalence of alcohol-impaired driving: a systematic review with a gender-driven approach and meta-analysis of gender differences

Published: 26 July 2024

Cite this article

Guido Pelletti 1 na1 ,
Rafael Boscolo-Berto 2 na1 ,
Laura Anniballi 1 ,
Arianna Giorgetti 1 ,
Filippo Pirani 1 ,
Mara Cavallaro 1 ,
Luca Giorgini 1 ,
Paolo Fais ORCID: orcid.org/0000-0002-2270-9956 1 ,
Jennifer Paola Pascali 1 &
Susi Pelotti 1

Explore all metrics

A growing number of studies investigated the factors that contribute to driving under the influence (DUI) of alcohol in relation to gender. However, a gendered approach of the scientific evidence is missing in the literature. To fill this gap, a gender-driven systematic review on real case studies of the last two decades was performed. In addition to the gender of the drivers involved, major independent variables such as the period of recruitment, the type of drivers recruited, and the geographical area where the study was conducted, were examined. Afterwards, a meta-analysis was performed comparing alcohol-positive rates (APR) between male and female drivers in three subgroups of drivers: those involved in road traffic accidents, those randomly tested on the road, and volunteers.

Three databases were searched for eligible studies in October 2023. Real-case studies reporting APR in man and women convicted for DUI of alcohol worldwide were included. Univariate analysis by ANOVA with post-hoc tests identified the independent variables with a significant impact on the dependent variable APR, according to a relationship subsequently investigated by standard multiple linear regression. The meta-analysis of random effects estimates was performed to investigate the change in overall effect size (measured by Cohen’s d standardized mean difference test) and 95% confidence interval (CI).

Among papers addressing driver gender, univariate analysis of independent variables revealed a higher Alcohol Positive Rate (APR) in men, particularly in drivers involved in crashes, with a noticeable decrease over time. Analyzing the gender of drivers involved in crashes, the meta-analysis showed that men had a significantly higher APR (30.7%; 95%CI 26.8–35.0) compared to women (13.2%; 95%CI 10.7–16.1). However, in drivers randomly tested, there was no significant difference in APR between genders (2.1% for men and 1.4% for women), while in volunteers, there was a statistically significant difference in APR with 3.4% (95%CI 1.5–7.6) for men and 1.1% (95%CI 0.5–2.7) for women.

Despite a progressive decrease in the epidemiological prevalence of alcohol-related DUI over time, this phenomenon remains at worryingly high levels among drivers involved in road traffic accidents in both genders, with a higher prevalence in men. It’s important for policymakers, professionals, and scientists to consider gender when planning research, analysis, interventions, and policies related to psychoactive substances, such as alcohol or other licit drugs. Forensic sciences can play a vital role in this regard, enabling a thorough analysis of gender gaps in different populations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Effects of alcohol intoxication on driving performance, confidence in driving ability, and psychomotor function: a randomized, double-blind, placebo-controlled study

Gender Differences and Automobile Insurance Acquisition

Data availability.

Not applicable.

Mauvais-Jarvis F, Bairey Merz N, Barnes PJ, Brinton RD, Carrero J-J, DeMeo DL, De Vries GJ, Epperson CN, Govindan R, Klein SL, Lonardo A, Maki PM, McCullough LD, Regitz-Zagrosek V, Regensteiner JG, Rubin JB, Sandberg K, Suzuki A (2020) Sex and gender: modifiers of health, disease, and medicine. Lancet 396:565–582. https://doi.org/10.1016/S0140-6736(20)31561-0

Article PubMed PubMed Central Google Scholar

Steingrímsson S, Carlsen HK, Sigfússon S, Magnússon A (2012) The changing gender gap in substance use disorder: a total population-based study of psychiatric in-patients. Addiction 107:1957–1962. https://doi.org/10.1111/j.1360-0443.2012.03954.x

Article PubMed Google Scholar

Barone R, Pelletti G, Garagnani M, Giusti A, Marzi M, Rossi F, Roffi R, Fais P, Pelotti S (2019) Alcohol and illicit drugs in drivers involved in road traffic crashes in Italy. An 8-year retrospective study. Forensic Sci Int 305:110004. https://doi.org/10.1016/j.forsciint.2019.110004

Article CAS PubMed Google Scholar

Marinelli S, Basile G, Manfredini R, Zaami S (2023) Sex- and gender-specific drug abuse dynamics: the need for tailored therapeutic approaches. J Pers Med 13:965. https://doi.org/10.3390/jpm13060965

Buccelli C, Della Casa E, Paternoster M, Niola M, Pieri M (2016) Gender differences in drug abuse in the forensic toxicological approach. Forensic Sci Int 265:89–95. https://doi.org/10.1016/j.forsciint.2016.01.014

Gjerde H, Ramaekers JG, Mørland JG (2020) Methodologies for Establishing the Relationship between Alcohol/Drug Use and Driving Impairment: Differences between Epidemiological, Experimental, and Real-Case Studies, in: A. Wayne Jones, J. Morland, R.H. Liu (Eds.), Alcohol, Drugs, and Impaired Driving, Taylor and Francis, : pp. 581–610. https://doi.org/10.4324/9781003030799

Pelletti G, Verstraete AG, Reyns T, Barone R, Rossi F, Garagnani M, Pelotti S (2019) Prevalence of therapeutic drugs in blood of drivers involved in traffic crashes in the area of Bologna, Italy. Forensic Sci Int 302:109914. https://doi.org/10.1016/j.forsciint.2019.109914

Robertson AA, Liew H, Gardner S (2011) An evaluation of the narrowing gender gap in DUI arrests. Accid Anal Prev 43:1414–1420. https://doi.org/10.1016/j.aap.2011.02.017

Schwartz J, Beltz L (2018) Trends in female and male drunken driving prevalence over thirty years: triangulating diverse sources of evidence (1985–2015). Addict Behav 84:7–12. https://doi.org/10.1016/j.addbeh.2018.03.024

McMurran M, Riemsma R, Manning N, Misso K, Kleijnen J (2011) Interventions for alcohol-related offending by women: a systematic review. Clin Psychol Rev 31:909–922. https://doi.org/10.1016/j.cpr.2011.04.005

Pelletti G, Boscolo-Berto R, Barone R, Giorgetti A, Fiorentini C, Pascali JP, Fais P, Pelotti S (2022) Gender differences in driving under the influence of psychoactive drugs: evidence mapping of real case studies and meta-analysis. Forensic Sci Int 341:111479. https://doi.org/10.1016/j.forsciint.2022.111479

Moher D, Liberati A, Tetzlaff J, Altman DG, Group PRISMA (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6:e1000097. https://doi.org/10.1371/journal.pmed.1000097

McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, and the, PRISMA-DTA Group T, Clifford JF, Cohen JJ, Deeks C, Gatsonis L, Hooft HA, Hunt CJ, Hyde DA, Korevaar MMG, Leeflang P, Macaskill JB, Reitsma R, Rodin AWS, Rutjes J-P, Salameh A, Stevens Y, Takwoingi M, Tonelli L, Weeks P, Whiting BH, Willis (2018) Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement, JAMA 319 388–396. https://doi.org/10.1001/jama.2017.19163

Boscolo-Berto R, Viel G, Cecchi R, Terranova C, Vogliardi S, Bajanowski T, Ferrara SD (2012) Journals publishing bio-medicolegal research in Europe. Int J Legal Med 126:129–137. https://doi.org/10.1007/s00414-011-0620-3

Viel G, Boscolo-Berto R, Cecchi R, Bajanowski T, Vieira ND, Ferrara SD (2011) Bio-medicolegal scientific research in Europe. A country-based analysis. Int J Legal Med 125:717–725. https://doi.org/10.1007/s00414-011-0576-3

Daglioglu N, Efeoglu Ozseker P, Dengiz H, Kekec Z (2022) Determination of phosphatidyl ethanol (PEth) 16:0/18:1 in dried blood samples of drivers involved in traffic accidents: a pilot study. Leg Med (Tokyo) 58:102091. https://doi.org/10.1016/j.legalmed.2022.102091

Oyono Y, Gjerde H, Acha Asongalem E, Kouomogne Nteungue BA, Bamuh E, Deuwa Ngako A, Kelley-Baker T, Ramaekers JG, Lontsi L, Sonwa G, Enow-Orock E (2021) Achidi Akum, Roadside surveys of drinking and driving in Cameroon. Traffic Inj Prev 22:349–354. https://doi.org/10.1080/15389588.2021.1922682

Huang C-Y, Chou S-E, Su W-T, Liu H-T, Hsieh T-M, Hsu S-Y, Hsieh H-Y, Hsieh C-H (2020) Effect of lowering the blood alcohol concentration limit to 0.03 among hospitalized trauma patients in Southern Taiwan: a cross-sectional analysis. Risk Manag Healthc Policy 13:571–581. https://doi.org/10.2147/RMHP.S250734

Jamt REG, Gjerde H, Furuhaugen H, Romeo G, Vindenes V, Ramaekers JG, Bogstrand ST (2020) Associations between psychoactive substance use and sensation seeking behavior among drivers in Norway. BMC Public Health 20:23. https://doi.org/10.1186/s12889-019-8087-0

Papalimperi AH, Athanaselis SA, Mina AD, Papoutsis II, Spiliopoulou CA, Papadodima SA (2019) Incidence of fatalities of road traffic accidents associated with alcohol consumption and the use of psychoactive drugs: a 7-year survey (2011–2017). Exp Ther Med 18:2299–2306. https://doi.org/10.3892/etm.2019.7787

Valen A, Bogstrand ST, Vindenes V, Frost J, Larsson M, Holtan A, Gjerde H (2019) Fatally injured drivers in Norway 2005-2015-Trends in substance use and crash characteristics. Traffic Inj Prev 20:460–466. https://doi.org/10.1080/15389588.2019.1616700

Pešić D, Antić B, Smailović E, Marković N (2019) Driving under the influence of alcohol and the effects of alcohol prohibition-case study in Serbia. Traffic Inj Prev 20:467–471. https://doi.org/10.1080/15389588.2019.1612058

Romano E, Kelley-Baker T, Hoff S, Eichelberger A, Ramírez A (2019) Use of Alcohol and Cannabis among adults driving children in Washington State. J Stud Alcohol Drugs 80:196–200. https://doi.org/10.15288/jsad.2019.80.196

Seesen M, Siviroj P, Sapbamrer R, Morarit S (2019) High blood alcohol concentration associated with traumatic brain injury among traffic injury patients during New Year festivals in Thailand. Traffic Inj Prev 20:115–121. https://doi.org/10.1080/15389588.2018.1547379

Jørgenrud B, Bogstrand ST, Furuhaugen H, Jamt REG, Vindenes V, Gjerde H (2018) Association between speeding and use of alcohol and medicinal and illegal drugs and involvement in road traffic crashes among motor vehicle drivers. Traffic Inj Prev 19:779–785. https://doi.org/10.1080/15389588.2018.1518577

Kalsi J, Selander T, Tervo T (2018) Alcohol policy and fatal alcohol-related crashes in Finland 2000–2016. Traffic Inj Prev 19:476–479. https://doi.org/10.1080/15389588.2018.1443325

Ferrari D, Manca M, Banfi G, Locatelli M (2018) Alcohol and illicit drugs in drivers involved in road traffic crashes in the Milan area. A comparison with normal traffic reveals the possible inadequacy of current cut-off limits. Forensic Sci Int 282:127–132. https://doi.org/10.1016/j.forsciint.2017.11.005

Santoyo-Castillo D, Pérez-Núñez R, Borges G, Híjar M (2018) Estimating the drink driving attributable fraction of road traffic deaths in Mexico. Addiction 113:828–835. https://doi.org/10.1111/add.14153

Martin J-L, Gadegbeku B, Wu D, Viallon V, Laumon B (2017) Cannabis, alcohol and fatal road accidents. PLoS ONE 12:e0187320. https://doi.org/10.1371/journal.pone.0187320

Article CAS PubMed PubMed Central Google Scholar

Domingo-Salvany A, Herrero MJ, Fernandez B, Perez J, Del Real P, González-Luque JC, de la Torre R (2017) Prevalence of psychoactive substances, alcohol and illicit drugs, in Spanish drivers: a roadside study in 2015. Forensic Sci Int 278:253–259. https://doi.org/10.1016/j.forsciint.2017.07.005

Kirsch B, Birngruber CG, Dettmeyer R (2017) Senior driving under the influence: a five-year retrospective study of alcoholized road-users aged 70 and over. Forensic Sci Int 277:10–15. https://doi.org/10.1016/j.forsciint.2017.05.002

Cittadini F, De Giovanni N, Caradonna L, Vetrugno G, Oliva A, Fucci N, Zuppi C, Pascali VL, Covino M (2017) Prevalence of alcohol and other drugs in injured drivers and their association with clinical outcomes. Eur Rev Med Pharmacol Sci 21:2008–2014

CAS PubMed Google Scholar

Jamt REG, Gjerde H, Normann PT, Bogstrand ST (2017) Roadside survey on alcohol and drug use among drivers in the Arctic county of Finnmark (Norway). Traffic Inj Prev 18:681–687. https://doi.org/10.1080/15389588.2017.1283027

Sobngwi-Tambekou JL, Brown TG, Bhatti JA (2016) Driving under the influence of alcohol in professional drivers in Cameroon, Traffic Inj Prev 17 suppl 1. 73–78. https://doi.org/10.1080/15389588.2016.1199867

Petković S, Palić K, Samojlik I (2016) Blood alcohol concentration in fatally injured drivers and the efficacy of alcohol policies of the new law on road traffic safety: a retrospective 10-year study in autonomous province of Vojvodina, Republic of Serbia. Traffic Inj Prev 17:553–557. https://doi.org/10.1080/15389588.2015.1125479

Jomar RT, de Ramos D, Abreu ÂMM (2016) Breathalyzer test: results and refusals to take the test of drivers intercepted under the DUI spot-check campaign in Rio De Janeiro. Cien Saude Colet 21:3787–3792. https://doi.org/10.1590/1413-812320152112.20572015

Bonilla-Escobar FJ, Herrera-López ML, Ortega-Lenis D, Medina-Murillo JJ, Fandiño-Losada A, Jaramillo-Molina C, Naranjo-Lujan S, Izquierdo EP, Vanlaar W (2016) Gutiérrez-Martínez, driving under the influence of alcohol in Cali, Colombia: prevalence and consumption patterns, 2013. Int J Inj Contr Saf Promot 23:179–188. https://doi.org/10.1080/17457300.2014.966120

Legrand S-A, Silverans P, de Paepe P, Buylaert W, Verstraete AG (2013) Presence of psychoactive substances in injured Belgian drivers. Traffic Inj Prev 14:461–468. https://doi.org/10.1080/15389588.2012.716881

Yuan A, Li Y, Zhang J (2013) The result of a baseline survey on drink driving in Nanning and Liuzhou of Guangxi Province, China. Traffic Inj Prev 14:230–236. https://doi.org/10.1080/15389588.2012.701785

Kelley-Baker T, Lacey JH, Voas RB, Romano E, Yao J, Berning A (2013) Drinking and driving in the United States: comparing results from the 2007 and 1996 National Roadside surveys. Traffic Inj Prev 14. https://doi.org/10.1080/15389588.2012.697229

Institóris L, Tóth AR, Molnár A, Arok Z, Kereszty E, Varga T (2013) The frequency of alcohol, illicit and licit drug consumption in the general driving population in South-East Hungary. Forensic Sci Int 224:37–43. https://doi.org/10.1016/j.forsciint.2012.10.022

Palmentier J-PFP, Warren R, Gorczynski LY (2009) Alcohol and drugs in suspected impaired drivers in Ontario from 2001 to 2005. J Forensic Leg Med 16:444–448. https://doi.org/10.1016/j.jflm.2009.05.002

Tsai Y-C, Wu S-C, Huang J-F, Kuo SCH, Rau C-S, Chien P-C, Hsieh H-Y, Hsieh C-H (2019) The effect of lowering the legal blood alcohol concentration limit on driving under the influence (DUI) in southern Taiwan: a cross-sectional retrospective analysis. BMJ Open 9:e026481. https://doi.org/10.1136/bmjopen-2018-026481

Zador PL, Krawchuk SA, Voas RB (2000) Alcohol-related relative risk of driver fatalities and driver involvement in fatal crashes in relation to driver age and gender: an update using 1996 data. J Stud Alcohol 61:387–395. https://doi.org/10.15288/jsa.2000.61.387

Hamnett HJ, Ilett M, Izzati F, Smith SS, Watson KH (2017) Toxicological findings in driver and motorcyclist fatalities in Scotland 2012–2015. Forensic Sci Int 274:22–26. https://doi.org/10.1016/j.forsciint.2016.12.034

du Plessis M, Hlaise KK, Blumenthal R (2016) Ethanol-related death in Ga-Rankuwa road-users, South Africa: a five-year analysis. J Forensic Leg Med 44:5–9. https://doi.org/10.1016/j.jflm.2016.08.006

Brubacher JR, Chan H, Martz W, Schreiber W, Asbridge M, Eppler J, Lund A, Macdonald S, Drummer O, Purssell R, Andolfatto G, Mann R, Brant R (2016) Prevalence of alcohol and drug use in injured british Columbia drivers. BMJ Open 6:e009278. https://doi.org/10.1136/bmjopen-2015-009278

Liu C, Huang Y, Pressley JC (2016) Restraint use and risky driving behaviors across drug types and drug and alcohol combinations for drivers involved in a fatal motor vehicle collision on U.S. roadways. Inj Epidemiol 3:9. https://doi.org/10.1186/s40621-016-0074-7

Legrand S-A, Gjerde H, Isalberti C, Van der Linden T, Lillsunde P, Dias MJ, Gustafsson S, Ceder G, Verstraete AG (2014) Prevalence of alcohol, illicit drugs and psychoactive medicines in killed drivers in four European countries. Int J Inj Contr Saf Promot 21:17–28. https://doi.org/10.1080/17457300.2012.748809

Brady JE, Li G (2014) Trends in alcohol and other drugs detected in fatally injured drivers in the United States, 1999–2010. Am J Epidemiol 179:692–699. https://doi.org/10.1093/aje/kwt327

Brady JE, Li G (2013) Prevalence of alcohol and other drugs in fatally injured drivers. Addiction 108:104–114. https://doi.org/10.1111/j.1360-0443.2012.03993.x

Rao Y, Zhao Z, Zhang Y, Ye Y, Zhang R, Liang C, Wang R, Sun Y, Jiang Y (2013) Prevalence of blood alcohol in fatal traffic crashes in Shanghai. Forensic Sci Int 224:117–122. https://doi.org/10.1016/j.forsciint.2012.11.011

Kuypers KPC, Legrand S-A, Ramaekers JG, Verstraete AG (2012) A case-control study estimating accident risk for alcohol, medicines and illegal drugs. PLoS ONE 7:e43496. https://doi.org/10.1371/journal.pone.0043496

Stübig T, Petri M, Zeckey C, Brand S, Müller C, Otte D, Krettek C, Haasper C (2012) Alcohol intoxication in road traffic accidents leads to higher impact speed difference, higher ISS and MAIS, and higher preclinical mortality. Alcohol 46:681–686. https://doi.org/10.1016/j.alcohol.2012.07.002

Costa N, Silva R, Mendonça MC, Real FC, Vieira DN, Teixeira HM (2012) Prevalence of ethanol and illicit drugs in road traffic accidents in the centre of Portugal: an eighteen-year update. Forensic Sci Int 216:37–43. https://doi.org/10.1016/j.forsciint.2011.08.013

Gómez-Talegón T, Fierro I, González-Luque JC, Colás M, López-Rivadulla M, Javier F, Álvarez (2012) Prevalence of psychoactive substances, alcohol, illicit drugs, and medicines, in Spanish drivers: a roadside study. Forensic Sci Int 223:106–113. https://doi.org/10.1016/j.forsciint.2012.08.012

Poulsen H, Moar R, Troncoso C (2012) The incidence of alcohol and other drugs in drivers killed in New Zealand road crashes 2004–2009. Forensic Sci Int 223:364–370. https://doi.org/10.1016/j.forsciint.2012.10.026

Legrand S-A, Isalberti C, der Linden TV, Bernhoft IM, Hels T, Simonsen KW, Favretto D, Ferrara SD, Caplinskiene M, Minkuviene Z, Pauliukevicius A, Houwing S, Mathijssen R, Lillsunde P, Langel K, Blencowe T, Verstraete AG (2013) Alcohol and drugs in seriously injured drivers in six European countries. Drug Test Anal 5:156–165. https://doi.org/10.1002/dta.1393

İdi̇z N, Karakuş A, Dalgiç M, Meseri̇ R, Akgür SA (2011) The Alcohol Levels in Fatal & Nonfatal Traffic Accidents in İzmir. Turkiye Klinikleri J Foren Sci Leg Med 8:6–11

Google Scholar

Gjerde H, Normann PT, Pettersen BS, Assum T, Aldrin M, Johansen U, Kristoffersen L, Øiestad EL, Christophersen AS, Mørland J (2008) Prevalence of alcohol and drugs among Norwegian motor vehicle drivers: a roadside survey. Accid Anal Prev 40:1765–1772. https://doi.org/10.1016/j.aap.2008.06.015

Plurad D, Demetriades D, Gruzinski G, Preston C, Chan L, Gaspard D, Margulies D, Cryer G (2010) Motor vehicle crashes: the association of alcohol consumption with the type and severity of injuries and outcomes. J Emerg Med 38:12–17. https://doi.org/10.1016/j.jemermed.2007.09.048

Jones AW, Kugelberg FC, Holmgren A, Ahlner J (2009) Five-year update on the occurrence of alcohol and other drugs in blood samples from drivers killed in road-traffic crashes in Sweden. Forensic Sci Int 186:56–62. https://doi.org/10.1016/j.forsciint.2009.01.014

Santamariña-Rubio E, Pérez K, Ricart I, Rodríguez-Sanz M, Rodríguez-Martos A, Brugal MT, Borrell C, Ariza C, Díez E, Beneyto VM, Nebot M, Ramos P, Suelves JM (2009) Substance use among road traffic casualties admitted to emergency departments. Inj Prev 15:87–94. https://doi.org/10.1136/ip.2008.019679

Gjerde H, Christophersen AS, Normann PT, Mørland J (2011) Toxicological investigations of drivers killed in road traffic accidents in Norway during 2006–2008. Forensic Sci Int 212:102–109. https://doi.org/10.1016/j.forsciint.2011.05.021

Peck RC, Gebers MA, Voas RB, Romano E (2008) The relationship between blood alcohol concentration (BAC), age, and crash risk. J Saf Res 39:311–319. https://doi.org/10.1016/j.jsr.2008.02.030

Article Google Scholar

Keall MD, Frith WJ, Patterson TL (2004) The influence of alcohol, age and number of passengers on the night-time risk of driver fatal injury in New Zealand. Accid Anal Prev 36:49–61. https://doi.org/10.1016/s0001-4575(02)00114-8

Shih H-C, Hu S-C, Yang C-C, Ko T-J, Wu J-K, Lee C-H (2003) Alcohol intoxication increases morbidity in drivers involved in motor vehicle accidents. Am J Emerg Med 21:91–94. https://doi.org/10.1053/ajem.2003.50025

Fabbri A, Marchesini G, Morselli-Labate AM, Rossi F, Cicognani A, Dente M, Iervese T, Ruggeri S, Mengozzi U, Vandelli A (2002) Positive blood alcohol concentration and road accidents. A prospective study in an Italian emergency department. Emerg Med J 19:210–214. https://doi.org/10.1136/emj.19.3.210

Whiting PF, Rutjes AWS, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang MMG, Sterne JAC, Bossuyt PMM (2011) QUADAS-2 Group, QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 155:529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372:n71. https://doi.org/10.1136/bmj.n71

Boscolo-Berto R, Favretto D, Cecchetto G, Vincenti M, Kronstrand R, Ferrara SD, Viel G (2014) Sensitivity and specificity of EtG in hair as a marker of chronic excessive drinking: pooled analysis of raw data and meta-analysis of diagnostic accuracy studies. Ther Drug Monit 36:560–575. https://doi.org/10.1097/FTD.0000000000000063

Begg CB, Mazumdar M (1994) Operating characteristics of a rank correlation test for publication bias. Biometrics 50:1088–1101

Berghaus G (2007) Meta-analyses in research in forensic medicine: Alcohol, drugs, diseases and traffic safety. Forensic Sci Int 165:108–110. https://doi.org/10.1016/j.forsciint.2006.05.011

Giorgetti A, Pascali JP, Pelletti G, Garagnani M, Roffi R, Grech M, Fais P (2024) Optimizing screening cutoffs for drugs of abuse in hair using immunoassay for forensic applications. Adv Clin Exp Med. https://doi.org/10.17219/acem/183124

Boscolo-Berto R (2024) Challenges and future trends of forensic toxicology to keep a cut above the rest. Adv Clin Exp Med. https://doi.org/10.17219/acem/185730

Chin JM, Growns B, Sebastian J, Page MJ, Nakagawa S (2022) The transparency and reproducibility of systematic reviews in forensic science. Forensic Sci Int 340:111472. https://doi.org/10.1016/j.forsciint.2022.111472

Driving Under the Influence of Drugs (2024) Alcohol and Medicines in Europe — findings from the DRUID project | ( www.emcdda.europa.eu , (n.d.). https://www.emcdda.europa.eu/publications/thematic-papers/druid_en

Cheng W-C, Dao K-L (2017) The occurrence of alcohol/drugs by toxicological examination of selected drivers in Hong Kong. Forensic Sci Int 275:242–253. https://doi.org/10.1016/j.forsciint.2017.03.022

Papa P, Rocchi L, Rolandi LM, Di Tuccio M, Biffi M, Valli A (2017) Illicit drugs in Emergency Department patients injured in road traffic accidents. Annali Dell’Istituto Superiore Di Sanita 53:35–39. https://doi.org/10.4415/ANN_17_01_08

Reilly K, Woodruff SI, Hohman M, Barker M (2019) Gender differences in driving under the influence (DUI) program client characteristics: implications for treatment delivery. Women Health 59:132–144. https://doi.org/10.1080/03630242.2018.1434589

Fell JC (2019) Approaches for reducing alcohol-impaired driving: evidence-based legislation, law enforcement strategies, sanctions, and alcohol-control policies. Forensic Sci Rev 31:161–184

Terranova C, Cestonaro C, Cinquetti A, Trevissoi F, Favretto D, Viel G, AnnaAprile (2024) Sex differences and driving impairment related to psychoactive substances. Traffic Injury Prev DOI. https://doi.org/10.1080/15389588.2024.2325607

Download references

Author information

Guido Pelletti and Rafael Boscolo-Berto equally contributed to this paper.

Authors and Affiliations

Department of Medical and Surgical Sciences, Unit of Legal Medicine, University of Bologna, Via Irnerio 49, Bologna, 40126, Italy

Guido Pelletti, Laura Anniballi, Arianna Giorgetti, Filippo Pirani, Mara Cavallaro, Luca Giorgini, Paolo Fais, Jennifer Paola Pascali & Susi Pelotti

Institute of Human Anatomy, Department of Neurosciences, University of Padova, Via A. Gabelli 65, Padua, 35127, Italy

Rafael Boscolo-Berto

You can also search for this author in PubMed Google Scholar

Contributions

GP: conceptualization; writing-original draft; RBB: methodology; formal analysis; LA: data curation; visualization; AG: data curation; formal analysis; FP: data curation; formal analysis; MC data curation; visualization;: LG data curation; visualization;: PF data curation; visualization;: JPP: SP: conceptualization; supervision. All authors have read and approved the final version of the manuscript.

Corresponding author

Correspondence to Paolo Fais .

Ethics declarations

Competing interest.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Compliance with ethical standards

No approval of an ethical committee is needed for this type of study.

Human ethics and consent to participate

Not applicable

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Pelletti, G., Boscolo-Berto, R., Anniballi, L. et al. Prevalence of alcohol-impaired driving: a systematic review with a gender-driven approach and meta-analysis of gender differences. Int J Legal Med (2024). https://doi.org/10.1007/s00414-024-03291-3

Download citation

Received : 19 April 2024

Accepted : 09 July 2024

Published : 26 July 2024

DOI : https://doi.org/10.1007/s00414-024-03291-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Forensic Toxicology
Traffic medicine
Find a journal
Publish with us
Track your research

COMMENTS

Regression Analysis
Regression analysis is a quantitative research method which is used when the study involves modelling and analysing several variables, where the relationship includes a dependent variable and one or more independent variables. In simple terms, regression analysis is a quantitative method used to test the nature of relationships between a dependent variable and one or more independent variables.
Regression Analysis
Regression analysis also helps in predicting health outcomes based on various factors like age, genetic markers, or lifestyle choices. Social Sciences: Regression analysis is widely used in social sciences like sociology, psychology, and education research. Researchers can investigate the impact of variables like income, education level, or ...
Regression analysis
t. e. In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables ...
Regression Analysis
Multiple linear regression analysis is essentially similar to the simple linear model, with the exception that multiple independent variables are used in the model. The mathematical representation of multiple linear regression is: Y = a + b X1 + c X2 + d X3 + ϵ. Where: Y - Dependent variable. X1, X2, X3 - Independent (explanatory) variables.
Explained: Regression analysis
The regression analysis creates the single line that best summarizes the distribution of points. Mathematically, the line representing a simple linear regression is expressed through a basic equation: Y = a 0 + a 1 X. Here X is hours spent studying per week, the "independent variable.". Y is the exam scores, the "dependent variable ...
Regression: Definition, Analysis, Calculation, and Example
Regression is a statistical measure used in finance, investing and other disciplines that attempts to determine the strength of the relationship between one dependent variable (usually denoted by ...
Regression Analysis: The Complete Guide
Regression analysis is a statistical method. It's used for analyzing different factors that might influence an objective - such as the success of a product launch, business growth, a new marketing campaign - and determining which factors are important and which ones can be ignored.
Regression Analysis
The aim of linear regression analysis is to estimate the coefficients of the regression equation b 0 and b k (k∈K) so that the sum of the squared residuals (i.e., the sum over all squared differences between the observed values of the i th observation of y i and the corresponding predicted values $ {\hat{y}}_i $) is minimized.The lower part of Fig. 1 illustrates this approach, which is ...
Sage Research Methods
Subject index. Understanding Regression Analysis: An Introductory Guide presents the fundamentals of regression analysis, from its meaning to uses, in a concise, easy-to-read, and non-technical style. It illustrates how regression coefficients are estimated, interpreted, and used in a variety of settings within the social sciences, business ...
What is Regression Analysis in Data Science?
Regression's significance in data science. Regression analysis is a powerful statistical technique that quantifies relationships between variables, identifies significant predictors, and bases predictions on discerned patterns. By employing regression, data scientists can pinpoint determinants of specific outcomes, imperative in areas like ...
Regression Analysis
Definition 7.1: Regression analysis is a statistical method for analyzing a relationship between two or more variables in such a manner that one of the variables can be predicted or explained by the information on the other variables. ... As such, it is an active field of research and a large number of statistical tests and diagnostic graphs ...
A Refresher on Regression Analysis
A Refresher on Regression Analysis. Understanding one of the most important types of data analysis. by. Amy Gallo. November 04, 2015. uptonpark/iStock/Getty Images. You probably know by now that ...
The clinician's guide to interpreting a regression analysis
Regression analysis is an important statistical method that is commonly ... (or dichotomous), meaning that the variable can ... Schober P, Vetter TR. Logistic regression in medical research ...
What Is Regression Analysis? Types, Importance, and Benefits
Regression analysis is a powerful tool used to derive statistical inferences for the future using observations from the past. It identifies the connections between variables occurring in a dataset and determines the magnitude of these associations and their significance on outcomes.
A Beginner's Guide to Regression Analysis
Logistic Regression. Logistic Regression comes into play when the dependent variable is discrete. This means that the target value will only have one or two values. For instance, a true or false, a yes or no, a 0 or 1, and so on. In this case, a sigmoid curve describes the relationship between the independent and dependent variables.
Regression Analysis: Definition, Types, Usage & Advantages
Overall, regression analysis saves the survey researchers' additional efforts in arranging several independent variables in tables and testing or calculating their effect on a dependent variable. Different types of analytical research methods are widely used to evaluate new business ideas and make informed decisions.
(PDF) Regression Analysis
7.1 Introduction. Regression analysis is one of the most fr equently used tools in market resear ch. In its. simplest form, regression analys is allows market researchers to analyze rela tionships ...
What Is Regression Analysis in Business Analytics?
Regression analysis is the statistical method used to determine the structure of a relationship between two variables (single linear regression) or three or more variables (multiple regression). According to the Harvard Business School Online course Business Analytics, regression is used for two primary purposes: To study the magnitude and ...
Understanding and interpreting regression analysis
Linear regression analysis involves examining the relationship between one independent and dependent variable. Statistically, the relationship between one independent variable (x) and a dependent variable (y) is expressed as: y= β 0 + β 1 x+ε. In this equation, β 0 is the y intercept and refers to the estimated value of y when x is equal to 0.
What is Regression Analysis? Types and Applications
Introduction to Regression Analysis . Regression analysis is a statistical technique for analysing and comprehending the connection between two or more variables of interest. The methodology used to do regression analysis aids in understanding which elements are significant, which may be ignored, and how they interact with one another.
What is Regression Analysis and Why Should I Use It?
Regression analysis is a reliable method of identifying which variables have impact on a topic of interest. The process of performing a regression allows you to confidently determine which factors matter most, which factors can be ignored, and how these factors influence each other. In order to understand regression analysis fully, it's ...
What is Regression Analysis? Definition, Types, and Examples
5 Types of Regression Analysis and When to Use Them. 1. Linear Regression Analysis. This type of regression analysis is one of the most basic types of regression and is used extensively in machine learning. Linear regression has a predictor variable and a dependent variablewhich is related to each linearly.
Regression Analysis: Types, Importance and Limitations
Regression analysis help in making prediction and forecasting for business in near and long term. It supports business decisions by providing necessary information related to dependent target and predictors. Regression analysis enables business in correcting errors by doing proper analysis of results derived from decisions.
Prevalence of alcohol-impaired driving: a systematic review with a
Univariate analysis by ANOVA with post-hoc tests identified the independent variables with a significant impact on the dependent variable APR, according to a relationship subsequently investigated by standard multiple linear regression. The meta-analysis of random effects estimates was performed to investigate the change in overall effect size ...

Regression Analysis

Regression Analysis – Methods, Types and Examples

Regression Analysis

Regression Analysis Methodology

Types of Regression Analysis

Linear Regression

Multiple Regression

Polynomial Regression

Logistic Regression

Ridge Regression and Lasso Regression

Time Series Regression

Nonlinear Regression

Poisson Regression

Generalized Linear Models (GLM)

Regression Analysis Formulas

Regression Analysis Examples

Importance of Regression Analysis

When to Use Regression Analysis

Applications of Regression Analysis

Advantages and Disadvantages of Regression Analysis

About the author

Muhammad Hassan

You may also like

Factor Analysis – Steps, Methods and Examples

Multidimensional Scaling – Types, Formulas and...

ANOVA (Analysis of variance) – Formulas, Types...

Discriminant Analysis – Methods, Types and...

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis – Methods, Applications and...

What is Regression Analysis?

Regression Analysis in Finance

Regression Analysis – Linear Model Assumptions

Regression Analysis – Simple Linear Regression

Y = a + bX + ϵ

Regression Analysis – Multiple Linear Regression

Y = a + b X 1 + c X 2 + d X 3 + ϵ

1. Beta and CAPM

2. Forecasting Revenues and Expenses

Create a free account to unlock this Template

Supercharge your skills with Premium Templates

Access Exclusive Templates

MIT News | Massachusetts Institute of Technology

Departments

Centers, Labs, & Programs

Explained: Regression analysis

Share this news article on:

Related Topics

More MIT News

Study tracks exposure to air pollution through the day

Edgerton Center hosts workshop for deaf high school students in STEM

MIT Global SCALE Network expands by adding center at Loughborough University

New transistor’s superlative properties could have broad electronics applications

When learning at MIT means studying thousands of miles away

Flying high to enable sustainable delivery, remote care

What Is Regression?

Regression: Definition, Analysis, Calculation, and Example

Key Takeaways

Regression and Econometrics

Example of How Regression Analysis Is Used in Finance

Why Is It Called Regression?

What Is the Purpose of Regression?

How Do You Interpret a Regression Model?

What Are the Assumptions That Must Hold for Regression Models?

Try Qualtrics for free

What is regression analysis?

How does regression analysis work?

Understanding variables:

2. Independent variable

Simple linear regression analysis

Multiple regression analysis

Multivariate linear regression

Logistic regression

Make accurate predictions

Identify inefficiencies

Drive better decisions

How do businesses use regression? A real-life example

Regression analysis tools

Related resources

Data Analysis 31 min read

What is Regression Analysis in Data Science?