Regression Analysis

Regression analysis models relationships between variables to predict outcomes, making it essential for data science and analytics. At MathMultiverse, we break down linear regression, evaluation metrics, and applications with clear examples and visualizations.

Linear Regression

Linear regression fits a line to data:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Minimize sum of squared errors (SSE):

\[ \text{SSE} = \sum (y_i - \hat{y}_i)^2 \]

Coefficients:

\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad \beta_0 = \bar{y} - \beta_1 \bar{x} \]

Correlation coefficient:

\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \]

Evaluation Metrics

Proportion of variance explained:

\[ R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} \]

RMSE

Average prediction error:

\[ \text{RMSE} = \sqrt{\frac{\sum (y_i - \hat{y}_i)^2}{n}} \]

MAE

Average absolute error:

\[ \text{MAE} = \frac{\sum |y_i - \hat{y}_i|}{n} \]

Examples

Study Hours vs. Scores

Data: {(1, 50), (2, 60), (3, 75), (4, 85)}

\[ y = 37.5 + 12x \] \[ \hat{y}(5) = 37.5 + 12 \cdot 5 = 97.5 \]

Advertising vs. Sales

Data: {(10, 100), (20, 150), (30, 200)}

\[ y = 50 + 5x \] \[ \hat{y}(40) = 50 + 5 \cdot 40 = 250 \]

Temperature vs. Ice Cream Sales

Data: {(20, 30), (25, 40), (30, 55), (35, 65)}

\[ y = -18.5 + 2.4x \] \[ \hat{y}(40) = -18.5 + 2.4 \cdot 40 = 77.5 \]

Visualizations

Study Hours vs. Scores

Applications

  • Economics: Sales forecast, \( y = 100 + 20x \), predicts 500 for 20 units.
  • Science: Experimental data, \( y = 9x \), predicts 36 for 4 units.
  • Marketing: Customer behavior, \( y = 13.33 + 17.5x \), predicts 100.83 for 5 clicks.
  • Healthcare: Drug dosage, \( y = 5 + 1.5x \), predicts 65 for 40 mg.
  • Real Estate: Price prediction, \( y = 100 + 2x \), predicts 260 for 80 sq ft.