Regression Analysis
Regression analysis models relationships between variables to predict outcomes, making it essential for data science and analytics. At MathMultiverse, we break down linear regression, evaluation metrics, and applications with clear examples and visualizations.
Linear Regression
Linear regression fits a line to data:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
Minimize sum of squared errors (SSE):
\[ \text{SSE} = \sum (y_i - \hat{y}_i)^2 \]
Coefficients:
\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad \beta_0 = \bar{y} - \beta_1 \bar{x} \]
Correlation coefficient:
\[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \]
Evaluation Metrics
R²
Proportion of variance explained:
\[ R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} \]
RMSE
Average prediction error:
\[ \text{RMSE} = \sqrt{\frac{\sum (y_i - \hat{y}_i)^2}{n}} \]
MAE
Average absolute error:
\[ \text{MAE} = \frac{\sum |y_i - \hat{y}_i|}{n} \]
Examples
Study Hours vs. Scores
Data: {(1, 50), (2, 60), (3, 75), (4, 85)}
\[ y = 37.5 + 12x \]
\[ \hat{y}(5) = 37.5 + 12 \cdot 5 = 97.5 \]
Advertising vs. Sales
Data: {(10, 100), (20, 150), (30, 200)}
\[ y = 50 + 5x \]
\[ \hat{y}(40) = 50 + 5 \cdot 40 = 250 \]
Temperature vs. Ice Cream Sales
Data: {(20, 30), (25, 40), (30, 55), (35, 65)}
\[ y = -18.5 + 2.4x \]
\[ \hat{y}(40) = -18.5 + 2.4 \cdot 40 = 77.5 \]
Visualizations
Study Hours vs. Scores
Applications
- Economics: Sales forecast, \( y = 100 + 20x \), predicts 500 for 20 units.
- Science: Experimental data, \( y = 9x \), predicts 36 for 4 units.
- Marketing: Customer behavior, \( y = 13.33 + 17.5x \), predicts 100.83 for 5 clicks.
- Healthcare: Drug dosage, \( y = 5 + 1.5x \), predicts 65 for 40 mg.
- Real Estate: Price prediction, \( y = 100 + 2x \), predicts 260 for 80 sq ft.