Regression concepts

Basic

Concepts of Regression Analysis
Regression Analysis is used for finding the best relationship between
dependent variable Y and one or more independent variables X1, X2 etc.
Let us first focus on simple linear regression. Simple means there is
only one independent variable X and linear means we will only find the
best linear relationship between X and Y.
We will be given a set of observations X,Y and our goal will be to
find the best equation Ycap = a + b.X
Note that Ycap is the estimated value of Y, a is the intercept on the vertical axis
and b is the slope of the line.
The best relationship is the one where error between actual Y and estimated
value of y given by symbol Ycap has the following properties.
1. Sum of errors is equal to 0. (Sum(Y – Ycap) = 0)
2. Sum of squared errors is lowest compared to any other line.
or ( Sum( (Y – Ycap)^2) ) = minimum compared to any pther line
This sum of squared error is also called unexplainable or residual variation.
Any line which follows these two properties is also called the least squares line.
You can take any set of numbers and use formulas below or EXCEL to find the best line.
The obvious question is if this line is good enough. Can we use this line for projecting
trends or forecasting. To answer this question we perform Correlation analysis.
The strating point of Correlation analysis is to find the total variation of Y. Recall from
basic statistics formulas that total variation = Sum ( ( Y – Ybar) ^ 2)
Note that deviding this total variation by n – 1 gives us variance and the square root of
variance gives us the standard deviation.
In general, if we start out with a large amount of variation and after doing the regression
we end up with a very small of residual variation, we claim that we have done a good job
of explaining and our model is good. Suppose our total variation was 200 and we are
left with a residual or unexplainable variation of 30, we have explained 170 out of 200 units
of variation. Our coefficient of determination would be 170/200 or 0.85. In other words
Coeff. of determination tells us the proportion of variation we have suceeded in explaining
by doing the regression model. At most we can explain will be all of it, in that case
R-squared = 1. If we do not explain anything then R-squared = 0.
When R-squared is 1 the value of R (Coeff. of correlation) will be = +1 or -1.
This is when we have a perfect model. When Rsqaured is 0, R will also be 0 and we
say that X and Y have no correlation. Correlation measures degree of linear relationship
between X and Y. If R is closer to +1 or -1, we can conclude that there appears to be
strong linear relationship between X and Y. If R is close to 0, we will say that there
does not appear to be linear relationship between X and Y.
When R is + it is called positive correlaion or we say that X and Y have direct relationship.
When R is – it is called negative correlaion or we say that X and Y have inverse relationship.
We will first learn how to find the regression line and R-squared using formulas.
Soppose the sales for the last 9 periods for a company were 80000, 90000, 120000,
90000, 110000, 12000,15000, 90000, 140000 units respectively. We want to forecast
for the tenth period using trend analysis or regression analysis.
Note that we have two variables. Sales and Time. We will make the data handing
easier by using sales in thousands and time will start from 1,2,3 etc. Note that n = 9
Time is the independent variable X and sales is the dependent variable Y.
X Y X.Y X^2 Ycap Y – Ycap (Y-Ycap)^2 Y – Ybar (Y-Ybar)^2
1 80 80 1 88 -8 64 -30 900
9 2 90 180 4 93.5 -3.5 12.25 -20 400
# of Obs 3 120 360 9 99 21 441 10 100
4 90 360 16 104.5 -14.5 210.25 -20 400
5 110 550 25 110 0 0 0 0
6 120 720 36 115.5 4.5 20.25 10 100
7 150 1050 49 121 29 841 40 1600
8 90 720 64 126.5 -36.5 1332.25 -20 400
9 140 1260 81 132 8 64 30 900
Totals 45 990 5280 285 0 2985 0 4800
Xbar = 5 b = 5.5 Residual Total
Ybar = 110 a = 82.5
Ycap = 82.5 + 5.5 X R-Squared = 0.378125 R = 0.6149186938
Formulas used above: Ybar = Sum(Y) / n Xbar = Sum(X) / n
b = (Sum(XY) – n * Xbar * Ybar) / (Sum(X^2) – n * (Xbar^2))
a = Ybar – b * Xbar
Ycap = a + b * X
Unexplainable or Residual Variation = Sum ( (Y – Ycap) ^ 2 ) )
Total Variation = Sum ( ( Y – Ybar) ^ 2 )
R- Squared = ( Total – Residual) / Total
You can solve this problem using Regression in EXCEL’s Data Analysis Package
Go to TOOLS. If you do not have DATA ANALYSIS Choose ADD IN
Select DATA ANALYSIS and it will be added to your Tools menu.
Within data analysis choose regression. You can chck labels to show names.
Time Sales
1 80
2 90
3 120
4 90
5 110
6 120
7 150
8 90
9 140
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.6149186938
R Square 0.378125
Adjusted R Square 0.2892857143
Standard Error 20.6501470074
Observations 9
ANOVA
df SS MS F Significance F
Regression 1 1815 1815 4.256281407 0.0780101022
Residual 7 2985 426.4285714286
Total 8 4800
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 82.5 15.0019839958 5.4992726311 0.0009072363 47.0259701991 117.9740298009 47.0259701991 117.9740298009
Time 5.5 2.6659225152 2.0630757153 0.0780101022 -0.8039005226 11.8039005226 -0.8039005226 11.8039005226
RESIDUAL OUTPUT
Observation Predicted Sales Residuals
1 88 -8
2 93.5 -3.5
3 99 21
4 104.5 -14.5
5 110 0
6 115.5 4.5
7 121 29
8 126.5 -36.5
9 132 8

Time Line Fit Plot

Sales 1 2 3 4 5 6 7 8 9 80 90 120 90 110 120 150 90 140 Predicted Sales 1 2 3 4 5 6 7 8 9 88 93.5 99 104.5 110 115.5 121 126.5 132Time

 

Sales

 

 

 

Formulas

Trend Analysis
n = 14 Using Formulas
Y X X.Y X^2 Ycap Y-Ycap (Y-Ycap)^2 Y-Ybar (Y-Ybar)^2
# Mergers Year
4 1 4 1 18.543 -14.543 211.495 -27.214 740.617
17 2 34 4 20.492 -3.492 12.196 -14.214 202.046
19 3 57 9 22.442 -3.442 11.846 -12.214 149.189
45 4 180 16 24.391 20.609 424.722 13.786 190.046
25 5 125 25 26.341 -1.341 1.797 -6.214 38.617
37 6 222 36 28.290 8.710 75.862 5.786 33.474
44 7 308 49 30.240 13.760 189.350 12.786 163.474
35 8 280 64 32.189 2.811 7.902 3.786 14.332
27 9 243 81 34.138 -7.138 50.958 -4.214 17.760
31 10 310 100 36.088 -5.088 25.887 -0.214 0.046
21 11 231 121 38.037 -17.037 290.272 -10.214 104.332
38 12 456 144 39.987 -1.987 3.947 6.786 46.046
45 13 585 169 41.936 3.064 9.386 13.786 190.046
49 14 686 196 43.886 5.114 26.156 17.786 316.332
Total 437 105 3721 1015 0.000 1341.776 0.000 2206.357
AVERAGE 31.2142857143 7.5
b= 1.949450549 Unexp Var = 1341.776 R-Square= 0.391859188
a= 16.59340659 Total Var = 2206.357 R = 0.625986572
Ycap = 16.59 + 1.94 X Exp Var = 864.581 May or May not br good Model
Standard Error = 10.574245442 Standard Err of b = 0.7010656462
Calculated T Stat = 2.7806961581 Calculated F Stat = 7.7322682773
USING EXCEL BUILT-IN FUNTION
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.625986572
R Square 0.391859188
Adjusted R Square 0.341180787
Standard Error 10.57424475
Observations 14
ANOVA
df SS MS F Significance F
Regression 1 864.5813187 864.5813187 7.732272141 0.016628916
Residual 12 1341.775824 111.814652
Total 13 2206.357143
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 16.59340659 5.969358487 2.779763793 0.016657688 3.587291958 29.59952123 3.587291958 29.59952123
Year 1.949450549 0.7010656 2.780696341 0.016628916 0.421959851 3.476941248 0.421959851 3.476941248
RESIDUAL OUTPUT
Observation Predicted # Mergers Residuals
1 18.54285714 -14.54285714
2 20.49230769 -3.492307692
3 22.44175824 -3.441758242
4 24.39120879 20.60879121
5 26.34065934 -1.340659341
6 28.29010989 8.70989011
7 30.23956044 13.76043956
8 32.18901099 2.810989011
9 34.13846154 -7.138461538
10 36.08791209 -5.087912088
11 38.03736264 -17.03736264
12 39.98681319 -1.986813187
13 41.93626374 3.063736264
14 43.88571429 5.114285714

Time Series

Example of Forecasting
Three period Simple Three period Weighted Average
Moving Average 0.2 0.3 0.5
Error Error
Period Sales Yt Forecast Ft Abs(Yt – Ft) (Yt-Ft)^2 APE Forecast Ft Abs(Yt – Ft) (Yt-Ft)^2 APE
1 24
2 25
3 27
4 24 25.333 1.333 1.778 5.556 25.800 1.800 3.240 7.500
5 20 25.333 5.333 28.444 26.667 25.100 5.100 26.010 25.500
6 21 23.667 2.667 7.111 12.698 22.600 1.600 2.560 7.619
7 23 21.667 1.333 1.778 5.797 21.300 1.700 2.890 7.391
8 28 21.333 6.667 44.444 23.810 21.800 6.200 38.440 22.143
9 24 24.000 0.000 0.000 0.000 25.100 1.100 1.210 4.583
10 26 25.000 1.000 1.000 3.846 25.000 1.000 1.000 3.846
11 24 26.000 2.000 4.000 8.333 25.800 1.800 3.240 7.500
12 29 24.667 4.333 18.778 14.943 24.600 4.400 19.360 15.172
26.333 2.741 11.926 11.294 26.900 2.744 10.883 11.251
Forecast MAD MSSE MAPE Forecast MAD MSSE MAPE
Exponential Smoothing Method Exponential Smoothing Method
Alpha = 0.2 Alpha = 0.7
Error Error
Period Sales Yt Forecast Ft Abs(Yt – Ft) (Yt-Ft)^2 APE Forecast Ft Abs(Yt – Ft) (Yt-Ft)^2 APE
1 24
2 25 24.000 1.000 1.000 4.000 24.000 1.000 1.000 4.000
3 27 24.200 2.800 7.840 10.370 24.700 2.300 5.290 8.519
4 24 24.760 0.760 0.578 3.167 26.310 2.310 5.336 9.625
5 20 24.608 4.608 21.234 23.040 24.693 4.693 22.024 23.465
6 21 23.686 2.686 7.217 12.792 21.408 0.408 0.166 1.942
7 23 23.149 0.149 0.022 0.648 21.122 1.878 3.525 8.164
8 28 23.119 4.881 23.821 17.431 22.437 5.563 30.950 19.869
9 24 24.095 0.095 0.009 0.398 26.331 2.331 5.434 9.713
10 26 24.076 1.924 3.700 7.399 24.699 1.301 1.692 5.003
11 24 24.461 0.461 0.213 1.921 25.610 1.610 2.591 6.707
12 29 24.369 4.631 21.447 15.969 24.483 4.517 20.404 15.576
25.295 2.181 7.916 8.831 27.645 2.537 8.947 10.235
Forecast MAD MSSE MAPE Forecast MAD MSSE MAPE
Naïve Method Number 1 Naïve Method Number 2
Forecast Ft ERROR:#NAME? Forecast Ft 24.5833333333
Error Error
Period Sales Yt Forecast Ft Abs(Yt – Ft) (Yt-Ft)^2 APE Forecast Ft Abs(Yt – Ft) (Yt-Ft)^2 APE
1 24 24.583 0.583 0.340 2.431
2 25 24 1.000 1.000 4.000 24.583 0.417 0.174 1.667
3 27 25 2.000 4.000 7.407 24.583 2.417 5.840 8.951
4 24 27 3.000 9.000 12.500 24.583 0.583 0.340 2.431
5 20 24 4.000 16.000 20.000 24.583 4.583 21.007 22.917
6 21 20 1.000 1.000 4.762 24.583 3.583 12.840 17.063
7 23 21 2.000 4.000 8.696 24.583 1.583 2.507 6.884
8 28 23 5.000 25.000 17.857 24.583 3.417 11.674 12.202
9 24 28 4.000 16.000 16.667 24.583 0.583 0.340 2.431
10 26 24 2.000 4.000 7.692 24.583 1.417 2.007 5.449
11 24 26 2.000 4.000 8.333 24.583 0.583 0.340 2.431
12 29 24 5.000 25.000 17.241 24.583 4.417 19.507 15.230
24.5833333333 29 2.818 9.909 11.378 24.583 2.014 6.410 8.340
Average Forecast MAD MSSE MAPE Forecast MAD MSSE MAPE
Trend Analysis
Using Regression
X Y X^2 X.Y Error
Period Sales Yt Forecast Ft Abs(Yt – Ft) (Yt-Ft)^2 APE
1 24 1 24 23.333 0.667 0.444 2.778
2 25 4 50 23.561 1.439 2.072 5.758
3 27 9 81 23.788 3.212 10.318 11.897
4 24 16 96 24.015 0.015 0.000 0.063
5 20 25 100 24.242 4.242 17.998 21.212
6 21 36 126 24.470 3.470 12.039 16.522
7 23 49 161 24.697 1.697 2.880 7.378
8 28 64 224 24.924 3.076 9.460 10.985
9 24 81 216 25.152 1.152 1.326 4.798
10 26 100 260 25.379 0.621 0.386 2.389
11 24 121 264 25.606 1.606 2.579 6.692
12 29 144 348 25.833 3.167 10.028 10.920
6.5 24.5833333333 650 1950 26.061 2.030 5.794 8.449
Average Average Forecast MAD MSSE MAPE
Slope b = 0.2273
Intecpt a= 23.1061
Equation is Estimated Y = 23.1061 + 0.2273 X


You Need a Professional Writer To Work On Your Paper?