LINEAR REGRESSION
- Introduction
- very often when 2 (or more) variables are observed, relationship between them can be
visualized
- predictions are always required in economics or physical science from existing and
historical data
- regression analysis is used to help formulate these predictions and relationships
- linear regression is a special kind of regression analysis in which 2 variables are
studied and a straight-line relationship is assumed
- linear regression is important because
- there exist many relationships that are of this form
- it provides close approximations to complicated relationships which would otherwise be
difficult to describe
- the 2 variables are divided into (i) independent variable and (ii) dependent variable
- Dependent Variable is the variable that we want to forecast
- Independent Variable is the variable that we use to make the forecast
- e.g. Time vs. GNP (time is independent, GNP is dependent)
- scatter diagrams are used to graphically presenting the relationship between the 2
variables
- usually the independent variable is drawn on the horizontal axis (X) and the dependent
variable on vertical axis (Y)
- the regression line is also called the regression line of Y on X
- Assumptions
- there is a linear relationship as determined (observed) from the scatter diagram
- the dependent values (Y) are independent of each other, i.e. if we obtain a large value
of Y on the first observation, the result of the second and subsequent observations will
not necessarily provide a large value. In simple term, there should not be
auto-correlation
- for each value of X the corresponding Y values are normally distributed
- the standard deviations of the Y values for each value of X are the same, i.e. homoscedasticity
- Process
- observe and note what is happening in a systematic way
- form some kind of theory about the observed facts
- draw a scatter diagram to visualize relationship
- generate the relationship by mathematical formula
- make use of the mathematical formula to predict
- Method of Least Squares
- from a scatter diagram, there is virtually no limit as to the number of lines that can
be drawn to make a linear relationship between the 2 variables
- the objective is to create a BEST FIT line to the data concerned
- the criterion is the called the method of least squares
- i.e. the sum of squares of the vertical
deviations from the points to the line be a minimum (based on the fact that the
dependent variable is drawn on the vertical axis)
- the linear relationship between the dependent variable (Y) and the independent variable
can be written as Y = a + bX , where a and b are parameters describing the vertical
intercept and the slope of the regression line respectively
- Calculating a and b
- Correlation
- when the value of one variable is related to the value of another, they are said to be
correlated
- there are 3 types of correlation: (i) perfectly correlated; (ii) partially correlated;
(iii) uncorrelated
- Coefficient of Correlation (r) measures such a relationship


- the value of r ranges from -1 (perfectly correlated in the negative direction) to +1
(perfectly correlated in the positive direction)
- when r = 0, the 2 variables are not correlated
- Coefficient of Determination
- Standard Error of Estimate (SEE)
- a measure of the variability of the regression line, i.e. the dispersion around the
regression line
- it tells how much variation there is in the dependent variable between the raw value and
the expected value in the regression

- this SEE allows us to generate the confidence interval on the regression line as we did
in the estimation of means
- Confidence interval for the regression line (estimating the
expected value)
- estimating the mean value of
for a given value of X is a very important practical problem
- e.g. if a corporation's profit Y is linearly related to its advertising expenditures X,
the corporation may want to estimate the mean profit for a given expenditure X
- this is given by the formula

- at n-2 degrees of freedom for the t-distribution
- Confidence interval for individual prediction
- for technical reason, the above formula must be amended and is given by

An Example
| |
Accounting
X |
Statistics
Y |
X2 |
Y2 |
XY |
| 1 |
74.00 |
81.00 |
5476.00 |
6561.00 |
5994.00 |
| 2 |
93.00 |
86.00 |
8649.00 |
7396.00 |
7998.00 |
| 3 |
55.00 |
67.00 |
3025.00 |
4489.00 |
3685.00 |
| 4 |
41.00 |
35.00 |
1681.00 |
1225.00 |
1435.00 |
| 5 |
23.00 |
30.00 |
529.00 |
900.00 |
690.00 |
| 6 |
92.00 |
100.00 |
8464.00 |
10000.00 |
9200.00 |
| 7 |
64.00 |
55.00 |
4096.00 |
3025.00 |
3520.00 |
| 8 |
40.00 |
52.00 |
1600.00 |
2704.00 |
2080.00 |
| 9 |
71.00 |
76.00 |
5041.00 |
5776.00 |
5396.00 |
| 10 |
33.00 |
24.00 |
1089.00 |
576.00 |
792.00 |
| 11 |
30.00 |
48.00 |
900.00 |
2304.00 |
1440.00 |
| 12 |
71.00 |
87.00 |
5041.00 |
7569.00 |
6177.00 |
| Sum |
687.00 |
741.00 |
45591.00 |
52525.00 |
48407.00 |
| Mean |
57.25 |
61.75 |
3799.25 |
4377.08 |
4033.92 |

Figure 1: Scatter Diagram of Raw Data




Figure 2: Scatter Diagram and Regression Line

Interpretation/Conclusion
There is a linear relation between the results of Accounting and Statistics as shown
from the scatter diagram in Figure 1. A linear regression analysis was done using the
least-square method. The resultant regression line is represented by
in which X represents the results of
Accounting and Y that of Statistics. Figure 2 shows the regression line. In this example,
the choice of dependent and independent variables is arbitrary. It can be said that the
results of Statistics are correlated to that of Accounting or vice versa.
The Coefficient of Determination
is 0.8453. This shows that the two variables are correlated. Nearly 85% of the variation
in Y is explained by the regression line.
The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two
variables are positively correlated (Y increases as X increases).
Copyright: © Southern Cross University, 1995. Permission is hereby granted
to use this document for personal use and in courses of instruction at educational
institutions provided that the article is used in full and this copyright statement is
reproduced. Permission is also given to mirror this document on Worldwide Web servers. Any
other usage is expressly prohibited without the express permission of Southern Cross
University.