Simple Regression
SIMPLE REGRESSION
How do we know what will
happen in the future if we adopt a policy?
One way is to look at what
happened in other places when similar policies were adopted.
Using the Chi Square
test, we could test nominal data.
The most simple examples are
the 2 by 2 crosstab.
Crosstab of Dichotomous
(Yes-No) Data
|
|
|
Dependent
Variable |
|
|
|
|
No |
Yes |
|
Dependent Variable |
No |
7 |
13 |
|
Yes |
13 |
7 |
|
But sometimes the policy
variable, or the expected outcomes of that policy, are interval level data.
Linear Regression (also called
Ordinary Least Squares) can be used on interval level data.
For example, imagine
municipality is considering adopting a 1.5 percent local payroll tax.
What will happen to wages?
What will happen to new job creation?
It would be very easy if we
could take 2 similar municipalities, impose a 1.5 % payroll tax on one, and no
such tax on the other, and see how their subsequent development differs.
But in a free society, we
cannot impose our experiments on other people.
Municipalities, however, may
conduct their own experiments.
One city may impose a half
percent payroll tax, and another impose a two percent, and another impose a five
percent tax. It is possible that no
one ever imposed a 1.5% payroll tax before.
Different levels of payroll tax
are interval data.
On the outcome side, the
economies of these cities may grow at different rates from two to six percent a
year. Different growth rates are
also interval data.
When both our dependent and independent variable are interval
data, we can use regression analysis. From
our results we can estimate the percent change in our dependent variable (like
new jobs creation) for each percentage of payroll tax.
Thus we could predict the response to a 1.5 percent payroll tax, even if
no one had ever had a payroll tax of that rate before.
The basic assumption of simple
regression is that changes in X cause changes in Y.
The first thing to do is put
your Ys in one column, and your Xs in another, and graph them.
Linear regression assumes
that the relationship is a straight line. If
it is not, you may have to do a transformation of the data to make them a
straight line. This is something
that you would learn in a course devoted to entirely to linear regression.
The assumptions of linear
regression are:
1. Both X and Y are
interval data
2. The relationship
between X and Y are linear.
3. The errors (difference
between expected and observed values of Y) are normally distributed with a mean
of zero. This results in a bell
shaped curve. Note: in last class
we found we use probabilistic reasoning and a bell shaped curve to estimate
significance.
4. The error is constant
regardless of the value of X. (If
error was greater over a certain range, which are usually high values, then our
predictions and estimates of significance would not be as valid over those
ranges.
5. Errors must be
independent of each other. Violations
of this assumption can happen if the subject of the experiment has a memory, or
if past independent variables are still influencing behavior.
EXCEL Instructions
The instructions for
interpreting a regression analysis are in your Essential Statistics book.
The instructions for generating
a simple regression on EXCEL are as follows:
Enter your data, Y or
independent variable in one column, and your X(s) or dependent variable(s) in
the next column.
Click your mouse on: Tools/
Data Analysis/ Regression.
On the resulting page:
Click on the X range and highlight your Xs
Click on the Y range and highlight your Ys
Hit enter and your output will appear on a new worksheet
(usually sheet 4).
Your Adjusted R Square is the
portion of the change in Y explained by changes in X
Your “intercept” is the
value of Y without any X.
Your “X variable 1" is
that change in Y for each unit of X.
Homework is on the Blackboard Course site.