Violent Crime data and regression analysis
|
State |
VCPC |
$PoliceExPC |
$JudicalExPC |
$CorrectionExPC |
Income (PC) |
Dummy |
|
Alabama |
0.008 |
0.019 |
0.033 |
0.044 |
16301.0 |
1 |
|
Alaska |
0.008 |
0.089 |
0.172 |
0.226 |
21826.0 |
1 |
|
Arizona |
0.007 |
0.025 |
0.019 |
0.088 |
17023.0 |
0 |
|
Arkansas |
0.006 |
0.017 |
0.013 |
0.049 |
15393.0 |
0 |
|
California |
0.011 |
0.026 |
0.035 |
0.098 |
21381.0 |
0 |
|
Colorado |
0.006 |
0.012 |
0.034 |
0.075 |
19992.0 |
0 |
|
Connecticut |
0.005 |
0.026 |
0.055 |
0.135 |
27172.0 |
0 |
|
Delaware |
0.007 |
0.052 |
0.073 |
0.128 |
20827.0 |
0 |
|
Florida |
0.012 |
0.015 |
0.032 |
0.078 |
19403.0 |
1 |
|
Georgia |
0.007 |
0.016 |
0.008 |
0.087 |
18088.0 |
1 |
|
Hawaii |
0.003 |
0.011 |
0.097 |
0.096 |
22109.0 |
0 |
|
Idaho |
0.003 |
0.028 |
0.024 |
0.059 |
16174.0 |
0 |
|
Illinois |
0.010 |
0.020 |
0.017 |
0.056 |
21624.0 |
0 |
|
Indiana |
0.005 |
0.019 |
0.009 |
0.056 |
18208.0 |
0 |
|
Iowa |
0.003 |
0.017 |
0.040 |
0.056 |
18107.0 |
0 |
|
Kansas |
0.005 |
0.015 |
0.033 |
0.071 |
19100.0 |
0 |
|
Kentucky |
0.005 |
0.027 |
0.039 |
0.056 |
16283.0 |
0 |
|
Louisiana |
0.011 |
0.029 |
0.022 |
0.070 |
15793.0 |
1 |
|
Maine |
0.001 |
0.019 |
0.025 |
0.043 |
18128.0 |
0 |
|
Maryland |
0.010 |
0.037 |
0.042 |
0.119 |
22976.0 |
0 |
|
Massachusetts |
0.008 |
0.029 |
0.057 |
0.113 |
23549.0 |
0 |
|
Michigan |
0.008 |
0.021 |
0.021 |
0.096 |
19589.0 |
0 |
|
Minnesota |
0.004 |
0.023 |
0.028 |
0.045 |
20291.0 |
0 |
|
Mississippi |
0.004 |
0.014 |
0.009 |
0.035 |
13902.0 |
1 |
|
Missouri |
0.007 |
0.020 |
0.018 |
0.040 |
18808.0 |
0 |
|
Montana |
0.002 |
0.030 |
0.042 |
0.048 |
16054.0 |
0 |
|
Nebraska |
0.004 |
0.024 |
0.018 |
0.052 |
19151.0 |
0 |
|
Nevada |
0.009 |
0.022 |
0.013 |
0.111 |
21030.0 |
0 |
|
New Hampshire |
0.001 |
0.023 |
0.059 |
0.044 |
21537.0 |
0 |
|
New Jersey |
0.006 |
0.036 |
0.038 |
0.086 |
25903.0 |
0 |
|
New Mexico |
0.009 |
0.029 |
0.047 |
0.085 |
15192.0 |
0 |
|
New York |
0.011 |
0.016 |
0.063 |
0.114 |
24021.0 |
0 |
|
North Carolina |
0.007 |
0.023 |
0.036 |
0.087 |
17549.0 |
1 |
|
North Dakota |
0.001 |
0.011 |
0.016 |
0.026 |
17101.0 |
0 |
|
Ohio |
0.005 |
0.015 |
0.012 |
0.068 |
18804.0 |
0 |
|
Oklahoma |
0.006 |
0.015 |
0.031 |
0.063 |
16344.0 |
0 |
|
Oregon |
0.005 |
0.024 |
0.050 |
0.066 |
18343.0 |
0 |
|
Pennsylvania |
0.004 |
0.026 |
0.021 |
0.052 |
20511.0 |
0 |
|
Rhode Island |
0.004 |
0.028 |
0.073 |
0.107 |
20256.0 |
0 |
|
South Carolina |
0.010 |
0.024 |
0.013 |
0.091 |
15993.0 |
1 |
|
South Dakota |
0.002 |
0.020 |
0.029 |
0.047 |
17183.0 |
0 |
|
Tennessee |
0.008 |
0.014 |
0.019 |
0.072 |
17367.0 |
1 |
|
Texas |
0.008 |
0.012 |
0.016 |
0.087 |
18093.0 |
1 |
|
Utah |
0.003 |
0.020 |
0.033 |
0.057 |
15096.0 |
0 |
|
Vermont |
0.001 |
0.046 |
0.050 |
0.073 |
18649.0 |
0 |
|
Virginia |
0.004 |
0.041 |
0.035 |
0.099 |
20560.0 |
0 |
|
Washington |
0.005 |
0.030 |
0.011 |
0.092 |
20851.0 |
0 |
|
West Virginia |
0.002 |
0.013 |
0.025 |
0.021 |
15432.0 |
0 |
|
Wisconsin |
0.003 |
0.010 |
0.027 |
0.077 |
18943.0 |
0 |
|
Wyoming |
0.003 |
0.023 |
0.044 |
0.058 |
18660.0 |
0 |
Original data: HomeworkData.xls
PA453, Graduate School of
Public Affairs, University of Missouri-Columbia
Winter semester. April 2001
I will use EXCEL regression function to know which variables affect the violent crime per capita for all 50 states (population). This study explains relationship between violent crime per capita and variables of justice activity expenditures or other variables.
(Assumption)
1. Sample size is 50 (1993 data).
2. The value of the dependent variable: violent crime per capita (VCPC)
3. The values of the predictor variables: Police expenditure per capita ($PoliceExPC), judicial expenditure per capita ($JudicialExPC) and correction expenditure per capita ($CorrectionExPC)
4. Every variable is categorical.
5. I should use Excels regression-data-analysis tool to obtain an estimated multiple regression model that includes more than one predictor variable.
(The
input screen)

(The
regression model with a categorical variable)
Y=0.0027-0.0368*POLICE 0.0567*JUDICIAL + 0.0775*CORRECTION
This is a multiple linear, or first order, regression model with 3 predictor variables.
1. 0.0027 is population intercept.
2. -0.0368 is slope coefficient of police expenditure per capita, and POLICE is genetic term for predictor variable.
3. 0.0567 is slope coefficient of judicial expenditure per capita, and JUDICIAL is genetic term for predictor variable.
4. 0.0775 is slope coefficient of correction expenditure per capita, and CORRECTIOPN is genetic term for predictor variable.
(Interpretation
in terms of statistical relationship)
1. As police expenditure per capita increases by one dollar, violent crime per capita decreases by 0.0368 cases holding all other predictor variables in the model constant.
2. As judicial expenditure per capita increases by one dollar, violent crime per capita decreases by 0.0567 cases holding all other predictor variables in the model constant.
3. As correction expenditure per capita increases by one dollar, violent crime per capita increases by 0.0775 cases holding all other predictor variables in the model constant.
(Interpretation
of individual significant relationship)
|
|
P-value |
|
$PoliceExPC |
0.35154 |
|
$JudicalExPC |
0.003892 |
|
$CorrectionExPC |
3.44E-06 |
1. P-value of police expenditure per capita is 0.35154, and it is bigger than 0.05 significant level. Therefore there is not significant relationship between violent crime per capita and police expenditure per capita.
2. P-value of judicial expenditure per capita is 0.0038, and it is smaller than 0.05 significant level. Therefore there is significant relationship between violent crime per capita and judicial expenditure per capita.
3. P-value of police expenditure per capita is 3.44E-06, and it is smaller than 0.05 significant level. Therefore there is significant relationship between violent crime per capita and correction expenditure per capita.
(Analysis of Variance: ANOVA)
I begin by stating the null and alternative hypothesis for the violent crime per capita increase. I use ANOVA to determine whether the one or more predictor variables and VCPC are related in the population. In multiple regression analysis, the null is always the no relationship hypothesis.
(Setting
hypothesis)
Null: None of the predictor variables are statistically related to the violent crime per capita. (All predictors equal zero)
Alternative: not all predictor equal zero.
(Significant
level)
Based on the consequence of making a Type 1 error, we set the significant level a, at 0.05.
I use the F-ratio to test the null hypothesis that none of the predictor variables are related to the VCPC.
F ratio=(mean square: regression) / (mean square: residual)
F-ratio is 9.822. The mean square regression is 9.822 times as large as the mean square residual.
If none of the 3 variables are related to the violent crime per capita as the dependent variable, the probability of obtaining an F-ratio of 9.822, or greater is 0.0000402. This is smaller than 0.05 significant level. In the decision rule of hypothesis testing, I should not reject null if the value of significance F is larger than the chosen significant level of 0.05. Therefore we should reject the null hypothesis in terms of decision rule.
Decision rule
If the null is true : significant F ≥ 0.05 ----- not reject the null
If the null is not true : significant F < 0.05 ---- reject the null
Consequently, we can conclude that the overall model is significant. It means that overall justice activity expenditure can explain violent crime per capita.
(Simple
linear regression model)
Y=VCPC
X=$PoliceExPC, JudicialExPC, or $CorrectionExPC
(Preliminary
analysis)
Before computing estimated regression models, I draw scatter diagrams for each predictor variable against VCPC as dependent variable.
(Best-fitting
liner or model)
Y=@*X+b
Y value ----- violent crime per capita
X value ----- each predictor variables
@ ----- Slope
b ----- intercept
@ and b can never be known for sure unless the study included all state data. In the liner model, @ do not appear as exponents. Though I do not know @ and b with certainty, I can estimate these parameter from simple liner regression model.
|
X-value |
Best fit liner equation |
Feature of scatter diagram |
|
Police expenditure per capita |
Y= 0.0215x+0.0052 |
The data contains no systematic upward or downward pattern. Nor are any clusters evident. |
|
Judicial expenditure per capita |
Y= -0.0019x+0.0057 |
The data contains no systematic upward or downward pattern. Nor are any clusters evident. |
|
Correction expenditure per capita |
Y= 0.0376x+0.0028 |
The data contains no clear systematic upward or downward pattern, however there is somehow relationship. Nor are any clusters evident. |
|
Income per capita |
Y= 2E-07x+0.0022 |
The data contains no systematic upward or downward pattern. Nor are any clusters evident. |
(Screen outcome)




(ANOVA)
(Hypothesis
testing)
I begin by stating the null and alternative hypothesis for the violent crime per capita increase. In simple regression analysis, the null is always the no relationship hypothesis.
(Null
hypothesis)
Null: each single predictor variables is not statistically related to the violent crime per capita.
Alternative: each predictor is related to the violence per capita.
(Significant
level)
Based on the consequence of making a Type 1 error (reject the null hypothesis when it is true), we set the significant level a, at 0.05.
(Output
screen of PoliceExPC date)

(Results of others)
|
Predictor variable |
P-value (significant F) |
Null hypothesis |
Result |
|
Police expenditure per capita (PoliceExPC) |
0.5156>0.05 significant level |
Police expenditure per capita is not statistically related to the violent crime per capita. @ of PEPC = 0 |
Not reject the null |
|
Judicial expenditure per capita (JudicialExPC) |
0.9014>0.05 significant level |
Judicial expenditure per capita is not statistically related to the violent crime per capita. @ of JEPC = 0 |
Not reject the null |
|
Correction expenditure per capita (CorrectionExPC) |
0.00144<0.05 significant level |
Collection expenditure per capita is not statistically related to the violent crime per capita. @ of CEPC = 0 |
Reject the null |
|
Per capita income (PCI) |
0.224>0.05 significant level |
Per capita income is not statistically related to the violent crime per capita. @ of PCI = 0 |
Not reject the null |
(Conclusion)
1. Police expenditure per capita is not statistically related to the violent crime per capita; therefore there is no significant relation between two variables.
2. Judicial expenditure per capita is not statistically related to the violent crime per capita, therefore there is no significant relation between two variables.
3. Collection expenditure per capita is statistically related to the violent crime per capita, therefore there is significant relation between two variables.
4. Per capita income is not statistically related to the violent crime per capita, therefore there is no significant relation between two variables. (Per capita income does not explain violent crime per capita.)
(ANOVA)
(Setting hypothesis)
Null: None of the predictor variables are statistically related to the violent crime per capita.
Alternative: not all predictor equal zero.
(Significant level)
Based on the consequence of making a Type 1 error, we set the significant level a, at 0.05.
When I add per capita income to the other independent variable (three justice activities expenditure), F-ratio is 0.001. This is smaller than 0.05 of significant level. Therefore I should reject the null. It means that there is significant relationship between violent crime per capita and combination of per capita income and expenditure per capita among justice activities.
(Output
screen)

(The
regression model)
Y=0.0047 - 0.0431 * POLICE - 0.0551 * JUDICIAL + 0.0843 * COLLECTION + 0.00000013 * INCOME
This is a multiple linear, or first order, regression model with 3 predictor variables.
1. 0.0047 is population intercept.
2. -0.0431 is slope coefficient of police expenditure per capita, and POLICE is genetic term for predictor variable.
3. 0.0551 is slope coefficient of judicial expenditure per capita, and JUDICIAL is genetic term for predictor variable.
4. 0.0843 is slope coefficient of collection expenditure per capita, and CORRECTIOPN is genetic term for predictor variable.
5. 0.00000013 is slope coefficient of income per capita, and INCOME is genetic term for predictor variable.
I apply ANOVA output portion of the regression data analysis tool.
The variation of violence crime per capita is measured by the total sum of squares, SST. SST measures the variation in the dependent variable due to the 3-predictor variables plus all other possible predictor variables not yet in the model.
SST= (0.00780 - 0.00568)² + (0.0892 - 0.00568)² + (0.0252 - 0.00568)²+ (0.0232 - 0.00568)²
=0.000426
Of the 0.000426 units of variation, the 3-predictor variables account for 0.000166 units of variation. 0.00026 units of variation are due to predictor variables not yet in the model.
(Degree of
freedom :df)
1. The total degree of freedom is n-1, where n equals the sample size. So 50-1=49.
2. The regression degree of freedom equals the number of predictor variables, k. So k =3.
3. The residual degree of freedom is the reminder n-1-k. So 50-3-1=46.
When I create a dummy variable for south versus non-south, I put value of 1 on each south state and value of 0 on each non-south state.
DUMMY (predict variable) = 1 if south state
= 0 is non-south state
(ANOVA)
(Output
screen)

(The
regression model)
Y=0.0026 - 0.0338 * POLICE 0.0506 * JUDICAL + 0.0690 * COLLECTION + 0.0022 * DUMMY
This is a multiple linear, or first order, regression model with 4 predictor variables.
1 0.0026 is population intercept.
2 -0.0338 is slope coefficient of police expenditure per capita, and POLICE is genetic term for predictor variable.
3 0.0506 is slope coefficient of judicial expenditure per capita, and JUDICIAL is genetic term for predictor variable.
4 0.0690 is slope coefficient of collection expenditure per capita, and CORRECTIOPN is genetic term for predictor variable.
5 0.0022 is slope coefficient of income per capita, and DUMMY is genetic term for predictor variable.
(Interpretation)
South state
Y=0.0026 - 0.0338 * POLICE 0.0506 * JUDICAL + 0.0690 * CORRECTION + 0.0022 * DUMMY(=1)
=0.0026 - 0.0338 * POLICE 0.0506 * JUDICAL + 0.0690 * CORRECTION + 0.0022
Non-south state
Y=0.0026 - 0.0338 * POLICE 0.0506 * JUDICAL + 0.0690 * CORRECTION + 0.0022 * DUMMY(=0)
=0.0026 - 0.0338 * POLICE 0.0506 * JUDICAL + 0.0690 * CORRECTION
∆Y according to dummy increase (0 to 1) =0.0022
DUMMY shows that how mush higher VCPC is for state coded 1 than state coded 0. If DUMMY is 0, ∆Y is 0. If DUMMY is 1, ∆Y is 0.0022. As dummy variable increases by one unit (1), violent crime per capita increases by 0.0022 cases holding all other predictor variables in the model constant. if a state is south part, violent crime per capita as the dependent variable increase by 0.0022 cases.
The figure indicates that for the 50 states in 1993, non-south states (coded as zero) received smaller violent crime per capita that did south states. However it does not mean that in the population of all states in every year-base is related to violent crime per capita.

Null: None of the predictor variables are statistically related to the violent crime per capita.
Alternative: not all predictor equal zero.
(Significant
level)
Based on the consequence of making a Type 1 error, we set the significant level a, at 0.05.
(Hypothesis
testing)
When I add the dummy variable to the other independent variable (three justice activities expenditure), F-ratio is 0.00000514. This is smaller than 0.05 of significant level. Therefore I should reject the null. It means that there is significant relationship between violent crime per capita and combination of the dummy variable and expenditure per capita among justice activities.