Violent Crime data and regression analysis

 

State

VCPC

$PoliceExPC

$JudicalExPC

$CorrectionExPC

Income (PC)

Dummy

Alabama

0.008

0.019

0.033

0.044

16301.0

1

Alaska

0.008

0.089

0.172

0.226

21826.0

1

Arizona

0.007

0.025

0.019

0.088

17023.0

0

Arkansas

0.006

0.017

0.013

0.049

15393.0

0

California

0.011

0.026

0.035

0.098

21381.0

0

Colorado

0.006

0.012

0.034

0.075

19992.0

0

Connecticut

0.005

0.026

0.055

0.135

27172.0

0

Delaware

0.007

0.052

0.073

0.128

20827.0

0

Florida

0.012

0.015

0.032

0.078

19403.0

1

Georgia

0.007

0.016

0.008

0.087

18088.0

1

Hawaii

0.003

0.011

0.097

0.096

22109.0

0

Idaho

0.003

0.028

0.024

0.059

16174.0

0

Illinois

0.010

0.020

0.017

0.056

21624.0

0

Indiana

0.005

0.019

0.009

0.056

18208.0

0

Iowa

0.003

0.017

0.040

0.056

18107.0

0

Kansas

0.005

0.015

0.033

0.071

19100.0

0

Kentucky

0.005

0.027

0.039

0.056

16283.0

0

Louisiana

0.011

0.029

0.022

0.070

15793.0

1

Maine

0.001

0.019

0.025

0.043

18128.0

0

Maryland

0.010

0.037

0.042

0.119

22976.0

0

Massachusetts

0.008

0.029

0.057

0.113

23549.0

0

Michigan

0.008

0.021

0.021

0.096

19589.0

0

Minnesota

0.004

0.023

0.028

0.045

20291.0

0

Mississippi

0.004

0.014

0.009

0.035

13902.0

1

Missouri

0.007

0.020

0.018

0.040

18808.0

0

Montana

0.002

0.030

0.042

0.048

16054.0

0

Nebraska

0.004

0.024

0.018

0.052

19151.0

0

Nevada

0.009

0.022

0.013

0.111

21030.0

0

New Hampshire

0.001

0.023

0.059

0.044

21537.0

0

New Jersey

0.006

0.036

0.038

0.086

25903.0

0

New Mexico

0.009

0.029

0.047

0.085

15192.0

0

New York

0.011

0.016

0.063

0.114

24021.0

0

North Carolina

0.007

0.023

0.036

0.087

17549.0

1

North Dakota

0.001

0.011

0.016

0.026

17101.0

0

Ohio

0.005

0.015

0.012

0.068

18804.0

0

Oklahoma

0.006

0.015

0.031

0.063

16344.0

0

Oregon

0.005

0.024

0.050

0.066

18343.0

0

Pennsylvania

0.004

0.026

0.021

0.052

20511.0

0

Rhode Island

0.004

0.028

0.073

0.107

20256.0

0

South Carolina

0.010

0.024

0.013

0.091

15993.0

1

South Dakota

0.002

0.020

0.029

0.047

17183.0

0

Tennessee

0.008

0.014

0.019

0.072

17367.0

1

Texas

0.008

0.012

0.016

0.087

18093.0

1

Utah

0.003

0.020

0.033

0.057

15096.0

0

Vermont

0.001

0.046

0.050

0.073

18649.0

0

Virginia

0.004

0.041

0.035

0.099

20560.0

0

Washington

0.005

0.030

0.011

0.092

20851.0

0

West Virginia

0.002

0.013

0.025

0.021

15432.0

0

Wisconsin

0.003

0.010

0.027

0.077

18943.0

0

Wyoming

0.003

0.023

0.044

0.058

18660.0

0

 

Original data: HomeworkData.xls

PA453, Graduate School of Public Affairs, University of Missouri-Columbia

Winter semester. April 2001

 

 

 

A)          Calculation of a multiple regression model

 

I will use EXCEL regression function to know which variables affect the violent crime per capita for all 50 states (population). This study explains relationship between violent crime per capita and variables of justice activity expenditures or other variables.

 

(Assumption)

 

1.            Sample size is 50 (1993 data).

2.            The value of the dependent variable: violent crime per capita (VCPC)

3.            The values of the predictor variables: Police expenditure per capita ($PoliceExPC), judicial expenditure per capita ($JudicialExPC) and correction expenditure per capita ($CorrectionExPC)

4.            Every variable is categorical.

5.            I should use Excel’s regression-data-analysis tool to obtain an estimated multiple regression model that includes more than one predictor variable.

 

(The input screen)

 

 

 

(The regression model with a categorical variable)

 

Y=0.0027-0.0368*POLICE – 0.0567*JUDICIAL + 0.0775*CORRECTION

This is a multiple linear, or first order, regression model with 3 predictor variables.

1.            0.0027 is population intercept.

2.            -0.0368 is slope coefficient of police expenditure per capita, and POLICE is genetic term for predictor variable.

3.            –0.0567 is slope coefficient of judicial expenditure per capita, and JUDICIAL is genetic term for predictor variable.

4.            0.0775 is slope coefficient of correction expenditure per capita, and CORRECTIOPN is genetic term for predictor variable.

 

(Interpretation in terms of statistical relationship)

 

1.            As police expenditure per capita increases by one dollar, violent crime per capita decreases by 0.0368 cases holding all other predictor variables in the model constant.

2.            As judicial expenditure per capita increases by one dollar, violent crime per capita decreases by 0.0567 cases holding all other predictor variables in the model constant.

3.            As correction expenditure per capita increases by one dollar, violent crime per capita increases by 0.0775 cases holding all other predictor variables in the model constant.

 

(Interpretation of individual significant relationship)

 

 

P-value

$PoliceExPC

0.35154

$JudicalExPC

0.003892

$CorrectionExPC

3.44E-06

 

1.            P-value of police expenditure per capita is 0.35154, and it is bigger than 0.05 significant level. Therefore there is not significant relationship between violent crime per capita and police expenditure per capita.

2.            P-value of judicial expenditure per capita is 0.0038, and it is smaller than 0.05 significant level. Therefore there is significant relationship between violent crime per capita and judicial expenditure per capita.

3.            P-value of police expenditure per capita is 3.44E-06, and it is smaller than 0.05 significant level. Therefore there is significant relationship between violent crime per capita and correction expenditure per capita.

 

 

 (Analysis of Variance: ANOVA)

 

I begin by stating the null and alternative hypothesis for the violent crime per capita increase. I use ANOVA to determine whether the one or more predictor variables and VCPC are related in the population. In multiple regression analysis, the null is always the “ no relationship” hypothesis.

 

(Setting hypothesis)

Null: None of the predictor variables are statistically related to the violent crime per capita. (All predictors equal zero)

Alternative: not all predictor equal zero.

 

(Significant level)

Based on the consequence of making a Type 1 error, we set the significant level a, at 0.05.

 

(Computing the F-ratio)

I use the F-ratio to test the null hypothesis that none of the predictor variables are related to the VCPC.

F ratio=(mean square: regression) / (mean square: residual)

F-ratio is 9.822. The mean square regression is 9.822 times as large as the mean square residual.

 

(Hypothesis testing)

If none of the 3 variables are related to the violent crime per capita as the dependent variable, the probability of obtaining an F-ratio of 9.822, or greater is 0.0000402. This is smaller than 0.05 significant level. In the decision rule of hypothesis testing, I should not reject null if the value of significance F is larger than the chosen significant level of 0.05. Therefore we should reject the null hypothesis in terms of decision rule.

 

Decision rule

If the null is true : significant F ≥ 0.05 ----- not reject the null

If the null is not true : significant F < 0.05 ---- reject the null

 

Consequently, we can conclude that the overall model is significant. It means that overall justice activity expenditure can explain violent crime per capita.

 

 

 

B)          Significance of each relationship and its null hypothesis

 

(Simple linear regression model)

 

Y=VCPC

X=$PoliceExPC, JudicialExPC, or $CorrectionExPC

 

(Preliminary analysis)

 

Before computing estimated regression models, I draw scatter diagrams for each predictor variable against VCPC as dependent variable.

 

(Best-fitting liner or model)

 

Y=@*X+b

Y value ----- violent crime per capita

X value ----- each predictor variables

@ ----- Slope

b ----- intercept

 

@ and b can never be known for sure unless the study included all state data. In the liner model, @ do not appear as exponents. Though I do not know @ and b with certainty, I can estimate these parameter from simple liner regression model.

 

X-value

Best fit liner equation

Feature of scatter diagram

Police expenditure per capita

Y=

0.0215x+0.0052

The data contains no systematic upward or downward pattern. Nor are any clusters evident.

Judicial expenditure per capita

Y=

-0.0019x+0.0057

The data contains no systematic upward or downward pattern. Nor are any clusters evident.

Correction expenditure per capita

Y=

0.0376x+0.0028

 

The data contains no clear systematic upward or downward pattern, however there is somehow relationship. Nor are any clusters evident.

Income per capita

Y=

2E-07x+0.0022

The data contains no systematic upward or downward pattern. Nor are any clusters evident.

 

(Screen outcome)

 

 

 

 

 

 

 

(ANOVA)

 

(Hypothesis testing)

I begin by stating the null and alternative hypothesis for the violent crime per capita increase. In simple regression analysis, the null is always the “ no relationship” hypothesis.

 

(Null hypothesis)

Null: each single predictor variables is not statistically related to the violent crime per capita.

Alternative:  each predictor is related to the violence per capita.

 

(Significant level)

Based on the consequence of making a Type 1 error (reject the null hypothesis when it is true), we set the significant level a, at 0.05.

 

(Output screen of PoliceExPC date)

 

 

(Results of others)

 

Predictor variable

P-value (significant F)

Null hypothesis

Result

Police expenditure per capita (PoliceExPC)

0.5156>0.05 significant level

Police expenditure per capita is not statistically related to the violent crime per capita.

@ of PEPC = 0

Not reject the null

Judicial expenditure per capita (JudicialExPC)

 

0.9014>0.05 significant level

 

Judicial expenditure per capita is not statistically related to the violent crime per capita.

@ of JEPC = 0

Not reject the null

Correction expenditure per capita (CorrectionExPC)

0.00144<0.05 significant level

Collection expenditure per capita is not statistically related to the violent crime per capita.

@ of CEPC  = 0

Reject the null

Per capita income (PCI)

 

 

0.224>0.05 significant level

Per capita income is not statistically related to the violent crime per capita.

@ of PCI = 0

Not reject the null

 

(Conclusion)

 

1.            Police expenditure per capita is not statistically related to the violent crime per capita; therefore there is no significant relation between two variables.

2.             Judicial expenditure per capita is not statistically related to the violent crime per capita, therefore there is no significant relation between two variables.

3.            Collection expenditure per capita is statistically related to the violent crime per capita, therefore there is significant relation between two variables.

4.            Per capita income is not statistically related to the violent crime per capita, therefore there is no significant relation between two variables. (Per capita income does not explain violent crime per capita.)

 

 

 

C) Relationship between violent crime per capita and combination of Per capita income and other justice activities

(ANOVA)

 

(Setting hypothesis)

Null: None of the predictor variables are statistically related to the violent crime per capita.

Alternative: not all predictor equal zero.

 

(Significant level)

Based on the consequence of making a Type 1 error, we set the significant level a, at 0.05.

 

(Hypothesis testing)

When I add per capita income to the other independent variable (three justice activities expenditure), F-ratio is 0.001. This is smaller than 0.05 of significant level. Therefore I should reject the null. It means that there is significant relationship between violent crime per capita and combination of per capita income and expenditure per capita among justice activities.

 

(Output screen)

 

 

 

(The regression model)

 

Y=0.0047 - 0.0431 * POLICE - 0.0551 * JUDICIAL + 0.0843 * COLLECTION + 0.00000013 * INCOME

This is a multiple linear, or first order, regression model with 3 predictor variables.

 

1.            0.0047 is population intercept.

2.            -0.0431 is slope coefficient of police expenditure per capita, and POLICE is genetic term for predictor variable.

3.            –0.0551 is slope coefficient of judicial expenditure per capita, and JUDICIAL is genetic term for predictor variable.

4.            0.0843 is slope coefficient of collection expenditure per capita, and CORRECTIOPN is genetic term for predictor variable.

5.            0.00000013 is slope coefficient of income per capita, and INCOME is genetic term for predictor variable.

 

 

 

 

D)          Variation of violence crime per capita as the dependent variable and appropriate measures

 

I apply ANOVA output portion of the regression data analysis tool.

 

(Decomposition of the total sum of square)

The variation of violence crime per capita is measured by the total sum of squares, SST. SST measures the variation in the dependent variable due to the 3-predictor variables plus all other possible predictor variables not yet in the model.

 

SST=  (0.00780 - 0.00568)² + (0.0892 - 0.00568)² + (0.0252 - 0.00568)²+…………… (0.0232 - 0.00568)²

      =0.000426

 

Of the 0.000426 units of variation, the 3-predictor variables account for 0.000166 units of variation. 0.00026 units of variation are due to predictor variables not yet in the model.

 

(Degree of freedom :df)

1.            The total degree of freedom is n-1, where n equals the sample size. So 50-1=49.

2.            The regression degree of freedom equals the number of predictor variables, k. So k =3.

3.            The residual degree of freedom is the reminder n-1-k. So 50-3-1=46.

 

 

E)           Dummy model for south versus non-south

 

When I create a dummy variable for south versus non-south, I put value of 1 on each south state and value of 0 on each non-south state.

 

DUMMY (predict variable) = 1 if south state

                                           = 0 is non-south state

 

 

(ANOVA)

 

(Output screen)

 

 

 

(The regression model)

 

Y=0.0026 - 0.0338 * POLICE – 0.0506 * JUDICAL + 0.0690 * COLLECTION + 0.0022 * DUMMY

This is a multiple linear, or first order, regression model with 4 predictor variables.

 

1              0.0026 is population intercept.

2              -0.0338 is slope coefficient of police expenditure per capita, and POLICE is genetic term for predictor variable.

3              –0.0506 is slope coefficient of judicial expenditure per capita, and JUDICIAL is genetic term for predictor variable.

4              0.0690 is slope coefficient of collection expenditure per capita, and CORRECTIOPN is genetic term for predictor variable.

5              0.0022 is slope coefficient of income per capita, and DUMMY is genetic term for predictor variable.

 

 

(Interpretation)

 

South state

Y=0.0026 - 0.0338 * POLICE – 0.0506 * JUDICAL + 0.0690 * CORRECTION + 0.0022 * DUMMY(=1)

 

   =0.0026 - 0.0338 * POLICE – 0.0506 * JUDICAL + 0.0690 * CORRECTION + 0.0022

 

Non-south state

Y=0.0026 - 0.0338 * POLICE – 0.0506 * JUDICAL + 0.0690 * CORRECTION + 0.0022 * DUMMY(=0)

 

   =0.0026 - 0.0338 * POLICE – 0.0506 * JUDICAL + 0.0690 * CORRECTION

 

 

∆Y according to dummy increase (0 to 1) =0.0022

DUMMY shows that how mush higher VCPC is for state coded 1 than state coded 0. If DUMMY is 0, ∆Y is 0. If DUMMY is 1, ∆Y is 0.0022. As dummy variable increases by one unit (1), violent crime per capita increases by 0.0022 cases holding all other predictor variables in the model constant. if a state is south part, violent crime per capita as the dependent variable increase by 0.0022 cases.

The figure indicates that for the 50 states in 1993, non-south states (coded as zero) received smaller violent crime per capita that did south states. However it does not mean that in the population of all states in every year-base is related to violent crime per capita.

 

 

(Setting hypothesis)

Null: None of the predictor variables are statistically related to the violent crime per capita.

Alternative: not all predictor equal zero.

 

(Significant level)

Based on the consequence of making a Type 1 error, we set the significant level a, at 0.05.

 

(Hypothesis testing)

When I add the dummy variable to the other independent variable (three justice activities expenditure), F-ratio is 0.00000514. This is smaller than 0.05 of significant level. Therefore I should reject the null. It means that there is significant relationship between violent crime per capita and combination of the dummy variable and expenditure per capita among justice activities.