Statistical Tools in Marketing Research anova
Name
Instructor
Course
Date
Statistical Tools in Marketing Research
ANOVA
This is a statistical tool of analysis that is used to separate total variability existing in a set of data into two components: systematic and random factors. The random factors have no statistical influence on the data set under study, while on the other hand systematic factors do. The ANOVA test is used to establish the effect of independent variables on the dependent variables in a regression analysis. It is a guide in telling whether or not an occurrence was most probable due to a random chance of variation. ANOVA test is a statistical technique used to determine the existence of variations among various population means. It is not used to determine how variances are different but to determine how means of a data set are different. It is the initial step in identifying the factors that influence a given data set. After performing the ANOVA test, it becomes possible for any data analyst to perform other analysis on systematic factors that statistically contribute the variability of the data set. The results of ANOVA test analysis can further be used in F-test analysis to test the significance of the overall regression formula.
ANOVA test can be divided into three parts depending on the kind of data under analysis: ANOVA: Single factor, ANOVA: Two factor with a replication, ANOVA: Two factor without a replication. ANOVA: Single factor performs an analysis on a data having two or more samples. The analysis provides a hypothesis test on each data from where each sample is drawn. In a data set where there are only two samples, a worksheet function could equally be used, while in a case of more than two samples, the use of a worksheet may not be convenient. The next part is ANOVA: Two-factor with a replication, which majorly used when the data under analysis is classified into two different dimensions. For example, in an experiment aimed at measuring the heights of plants, the plants may be treated with different brands of fertilizer and may also be put under different temperatures. The ANOVA tool can then be used to test whether the plants’ heights and the different brands of fertilizer are drawn from a similar underlying population. The next type is ANOVA: Two-factor without a replication, which is used in a situation where data are, classified under two different dimensions; the same case with the two-factor case with replication. However, for this tool there is an assumption that there is only one observation for every pair.
ANOVA is a particular type of statistical hypothesis testing that is heavily used in analyzing an experimental data. Statistical hypothesis test is used to make decisions using data. A test result is called statistically significant incase it is deemed not likely to have happened by chance. A statistically significant result when the p-value is less than a significance level justifies rejection of a null hypothesis, but only in a case where the null hypothesis prior probability is not low. In the application of ANOVA test, the null hypothesis is described as all groups being random samples of a similar population, implying that all the treatments have a similar effect. Rejecting the null hypothesis indicates that different treatments lead to altered effects. ANOVA involves the synthesis and analysis of various ideas and it is for that matter used for multiple purposes. ANOVA, as an exploratory data analysis, is a system of additive data decomposition with its sum of squares indicating the variance for every component of the decomposition. It can also be used in comparing mean squares and the F-tests and thus allowing testing for a clustered sequence of models in the marketing system. It is relatively robust and computationally elegant against violations of existing assumptions giving it the industrial, strength to carry out statistical analysis in the market environment. Due to its ability to analyze numerous and complex sets of data, ANOVA has for a long time enjoyed the prestige of being the most used statistical tool of analysis in psychological research. It is also the most useful tool in statistical inference data in the marketing environment. Analysis of variance can be studied by several approaches and the most common approach is the use of linear model, which relates the responses to the blocks and treatments. However, the model is normally linear but it can be non-linear across other factor levels. Interpretation of the data is normally easy in cases of a balanced data across other factors but deeper understanding is required in a case for unbalanced data.
The T-test
This is a statistical test used in the comparison of means of two treatments or samples, even if they possess varying numbers of replicates. In simple terms, it compares the actual difference existing between two means of a particular data set. It can be used to tell if two data sets are statistically different from one another, and is often applied in situations where the test statistic follows a normal distribution and if the scaling value term value in the test statistic is known. The t-test considers the t-distribution, degrees of freedom, and t-statistic to determine the probability (p-value) that can be used to tell whether the means of the population differ. The t-test statistic is very popular as millions of t-test analyses are performed on a daily basis in the marketing research industry. The t-test was formulated to test the initially developed hypothesis. For example, it can be put into use to determine whether two batches of wine are equally good. There is a large variety of t-test and the most commonly used varieties today are:
One sample t-test: this is used to test whether the population mean has a pre-determined value or not. For example, a company can specify that all new concepts must achieve a score of 50 before proceeding to the next stage of testing. One sample t-test can be used in this case to tell if any of the new concepts can significantly test below this standard.
Two-sample test: this is the most commonly used and also misused type of t-test. It is used to test for differences in the means of two populations. For example, this test can be used to determine whether or not there are significant differences in the way women and men score the new concept.
Paired t-test: this test can be used in a situation where two measurements come from the same source to determine whether or not there is a difference between the means of the two measures. For example, if it is known how much a particular respondent liked a specific concept (concept A) and how much he liked another concept (concept B), the paired t-test can be used to tell whether or not there exist a significant difference in the preferences.
In a t-test statistic that is used to compare the means of two independent experiments, there is need to observe the following assumptions:
Each of the populations under comparison should have a normal distribution. This can be established by putting the populations through a normally test or graphically assessed using normal quintile plot.
When using the original student t-test, the two populations under comparison should have a similar variance. This can be tested using Levene’s test, brown-Forsythe test, F-test; or tested graphically using a Q-Q plot. Incase the sample sizes of the two groups under comparison are equal, the original student t-test is highly recommended in analysis of unequal variances. There is another test called Welch’s t-test, which not sensitive to equality of variances.
The data that is used to do the testing should be independently be sampled from the populations that are being compared. This is generally not possible to test from the data, but incase the data are independently sampled; the classical t-tests may show misleading results.
Two sample t-tests involve paired samples, independent samples, and overlapping samples. The t-test can also be divided into unpaired two-sample t-test and paired t-tests. Paired tests are a type of blocking and are more powerful than unpaired tests. In a different context, the paired t-test can also be used in reducing the impact of confounding factors in observational studies. There is also the independent t-test, which is normally used in situations where two separate sets of identically and independently distributed samples are obtained from every population being compared. Overlapping sample t-test on the other hand is used in situations where there paired samples but have certain part of data missing in them. This test is commonly used in commercial surveys. For example by polling surveys.
Chi-Square Test
This is a statistical test that is normally used to carry out a comparison between the observed data and the data that a researcher or a data analyst expects to get in relation to a specific hypothesis. In any form of a data distribution, there are generally two kinds of random variables that in turn yield two kinds of data: categorical and numerical. The chi square statistic is used in investigating whether categorical variable distributions differ from each other. Categorical variable basically yields data in form of categories while numerical variable on the other hand yield data yields data in numerical forms. The use of chi square test can be used in several situations in the market for decision making: 1) Are all designs equally preferred? 2) Are all brands equally preferred? 3) Is there any relationship existing between brand preference and income level, 4) Is there any relationship between the size of the washing machine purchased and the family size? 5) Is there any relationship between the type of job chosen and the educational background? These questions can be answered by the use of chi square analysis. The first two questions can be answered by chi-square test for the goodness of fit while questions 3, 4, and 5 can be solved by the use of chi square test for independence. It is important to note that the variables used in the Chi-square analysis are usually nominally scaled. Nominal data are known by two names; attribute data and categorical data.
Application of these Tools
RESULTS AND FINDINGS
Univariate Analysis of Variance
Notes
Output Created Comments Input Data Active Dataset DataSet1
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data File 10609
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
Cases Used Statistics are based on all cases with valid data for all variables in the model.
Between-Subjects Factors
Value Label N
Reason for stop/search 1 Officer Intuition 93
2 Suspect acting suspiciously 55
3 Called to Scene 42
4 Prior Information 24
5 Public Complaint 19
If complaint made, how satisfied was suspect with response 1 Very satisfied 28
2 Satisfied 77
3 Neither satisfied/unsatisfied 65
4 Dissatisfied 27
5 Very dissatisfied 36
How worried was suspect about crime in their area 1 Very worried 28
2 Fairly worried 19
3 Not too worried 34
4 Not at all worried 45
5 Not applicable 107
Tests of Between-Subjects Effects
Dependent Variable: Suspects employment
Source Type III Sum of Squares d.f Mean Square F Sig.
Intercept Hypothesis 5.658 1 5.658 4.008 .047
Error 216.560 153.419 1.412a Weapon Hypothesis 2.492 1 2.492 1.769 .185
Error 211.278 150 1.409b Reason Hypothesis 5.989 4 1.497 1.622 .198
Error 24.633 26.682 .923c Satisfied Hypothesis 1.504 4 .376 .516 .724
Error 21.586 29.644 .728d Worry Hypothesis 6.716 4 1.679 3.632 .182
Error 1.200 2.596 .462e Reason * Satisfied Hypothesis 15.872 16 .992 .903 .574
Error 30.113 27.402 1.099f Reason * Worry Hypothesis 12.337 15 .822 .733 .735
Error 37.044 33.030 1.122g Satisfied * Worry Hypothesis 9.903 16 .619 .554 .895
Error 35.777 32.007 1.118h Reason * Satisfied * Worry Hypothesis 23.569 22 1.071 .761 .769
Error 211.278 150 1.409b Expected Mean Squares
Source Variance Component
Var(Worry) Var(Reason * Worry) Var(Satisfied * Worry) Var(Reason * Satisfied * Worry) Var(Error) Quadratic Term
Intercept .253 .054 .052 .023 1.000 Intercept, Reason, Satisfied, Reason * Satisfied
Weapon .000 .000 .000 .000 1.000 Weapon
Reason .000 4.280 .000 1.633 1.000 Reason, Reason * Satisfied
Satisfied .000 .000 4.226 1.749 1.000 Satisfied, Reason * Satisfied
Worry 20.700 4.473 4.293 1.734 1.000 Reason * Satisfied .000 .000 .000 2.149 1.000 Reason * Satisfied
Reason * Worry .000 5.120 .000 1.992 1.000 Satisfied * Worry .000 .000 4.922 2.018 1.000 Reason * Satisfied * Worry .000 .000 .000 2.340 1.000 Error .000 .000 .000 .000 1.000 a. For each source, the expected mean square equals the sum of the coefficients in the cells times the variance components, plus a quadratic term involving effects in the Quadratic Term cell.
b. Expected Mean Squares are based on the Type III Sums of Squares.
Univariate Analysis of Variance
Notes
Output Created 04-Jan-2013 22:56:21
Comments Input Data C:UsersmoseAppDataLocalTempLynfield Stop Data(1).sav
Active Dataset DataSet1
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data File 10609
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
Cases Used Statistics are based on all cases with valid data for all variables in the model.
Between-Subjects Factors
Value Label N
Reason for stop/search 1 Officer Intuition 93
2 Suspect acting suspiciously 55
3 Called to Scene 42
4 Prior Information 24
5 Public Complaint 19
If complaint made, how satisfied was suspect with response 1 Very satisfied 28
2 Satisfied 77
3 Neither satisfied/unsatisfied 65
4 Dissatisfied 27
5 Very dissatisfied 36
How worried was suspect about crime in their area 1 Very worried 28
2 Fairly worried 19
3 Not too worried 34
4 Not at all worried 45
5 Not applicable 107
Tests of Between-Subjects Effects
Dependent Variable: Suspects employment
Source Type III Sum of Squares df Mean Square F Sig.
Intercept Hypothesis 5.658 1 5.658 4.008 .047
Error 216.560 153.419 1.412a Weapon Hypothesis 2.492 1 2.492 1.769 .185
Error 211.278 150 1.409b Reason Hypothesis 5.989 4 1.497 1.622 .198
Error 24.633 26.682 .923c Satisfied Hypothesis 1.504 4 .376 .516 .724
Error 21.586 29.644 .728d Worry Hypothesis 6.716 4 1.679 3.632 .182
Error 1.200 2.596 .462e Reason * Satisfied Hypothesis 15.872 16 .992 .903 .574
Error 30.113 27.402 1.099f Reason * Worry Hypothesis 12.337 15 .822 .733 .735
Error 37.044 33.030 1.122g Satisfied * Worry Hypothesis 9.903 16 .619 .554 .895
Error 35.777 32.007 1.118h Reason * Satisfied * Worry Hypothesis 23.569 22 1.071 .761 .769
Error 211.278 150 1.409b Expected Mean Squares
Source Variance Component
Var(Worry) Var(Reason * Worry) Var(Satisfied * Worry) Var(Reason * Satisfied * Worry) Var(Error) Quadratic Term
Intercept .253 .054 .052 .023 1.000 Intercept, Reason, Satisfied, Reason * Satisfied
Weapon .000 .000 .000 .000 1.000 Weapon
Reason .000 4.280 .000 1.633 1.000 Reason, Reason * Satisfied
Satisfied .000 .000 4.226 1.749 1.000 Satisfied, Reason * Satisfied
Worry 20.700 4.473 4.293 1.734 1.000 Reason * Satisfied .000 .000 .000 2.149 1.000 Reason * Satisfied
Reason * Worry .000 5.120 .000 1.992 1.000 Satisfied * Worry .000 .000 4.922 2.018 1.000 Reason * Satisfied * Worry .000 .000 .000 2.340 1.000 Error .000 .000 .000 .000 1.000 a. For each source, the expected mean square equals the sum of the coefficients in the cells times the variance components, plus a quadratic term involving effects in the Quadratic Term cell.
b. Expected Mean Squares are based on the Type III Sums of Squares.
REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN (.05) POUT (.10) /NOORIGIN /DEPENDENT Stop Location /METHOD=ENTER SusWork.
Regression
Notes
Input Data
Active Dataset
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data File 10609
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
Cases Used Statistics are based on cases with no missing values for any variable used.
Syntax REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Stop Location
/METHOD=ENTER SusWork.
Resources Processor Time 0:00:00.032
Elapsed Time 0:00:00.047
Memory Required 1820 bytes
Additional Memory Required for Residual Plots 0 bytes
Variables Entered/Removed
Model Variables Entered Variables Removed Method
1 Suspects employment . Enter
a. All requested variables entered.
b. Dependent Variable: Location of stop/search
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .010a .000 .000 1.948
a. Predictors: (Constant), Suspects employment
ANOVA
Model Sum of Squares df Mean Square F Sig.
1 Regression 3.922 1 3.922 1.033 .309a
Residual 40270.242 10607 3.797 Total 40274.164 10608 a. Predictors: (Constant), Suspects employment
b. Dependent Variable: Location of stop/search
Coefficients
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta 1 (Constant) 5.602 .056 99.315 .000
Suspects employment -.016 .016 -.010 -1.016 .309
a. Dependent Variable: Location of stop/search
Correlations
Notes
Output Created 05-Jan-2013 06:51:55
Comments Input Data C:UsersmoseAppDataLocalTempLynfield Stop Data(1).sav
Active Dataset DataSet1
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data File 10609
Missing Value Handling
Definition of Missing User-defined missing values are treated as missing.
Correlations
Location of stop/search Age of suspect Suspects activity prior to stop/search Suspects employment Was suspect carrying a weapon? Age of Officer involved in stop Was any complaint made? Response to Question: Police can be trusted to deal fairly with all sections of community Gender of suspect
Location of stop/search Pearson Correlation 1 .004 -.011 -.010 .020* .070** -.070** .059 -.012
Sig. (2-tailed) .685 .243 .309 .043 .000 .000 .101 .222
N 10609 10414 10609 10609 10496 10609 10609 764 10458
Age of suspect Pearson Correlation .004 1 .000 .018 -.010 .070** -.062** -.016 -.077**
Sig. (2-tailed) .685 .989 .066 .287 .000 .000 .663 .000
N 10414 10414 10414 10414 10305 10414 10414 746 10414
Suspects activity prior to stop/search Pearson Correlation -.011 .000 1 -.003 -.034** .013 -.021* -.029 -.007
Sig. (2-tailed) .243 .989 .731 .001 .169 .029 .431 .482
N 10609 10414 10609 10609 10496 10609 10609 764 10458
Suspects employment Pearson Correlation -.010 .018 -.003 1 -.007 -.026** .030** .043 .000
Sig. (2-tailed) .309 .066 .731 .493 .008 .002 .232 .990
N 10609 10414 10609 10609 10496 10609 10609 764 10458
Was suspect carrying a weapon? Pearson Correlation .020* -.010 -.034** -.007 1 -.034** .033** -.010 .012
Sig. (2-tailed) .043 .287 .001 .493 .000 .001 .794 .229
N 10496 10305 10496 10496 10496 10496 10496 754 10347
Age of Officer involved in stop Pearson Correlation .070** .070** .013 -.026** -.034** 1 -.931** .051 -.029**
Sig. (2-tailed) .000 .000 .169 .008 .000 .000 .157 .003
N 10609 10414 10609 10609 10496 10609 10609 764 10458
Was any complaint made? Pearson Correlation -.070** -.062** -.021* .030** .033** -.931** 1 -.027 .024*
Sig. (2-tailed) .000 .000 .029 .002 .001 .000 .453 .013
N 10609 10414 10609 10609 10496 10609 10609 764 10458
Response to Question: Police can be trusted to deal fairly with all sections of community Pearson Correlation .059 -.016 -.029 .043 -.010 .051 -.027 1 .008
Sig. (2-tailed) .101 .663 .431 .232 .794 .157 .453 .833
N 764 746 764 764 754 764 764 764 753
Gender of suspect Pearson Correlation -.012 -.077** -.007 .000 .012 -.029** .024* .008 1
Sig. (2-tailed) .222 .000 .482 .990 .229 .003 .013 .833 N 10458 10414 10458 10458 10347 10458 10458 753 10458
*. Correlation is significant at the 0.05 level (2-tailed).
DISCUSSION
Crime rate across the world is triggered by many factors that are statistically testable. It is worth noting from the data above that gives a summary analysis of all the variables involved in crime. A number of correlation analyses are highlighted in the above tables. Top in the list is the age of the subjects which is one of the great factors that influences the probability of an individual engaging in a criminal activity. The correlation between age and probability in involvement in crime is fairly highly positive. Using a two tailed test, it was found out that the young people forms the bulk of the culprits due to their vulnerability to break many social laws and basic regulations. The middle aged people are equally forming a high proportion of those involved in criminal activities due to the gained experience on the legal loopholes and over indulgence in adventurous undertakings. The involvement of the old people in criminal activities is very low due to the physical health status deterioration and good understanding of the law. The age factor explains the correlation coefficient of over +0.5 indicated in the tables above.
In the regression line, it would be important to note the inverse relationship between age and possibility of engagement in criminal activities. The effect of gender is a significant variable in the involvement of criminal activities and this evident in the correlation analysis that indicates a highly positive correlation coefficient for male and the otherwise for female. The male tend to engage in crime at a higher rate than female due to courage and incentive of escaping under tight security which backed is up by their biological anatomy. The female tend to engage in classified crimes but aggregately comparatively lower than their male counterparts and this explains the high negative correlation coefficient in the crime rate. The past involvement of an individual in criminal activities is equally vital in contributing to further engagement in this vice. Such group of people have learnt the various ways of the law and have upper hand in attempting to break them while considering other modalities of escaping prosecution. The relationship between crime and the past of the subject is therefore directly proportional. Employment status is an equally significant player in involvement in crime. It is a fact that those who are not employed are highly likely to seek dubious ways of survival and this amount to involvement in crime. When a subject is caught with a weapon, this indicates high probability of planned and past criminal involvement.
Works Cited
Janert, Philipp K. Data Analysis with Open Source Tools. Sebastopol, CA: O’Reilly, 2011. Internet resource.
Janert, Philipp K. Gnuplot in Action: Understanding Data with Graphs. Greenwich, Conn: Manning, 2010. Print.
McKinney, Wes. Python for Data Analysis. Sebastopol, Calif: O’Reilly, 2013. Print.
Nelson, Stephen L. Excel 2007 Data Analysis for Dummies. Hoboken, N.J: John Wiley & Sons, 2013. Internet resource.
Ott, R L, Michael T. Longnecker, and Jackie Miller. An Introduction to Statistical Methods and Data Analysis. Boston: Brooks/Cole/Cengage Learning, 2010. Print.
Rumsey, Deborah J. Statistics for Dummies. Hoboken, N.J: Wiley Pub, 2011. Internet resource.
Schmuller, Joseph. Statistical Analysis with Excel for Dummies. New York: Wiley, 2013. Internet resource.
Warden, Pete. Data Source Handbook. Sebastopol, Calif: O’Reilly Media, 2011. Internet resource.