Statistical Tools in Marketing Research anova

Name

Instructor

Course

Date

Statistical Tools in Marketing Research

ANOVA

This is a statistical tool of analysis that is used to separate total variability existing in a set of data into two components: systematic and random factors. The random factors have no statistical influence on the data set under study, while on the other hand systematic factors do. The ANOVA test is used to establish the effect of independent variables on the dependent variables in a regression analysis. It is a guide in telling whether or not an occurrence was most probable due to a random chance of variation. ANOVA test is a statistical technique used to determine the existence of variations among various population means. It is not used to determine how variances are different but to determine how means of a data set are different. It is the initial step in identifying the factors that influence a given data set. After performing the ANOVA test, it becomes possible for any data analyst to perform other analysis on systematic factors that statistically contribute the variability of the data set. The results of ANOVA test analysis can further be used in F-test analysis to test the significance of the overall regression formula.

ANOVA test can be divided into three parts depending on the kind of data under analysis: ANOVA: Single factor, ANOVA: Two factor with a replication, ANOVA: Two factor without a replication. ANOVA: Single factor performs an analysis on a data having two or more samples. The analysis provides a hypothesis test on each data from where each sample is drawn. In a data set where there are only two samples, a worksheet function could equally be used, while in a case of more than two samples, the use of a worksheet may not be convenient. The next part is ANOVA: Two-factor with a replication, which majorly used when the data under analysis is classified into two different dimensions. For example, in an experiment aimed at measuring the heights of plants, the plants may be treated with different brands of fertilizer and may also be put under different temperatures. The ANOVA tool can then be used to test whether the plants’ heights and the different brands of fertilizer are drawn from a similar underlying population. The next type is ANOVA: Two-factor without a replication, which is used in a situation where data are, classified under two different dimensions; the same case with the two-factor case with replication. However, for this tool there is an assumption that there is only one observation for every pair.

ANOVA is a particular type of statistical hypothesis testing that is heavily used in analyzing an experimental data. Statistical hypothesis test is used to make decisions using data. A test result is called statistically significant incase it is deemed not likely to have happened by chance. A statistically significant result when the p-value is less than a significance level justifies rejection of a null hypothesis, but only in a case where the null hypothesis prior probability is not low. In the application of ANOVA test, the null hypothesis is described as all groups being random samples of a similar population, implying that all the treatments have a similar effect. Rejecting the null hypothesis indicates that different treatments lead to altered effects. ANOVA involves the synthesis and analysis of various ideas and it is for that matter used for multiple purposes. ANOVA, as an exploratory data analysis, is a system of additive data decomposition with its sum of squares indicating the variance for every component of the decomposition. It can also be used in comparing mean squares and the F-tests and thus allowing testing for a clustered sequence of models in the marketing system. It is relatively robust and computationally elegant against violations of existing assumptions giving it the industrial, strength to carry out statistical analysis in the market environment. Due to its ability to analyze numerous and complex sets of data, ANOVA has for a long time enjoyed the prestige of being the most used statistical tool of analysis in psychological research. It is also the most useful tool in statistical inference data in the marketing environment. Analysis of variance can be studied by several approaches and the most common approach is the use of linear model, which relates the responses to the blocks and treatments. However, the model is normally linear but it can be non-linear across other factor levels. Interpretation of the data is normally easy in cases of a balanced data across other factors but deeper understanding is required in a case for unbalanced data.

The T-test

This is a statistical test used in the comparison of means of two treatments or samples, even if they possess varying numbers of replicates. In simple terms, it compares the actual difference existing between two means of a particular data set. It can be used to tell if two data sets are statistically different from one another, and is often applied in situations where the test statistic follows a normal distribution and if the scaling value term value in the test statistic is known. The t-test considers the t-distribution, degrees of freedom, and t-statistic to determine the probability (p-value) that can be used to tell whether the means of the population differ. The t-test statistic is very popular as millions of t-test analyses are performed on a daily basis in the marketing research industry. The t-test was formulated to test the initially developed hypothesis. For example, it can be put into use to determine whether two batches of wine are equally good. There is a large variety of t-test and the most commonly used varieties today are:

One sample t-test: this is used to test whether the population mean has a pre-determined value or not. For example, a company can specify that all new concepts must achieve a score of 50 before proceeding to the next stage of testing. One sample t-test can be used in this case to tell if any of the new concepts can significantly test below this standard.

Two-sample test: this is the most commonly used and also misused type of t-test. It is used to test for differences in the means of two populations. For example, this test can be used to determine whether or not there are significant differences in the way women and men score the new concept.

Paired t-test: this test can be used in a situation where two measurements come from the same source to determine whether or not there is a difference between the means of the two measures. For example, if it is known how much a particular respondent liked a specific concept (concept A) and how much he liked another concept (concept B), the paired t-test can be used to tell whether or not there exist a significant difference in the preferences.

In a t-test statistic that is used to compare the means of two independent experiments, there is need to observe the following assumptions:

Each of the populations under comparison should have a normal distribution. This can be established by putting the populations through a normally test or graphically assessed using normal quintile plot.

When using the original student t-test, the two populations under comparison should have a similar variance. This can be tested using Levene’s test, brown-Forsythe test, F-test; or tested graphically using a Q-Q plot. Incase the sample sizes of the two groups under comparison are equal, the original student t-test is highly recommended in analysis of unequal variances. There is another test called Welch’s t-test, which not sensitive to equality of variances.

The data that is used to do the testing should be independently be sampled from the populations that are being compared. This is generally not possible to test from the data, but incase the data are independently sampled; the classical t-tests may show misleading results.

Two sample t-tests involve paired samples, independent samples, and overlapping samples. The t-test can also be divided into unpaired two-sample t-test and paired t-tests. Paired tests are a type of blocking and are more powerful than unpaired tests. In a different context, the paired t-test can also be used in reducing the impact of confounding factors in observational studies. There is also the independent t-test, which is normally used in situations where two separate sets of identically and independently distributed samples are obtained from every population being compared. Overlapping sample t-test on the other hand is used in situations where there paired samples but have certain part of data missing in them. This test is commonly used in commercial surveys. For example by polling surveys.

Chi-Square Test

This is a statistical test that is normally used to carry out a comparison between the observed data and the data that a researcher or a data analyst expects to get in relation to a specific hypothesis. In any form of a data distribution, there are generally two kinds of random variables that in turn yield two kinds of data: categorical and numerical. The chi square statistic is used in investigating whether categorical variable distributions differ from each other. Categorical variable basically yields data in form of categories while numerical variable on the other hand yield data yields data in numerical forms. The use of chi square test can be used in several situations in the market for decision making: 1) Are all designs equally preferred? 2) Are all brands equally preferred? 3) Is there any relationship existing between brand preference and income level, 4) Is there any relationship between the size of the washing machine purchased and the family size? 5) Is there any relationship between the type of job chosen and the educational background? These questions can be answered by the use of chi square analysis. The first two questions can be answered by chi-square test for the goodness of fit while questions 3, 4, and 5 can be solved by the use of chi square test for independence. It is important to note that the variables used in the Chi-square analysis are usually nominally scaled. Nominal data are known by two names; attribute data and categorical data.

Application of these Tools

RESULTS AND FINDINGS

Univariate Analysis of Variance

Notes

Output Created Comments Input Data Active Dataset DataSet1

Filter <none>

Weight <none>

Split File <none>

N of Rows in Working Data File 10609

Missing Value Handling Definition of Missing User-defined missing values are treated as missing.

Cases Used Statistics are based on all cases with valid data for all variables in the model.

Between-Subjects Factors

Value Label N

Reason for stop/search 1 Officer Intuition 93

2 Suspect acting suspiciously 55

3 Called to Scene 42

4 Prior Information 24

5 Public Complaint 19

If complaint made, how satisfied was suspect with response 1 Very satisfied 28

2 Satisfied 77

3 Neither satisfied/unsatisfied 65

4 Dissatisfied 27

5 Very dissatisfied 36

How worried was suspect about crime in their area 1 Very worried 28

2 Fairly worried 19

3 Not too worried 34

4 Not at all worried 45

5 Not applicable 107

Tests of Between-Subjects Effects

Dependent Variable: Suspects employment

Source Type III Sum of Squares d.f Mean Square F Sig.

Intercept Hypothesis 5.658 1 5.658 4.008 .047

Error 216.560 153.419 1.412a Weapon Hypothesis 2.492 1 2.492 1.769 .185

Error 211.278 150 1.409b Reason Hypothesis 5.989 4 1.497 1.622 .198

Error 24.633 26.682 .923c Satisfied Hypothesis 1.504 4 .376 .516 .724

Error 21.586 29.644 .728d Worry Hypothesis 6.716 4 1.679 3.632 .182

Error 1.200 2.596 .462e Reason * Satisfied Hypothesis 15.872 16 .992 .903 .574

Error 30.113 27.402 1.099f Reason * Worry Hypothesis 12.337 15 .822 .733 .735

Error 37.044 33.030 1.122g Satisfied * Worry Hypothesis 9.903 16 .619 .554 .895

Error 35.777 32.007 1.118h Reason * Satisfied * Worry Hypothesis 23.569 22 1.071 .761 .769

Error 211.278 150 1.409b Expected Mean Squares

Source Variance Component

Var(Worry) Var(Reason * Worry) Var(Satisfied * Worry) Var(Reason * Satisfied * Worry) Var(Error) Quadratic Term

Intercept .253 .054 .052 .023 1.000 Intercept, Reason, Satisfied, Reason * Satisfied

Weapon .000 .000 .000 .000 1.000 Weapon

Reason .000 4.280 .000 1.633 1.000 Reason, Reason * Satisfied

Satisfied .000 .000 4.226 1.749 1.000 Satisfied, Reason * Satisfied

Worry 20.700 4.473 4.293 1.734 1.000 Reason * Satisfied .000 .000 .000 2.149 1.000 Reason * Satisfied

Reason * Worry .000 5.120 .000 1.992 1.000 Satisfied * Worry .000 .000 4.922 2.018 1.000 Reason * Satisfied * Worry .000 .000 .000 2.340 1.000 Error .000 .000 .000 .000 1.000 a. For each source, the expected mean square equals the sum of the coefficients in the cells times the variance components, plus a quadratic term involving effects in the Quadratic Term cell.

b. Expected Mean Squares are based on the Type III Sums of Squares.