# Stats

Essay by yyyuki November 2014

Mabel Xia

Question 7:

- Changes in average level from group to group : Wednesdays had the lowest mean absences, and Mondays had the highest absences. - Differences in variability : The data for Monday had the largest spread. - The data for Wednesday and Friday are slightly skewed to the right, while the rest of the data are shown as relatively symmetrical ( Monday , Tuesday and Thursday ) .

We also see : Clusters / Multi - modal : NoOutliers : NoSkewness : SlightSpreads / variability : Large differences : NoTotal sample size : ntot = 75

 Descriptives Absent N Mean Std. Deviation Std. Error 95% Confidence Interval for Mean Minimum Maximum Lower Bound Upper Bound Monday 15 54.73 5.599 1.446 51.63 57.83 45 63 Tuesday 15 51.93 3.494 .902 50.00 53.87 48 58 Wednesday 15 50.13 3.701 .956 48.08 52.18 45 56 Thursday 15 50.47 2.696 .696 48.97 51.96 45 56 Friday 15 51.13 3.907 1.009 48.97 53.30 47 59 Total 75 51.68 4.224 .488 50.71 52.65 45 63

(b) - (c)

 ANOVA Absent Sum of Squares df Mean Square F Sig. Between Groups 203.253 4 50.813 3.184 .018 Within Groups 1117.067 70 15.958 Total 1320.320 74
 Multiple Comparisons Dependent Variable: AbsentTukey HSD (I) Day (J) Day Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound Monday Tuesday 2.800 1.459 .317 -1.28 6.88 Wednesday 4.600* 1.459 .019 .52 8.68 Thursday 4.267* 1.459 .036 .18 8.35 Friday 3.600 1.459 .110 -.48 7.68 Tuesday Monday -2.800 1.459 .317 -6.88 1.28 Wednesday 1.800 1.459 .732 -2.28 5.88 Thursday 1.467 1.459 .852 -2.62 5.55 Friday .800 1.459 .982 -3.28 4.88 Wednesday Monday -4.600* 1.459 .019 -8.68 -.52 Tuesday -1.800 1.459 .732 -5.88 2.28 Thursday -.333 1.459 .999 -4.42 3.75 Friday -1.000 1.459 .959 -5.08 3.08 Thursday Monday -4.267* 1.459 .036 -8.35 -.18 Tuesday -1.467 1.459 .852 -5.55 2.62 Wednesday .333 1.459 .999 -3.75 4.42 Friday -.667 1.459 .991 -4.75 3.42 Friday Monday -3.600 1.459 .110 -7.68 .48 Tuesday -.800 1.459 .982 -4.88 3.28 Wednesday 1.000 1.459 .959 -3.08 5.08 Thursday .667 1.459 .991 -3.42 4.75 *. The mean difference is significant at the 0.05 level.

- The samples of the days of the week of absenteeism are random samples.

- The samples of the days of the week of absenteeism are independent of each other. - The underlying distributions ( populations ) of absenteeism for the days of the week are normally distributed. - The standard deviations of the underlying distributions ( populations ) of absenteeism for the days of the week are the same.

Largest standard deviation 5.599 = = 2.077 ( 3 s.f ) Smallest standard deviation 2.696

- The samples are independent as absenteeism records for 15 randomly selected weeks were collected. - There are concerns with the assumption of independence between the days of the days of the week. For example, an absence on Monday does not cause an absence on Wednesday. One absence does not cause another absence, thus, one absence is not dependent on another. - The normality assumption is uncertain, as the sample size of each group is the same, but the total sample size = 75 , may be too small to assume that the observations are normally distributed. - The equal population standard deviation does not show to be reasonable, as the ratio of the largest to smallest sample standard deviation ( 2.077 ) is more than 2. - Overall, there are doubts about the validity of the F-test, as it seems that only the assumption of random samples has been reasonably met.

(g)

The underlying mean for absenteeism of days of the week are identical Ho : ÃÂµMONDAY = ÃÂµTUESDAY = ÃÂµWEDNESDAY = ÃÂµTHURSDAY = ÃÂµFRIDAY

H1 : Differences exist between some of the underlying mean for absenteeism of the 5 days

The P - value for the F-test is 0.018 . We have strong evidence against the null hypothesis, which means that there is at least one difference for the underlying mean hours of absenteeism for the 5 days. However, this is an observational study, therefore, we cannot use the result of this study alone to claim that there is a difference for the underlying mean absenteeism for the 5 days.

(h)

There is no evidence that the underlying mean hours of absenteeism for Mondays and Fridays is different. ( P - value = 0.110 ) With 95% confidence interval, we estimate that the underlying mean hours of absenteeism for Mondays is somewhere between 0.48 hours lower and 7.68 hours higher than the underlying mean hours absenteeism for Fridays.

There are significant differences between Monday and Wednesday ( P - value = 0.019 ) and Monday and Thursday ( P - value = 0.036 ) .

It is not possible to determine which single day has the lowest underlying mean hours of absenteeism. The statistical results shows that there is no significant difference between Monday and Tuesday ( P - value = 0.317 ) , Monday and Friday ( P - value = 0.110 ) ; while Tuesday, Wednesday and Thursday have significant difference with only one day of the week ( Monday ) , and Friday has no significant difference with any days of the week. Hence, we do not have enough evidence to determine that the underlying mean hours of absenteeism differ.

-31992

-31303

187881

-52645

-69306

-20147

-48479

-1106

36500

18977

-112301