ANOVA
# Set working directory
library(car)
library(emmeans)
library(rstatix)
library(tidyverse)
options('max.print' = 100000)
Introducción ANOVA
Uno de los procesos más usados para comparar diferentes “tratamientos” sobre un grupo experimental es sin duda el Análisis de Varianza (ANOVA). Desarrollado originalmente por R.A. Fisher y ha tenido una gran influencia en la estadística. El análisis de varianza se usa principalmente para dos propósitos: 1) para estimar y desarrollar prueba de hipótesis de varianzas de una población. 2) Estimar y probar medias poblacionales. En esta introducción resolveremos ejemplos para la estimación de medias poblacionales.
ANOVA Modelos de una via (One-Way ANOVA)
Carne proveniente de caza, entre otras aquellas del venado de cola-blanca, y ardilla gris, son usadas como alimento or familias, cazadores y otros individuos por razones culturales, personales, o de salud. En un estudio de Holben et al. estimó el contenido de selenio de carne de venado cola-blanca libre (venison), y ardilla gris (squirrel) obtenidos de una región de baja concentración de selenio en EEUU. Estos contenidos de selenio fueron también comparados con concentraciones de carne de ganado producido en la misma región (RRB) y en una región externa (NRRB). Nos interesa saber si la concentración de selenio \((\mu g/100g)\) es distinta en los grupos de carne. (EXA_C08_S02_01.csv).
La hipótesis nula de este modelo es que todos los promedios poblacionales (cada promedio de la población de cada caso) son iguales entre ellos \(\mu_1 = \mu_2 = \mu_3 \dots = \mu_n\), y la hipótesis alternativa es que “por lo menos uno de ellos es distinto”.
getOption('max.print')
[1] 100000
# Exa_C08_S02_01 One-way
#
2.1 <- read_csv(file="~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/EXA_C08_S02_01mod.csv", show_col_types = FALSE)
Exa8.boxplot(Selenium ~ Group, data = Exa8.2.1)
ggplot(data = Exa8.2.1, aes(x = Group, y = Selenium)) + geom_boxplot()
2.1_lm <- lm(Selenium ~ Group, data = Exa8.2.1)
Exa8.2.1_lm <- anova(Exa8.2.1_lm)) (anova_Exa8.
Analysis of Variance Table
Response: Selenium
Df Sum Sq Mean Sq F value Pr(>F)
Group 3 18935 6311.7 22.614 5.345e-12 ***
Residuals 140 39074 279.1
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
2.1_aov <- aov(Selenium ~ Group, data = Exa8.2.1)
Exa8.#
summary(Exa8.2.1_aov)
Df Sum Sq Mean Sq F value Pr(>F)
Group 3 18935 6312 22.61 5.34e-12 ***
Residuals 140 39074 279
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model.tables(Exa8.2.1_aov, type="means")
Tables of means
Grand mean
35.44736
Group
NRB RRB SQU VEN
62.05 28.52 37.42 25.88
rep 19.00 30.00 53.00 42.00
model.tables(Exa8.2.1_aov, type="effects")
Tables of effects
Group
NRB RRB SQU VEN
26.6 -6.925 1.97 -9.572
rep 19.0 30.000 53.00 42.000
#
TukeyHSD(Exa8.2.1_aov)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Selenium ~ Group, data = Exa8.2.1)
$Group
diff lwr upr p adj
RRB-NRB -33.523982 -46.260203 -20.787762 0.0000000
SQU-NRB -24.629335 -36.244663 -13.014007 0.0000010
VEN-NRB -36.170840 -48.180851 -24.160828 0.0000000
SQU-RRB 8.894648 -1.030122 18.819418 0.0961032
VEN-RRB -2.646857 -13.030768 7.737053 0.9108599
VEN-SQU -11.541505 -20.515363 -2.567647 0.0057777
plot(TukeyHSD(Exa8.2.1_aov))
Más ejemplos
- Pacientes de enfermedades reumáticas u osteoporosis generalmente sufren de pérdidas críticas de densidad mineral osea (Bone Mineral Density, BMD). Un medicamento usado para recuperar o prevenir una pérdida mayor de BMD, es el Alendronato. Holcomb y Rothenberg examinaron a 96 mujeres tomando alendronato para determinar si había alguna diferencia en el promedio de cambio en densidad ósea entre cinco clasificaciones diagnostica primarias. El Grupo1 era de pacientes diagnosticados con artritis reumatoide (RA). Grupo2 era de pacientes con una variedad de diagnósticos incluyendo, Lupus, granulomatosis de Wegener y poliarteritis, y otros desordenes vasculares (LUPUS). Grupo3 consistió en pacientes diagnosticados con polimialgia reumática o artritis temporal (PMRTA). El Grupo4 estaba integrado por pacientes con artrosis (OA). Y el Grupo 5 de pacientes con diagnóstico de osteoporosis (O) sin otros desordenes reumáticos. ¿Puede determinar que los promedios de los grupos presentan alguna diferencia? Haga una gráfica de cajas de los grupos.
# Exer. 8.2.2 Bone Mineral Density, BMD
# Patients suffering from rheumatic diseases or osteoporosis often suffer critical loss in bone
# mineral density (BMD).
#
2.2 <- read_csv(file="~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/EXR_C08_S02_02.csv", show_col_types = FALSE)
EXR8.
boxplot(BMD ~ factor(GROUP), data = EXR8.2.2)
<- aov(BMD ~ factor(GROUP), data=EXR8.2.2)
Exr2_2.aov anova(Exr2_2.aov)
Analysis of Variance Table
Response: BMD
Df Sum Sq Mean Sq F value Pr(>F)
factor(GROUP) 4 355.5 88.864 2.2772 0.06697 .
Residuals 91 3551.1 39.024
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# anova() to print the ANOVA table
2.2_M <- EXR8.2.2 %>%
EXR8.mutate(GROUP = case_when(GROUP == 1 ~ "RA",
== 2 ~ "LUPUS",
GROUP == 3 ~ "PMRTA",
GROUP == 4 ~ "OA",
GROUP == 5 ~ "O"))
GROUP
2.2_M <- EXR8.2.2_M %>%
EXR8.mutate(GROUP = GROUP %>% fct_relevel("O", "RA", "LUPUS", "PMRTA", "OA"))
2.2_M %>% ggplot( aes(x = GROUP, y = BMD) ) +
EXR8.geom_boxplot() +
geom_jitter(width = 0.1, alpha = 0.6, aes(color = GROUP))
#
<- lm(BMD ~ factor(GROUP), data=EXR8.2.2_M)
Exr2_2_lm S(Exr2_2_lm)
Call: lm(formula = BMD ~ factor(GROUP), data = EXR8.2.2_M)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.672 1.975 4.896 4.22e-06 ***
factor(GROUP)RA -5.202 2.226 -2.336 0.02166 *
factor(GROUP)LUPUS -5.094 2.870 -1.775 0.07929 .
factor(GROUP)PMRTA -7.491 2.518 -2.975 0.00375 **
factor(GROUP)OA -4.467 2.351 -1.900 0.06061 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard deviation: 6.247 on 91 degrees of freedom
Multiple R-squared: 0.09099
F-statistic: 2.277 on 4 and 91 DF, p-value: 0.06697
AIC BIC
631.06 646.45
emmeans(Exr2_2_lm, pairwise ~ GROUP)
$emmeans
GROUP emmean SE df lower.CL upper.CL
O 9.67 1.98 91 5.748 13.60
RA 4.47 1.03 91 2.430 6.51
LUPUS 4.58 2.08 91 0.442 8.71
PMRTA 2.18 1.56 91 -0.921 5.28
OA 5.20 1.28 91 2.672 7.74
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
O - RA 5.202 2.23 91 2.336 0.1430
O - LUPUS 5.094 2.87 91 1.775 0.3946
O - PMRTA 7.491 2.52 91 2.975 0.0301
O - OA 4.467 2.35 91 1.900 0.3248
RA - LUPUS -0.108 2.32 91 -0.047 1.0000
RA - PMRTA 2.289 1.87 91 1.225 0.7371
RA - OA -0.735 1.64 91 -0.449 0.9915
LUPUS - PMRTA 2.397 2.60 91 0.921 0.8881
LUPUS - OA -0.627 2.44 91 -0.257 0.9990
PMRTA - OA -3.024 2.02 91 -1.500 0.5653
P value adjustment: tukey method for comparing a family of 5 estimates
emmeans(Exr2_2_lm, trt.vs.ctrl ~ GROUP)
$emmeans
GROUP emmean SE df lower.CL upper.CL
O 9.67 1.98 91 5.748 13.60
RA 4.47 1.03 91 2.430 6.51
LUPUS 4.58 2.08 91 0.442 8.71
PMRTA 2.18 1.56 91 -0.921 5.28
OA 5.20 1.28 91 2.672 7.74
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
RA - O -5.20 2.23 91 -2.336 0.0740
LUPUS - O -5.09 2.87 91 -1.775 0.2387
PMRTA - O -7.49 2.52 91 -2.975 0.0138
OA - O -4.47 2.35 91 -1.900 0.1889
P value adjustment: dunnettx method for 4 tests
- Ilich-Ernst y colaboradores estudiaron el consumo de calcio en la dieta de un grupos de 113 mujeres adultas sanas de edades entre 20 y 88 años. Los investigadores segregaron a los sujetos de estudio por grupos de edad de la siguiente manera: Grupo A 20.0 – 45.9 años; grupo B 46.00 – 55. 9 años; grupo C56.0 – 65.9 años; y grupo D, de más de 66 años. El consumo de calsio estuvo medido en mg/día. Los datos están listados en el archivo EXR_C08_S02_03.csv. Siguiendo un ANOVA, ¿se puede concluir que hay uns diferencia en los promedios de las poblaciones? Haga un HSD Tukey para estimar las diferencias entre las distintas poblaciones, sea Alpha = .05. Grafique los datos usando una gráfica de cajas y explique los resultados.
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Exr 8.2.3
2.3 <- read_csv(file="~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/EXR_C08_S02_03.csv", show_col_types = FALSE)
Exr8.
plot(calcium ~ factor(Group), data = Exr8.2.3)
<- aov(calcium ~ Group, data=Exr8.2.3)
Exr3.aov anova(Exr3.aov)
Analysis of Variance Table
Response: calcium
Df Sum Sq Mean Sq F value Pr(>F)
Group 3 5931208 1977069 9.3588 1.476e-05 ***
Residuals 109 23026500 211252
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(Exr3.aov)
TukeyHSD(Exr3.aov)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = calcium ~ Group, data = Exr8.2.3)
$Group
diff lwr upr p adj
B-A -455.72078 -865.7066 -45.73492 0.0230454
C-A -574.53605 -913.5892 -235.48294 0.0001356
D-A -596.63447 -905.3867 -287.88228 0.0000109
C-B -118.81527 -509.0844 271.45386 0.8568796
D-B -140.91369 -505.1676 223.34020 0.7443051
D-C -22.09842 -304.1436 259.94679 0.9969598
ANOVA de dos vías
Para el caso de un ANOVA de dos vías, tenemos que la función repuesta responde a dos factores que a su vez tiene dos o más niveles por factor: \(Y \sim factor_1 + factor_2\)
Del ejemplo 8.3.1 de Daniel para un ANOVA de dos vías: Un fisioterapeuta quería comparar tres métodos para enseñar a sus pacientes a usar una prótesis. Pensó que la proporción de aprendizage sería diferente dependiendo de la edad de los parcientes y desarrolló un experimento que le permitió tomar en cuenta la edad. Se construye la tábla de los datos del problema.
# Build a table, index by index since
# Two-Way ANOVA EXAMPLE 8.3.1 has NO-data file, therefore we make the example table.
#
<- tibble(
Learn Age = factor(rep(c(1,2,3,4,5), 3)),
Method = factor(rep(1:3,c(5,5,5))),
Rate = c(7,8,9,10,11,9,9,9,9,12,10,10,12,12,14)
)
Learn
# A tibble: 15 × 3
Age Method Rate
<fct> <fct> <dbl>
1 1 1 7
2 2 1 8
3 3 1 9
4 4 1 10
5 5 1 11
6 1 2 9
7 2 2 9
8 3 2 9
9 4 2 9
10 5 2 12
11 1 3 10
12 2 3 10
13 3 3 12
14 4 3 12
15 5 3 14
#
plot(Rate ~ Age + Method, data = Learn)
Para las gráficas de cajas ejecutadas por un ggplot (o boxplot) es necesario mandar por separado cada grupo en la variable predictora (o independiente), o separar dos páneles, uno por cada grupo.
# For the Age group
ggplot(data = Learn, aes(x = Age, y = Rate) ) +
geom_boxplot() +
geom_jitter(width = 0.1, alpha=0.6, aes(color = Age))
# For the Rate group
ggplot(data = Learn, aes(x = Method, y = Rate) ) +
geom_boxplot() +
geom_jitter(width = 0.1, alpha=0.6, aes(color = Method))
# now the ANOVA model
= aov(Rate ~ Method + Age, data = Learn)
Rate_aov anova(Rate_aov)
Analysis of Variance Table
Response: Rate
Df Sum Sq Mean Sq F value Pr(>F)
Method 2 18.5333 9.2667 21.385 0.0006165 ***
Age 4 24.9333 6.2333 14.385 0.0010017 **
Residuals 8 3.4667 0.4333
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
layout(matrix(c(1,2), nrow=1, ncol=2, byrow=TRUE))
TukeyHSD(Rate_aov)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Rate ~ Method + Age, data = Learn)
$Method
diff lwr upr p adj
2-1 0.6 -0.5896489 1.789649 0.3666717
3-1 2.6 1.4103511 3.789649 0.0006358
3-2 2.0 0.8103511 3.189649 0.0034083
$Age
diff lwr upr p adj
2-1 0.3333333 -1.5235390 2.190206 0.9676094
3-1 1.3333333 -0.5235390 3.190206 0.1877558
4-1 1.6666667 -0.1902056 3.523539 0.0810838
5-1 3.6666667 1.8097944 5.523539 0.0009146
3-2 1.0000000 -0.8568723 2.856872 0.4057524
4-2 1.3333333 -0.5235390 3.190206 0.1877558
5-2 3.3333333 1.4764610 5.190206 0.0017351
4-3 0.3333333 -1.5235390 2.190206 0.9676094
5-3 2.3333333 0.4764610 4.190206 0.0154324
5-4 2.0000000 0.1431277 3.856872 0.0348816
plot(TukeyHSD(Rate_aov))
abline(v=0, col=2)
Ejemplos two-way ANOVA
En un estudio de efectos pulmonares en conejillos de Indias, Lacroix et al. expuso a 18 conejillos de Indias sensibilizados con albúmina y 18 no-sensibilizados con aire regular, benzaldehido y acetildehido. Al final de la exposición los conejillos de Indias fueron anesteciados y la respuesta a alergias medida por lavado broncoalveolar (BAL). La tabla de resultados se muestra en REV_C08_16 y muestra la cuenta de células alveolares \((\times 10^6)\) por grupo de tratamiento (aire, benzaldehido y acetildehido) y sensibilizados a albúmina. Pruebe por las diferencias en a) entre sensibilizados y no sensibilizados por albúmina, b) entre los tres tratamientos. Sea \(\alpha=.05\) en todas las pruebas.
.16 <- read_csv("~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/REV_C08_16.csv", show_col_types = FALSE)
Rev8boxplot(Count ~ factor(Treat), data = Rev8.16)
boxplot(Count ~ factor(Sens), data = Rev8.16)
boxplot(Count ~ factor(Sens) + factor(Treat), data = Rev8.16)
#
.16_M <- Rev8.16 %>%
Rev8mutate( Treat = case_when( Treat == "Air" ~ "1Air",
== "Act" ~ "2Act",
Treat == "Benz" ~ "3Benz"))
Treat
.16_aov <- aov(Count ~ factor(Sens) + factor(Treat), data = Rev8.16)
Rev8anova(Rev8.16_aov)
Analysis of Variance Table
Response: Count
Df Sum Sq Mean Sq F value Pr(>F)
factor(Sens) 1 7906.2 7906.2 20.718 7.276e-05 ***
factor(Treat) 2 7688.6 3844.3 10.074 0.0004042 ***
Residuals 32 12211.5 381.6
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# With car Anova
.16_lm <- lm(Count ~ factor(Sens) + factor(Treat), data = Rev8.16_M)
Rev8Anova(Rev8.16_lm)
Anova Table (Type II tests)
Response: Count
Sum Sq Df F value Pr(>F)
factor(Sens) 7906.2 1 20.718 7.276e-05 ***
factor(Treat) 7688.6 2 10.074 0.0004042 ***
Residuals 12211.5 32
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
anova(Rev8.16_lm)
Analysis of Variance Table
Response: Count
Df Sum Sq Mean Sq F value Pr(>F)
factor(Sens) 1 7906.2 7906.2 20.718 7.276e-05 ***
factor(Treat) 2 7688.6 3844.3 10.074 0.0004042 ***
Residuals 32 12211.5 381.6
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans(Rev8.16_lm, pairwise ~ Sens + Treat)
$emmeans
Sens Treat emmean SE df lower.CL upper.CL
No 1Air 31.4 6.51 32 18.17 44.7
Yes 1Air 61.1 6.51 32 47.81 74.3
No 2Act 48.0 6.51 32 34.69 61.2
Yes 2Act 77.6 6.51 32 64.33 90.9
No 3Benz 12.2 6.51 32 -1.07 25.5
Yes 3Benz 41.8 6.51 32 28.56 55.1
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
No 1Air - Yes 1Air -29.64 6.51 32 -4.552 0.0009
No 1Air - No 2Act -16.52 7.98 32 -2.071 0.3273
No 1Air - Yes 2Act -46.16 10.30 32 -4.483 0.0012
No 1Air - No 3Benz 19.25 7.98 32 2.413 0.1820
No 1Air - Yes 3Benz -10.39 10.30 32 -1.009 0.9114
Yes 1Air - No 2Act 13.12 10.30 32 1.275 0.7963
Yes 1Air - Yes 2Act -16.52 7.98 32 -2.071 0.3273
Yes 1Air - No 3Benz 48.88 10.30 32 4.748 0.0005
Yes 1Air - Yes 3Benz 19.25 7.98 32 2.413 0.1820
No 2Act - Yes 2Act -29.64 6.51 32 -4.552 0.0009
No 2Act - No 3Benz 35.76 7.98 32 4.484 0.0011
No 2Act - Yes 3Benz 6.12 10.30 32 0.595 0.9907
Yes 2Act - No 3Benz 65.40 10.30 32 6.352 <.0001
Yes 2Act - Yes 3Benz 35.76 7.98 32 4.484 0.0011
No 3Benz - Yes 3Benz -29.64 6.51 32 -4.552 0.0009
P value adjustment: tukey method for comparing a family of 6 estimates
emmeans(Rev8.16_lm, trt.vs.ctrl ~ Sens + Treat)
$emmeans
Sens Treat emmean SE df lower.CL upper.CL
No 1Air 31.4 6.51 32 18.17 44.7
Yes 1Air 61.1 6.51 32 47.81 74.3
No 2Act 48.0 6.51 32 34.69 61.2
Yes 2Act 77.6 6.51 32 64.33 90.9
No 3Benz 12.2 6.51 32 -1.07 25.5
Yes 3Benz 41.8 6.51 32 28.56 55.1
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Yes 1Air - No 1Air 29.6 6.51 32 4.552 0.0003
No 2Act - No 1Air 16.5 7.98 32 2.071 0.1723
Yes 2Act - No 1Air 46.2 10.30 32 4.483 0.0004
No 3Benz - No 1Air -19.2 7.98 32 -2.413 0.0868
Yes 3Benz - No 1Air 10.4 10.30 32 1.009 0.7331
P value adjustment: dunnettx method for 5 tests
#
.16_lm_int <- aov(Count ~ factor(Sens) * factor(Treat), data = Rev8.16_M)
Rev8# model with interaction
anova(Rev8.16_lm_int)
Analysis of Variance Table
Response: Count
Df Sum Sq Mean Sq F value Pr(>F)
factor(Sens) 1 7906.2 7906.2 23.0210 4.119e-05 ***
factor(Treat) 2 7688.6 3844.3 11.1938 0.0002336 ***
factor(Sens):factor(Treat) 2 1908.5 954.3 2.7786 0.0781461 .
Residuals 30 10303.0 343.4
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
.16_lm_int_emm_p <- emmeans(Rev8.16_lm_int, pairwise ~ Sens + Treat)
Rev8.16_lm_int_emm_p Rev8
$emmeans
Sens Treat emmean SE df lower.CL upper.CL
No 1Air 24.3 7.57 30 8.87 39.8
Yes 1Air 68.2 7.57 30 52.74 83.6
No 2Act 45.1 7.57 30 29.62 60.5
Yes 2Act 80.5 7.57 30 65.02 95.9
No 3Benz 22.2 7.57 30 6.74 37.6
Yes 3Benz 31.8 7.57 30 16.37 47.3
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
No 1Air - Yes 1Air -43.88 10.7 30 -4.101 0.0036
No 1Air - No 2Act -20.75 10.7 30 -1.939 0.3993
No 1Air - Yes 2Act -56.16 10.7 30 -5.249 0.0002
No 1Air - No 3Benz 2.12 10.7 30 0.199 1.0000
No 1Air - Yes 3Benz -7.51 10.7 30 -0.702 0.9803
Yes 1Air - No 2Act 23.12 10.7 30 2.161 0.2850
Yes 1Air - Yes 2Act -12.28 10.7 30 -1.148 0.8572
Yes 1Air - No 3Benz 46.00 10.7 30 4.299 0.0021
Yes 1Air - Yes 3Benz 36.37 10.7 30 3.399 0.0215
No 2Act - Yes 2Act -35.41 10.7 30 -3.309 0.0268
No 2Act - No 3Benz 22.88 10.7 30 2.138 0.2959
No 2Act - Yes 3Benz 13.24 10.7 30 1.238 0.8150
Yes 2Act - No 3Benz 58.28 10.7 30 5.447 0.0001
Yes 2Act - Yes 3Benz 48.65 10.7 30 4.547 0.0011
No 3Benz - Yes 3Benz -9.63 10.7 30 -0.900 0.9434
P value adjustment: tukey method for comparing a family of 6 estimates
.16_lm_int_emm_c <- emmeans(Rev8.16_lm_int, trt.vs.ctrl ~ Sens + Treat)
Rev8.16_lm_int_emm_c Rev8
$emmeans
Sens Treat emmean SE df lower.CL upper.CL
No 1Air 24.3 7.57 30 8.87 39.8
Yes 1Air 68.2 7.57 30 52.74 83.6
No 2Act 45.1 7.57 30 29.62 60.5
Yes 2Act 80.5 7.57 30 65.02 95.9
No 3Benz 22.2 7.57 30 6.74 37.6
Yes 3Benz 31.8 7.57 30 16.37 47.3
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Yes 1Air - No 1Air 43.88 10.7 30 4.101 0.0013
No 2Act - No 1Air 20.75 10.7 30 1.939 0.2205
Yes 2Act - No 1Air 56.16 10.7 30 5.249 0.0001
No 3Benz - No 1Air -2.12 10.7 30 -0.199 0.9958
Yes 3Benz - No 1Air 7.51 10.7 30 0.702 0.8862
P value adjustment: dunnettx method for 5 tests
Ejemplo El interés del estudio de Hartman-Maeir et al. fue estimar los perfiles del déficit de concienciación entre pacientes con infarto cerebral que están en rehabilitación. Estudió 35 pacientes con lesiones por infarto en el hemisferio derecho y 19 pacientes con lesiones en el hemisferio izquierdo. Además agrupó las lesiones por su tamaño como: 2 = 1-3 cm, 3 = 3-5cm, y 4 = 5 cm o más grandes. Una de las medidas importantes fue la calificación de la concienciación por su propia limitación. Las calificaciones tuvieron un rango de 8 a 24, con una calificación más alta significando mayor concienciación (REV_C08_22.csv). Pruebe la diferencia por tamaño de lesión y lado de hemisferio, sea \(\alpha = .05\).
8.22 <- read_csv("~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/REV_C08_22.csv", show_col_types = FALSE)
Rev.
boxplot(SCORES ~ factor(SIDE) + factor(SIZE), data = Rev.8.22)
boxplot(SCORES ~ factor(SIZE) + factor(SIDE), data = Rev.8.22)
#
8.22_lm <- aov(SCORES ~ factor(SIZE) + factor(SIDE), data = Rev.8.22)
Rev.anova(Rev.8.22_lm)
Analysis of Variance Table
Response: SCORES
Df Sum Sq Mean Sq F value Pr(>F)
factor(SIZE) 2 28.690 14.3451 3.5094 0.03748 *
factor(SIDE) 1 2.578 2.5782 0.6307 0.43084
Residuals 50 204.380 4.0876
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# contrasts
emmeans(Rev.8.22_lm, pairwise ~ SIDE )
$emmeans
SIDE emmean SE df lower.CL upper.CL
L 11.1 0.470 50 10.11 12.0
R 10.6 0.345 50 9.91 11.3
Results are averaged over the levels of: SIZE
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
L - R 0.459 0.578 50 0.794 0.4308
Results are averaged over the levels of: SIZE
8.22_lm_int <- lm(SCORES ~ factor(SIZE) * factor(SIDE), data = Rev.8.22)
Rev.Anova(Rev.8.22_lm_int)
Anova Table (Type II tests)
Response: SCORES
Sum Sq Df F value Pr(>F)
factor(SIZE) 29.981 2 3.7121 0.0317 *
factor(SIDE) 2.578 1 0.6384 0.4282
factor(SIZE):factor(SIDE) 10.540 2 1.3050 0.2806
Residuals 193.840 48
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
emmeans(Rev.8.22_lm_int, trt.vs.ctrl ~ SIZE)
$emmeans
SIZE emmean SE df lower.CL upper.CL
2 10.0 0.429 48 9.16 10.9
3 10.5 0.560 48 9.36 11.6
4 12.1 0.535 48 11.01 13.2
Results are averaged over the levels of: SIDE
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
SIZE3 - SIZE2 0.469 0.706 48 0.664 0.7243
SIZE4 - SIZE2 2.063 0.686 48 3.009 0.0081
Results are averaged over the levels of: SIDE
P value adjustment: dunnettx method for 2 tests
# emmip(Rev.8.22_lm_int, SIDE ~ SIZE)
# With rstatix
8.22_aov <- anova_test(data = Rev.8.22, SCORES ~ factor(SIZE) * factor(SIDE))
Rev.get_anova_table(Rev.8.22_aov)
ANOVA Table (type II tests)
Effect DFn DFd F p p<.05 ges
1 factor(SIZE) 2 48 3.712 0.032 * 0.134
2 factor(SIDE) 1 48 0.638 0.428 0.013
3 factor(SIZE):factor(SIDE) 2 48 1.305 0.281 0.052
# Post hoc
emmeans_test(Rev.8.22, SCORES ~ SIZE, p.adjust.method = "fdr")
# A tibble: 3 × 9
term .y. group1 group2 df statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 SIZE SCORES 2 3 51 -0.837 0.407 0.407 ns
2 SIZE SCORES 2 4 51 -2.65 0.0108 0.0323 *
3 SIZE SCORES 3 4 51 -1.56 0.125 0.187 ns
Ejemplo 18 The effects of thermal pollution on Corbicula fluminea (Asiatic clams) at three different geographical locations were analyzed by John Brooker (REV_C08_18). Sample data on clam shell length, width, and height are displayed in the following table. Determine if there is a significant difference in mean length, height, or width (measured in mm) of the clam shell at the three different locations by performing three analyses. What inferences can be made from your results? What are the assumptions underlying your inferences? What are the target populations?
8.18 <- read_csv("~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/REV_C08_18_mod.csv", show_col_types = FALSE)
Rev.boxplot(Measure ~ factor(site) + factor(Geometry), data = Rev.8.18)
#
8.18_lm <- lm(Measure ~ factor(site) + factor(Geometry), data = Rev.8.18)
Rev.Anova(Rev.8.18_lm)
Anova Table (Type II tests)
Response: Measure
Sum Sq Df F value Pr(>F)
factor(site) 1.869 2 5.0704 0.007045 **
factor(Geometry) 286.132 2 776.3727 < 2.2e-16 ***
Residuals 39.988 217
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Pairwise post hoc
options('max.print' = 10000)
emmeans(Rev.8.18_lm, pairwise ~ site + Geometry)
$emmeans
site Geometry emmean SE df lower.CL upper.CL
1 height 4.30 0.0642 217 4.17 4.43
2 height 4.12 0.0658 217 3.99 4.25
3 height 4.09 0.0634 217 3.97 4.22
1 length 7.07 0.0642 217 6.95 7.20
2 length 6.90 0.0658 217 6.77 7.03
3 length 6.87 0.0634 217 6.74 6.99
1 width 5.85 0.0642 217 5.73 5.98
2 width 5.68 0.0658 217 5.55 5.81
3 width 5.65 0.0634 217 5.52 5.77
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
site1 height - site2 height 0.1768 0.0716 217 2.468 0.2534
site1 height - site3 height 0.2061 0.0694 217 2.969 0.0788
site1 height - site1 length -2.7742 0.0706 217 -39.310 <.0001
site1 height - site2 length -2.5974 0.1005 217 -25.835 <.0001
site1 height - site3 length -2.5681 0.0990 217 -25.942 <.0001
site1 height - site1 width -1.5541 0.0706 217 -22.021 <.0001
site1 height - site2 width -1.3773 0.1005 217 -13.699 <.0001
site1 height - site3 width -1.3479 0.0990 217 -13.616 <.0001
site2 height - site3 height 0.0294 0.0709 217 0.414 1.0000
site2 height - site1 length -2.9509 0.1005 217 -29.351 <.0001
site2 height - site2 length -2.7742 0.0706 217 -39.310 <.0001
site2 height - site3 length -2.7448 0.1001 217 -27.430 <.0001
site2 height - site1 width -1.7308 0.1005 217 -17.215 <.0001
site2 height - site2 width -1.5541 0.0706 217 -22.021 <.0001
site2 height - site3 width -1.5247 0.1001 217 -15.237 <.0001
site3 height - site1 length -2.9803 0.0990 217 -30.106 <.0001
site3 height - site2 length -2.8035 0.1001 217 -28.016 <.0001
site3 height - site3 length -2.7742 0.0706 217 -39.310 <.0001
site3 height - site1 width -1.7602 0.0990 217 -17.780 <.0001
site3 height - site2 width -1.5834 0.1001 217 -15.823 <.0001
site3 height - site3 width -1.5541 0.0706 217 -22.021 <.0001
site1 length - site2 length 0.1768 0.0716 217 2.468 0.2534
site1 length - site3 length 0.2061 0.0694 217 2.969 0.0788
site1 length - site1 width 1.2201 0.0706 217 17.289 <.0001
site1 length - site2 width 1.3969 0.1005 217 13.894 <.0001
site1 length - site3 width 1.4262 0.0990 217 14.407 <.0001
site2 length - site3 length 0.0294 0.0709 217 0.414 1.0000
site2 length - site1 width 1.0434 0.1005 217 10.378 <.0001
site2 length - site2 width 1.2201 0.0706 217 17.289 <.0001
site2 length - site3 width 1.2495 0.1001 217 12.486 <.0001
site3 length - site1 width 1.0140 0.0990 217 10.243 <.0001
site3 length - site2 width 1.1908 0.1001 217 11.900 <.0001
site3 length - site3 width 1.2201 0.0706 217 17.289 <.0001
site1 width - site2 width 0.1768 0.0716 217 2.468 0.2534
site1 width - site3 width 0.2061 0.0694 217 2.969 0.0788
site2 width - site3 width 0.0294 0.0709 217 0.414 1.0000
P value adjustment: tukey method for comparing a family of 9 estimates
ANOVA Modelos de medidas repetidas
Los modelos de medidas repetidas son aquellos en los que la misma
variable es medida en los mismos sujetos en dos o más ocasiones. En
donde las ocaciones pueden significar condiciones distintas como
diferentes tratamientos o diferentes puntos de tiempo.
Las ventajas de usar este tipo de modelos se han mencionado y son que
permiten controlar variables relacionadas con el sujeto y no con el
estudio que se está desarrollado .
Ejemplo Licciardone et al. examined subjects with chronic, nonspecific low back pain. In this study, 18 of the subjects completed a survey questionnaire assessing physical functioning at baseline, and after 1, 3, and 6 months. Table (EXA_C08_S04_01) shows the data for these subjects who received a sham treatment that appeared to be genuine osteopathic manipulation. Higher values indicate better physical functioning. The goal of the experiment was to determine if subjects would report improvement over time even though the treatment they received would provide minimal improvement. We wish to know if there is a difference in the mean survey values among the four points in time.
4.1 <- read_csv("~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/EXA_C08_S04_01mod.csv", show_col_types = FALSE)
Exa8.
ggplot( data = Exa8.4.1, aes(x = Time, y = Assessment) ) +
geom_boxplot() +
geom_jitter(width = .15, alpha = 0.6, aes(color = Time))
# Find outliers (extreme if present)
4.1 %>%
Exa8.group_by(Time) %>%
identify_outliers(Assessment)
# A tibble: 2 × 5
Time Assessment Subject is.outlier is.extreme
<chr> <dbl> <dbl> <lgl> <lgl>
1 Month3 20 16 TRUE FALSE
2 Month6 25 16 TRUE FALSE
# Test normal distribution
4.1 %>%
Exa8.group_by(Time) %>%
shapiro_test(Assessment)
# A tibble: 4 × 4
Time variable statistic p
<chr> <chr> <dbl> <dbl>
1 Baseline Assessment 0.900 0.0568
2 Month1 Assessment 0.960 0.611
3 Month3 Assessment 0.884 0.0303
4 Month6 Assessment 0.935 0.235
# ANOVA test from the rstatix package
4.1_aov <- anova_test(data = Exa8.4.1, dv = Assessment, wid = Subject, within = Time)
Exa8.get_anova_table(Exa8.4.1_aov)
ANOVA Table (type III tests)
Effect DFn DFd F p p<.05 ges
1 Time 2.22 37.68 5.501 0.006 * 0.08
# Pairwise comparisons
emmeans_test(Exa8.4.1, Assessment ~ Time, p.adjust.method = "fdr")
# A tibble: 6 × 9
term .y. group1 group2 df statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 Time Assessment Baseline Month1 68 0.372 0.711 0.711 ns
2 Time Assessment Baseline Month3 68 -1.74 0.0871 0.242 ns
3 Time Assessment Baseline Month6 68 -1.20 0.235 0.352 ns
4 Time Assessment Month1 Month3 68 -2.11 0.0387 0.232 ns
5 Time Assessment Month1 Month6 68 -1.57 0.121 0.242 ns
6 Time Assessment Month3 Month6 68 0.537 0.593 0.711 ns
# # # # # # # # # # # # # # # # # # #
# Now with the R basic anova function, how to make a repeated measures
ggplot( data = Exa8.4.1, aes(x = Time, y = Assessment) ) +
geom_boxplot() +
geom_jitter(width = .1, alpha=.6, aes(color = Time)) +
theme_bw()
4.1_mod <- aov( Assessment ~ factor(Time) + Error(factor(Subject)), data = Exa8.4.1 )
Exa8.summary(Exa8.4.1_mod)
Error: factor(Subject)
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 17 20238 1190
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
factor(Time) 3 2396 798.6 5.501 0.00237 **
Residuals 51 7404 145.2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans(Exa8.4.1_mod, pairwise ~ Time )
$emmeans
Time emmean SE df lower.CL upper.CL
Baseline 61.9 4.75 30.4 52.2 71.6
Month1 59.4 4.75 30.4 49.7 69.1
Month3 73.6 4.75 30.4 63.9 83.3
Month6 70.0 4.75 30.4 60.3 79.7
Warning: EMMs are biased unless design is perfectly balanced
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Baseline - Month1 2.50 4.02 51 0.622 0.9244
Baseline - Month3 -11.67 4.02 51 -2.905 0.0269
Baseline - Month6 -8.06 4.02 51 -2.006 0.1994
Month1 - Month3 -14.17 4.02 51 -3.527 0.0048
Month1 - Month6 -10.56 4.02 51 -2.628 0.0534
Month3 - Month6 3.61 4.02 51 0.899 0.8053
P value adjustment: tukey method for comparing a family of 4 estimates