ANOVA

# Set working directory
library(car)
library(emmeans)
library(rstatix)
library(tidyverse)

options('max.print' = 100000)

Introducción ANOVA

Uno de los procesos más usados para comparar diferentes “tratamientos” sobre un grupo experimental es sin duda el Análisis de Varianza (ANOVA). Desarrollado originalmente por R.A. Fisher y ha tenido una gran influencia en la estadística. El análisis de varianza se usa principalmente para dos propósitos: 1) para estimar y desarrollar prueba de hipótesis de varianzas de una población. 2) Estimar y probar medias poblacionales. En esta introducción resolveremos ejemplos para la estimación de medias poblacionales.

ANOVA Modelos de una via (One-Way ANOVA)

Carne proveniente de caza, entre otras aquellas del venado de cola-blanca, y ardilla gris, son usadas como alimento or familias, cazadores y otros individuos por razones culturales, personales, o de salud. En un estudio de Holben et al. estimó el contenido de selenio de carne de venado cola-blanca libre (venison), y ardilla gris (squirrel) obtenidos de una región de baja concentración de selenio en EEUU. Estos contenidos de selenio fueron también comparados con concentraciones de carne de ganado producido en la misma región (RRB) y en una región externa (NRRB). Nos interesa saber si la concentración de selenio \((\mu g/100g)\) es distinta en los grupos de carne. (EXA_C08_S02_01.csv).

La hipótesis nula de este modelo es que todos los promedios poblacionales (cada promedio de la población de cada caso) son iguales entre ellos \(\mu_1 = \mu_2 = \mu_3 \dots = \mu_n\), y la hipótesis alternativa es que “por lo menos uno de ellos es distinto”.

getOption('max.print')
[1] 100000
# Exa_C08_S02_01 One-way
#
Exa8.2.1 <- read_csv(file="~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/EXA_C08_S02_01mod.csv", show_col_types = FALSE)
boxplot(Selenium ~ Group, data = Exa8.2.1)

ggplot(data = Exa8.2.1, aes(x = Group, y = Selenium)) + geom_boxplot()

Exa8.2.1_lm <- lm(Selenium ~ Group, data = Exa8.2.1)
(anova_Exa8.2.1_lm <- anova(Exa8.2.1_lm))
Analysis of Variance Table

Response: Selenium
           Df Sum Sq Mean Sq F value    Pr(>F)    
Group       3  18935  6311.7  22.614 5.345e-12 ***
Residuals 140  39074   279.1                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
Exa8.2.1_aov <- aov(Selenium ~ Group, data = Exa8.2.1)
#
summary(Exa8.2.1_aov)
             Df Sum Sq Mean Sq F value   Pr(>F)    
Group         3  18935    6312   22.61 5.34e-12 ***
Residuals   140  39074     279                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model.tables(Exa8.2.1_aov, type="means")
Tables of means
Grand mean
         
35.44736 

 Group 
      NRB   RRB   SQU   VEN
    62.05 28.52 37.42 25.88
rep 19.00 30.00 53.00 42.00
model.tables(Exa8.2.1_aov, type="effects")
Tables of effects

 Group 
     NRB    RRB   SQU    VEN
    26.6 -6.925  1.97 -9.572
rep 19.0 30.000 53.00 42.000
#
TukeyHSD(Exa8.2.1_aov)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Selenium ~ Group, data = Exa8.2.1)

$Group
              diff        lwr        upr     p adj
RRB-NRB -33.523982 -46.260203 -20.787762 0.0000000
SQU-NRB -24.629335 -36.244663 -13.014007 0.0000010
VEN-NRB -36.170840 -48.180851 -24.160828 0.0000000
SQU-RRB   8.894648  -1.030122  18.819418 0.0961032
VEN-RRB  -2.646857 -13.030768   7.737053 0.9108599
VEN-SQU -11.541505 -20.515363  -2.567647 0.0057777
plot(TukeyHSD(Exa8.2.1_aov))

Más ejemplos

  1. Pacientes de enfermedades reumáticas u osteoporosis generalmente sufren de pérdidas críticas de densidad mineral osea (Bone Mineral Density, BMD). Un medicamento usado para recuperar o prevenir una pérdida mayor de BMD, es el Alendronato. Holcomb y Rothenberg examinaron a 96 mujeres tomando alendronato para determinar si había alguna diferencia en el promedio de cambio en densidad ósea entre cinco clasificaciones diagnostica primarias. El Grupo1 era de pacientes diagnosticados con artritis reumatoide (RA). Grupo2 era de pacientes con una variedad de diagnósticos incluyendo, Lupus, granulomatosis de Wegener y poliarteritis, y otros desordenes vasculares (LUPUS). Grupo3 consistió en pacientes diagnosticados con polimialgia reumática o artritis temporal (PMRTA). El Grupo4 estaba integrado por pacientes con artrosis (OA). Y el Grupo 5 de pacientes con diagnóstico de osteoporosis (O) sin otros desordenes reumáticos. ¿Puede determinar que los promedios de los grupos presentan alguna diferencia? Haga una gráfica de cajas de los grupos.
# Exer. 8.2.2 Bone Mineral Density, BMD
# Patients suffering from rheumatic diseases or osteoporosis often suffer critical loss in bone
# mineral density (BMD).
#
EXR8.2.2 <- read_csv(file="~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/EXR_C08_S02_02.csv", show_col_types = FALSE)

boxplot(BMD ~ factor(GROUP), data = EXR8.2.2)

Exr2_2.aov <- aov(BMD ~ factor(GROUP), data=EXR8.2.2)
anova(Exr2_2.aov)
Analysis of Variance Table

Response: BMD
              Df Sum Sq Mean Sq F value  Pr(>F)  
factor(GROUP)  4  355.5  88.864  2.2772 0.06697 .
Residuals     91 3551.1  39.024                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# anova() to print the ANOVA table

EXR8.2.2_M <- EXR8.2.2 %>% 
  mutate(GROUP = case_when(GROUP == 1 ~ "RA", 
                           GROUP == 2 ~ "LUPUS",
                           GROUP == 3 ~ "PMRTA", 
                           GROUP == 4 ~ "OA",
                           GROUP == 5 ~ "O"))

EXR8.2.2_M <- EXR8.2.2_M %>% 
  mutate(GROUP = GROUP %>% fct_relevel("O", "RA", "LUPUS", "PMRTA", "OA"))
  
EXR8.2.2_M %>% ggplot( aes(x = GROUP, y = BMD) )  + 
  geom_boxplot() +
  geom_jitter(width = 0.1, alpha = 0.6, aes(color = GROUP))

#

Exr2_2_lm <- lm(BMD ~ factor(GROUP), data=EXR8.2.2_M)
S(Exr2_2_lm)
Call: lm(formula = BMD ~ factor(GROUP), data = EXR8.2.2_M)

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)           9.672      1.975   4.896 4.22e-06 ***
factor(GROUP)RA      -5.202      2.226  -2.336  0.02166 *  
factor(GROUP)LUPUS   -5.094      2.870  -1.775  0.07929 .  
factor(GROUP)PMRTA   -7.491      2.518  -2.975  0.00375 ** 
factor(GROUP)OA      -4.467      2.351  -1.900  0.06061 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard deviation: 6.247 on 91 degrees of freedom
Multiple R-squared: 0.09099
F-statistic: 2.277 on 4 and 91 DF,  p-value: 0.06697 
   AIC    BIC 
631.06 646.45 
emmeans(Exr2_2_lm, pairwise ~ GROUP)
$emmeans
 GROUP emmean   SE df lower.CL upper.CL
 O       9.67 1.98 91    5.748    13.60
 RA      4.47 1.03 91    2.430     6.51
 LUPUS   4.58 2.08 91    0.442     8.71
 PMRTA   2.18 1.56 91   -0.921     5.28
 OA      5.20 1.28 91    2.672     7.74

Confidence level used: 0.95 

$contrasts
 contrast      estimate   SE df t.ratio p.value
 O - RA           5.202 2.23 91   2.336  0.1430
 O - LUPUS        5.094 2.87 91   1.775  0.3946
 O - PMRTA        7.491 2.52 91   2.975  0.0301
 O - OA           4.467 2.35 91   1.900  0.3248
 RA - LUPUS      -0.108 2.32 91  -0.047  1.0000
 RA - PMRTA       2.289 1.87 91   1.225  0.7371
 RA - OA         -0.735 1.64 91  -0.449  0.9915
 LUPUS - PMRTA    2.397 2.60 91   0.921  0.8881
 LUPUS - OA      -0.627 2.44 91  -0.257  0.9990
 PMRTA - OA      -3.024 2.02 91  -1.500  0.5653

P value adjustment: tukey method for comparing a family of 5 estimates 
emmeans(Exr2_2_lm, trt.vs.ctrl ~ GROUP)
$emmeans
 GROUP emmean   SE df lower.CL upper.CL
 O       9.67 1.98 91    5.748    13.60
 RA      4.47 1.03 91    2.430     6.51
 LUPUS   4.58 2.08 91    0.442     8.71
 PMRTA   2.18 1.56 91   -0.921     5.28
 OA      5.20 1.28 91    2.672     7.74

Confidence level used: 0.95 

$contrasts
 contrast  estimate   SE df t.ratio p.value
 RA - O       -5.20 2.23 91  -2.336  0.0740
 LUPUS - O    -5.09 2.87 91  -1.775  0.2387
 PMRTA - O    -7.49 2.52 91  -2.975  0.0138
 OA - O       -4.47 2.35 91  -1.900  0.1889

P value adjustment: dunnettx method for 4 tests 
  1. Ilich-Ernst y colaboradores estudiaron el consumo de calcio en la dieta de un grupos de 113 mujeres adultas sanas de edades entre 20 y 88 años. Los investigadores segregaron a los sujetos de estudio por grupos de edad de la siguiente manera: Grupo A 20.0 – 45.9 años; grupo B 46.00 – 55. 9 años; grupo C56.0 – 65.9 años; y grupo D, de más de 66 años. El consumo de calsio estuvo medido en mg/día. Los datos están listados en el archivo EXR_C08_S02_03.csv. Siguiendo un ANOVA, ¿se puede concluir que hay uns diferencia en los promedios de las poblaciones? Haga un HSD Tukey para estimar las diferencias entre las distintas poblaciones, sea Alpha = .05. Grafique los datos usando una gráfica de cajas y explique los resultados.
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Exr 8.2.3
Exr8.2.3 <- read_csv(file="~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/EXR_C08_S02_03.csv", show_col_types = FALSE)

plot(calcium ~ factor(Group), data = Exr8.2.3)

Exr3.aov <- aov(calcium ~ Group, data=Exr8.2.3)
anova(Exr3.aov)
Analysis of Variance Table

Response: calcium
           Df   Sum Sq Mean Sq F value    Pr(>F)    
Group       3  5931208 1977069  9.3588 1.476e-05 ***
Residuals 109 23026500  211252                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(Exr3.aov)

TukeyHSD(Exr3.aov)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = calcium ~ Group, data = Exr8.2.3)

$Group
          diff       lwr        upr     p adj
B-A -455.72078 -865.7066  -45.73492 0.0230454
C-A -574.53605 -913.5892 -235.48294 0.0001356
D-A -596.63447 -905.3867 -287.88228 0.0000109
C-B -118.81527 -509.0844  271.45386 0.8568796
D-B -140.91369 -505.1676  223.34020 0.7443051
D-C  -22.09842 -304.1436  259.94679 0.9969598

ANOVA de dos vías

Para el caso de un ANOVA de dos vías, tenemos que la función repuesta responde a dos factores que a su vez tiene dos o más niveles por factor: \(Y \sim factor_1 + factor_2\)

Del ejemplo 8.3.1 de Daniel para un ANOVA de dos vías: Un fisioterapeuta quería comparar tres métodos para enseñar a sus pacientes a usar una prótesis. Pensó que la proporción de aprendizage sería diferente dependiendo de la edad de los parcientes y desarrolló un experimento que le permitió tomar en cuenta la edad. Se construye la tábla de los datos del problema.

# Build a table, index by index since
# Two-Way ANOVA EXAMPLE 8.3.1 has NO-data file, therefore we make the example table.
#

Learn <- tibble(
  Age = factor(rep(c(1,2,3,4,5), 3)),
  Method = factor(rep(1:3,c(5,5,5))),
  Rate = c(7,8,9,10,11,9,9,9,9,12,10,10,12,12,14)
  )

Learn
# A tibble: 15 × 3
   Age   Method  Rate
   <fct> <fct>  <dbl>
 1 1     1          7
 2 2     1          8
 3 3     1          9
 4 4     1         10
 5 5     1         11
 6 1     2          9
 7 2     2          9
 8 3     2          9
 9 4     2          9
10 5     2         12
11 1     3         10
12 2     3         10
13 3     3         12
14 4     3         12
15 5     3         14
#
plot(Rate ~ Age + Method, data = Learn)

Para las gráficas de cajas ejecutadas por un ggplot (o boxplot) es necesario mandar por separado cada grupo en la variable predictora (o independiente), o separar dos páneles, uno por cada grupo.

# For the Age group

ggplot(data = Learn, aes(x = Age, y = Rate) )  + 
  geom_boxplot() + 
  geom_jitter(width = 0.1, alpha=0.6, aes(color = Age))

# For the Rate group
ggplot(data = Learn, aes(x = Method, y = Rate) )  + 
  geom_boxplot() + 
  geom_jitter(width = 0.1, alpha=0.6, aes(color = Method))

# now the ANOVA model
Rate_aov = aov(Rate ~ Method + Age, data = Learn)
anova(Rate_aov)
Analysis of Variance Table

Response: Rate
          Df  Sum Sq Mean Sq F value    Pr(>F)    
Method     2 18.5333  9.2667  21.385 0.0006165 ***
Age        4 24.9333  6.2333  14.385 0.0010017 ** 
Residuals  8  3.4667  0.4333                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
layout(matrix(c(1,2), nrow=1, ncol=2, byrow=TRUE))
TukeyHSD(Rate_aov)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Rate ~ Method + Age, data = Learn)

$Method
    diff        lwr      upr     p adj
2-1  0.6 -0.5896489 1.789649 0.3666717
3-1  2.6  1.4103511 3.789649 0.0006358
3-2  2.0  0.8103511 3.189649 0.0034083

$Age
         diff        lwr      upr     p adj
2-1 0.3333333 -1.5235390 2.190206 0.9676094
3-1 1.3333333 -0.5235390 3.190206 0.1877558
4-1 1.6666667 -0.1902056 3.523539 0.0810838
5-1 3.6666667  1.8097944 5.523539 0.0009146
3-2 1.0000000 -0.8568723 2.856872 0.4057524
4-2 1.3333333 -0.5235390 3.190206 0.1877558
5-2 3.3333333  1.4764610 5.190206 0.0017351
4-3 0.3333333 -1.5235390 2.190206 0.9676094
5-3 2.3333333  0.4764610 4.190206 0.0154324
5-4 2.0000000  0.1431277 3.856872 0.0348816
plot(TukeyHSD(Rate_aov))
abline(v=0, col=2)

Ejemplos two-way ANOVA

En un estudio de efectos pulmonares en conejillos de Indias, Lacroix et al. expuso a 18 conejillos de Indias sensibilizados con albúmina y 18 no-sensibilizados con aire regular, benzaldehido y acetildehido. Al final de la exposición los conejillos de Indias fueron anesteciados y la respuesta a alergias medida por lavado broncoalveolar (BAL). La tabla de resultados se muestra en REV_C08_16 y muestra la cuenta de células alveolares \((\times 10^6)\) por grupo de tratamiento (aire, benzaldehido y acetildehido) y sensibilizados a albúmina. Pruebe por las diferencias en a) entre sensibilizados y no sensibilizados por albúmina, b) entre los tres tratamientos. Sea \(\alpha=.05\) en todas las pruebas.

Rev8.16 <- read_csv("~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/REV_C08_16.csv", show_col_types = FALSE)
boxplot(Count ~ factor(Treat), data = Rev8.16)

boxplot(Count ~ factor(Sens), data = Rev8.16)

boxplot(Count ~ factor(Sens) + factor(Treat), data = Rev8.16)

#
Rev8.16_M <- Rev8.16 %>% 
  mutate( Treat = case_when( Treat == "Air" ~ "1Air", 
                             Treat == "Act" ~ "2Act", 
                             Treat == "Benz" ~ "3Benz"))

Rev8.16_aov <- aov(Count ~ factor(Sens) + factor(Treat), data = Rev8.16)
anova(Rev8.16_aov)
Analysis of Variance Table

Response: Count
              Df  Sum Sq Mean Sq F value    Pr(>F)    
factor(Sens)   1  7906.2  7906.2  20.718 7.276e-05 ***
factor(Treat)  2  7688.6  3844.3  10.074 0.0004042 ***
Residuals     32 12211.5   381.6                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# With car Anova
Rev8.16_lm <- lm(Count ~ factor(Sens) + factor(Treat), data = Rev8.16_M)
Anova(Rev8.16_lm)
Anova Table (Type II tests)

Response: Count
               Sum Sq Df F value    Pr(>F)    
factor(Sens)   7906.2  1  20.718 7.276e-05 ***
factor(Treat)  7688.6  2  10.074 0.0004042 ***
Residuals     12211.5 32                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
anova(Rev8.16_lm)
Analysis of Variance Table

Response: Count
              Df  Sum Sq Mean Sq F value    Pr(>F)    
factor(Sens)   1  7906.2  7906.2  20.718 7.276e-05 ***
factor(Treat)  2  7688.6  3844.3  10.074 0.0004042 ***
Residuals     32 12211.5   381.6                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans(Rev8.16_lm, pairwise ~ Sens + Treat)
$emmeans
 Sens Treat emmean   SE df lower.CL upper.CL
 No   1Air    31.4 6.51 32    18.17     44.7
 Yes  1Air    61.1 6.51 32    47.81     74.3
 No   2Act    48.0 6.51 32    34.69     61.2
 Yes  2Act    77.6 6.51 32    64.33     90.9
 No   3Benz   12.2 6.51 32    -1.07     25.5
 Yes  3Benz   41.8 6.51 32    28.56     55.1

Confidence level used: 0.95 

$contrasts
 contrast             estimate    SE df t.ratio p.value
 No 1Air - Yes 1Air     -29.64  6.51 32  -4.552  0.0009
 No 1Air - No 2Act      -16.52  7.98 32  -2.071  0.3273
 No 1Air - Yes 2Act     -46.16 10.30 32  -4.483  0.0012
 No 1Air - No 3Benz      19.25  7.98 32   2.413  0.1820
 No 1Air - Yes 3Benz    -10.39 10.30 32  -1.009  0.9114
 Yes 1Air - No 2Act      13.12 10.30 32   1.275  0.7963
 Yes 1Air - Yes 2Act    -16.52  7.98 32  -2.071  0.3273
 Yes 1Air - No 3Benz     48.88 10.30 32   4.748  0.0005
 Yes 1Air - Yes 3Benz    19.25  7.98 32   2.413  0.1820
 No 2Act - Yes 2Act     -29.64  6.51 32  -4.552  0.0009
 No 2Act - No 3Benz      35.76  7.98 32   4.484  0.0011
 No 2Act - Yes 3Benz      6.12 10.30 32   0.595  0.9907
 Yes 2Act - No 3Benz     65.40 10.30 32   6.352  <.0001
 Yes 2Act - Yes 3Benz    35.76  7.98 32   4.484  0.0011
 No 3Benz - Yes 3Benz   -29.64  6.51 32  -4.552  0.0009

P value adjustment: tukey method for comparing a family of 6 estimates 
emmeans(Rev8.16_lm, trt.vs.ctrl ~ Sens + Treat)
$emmeans
 Sens Treat emmean   SE df lower.CL upper.CL
 No   1Air    31.4 6.51 32    18.17     44.7
 Yes  1Air    61.1 6.51 32    47.81     74.3
 No   2Act    48.0 6.51 32    34.69     61.2
 Yes  2Act    77.6 6.51 32    64.33     90.9
 No   3Benz   12.2 6.51 32    -1.07     25.5
 Yes  3Benz   41.8 6.51 32    28.56     55.1

Confidence level used: 0.95 

$contrasts
 contrast            estimate    SE df t.ratio p.value
 Yes 1Air - No 1Air      29.6  6.51 32   4.552  0.0003
 No 2Act - No 1Air       16.5  7.98 32   2.071  0.1723
 Yes 2Act - No 1Air      46.2 10.30 32   4.483  0.0004
 No 3Benz - No 1Air     -19.2  7.98 32  -2.413  0.0868
 Yes 3Benz - No 1Air     10.4 10.30 32   1.009  0.7331

P value adjustment: dunnettx method for 5 tests 
#
Rev8.16_lm_int <- aov(Count ~ factor(Sens) * factor(Treat), data = Rev8.16_M)
# model with interaction
anova(Rev8.16_lm_int)
Analysis of Variance Table

Response: Count
                           Df  Sum Sq Mean Sq F value    Pr(>F)    
factor(Sens)                1  7906.2  7906.2 23.0210 4.119e-05 ***
factor(Treat)               2  7688.6  3844.3 11.1938 0.0002336 ***
factor(Sens):factor(Treat)  2  1908.5   954.3  2.7786 0.0781461 .  
Residuals                  30 10303.0   343.4                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Rev8.16_lm_int_emm_p <- emmeans(Rev8.16_lm_int, pairwise ~ Sens + Treat)
Rev8.16_lm_int_emm_p
$emmeans
 Sens Treat emmean   SE df lower.CL upper.CL
 No   1Air    24.3 7.57 30     8.87     39.8
 Yes  1Air    68.2 7.57 30    52.74     83.6
 No   2Act    45.1 7.57 30    29.62     60.5
 Yes  2Act    80.5 7.57 30    65.02     95.9
 No   3Benz   22.2 7.57 30     6.74     37.6
 Yes  3Benz   31.8 7.57 30    16.37     47.3

Confidence level used: 0.95 

$contrasts
 contrast             estimate   SE df t.ratio p.value
 No 1Air - Yes 1Air     -43.88 10.7 30  -4.101  0.0036
 No 1Air - No 2Act      -20.75 10.7 30  -1.939  0.3993
 No 1Air - Yes 2Act     -56.16 10.7 30  -5.249  0.0002
 No 1Air - No 3Benz       2.12 10.7 30   0.199  1.0000
 No 1Air - Yes 3Benz     -7.51 10.7 30  -0.702  0.9803
 Yes 1Air - No 2Act      23.12 10.7 30   2.161  0.2850
 Yes 1Air - Yes 2Act    -12.28 10.7 30  -1.148  0.8572
 Yes 1Air - No 3Benz     46.00 10.7 30   4.299  0.0021
 Yes 1Air - Yes 3Benz    36.37 10.7 30   3.399  0.0215
 No 2Act - Yes 2Act     -35.41 10.7 30  -3.309  0.0268
 No 2Act - No 3Benz      22.88 10.7 30   2.138  0.2959
 No 2Act - Yes 3Benz     13.24 10.7 30   1.238  0.8150
 Yes 2Act - No 3Benz     58.28 10.7 30   5.447  0.0001
 Yes 2Act - Yes 3Benz    48.65 10.7 30   4.547  0.0011
 No 3Benz - Yes 3Benz    -9.63 10.7 30  -0.900  0.9434

P value adjustment: tukey method for comparing a family of 6 estimates 
Rev8.16_lm_int_emm_c <- emmeans(Rev8.16_lm_int, trt.vs.ctrl ~ Sens + Treat)
Rev8.16_lm_int_emm_c
$emmeans
 Sens Treat emmean   SE df lower.CL upper.CL
 No   1Air    24.3 7.57 30     8.87     39.8
 Yes  1Air    68.2 7.57 30    52.74     83.6
 No   2Act    45.1 7.57 30    29.62     60.5
 Yes  2Act    80.5 7.57 30    65.02     95.9
 No   3Benz   22.2 7.57 30     6.74     37.6
 Yes  3Benz   31.8 7.57 30    16.37     47.3

Confidence level used: 0.95 

$contrasts
 contrast            estimate   SE df t.ratio p.value
 Yes 1Air - No 1Air     43.88 10.7 30   4.101  0.0013
 No 2Act - No 1Air      20.75 10.7 30   1.939  0.2205
 Yes 2Act - No 1Air     56.16 10.7 30   5.249  0.0001
 No 3Benz - No 1Air     -2.12 10.7 30  -0.199  0.9958
 Yes 3Benz - No 1Air     7.51 10.7 30   0.702  0.8862

P value adjustment: dunnettx method for 5 tests 

Ejemplo El interés del estudio de Hartman-Maeir et al. fue estimar los perfiles del déficit de concienciación entre pacientes con infarto cerebral que están en rehabilitación. Estudió 35 pacientes con lesiones por infarto en el hemisferio derecho y 19 pacientes con lesiones en el hemisferio izquierdo. Además agrupó las lesiones por su tamaño como: 2 = 1-3 cm, 3 = 3-5cm, y 4 = 5 cm o más grandes. Una de las medidas importantes fue la calificación de la concienciación por su propia limitación. Las calificaciones tuvieron un rango de 8 a 24, con una calificación más alta significando mayor concienciación (REV_C08_22.csv). Pruebe la diferencia por tamaño de lesión y lado de hemisferio, sea \(\alpha = .05\).

Rev.8.22 <- read_csv("~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/REV_C08_22.csv", show_col_types = FALSE)

boxplot(SCORES ~ factor(SIDE) + factor(SIZE), data = Rev.8.22)

boxplot(SCORES ~ factor(SIZE) + factor(SIDE), data = Rev.8.22)

#
Rev.8.22_lm <- aov(SCORES ~ factor(SIZE) + factor(SIDE), data = Rev.8.22)
anova(Rev.8.22_lm)
Analysis of Variance Table

Response: SCORES
             Df  Sum Sq Mean Sq F value  Pr(>F)  
factor(SIZE)  2  28.690 14.3451  3.5094 0.03748 *
factor(SIDE)  1   2.578  2.5782  0.6307 0.43084  
Residuals    50 204.380  4.0876                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# contrasts
emmeans(Rev.8.22_lm, pairwise ~ SIDE )
$emmeans
 SIDE emmean    SE df lower.CL upper.CL
 L      11.1 0.470 50    10.11     12.0
 R      10.6 0.345 50     9.91     11.3

Results are averaged over the levels of: SIZE 
Confidence level used: 0.95 

$contrasts
 contrast estimate    SE df t.ratio p.value
 L - R       0.459 0.578 50   0.794  0.4308

Results are averaged over the levels of: SIZE 
Rev.8.22_lm_int <- lm(SCORES ~ factor(SIZE) * factor(SIDE), data = Rev.8.22)
Anova(Rev.8.22_lm_int)
Anova Table (Type II tests)

Response: SCORES
                           Sum Sq Df F value Pr(>F)  
factor(SIZE)               29.981  2  3.7121 0.0317 *
factor(SIDE)                2.578  1  0.6384 0.4282  
factor(SIZE):factor(SIDE)  10.540  2  1.3050 0.2806  
Residuals                 193.840 48                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
emmeans(Rev.8.22_lm_int, trt.vs.ctrl ~ SIZE)
$emmeans
 SIZE emmean    SE df lower.CL upper.CL
    2   10.0 0.429 48     9.16     10.9
    3   10.5 0.560 48     9.36     11.6
    4   12.1 0.535 48    11.01     13.2

Results are averaged over the levels of: SIDE 
Confidence level used: 0.95 

$contrasts
 contrast      estimate    SE df t.ratio p.value
 SIZE3 - SIZE2    0.469 0.706 48   0.664  0.7243
 SIZE4 - SIZE2    2.063 0.686 48   3.009  0.0081

Results are averaged over the levels of: SIDE 
P value adjustment: dunnettx method for 2 tests 
# emmip(Rev.8.22_lm_int, SIDE ~ SIZE)

# With rstatix
Rev.8.22_aov <- anova_test(data = Rev.8.22, SCORES ~ factor(SIZE) * factor(SIDE))
get_anova_table(Rev.8.22_aov)
ANOVA Table (type II tests)

                     Effect DFn DFd     F     p p<.05   ges
1              factor(SIZE)   2  48 3.712 0.032     * 0.134
2              factor(SIDE)   1  48 0.638 0.428       0.013
3 factor(SIZE):factor(SIDE)   2  48 1.305 0.281       0.052
# Post hoc
emmeans_test(Rev.8.22, SCORES ~ SIZE, p.adjust.method = "fdr")
# A tibble: 3 × 9
  term  .y.    group1 group2    df statistic      p  p.adj p.adj.signif
* <chr> <chr>  <chr>  <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>       
1 SIZE  SCORES 2      3         51    -0.837 0.407  0.407  ns          
2 SIZE  SCORES 2      4         51    -2.65  0.0108 0.0323 *           
3 SIZE  SCORES 3      4         51    -1.56  0.125  0.187  ns          

Ejemplo 18 The effects of thermal pollution on Corbicula fluminea (Asiatic clams) at three different geographical locations were analyzed by John Brooker (REV_C08_18). Sample data on clam shell length, width, and height are displayed in the following table. Determine if there is a significant difference in mean length, height, or width (measured in mm) of the clam shell at the three different locations by performing three analyses. What inferences can be made from your results? What are the assumptions underlying your inferences? What are the target populations?

Rev.8.18 <- read_csv("~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/REV_C08_18_mod.csv", show_col_types = FALSE)
boxplot(Measure ~ factor(site) + factor(Geometry), data = Rev.8.18)

#
Rev.8.18_lm <- lm(Measure ~ factor(site) + factor(Geometry), data = Rev.8.18)
Anova(Rev.8.18_lm)
Anova Table (Type II tests)

Response: Measure
                  Sum Sq  Df  F value    Pr(>F)    
factor(site)       1.869   2   5.0704  0.007045 ** 
factor(Geometry) 286.132   2 776.3727 < 2.2e-16 ***
Residuals         39.988 217                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Pairwise post hoc
options('max.print' = 10000)
emmeans(Rev.8.18_lm, pairwise ~ site + Geometry)
$emmeans
 site Geometry emmean     SE  df lower.CL upper.CL
    1 height     4.30 0.0642 217     4.17     4.43
    2 height     4.12 0.0658 217     3.99     4.25
    3 height     4.09 0.0634 217     3.97     4.22
    1 length     7.07 0.0642 217     6.95     7.20
    2 length     6.90 0.0658 217     6.77     7.03
    3 length     6.87 0.0634 217     6.74     6.99
    1 width      5.85 0.0642 217     5.73     5.98
    2 width      5.68 0.0658 217     5.55     5.81
    3 width      5.65 0.0634 217     5.52     5.77

Confidence level used: 0.95 

$contrasts
 contrast                    estimate     SE  df t.ratio p.value
 site1 height - site2 height   0.1768 0.0716 217   2.468  0.2534
 site1 height - site3 height   0.2061 0.0694 217   2.969  0.0788
 site1 height - site1 length  -2.7742 0.0706 217 -39.310  <.0001
 site1 height - site2 length  -2.5974 0.1005 217 -25.835  <.0001
 site1 height - site3 length  -2.5681 0.0990 217 -25.942  <.0001
 site1 height - site1 width   -1.5541 0.0706 217 -22.021  <.0001
 site1 height - site2 width   -1.3773 0.1005 217 -13.699  <.0001
 site1 height - site3 width   -1.3479 0.0990 217 -13.616  <.0001
 site2 height - site3 height   0.0294 0.0709 217   0.414  1.0000
 site2 height - site1 length  -2.9509 0.1005 217 -29.351  <.0001
 site2 height - site2 length  -2.7742 0.0706 217 -39.310  <.0001
 site2 height - site3 length  -2.7448 0.1001 217 -27.430  <.0001
 site2 height - site1 width   -1.7308 0.1005 217 -17.215  <.0001
 site2 height - site2 width   -1.5541 0.0706 217 -22.021  <.0001
 site2 height - site3 width   -1.5247 0.1001 217 -15.237  <.0001
 site3 height - site1 length  -2.9803 0.0990 217 -30.106  <.0001
 site3 height - site2 length  -2.8035 0.1001 217 -28.016  <.0001
 site3 height - site3 length  -2.7742 0.0706 217 -39.310  <.0001
 site3 height - site1 width   -1.7602 0.0990 217 -17.780  <.0001
 site3 height - site2 width   -1.5834 0.1001 217 -15.823  <.0001
 site3 height - site3 width   -1.5541 0.0706 217 -22.021  <.0001
 site1 length - site2 length   0.1768 0.0716 217   2.468  0.2534
 site1 length - site3 length   0.2061 0.0694 217   2.969  0.0788
 site1 length - site1 width    1.2201 0.0706 217  17.289  <.0001
 site1 length - site2 width    1.3969 0.1005 217  13.894  <.0001
 site1 length - site3 width    1.4262 0.0990 217  14.407  <.0001
 site2 length - site3 length   0.0294 0.0709 217   0.414  1.0000
 site2 length - site1 width    1.0434 0.1005 217  10.378  <.0001
 site2 length - site2 width    1.2201 0.0706 217  17.289  <.0001
 site2 length - site3 width    1.2495 0.1001 217  12.486  <.0001
 site3 length - site1 width    1.0140 0.0990 217  10.243  <.0001
 site3 length - site2 width    1.1908 0.1001 217  11.900  <.0001
 site3 length - site3 width    1.2201 0.0706 217  17.289  <.0001
 site1 width - site2 width     0.1768 0.0716 217   2.468  0.2534
 site1 width - site3 width     0.2061 0.0694 217   2.969  0.0788
 site2 width - site3 width     0.0294 0.0709 217   0.414  1.0000

P value adjustment: tukey method for comparing a family of 9 estimates 

ANOVA Modelos de medidas repetidas

Los modelos de medidas repetidas son aquellos en los que la misma variable es medida en los mismos sujetos en dos o más ocasiones. En donde las ocaciones pueden significar condiciones distintas como diferentes tratamientos o diferentes puntos de tiempo.
Las ventajas de usar este tipo de modelos se han mencionado y son que permiten controlar variables relacionadas con el sujeto y no con el estudio que se está desarrollado .

Ejemplo Licciardone et al. examined subjects with chronic, nonspecific low back pain. In this study, 18 of the subjects completed a survey questionnaire assessing physical functioning at baseline, and after 1, 3, and 6 months. Table (EXA_C08_S04_01) shows the data for these subjects who received a sham treatment that appeared to be genuine osteopathic manipulation. Higher values indicate better physical functioning. The goal of the experiment was to determine if subjects would report improvement over time even though the treatment they received would provide minimal improvement. We wish to know if there is a difference in the mean survey values among the four points in time.

Exa8.4.1 <- read_csv("~/Dropbox/GitHub/ProbEstad/DataSets/ch08_all/EXA_C08_S04_01mod.csv", show_col_types = FALSE)

ggplot( data = Exa8.4.1, aes(x = Time, y = Assessment) ) + 
  geom_boxplot() + 
  geom_jitter(width = .15, alpha = 0.6, aes(color = Time))

# Find outliers (extreme if present)
Exa8.4.1 %>% 
  group_by(Time) %>% 
  identify_outliers(Assessment)
# A tibble: 2 × 5
  Time   Assessment Subject is.outlier is.extreme
  <chr>       <dbl>   <dbl> <lgl>      <lgl>     
1 Month3         20      16 TRUE       FALSE     
2 Month6         25      16 TRUE       FALSE     
# Test normal distribution
Exa8.4.1 %>% 
  group_by(Time) %>% 
  shapiro_test(Assessment)
# A tibble: 4 × 4
  Time     variable   statistic      p
  <chr>    <chr>          <dbl>  <dbl>
1 Baseline Assessment     0.900 0.0568
2 Month1   Assessment     0.960 0.611 
3 Month3   Assessment     0.884 0.0303
4 Month6   Assessment     0.935 0.235 
# ANOVA test from the rstatix package
Exa8.4.1_aov <- anova_test(data = Exa8.4.1, dv = Assessment, wid = Subject, within = Time)
get_anova_table(Exa8.4.1_aov)
ANOVA Table (type III tests)

  Effect  DFn   DFd     F     p p<.05  ges
1   Time 2.22 37.68 5.501 0.006     * 0.08
# Pairwise comparisons
emmeans_test(Exa8.4.1, Assessment ~ Time, p.adjust.method = "fdr")
# A tibble: 6 × 9
  term  .y.        group1   group2    df statistic      p p.adj p.adj.signif
* <chr> <chr>      <chr>    <chr>  <dbl>     <dbl>  <dbl> <dbl> <chr>       
1 Time  Assessment Baseline Month1    68     0.372 0.711  0.711 ns          
2 Time  Assessment Baseline Month3    68    -1.74  0.0871 0.242 ns          
3 Time  Assessment Baseline Month6    68    -1.20  0.235  0.352 ns          
4 Time  Assessment Month1   Month3    68    -2.11  0.0387 0.232 ns          
5 Time  Assessment Month1   Month6    68    -1.57  0.121  0.242 ns          
6 Time  Assessment Month3   Month6    68     0.537 0.593  0.711 ns          
# # # # # # # # # # # # # # # # # # # 
# Now with the R basic anova function, how to make a repeated measures
ggplot( data = Exa8.4.1, aes(x = Time, y = Assessment) ) + 
  geom_boxplot() + 
  geom_jitter(width = .1, alpha=.6, aes(color = Time)) +
  theme_bw()

Exa8.4.1_mod <- aov( Assessment ~ factor(Time) + Error(factor(Subject)), data = Exa8.4.1 )
summary(Exa8.4.1_mod)

Error: factor(Subject)
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 17  20238    1190               

Error: Within
             Df Sum Sq Mean Sq F value  Pr(>F)   
factor(Time)  3   2396   798.6   5.501 0.00237 **
Residuals    51   7404   145.2                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans(Exa8.4.1_mod, pairwise ~ Time )
$emmeans
 Time     emmean   SE   df lower.CL upper.CL
 Baseline   61.9 4.75 30.4     52.2     71.6
 Month1     59.4 4.75 30.4     49.7     69.1
 Month3     73.6 4.75 30.4     63.9     83.3
 Month6     70.0 4.75 30.4     60.3     79.7

Warning: EMMs are biased unless design is perfectly balanced 
Confidence level used: 0.95 

$contrasts
 contrast          estimate   SE df t.ratio p.value
 Baseline - Month1     2.50 4.02 51   0.622  0.9244
 Baseline - Month3   -11.67 4.02 51  -2.905  0.0269
 Baseline - Month6    -8.06 4.02 51  -2.006  0.1994
 Month1 - Month3     -14.17 4.02 51  -3.527  0.0048
 Month1 - Month6     -10.56 4.02 51  -2.628  0.0534
 Month3 - Month6       3.61 4.02 51   0.899  0.8053

P value adjustment: tukey method for comparing a family of 4 estimates