# libraries
library(tidyverse)

Ejemplos de prueba de hipótesis

Promedio de dos poblaciones

En esta parte desarrollaremos ejemplos y ejercicios de la comparación del promedio de dos poblaciones usando una prueba t. Los ejemplos son del Daniel capítulo 7.

Ejemplo 7.3.12

Can we conclude that, on the average, lymphocytes and tumor cells differ in size? The measures in EXR_C07_S03_12.csv are the cell diameters (μm) of 40 lymphocytes and 50 tumor cells obtained from biopsies of tissue from patients with melanoma. Let \(\alpha = .05\)

Tumor <- read_csv("DataSets/ch07_all/EXR_C07_S03_12.csv", show_col_types = FALSE)

Tumor

# A tibble: 90 × 2
    Size Group
   <dbl> <dbl>
 1   9       1
 2   6.3     1
 3   8.6     1
 4   7.4     1
 5   8.8     1
 6   9.4     1
 7   5.7     1
 8   7       1
 9   8.7     1
10   5.2     1
# … with 80 more rows

Tumor <- Tumor %>% mutate(Group = factor(Group))

Tumor %>% 
  ggplot(aes(x = Group, y = Size)) +
  geom_boxplot() +
  geom_jitter(aes(color = Group)) +
  labs(y = "Cell diameter", 
       title = "Lymphocytes and tumor cells of patients with melanoma")

t.test(Size ~ Group , data = Tumor, alternative = "less", var.equal = FALSE, conf.level = 0.95)


    Welch Two Sample t-test

data:  Size by Group
t = -22.396, df = 78.005, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 1 and group 2 is less than 0
95 percent confidence interval:
      -Inf -10.15464
sample estimates:
mean in group 1 mean in group 2 
           6.95           17.92

Medidas repetidas

En ocasiones la comparación de las medidas de dos muestras no resulta en una diferencia clara y no se puede concuir que hay un efecto, esto en ocasiones es resultado de diferencia en las varianzas u otros factores externos que pueden resultar en medidas divergentes. Sin embargo si de una sola muestra aleatoria de sujetos se puede hacer una comparación de las dos condiciones de interés, estas medidas espureas pueden eliminarse y resultar en un método con más poder estadístico.

Ejemplo 19

William Tindall (A-28) performed a retrospective study of the records of patients receiving care for hypercholesterolemia. The table at REV_C07_19.csv gives measurements of total cholesterol for patients before and 6 weeks after taking a statin drug. Is there sufficient evidence at the α = .01 level of significance for us to conclude that the drug would result in reduction in total cholesterol in a population of similar hypercholesterolemia patients?

Cholesterol <- read_csv("DataSets/ch07_all/REV_C07_19.csv", show_col_types = FALSE)

Chol_long <- Cholesterol %>%
  pivot_longer(cols = c("Before", "After") , names_to = "Treat", values_to = "Chol")

boxplot(Chol ~ Treat , data = Chol_long, ylab = "Nivel de colesterol" )

t.test(Chol ~ Treat , data = Chol_long, alternative = "less", paired = TRUE, var.equal = TRUE, conf.level = 0.99)


    Paired t-test

data:  Chol by Treat
t = -29.495, df = 106, p-value < 2.2e-16
alternative hypothesis: true difference in means is less than 0
99 percent confidence interval:
      -Inf -73.06903
sample estimates:
mean of the differences 
              -79.42991

# For factor instantiation with Before and After order
# For Before (1) and After (2) then factor 1 is greater than state 2! Factors compare respect the numerical order
Chol_long <- Chol_long %>% mutate( Treat = Treat %>% fct_relevel("Before", "After"))

#boxplot(Chol ~ Treat , data = Chol_long, ylab = "Nivel de colesterol" )
Chol_long %>% 
  ggplot(aes(x = Treat, y = Chol)) +
  geom_boxplot() +
  geom_jitter(aes(color = Treat)) +
  labs(y = "Cholesteron level", 
       title = "Hypercholesterolemia statin treatment")

t.test(Chol ~ Treat , data = Chol_long, alternative = "greater", paired = TRUE, var.equal = TRUE, conf.level = 0.99)


    Paired t-test

data:  Chol by Treat
t = 29.495, df = 106, p-value < 2.2e-16
alternative hypothesis: true difference in means is greater than 0
99 percent confidence interval:
 73.06903      Inf
sample estimates:
mean of the differences 
               79.42991

Ejemplo 24

Kindergarten students were the participants in a study conducted by Susan Bazyk et al. (A-32). The researchers studied the fine motor skills of 37 children receiving occupational therapy. They used an index of fine motor skills that measured hand use, eye–hand coordination, and manual dexterity before and after 7 months of occupational therapy. Higher values indicate stronger fine motor skills. The scores appear in the table in REV_C07_24.csv. Can one conclude on the basis of these data that after 7 months, the fine motor skills in a population of similar subjects would be stronger? Let α = .05. Determine the p value.

Dexterity <- read_csv("DataSets/ch07_all/REV_C07_24.csv", show_col_types = FALSE)

Dext_long <- Dexterity %>%
  pivot_longer(cols = c("Pre", "Post") , names_to = "Therapy", values_to = "Index")

Dext_long <- Dext_long %>% mutate( Therapy = Therapy %>% fct_relevel("Pre", "Post"))

# boxplot(Index ~ Therapy , data = Dext_long, ylab = "Índice de capacidad motora fina")
Dext_long %>% 
  ggplot(aes(x = Therapy, y = Index)) +
  geom_boxplot() +
  geom_jitter(aes(color = Therapy)) +
  labs(y = "Index of Motor Dexterity", 
       title = "Occupational therapy for Motor dexterity")

t.test(Index ~ Therapy , data = Dext_long, alternative = "less", paired = TRUE, var.equal = TRUE, conf.level = 0.95)


    Paired t-test

data:  Index by Therapy
t = -3.7706, df = 36, p-value = 0.0002927
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -4.313528
sample estimates:
mean of the differences 
              -7.810811