# libraries
library(tidyverse)
Ejemplos de prueba de hipótesis
Promedio de dos poblaciones
En esta parte desarrollaremos ejemplos y ejercicios de la comparación del promedio de dos poblaciones usando una prueba t. Los ejemplos son del Daniel capítulo 7.
Ejemplo 7.3.12
Can we conclude that, on the average, lymphocytes and tumor cells differ in size? The measures in EXR_C07_S03_12.csv are the cell diameters (μm) of 40 lymphocytes and 50 tumor cells obtained from biopsies of tissue from patients with melanoma. Let \(\alpha = .05\)
<- read_csv("DataSets/ch07_all/EXR_C07_S03_12.csv", show_col_types = FALSE)
Tumor
Tumor
# A tibble: 90 × 2
Size Group
<dbl> <dbl>
1 9 1
2 6.3 1
3 8.6 1
4 7.4 1
5 8.8 1
6 9.4 1
7 5.7 1
8 7 1
9 8.7 1
10 5.2 1
# … with 80 more rows
<- Tumor %>% mutate(Group = factor(Group))
Tumor
%>%
Tumor ggplot(aes(x = Group, y = Size)) +
geom_boxplot() +
geom_jitter(aes(color = Group)) +
labs(y = "Cell diameter",
title = "Lymphocytes and tumor cells of patients with melanoma")
t.test(Size ~ Group , data = Tumor, alternative = "less", var.equal = FALSE, conf.level = 0.95)
Welch Two Sample t-test
data: Size by Group
t = -22.396, df = 78.005, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 1 and group 2 is less than 0
95 percent confidence interval:
-Inf -10.15464
sample estimates:
mean in group 1 mean in group 2
6.95 17.92
Medidas repetidas
En ocasiones la comparación de las medidas de dos muestras no resulta en una diferencia clara y no se puede concuir que hay un efecto, esto en ocasiones es resultado de diferencia en las varianzas u otros factores externos que pueden resultar en medidas divergentes. Sin embargo si de una sola muestra aleatoria de sujetos se puede hacer una comparación de las dos condiciones de interés, estas medidas espureas pueden eliminarse y resultar en un método con más poder estadístico.
Ejemplo 19
William Tindall (A-28) performed a retrospective study of the records of patients receiving care for hypercholesterolemia. The table at REV_C07_19.csv gives measurements of total cholesterol for patients before and 6 weeks after taking a statin drug. Is there sufficient evidence at the α = .01 level of significance for us to conclude that the drug would result in reduction in total cholesterol in a population of similar hypercholesterolemia patients?
<- read_csv("DataSets/ch07_all/REV_C07_19.csv", show_col_types = FALSE)
Cholesterol
<- Cholesterol %>%
Chol_long pivot_longer(cols = c("Before", "After") , names_to = "Treat", values_to = "Chol")
boxplot(Chol ~ Treat , data = Chol_long, ylab = "Nivel de colesterol" )
t.test(Chol ~ Treat , data = Chol_long, alternative = "less", paired = TRUE, var.equal = TRUE, conf.level = 0.99)
Paired t-test
data: Chol by Treat
t = -29.495, df = 106, p-value < 2.2e-16
alternative hypothesis: true difference in means is less than 0
99 percent confidence interval:
-Inf -73.06903
sample estimates:
mean of the differences
-79.42991
# For factor instantiation with Before and After order
# For Before (1) and After (2) then factor 1 is greater than state 2! Factors compare respect the numerical order
<- Chol_long %>% mutate( Treat = Treat %>% fct_relevel("Before", "After"))
Chol_long
#boxplot(Chol ~ Treat , data = Chol_long, ylab = "Nivel de colesterol" )
%>%
Chol_long ggplot(aes(x = Treat, y = Chol)) +
geom_boxplot() +
geom_jitter(aes(color = Treat)) +
labs(y = "Cholesteron level",
title = "Hypercholesterolemia statin treatment")
t.test(Chol ~ Treat , data = Chol_long, alternative = "greater", paired = TRUE, var.equal = TRUE, conf.level = 0.99)
Paired t-test
data: Chol by Treat
t = 29.495, df = 106, p-value < 2.2e-16
alternative hypothesis: true difference in means is greater than 0
99 percent confidence interval:
73.06903 Inf
sample estimates:
mean of the differences
79.42991
Ejemplo 24
Kindergarten students were the participants in a study conducted by Susan Bazyk et al. (A-32). The researchers studied the fine motor skills of 37 children receiving occupational therapy. They used an index of fine motor skills that measured hand use, eye–hand coordination, and manual dexterity before and after 7 months of occupational therapy. Higher values indicate stronger fine motor skills. The scores appear in the table in REV_C07_24.csv. Can one conclude on the basis of these data that after 7 months, the fine motor skills in a population of similar subjects would be stronger? Let α = .05. Determine the p value.
<- read_csv("DataSets/ch07_all/REV_C07_24.csv", show_col_types = FALSE)
Dexterity
<- Dexterity %>%
Dext_long pivot_longer(cols = c("Pre", "Post") , names_to = "Therapy", values_to = "Index")
<- Dext_long %>% mutate( Therapy = Therapy %>% fct_relevel("Pre", "Post"))
Dext_long
# boxplot(Index ~ Therapy , data = Dext_long, ylab = "Índice de capacidad motora fina")
%>%
Dext_long ggplot(aes(x = Therapy, y = Index)) +
geom_boxplot() +
geom_jitter(aes(color = Therapy)) +
labs(y = "Index of Motor Dexterity",
title = "Occupational therapy for Motor dexterity")
t.test(Index ~ Therapy , data = Dext_long, alternative = "less", paired = TRUE, var.equal = TRUE, conf.level = 0.95)
Paired t-test
data: Index by Therapy
t = -3.7706, df = 36, p-value = 0.0002927
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -4.313528
sample estimates:
mean of the differences
-7.810811