Proyectos Finales
Probabilidad y estadística
Licenciatura en Tecnología, ENES Juriquilla.
Proyecto final 1
Noting that non-Hodgkin’s lymphomas (NHL) represent a heterogeneous
group of diseases in which prognosis is difficult to predict,
Christiansen et al. reported on the prognostic aspects of soluble
intercellular adhesion molecule-1 (sICAM-1) in NHL. Among the data
collected were the serum sICAM-1 (ng/ml) levels in four groups of
subjects: healthy controls (C), high-grade NHL (hNHL), low-grade NHL
(1NHL), and patients with hairy cell leukemia (HCL) in the file
REV_C08_58.csv.
- Perform a statistical analysis of the data that you think would yield
useful information for the researchers.
- Determine p values for each computed test statistic.
- State all assumptions that are necessary to validate your
analysis.
- Describe the population(s) about which you think inferences based on
your analysis would be applicable.
Proyecto final 2
In Kreiter et al., medical school exams were delivered via computer format. Because there were not enough computer stations to test the entire class simultaneously, the exams were administered over 2 days. Both students and faculty wondered if students testing on day 2 might have an advantage due to extra study time or a breach in test security. Thus, the researchers examined a large medical class (n = 193) tested over 2 days with three 2-hour 80-item multiple-choice exams. Students were assigned testing days via pseudorandom assignment. Of interest was whether taking a particular exam on day 1 or day 2 had a significant impact on scores. Use the data set LDS_C08_MEDSCORES.csv to determine if test, day, or interaction has significant impact on test scores. Let \(\alpha = .05\)
Proyecto final 3
Refer to the serum angiotensin-converting enzyme data on 1600 subjects (LDS_C08_SACEDATA.csv). Sarcoidosis, found throughout the world, is a systemic granulomatous disease of unknown cause. The assay of serum angiotensin-converting enzyme (SACE) is helpful in the diagnosis of active sarcoidosis. The activity of SACE is usually increased in patients with the disease, while normal levels occur in subjects who have not had the disease, those who have recovered, and patients with other granulomatous disorders. The data are the SACE values for four populations of subjects classified according to status regarding sarcoidosis: never had, A; active, B; stable, C; recovered, D. Select a simple random sample of 15 subjects from each population and perform an analysis to determine if you can conclude that the population means are different. Let \(\alpha = .05\). Use Tukey’s test to test for significant differences among individual pairs of means. Compare these results with those of second sample of 30 subjects from each population and finally use all the data and compare your results.
Proyecto final 4 (Asignado)
Refer to the urinary colony-stimulating factor data on 1500 subjects (LDS_C08_CSFDATA.csv). The data are the urinary colony-stimulating factor (CSF) levels in five populations: normal subjects and subjects with four different diseases. Each observation represents the mean colony count of four plates from a single urine specimen from a given subject. Select a simple random sample of size 15 from each of the five populations and perform an analysis of variance to determine if one may conclude that the population means are different. Let \(\alpha = .05\). Use Tukey’s HSD statistic to test for significant differences among all possible pairs of sample means. Prepare a narrative report on the results of your analysis. Compare these results with those of second sample of 30 subjects from each population and finally use all the data and compare your results.
Proyecto final 5 (Asignado)
Refer to the red blood cell data on 1050 subjects (LDS_C08_RBCDATA.csv). Suppose that you are a statistical consultant to a medical researcher who is interested in learning something about the relationship between blood folate concentrations in adult females and the quality of their diet. The researcher has available three populations of subjects: those whose diet quality is rated as good, those whose diets are fair, and those with poor diets. For each subject there is also available her red blood cell (RBC) folate value (in \(\mu g/liter\) of red cells). Draw a simple random sample of size 10 from each population and determine whether the researcher can conclude that the three populations differ with respect to mean RBC folate value. Use Tukey’s test to make all possible comparisons. Let \(\alpha = .05\) and find the p value for each test. Compare these results with those of second random sample of 25 subjects from each population and finally use all the data and compare your results.
Proyecto final 6
Refer to the serum cholesterol data on 350 subjects under three diet regimens (LDS_C08_SERUMCHO.csv). A total of 347 adult males between the ages of 30 and 65 participated in a study to investigate the relationship between the consumption of meat and serum cholesterol levels. Each subject ate beef as his only meat for a period of 20 weeks, pork as his only meat for another period of 20 weeks, and chicken or fish as his only meat for another 20-week period. At the end of each period, serum cholesterol determinations (mg/100 ml) were made on each subject. Select a simple random sample of 10 subjects from the population of 350. Use two-way analysis of variance to determine whether one should conclude that there is a difference in population mean serum cholesterol levels among the three diets. Let \(\alpha = .05\).Compare these results with those of second random sample of 30 subjects from each population and finally use all the data and compare your results.
Proyecto final 7
Potteiger et al. wished to determine if sodium citrate ingestion
would improve cycling performance and facilitate favorable metabolic
conditions during the cycling ride. Subjects were eight trained male
competitive cyclists whose mean age was 25.4 years with a standard
deviation of 6.5. Each participant completed a 30-km cycling time trial
under two conditions, following ingestion of sodium citrate and
following ingestion of a placebo. Blood samples were collected prior to
treatment ingestion (PRE-ING); prior to exercising (PRE-EX); during the
cycling ride at completion of 10, 20, and 30 km; and 15 minutes after
cessation of exercise (POST-EX). The values of partial pressures of
oxygen \((P O_2)\) and carbon dioxide
\((P CO_2)\) for each subject, under
each condition, at each measurement time are in REV_C08_44.csv. Group 1
= sodium citrate and 2 = Placebo.
- Perform a statistical analysis of the data (including hypothesis
testing and confidence interval construction) that you think would yield
useful information for the researchers.
- Determine p values for each computed test statistic.
- State all assumptions that are necessary to validate your
analysis.
Proyecto final 8 (Asignado)
Sloan et al. note that cardiac sympathetic activation and
parasympathetic withdrawal result in heart rate increases during
psychological stress. As indicators of cardiac adrenergic activity,
plasma epinephrine (E) and norepinephrine (NE) generally increase in
response to psychological challenge. Power spectral analysis of heart
period variability also provides estimates of cardiac autonomic nervous
system activity. The authors conducted a study to determine the
relationship between neurohumoral and two different spectral estimates
of cardiac sympathetic nervous system activity during a quiet resting
baseline and in response to a psychologically challenging arithmetic
task. Subjects were healthy, medication-free male and female volunteers
with a mean age of 37.8 years. None had a history of cardiac,
respiratory, or vascular disease. Among the data collected were the
measurements on E, NE, low-frequency (LF) and very-low-frequency (VLF)
power spectral indices, and low-frequency/high-frequency ratios (LH/HF).
Measurements are given for three periods: baseline (B), a mental
arithmetic task (MA), and change from baseline to task (DELTA), in file
REV_C09_42.csv.
- Perform a statistical analysis of the data (including hypothesis
testing and confidence interval construction) that you think would yield
useful information for the researchers.
- Construct graphs that you think would be helpful in illustrating the
relationships among variables.
- Determine p values for each computed test statistic.
( hint: hacer correlaciones de las variables E y NE contra las variables
de frecuencia cardiaca, para comparar condiciones de reposo (B) y
actividad (NA) antes de intentar un modelo lineal (lm) solo hacer el
modelo lineal de las variables que muestren correlación).
Proyecto final 9
Refer to the data for 1050 subjects with essential hypertension (LDS_C09_HYPERTEN.csv). Suppose that you are a statistical consultant to a medical research team interested in essential hypertension. Select a simple random sample of 80 subjects from the population and perform the analyses that you think would be useful to the researchers. Present your findings and conclusions in narrative form and illustrate with graphs where appropriate. Select a second simple random sample of 15 subjects and compare with your results of the first sample of 80.
Proyecto final 10 (Asignado)
Refer to the data for 1050 subjects with cerebral edema (LDS_C09_CEREBRAL.csv). Cerebral edema with consequent increased intracranial pressure frequently accompanies lesions resulting from head injury and other conditions that adversely affect the integrity of the brain. Available treatments for cerebral edema vary in effectiveness and undesirable side effects. One such treatment is glycerol, administered either orally or intravenously. Of interest to clinicians is the relationship between intracranial pressure and glycerol plasma concentration. Suppose that you are a statistical consultant with a research team investigating the relationship between these two variables. Select a simple random sample from the population and perform the analysis that you think would be useful to the researchers. Present your findings and conclusions in narrative form and illustrate with graphs where appropriate. Compare these results with those of two different random samples.
Proyecto final 11
Refer to the data for 1200 patients with rheumatoid arthritis
(CALCIUM). One hundred patients received the medicine at each dose
level. Suppose that you are a medical researchers wishing to gain
insight into the nature of the relationship between dose level of
prednisolone and total body calcium. Select a simple random sample of
three patients from each dose level group and do the following. Use the
total number of pairs of observations to obtain the least-squares
equation describing the relationship between dose level (the independent
variable) and total body calcium. - Draw a scatter diagram of the data
and plot the equation.
- Compute r and test for significance at the .05 level. Find the p
value.
- Compare your results with those of two different random samples of
three patients from each dose.
Proyecto final 12 (Asignado)
Yasu et al. used noninvasive magnetic resonance spectroscopy to
determine the short- and long-term effects of percutaneous transvenous
mitral commissurotomy (PTMC) on exercise capacity and metabolic
responses of skeletal muscles during exercise. Data were collected on 11
patients (2 males, 9 females) with symptomatic mitral stenosis. Their
mean age was 52 years with a standard deviation of 11. Among the data
collected were the following measurements on changes in mitral valve
area (d-MVA) and peak oxygen consumption (d-Vo2) 3, 30, and 90 days
post-PTMC in REV_C09_46.csv.
- Perform a statistical analysis of the data (including hypothesis
testing and confidence interval construction) that you think would yield
useful information for the researchers.
- Construct graphs that you think would be helpful in illustrating the
relationships among variables.
- Determine p values for each computed test statistic.
Proyecto final 13 (Asignado)
The purpose of a study by Halligan et al. was to evaluate diurnal
variation in blood pressure (BP) in women who were normotensive and
those with pre-eclampsia. The subjects were similar in age, weight, and
mean duration of gestation (35 weeks). The researchers collected the BP
readings at REV_C09_39.csv. As part of their analysis they studied the
relationship between mean day and night measurements and day/night
differences for both diastolic and systolic BP in each group. C1 = group
(0 = normotensive, 1 = pre-eclamptic); C2 = day diastolic; C3 = night
diastolic; C4 = day systolic; C5 = night systolic.
- Draw a scatter diagram of the data and plot the equation.
- Compute r and test for significance at the .05 level. Find the p
value.
Proyecto final 14 (Asignado)
Another variable of interest in the study by Reiss et al. was partial
thromboplastin (aPTT), the standard test used to monitor heparin
anticoagulation. Use the data in EXR_C09_S07_02.csv to examine the
correlation between aPTT levels as measured by the CoaguCheck
point-of-care assay and standard laboratory hospital assay in 90
subjects receiving heparin alone, heparin with warfarin, and warfarin
and exoenoxaparin.
- Draw a scatter diagram of the data and plot the equation for each of
the cases.
- Compute r and test for significance at the .05 level. Find the p value
of the models.
Proyecto final 15
The objective of a study by Sakhaee et al. was to ascertain body
content of aluminum (A1) noninvasively using the increment in serum and
urinary Al following the intravenous administration of deferoxamine
(DFO) in patients with kidney stones and osteoporotic women undergoing
long-term treatment with potassium citrate (K3Cit) or tricalcium
dicitrate (Ca3Cit2), respectively. Subjects consisted of 10 patients
with calcium nephrolithiasis and five patients with osteoporosis who
were maintained on potassium citrate or calcium citrate for 2–8 years,
respectively, plus 16 normal volunteers without a history of regular
aluminum-containing antacid use. Among the data collected were the
following 24-hour urinary aluminum excretion measurements (\(\mu g/day\)) before (PRE) and after (POST)
2-hour infusion of DFO, en REV_C13_20.csv.
- Apply one of the none parametric techniques.
- Apply one of the ANOVA techniques.
- Formulate relevant hypotheses, perform the appropriate tests, and find
p values.
- State the statistical decisions and clinical conclusions that the
results of your hypothesis tests justify.
- Describe the population(s) to which you think your inferences are
applicable.
- State the assumptions necessary for the validity of your analyses.
Proyecto final 16
The purpose of a study by Kim et al. was to investigate the serial
changes in Lp(a) lipoprotein levels with the loss of female sex hormones
by surgical menopause and with estrogen replacement therapy in the same
women. Subjects were 44 premenopausal women who underwent a
transabdominal hysterectomy (TAH). Thirty-one of the women had a TAH and
unilateral salpingo-oophorectomy (USO), and 13 had a TAH and bilateral
salpingo-oophorectomy (BSO). The women ranged in age from 30 to 53
years. Subjects in the BSO group received .625 mg of conjugated equine
estrogen daily 2 months after the operation. The data in REV_C13_23.csv
were the subjects’ total cholesterol levels before (TC0), 2 months after
(TC2), and 4 months after (TC4) the surgical procedure and hormone
replacement therapy.
- Apply one of the none parametric techniques.
- Apply one of the ANOVA techniques.
- Formulate relevant hypotheses, perform the appropriate tests, and find
p values.
- State the statistical decisions and clinical conclusions that the
results of your hypothesis tests justify.
- Describe the population(s) to which you think your inferences are
applicable.
- State the assumptions necessary for the validity of your analyses.
Proyecto final 17 (Asignado)
Heijdra et al. state that many patients with severe chronic
obstructive pulmonary disease (COPD) have low arterial oxygen saturation
during the night. These investigators conducted a study to determine
whether there is a causal relationship between respiratory muscle
dysfunction and nocturnal saturation. Subjects were 20 (5 females, 15
males) patients with COPD randomly assigned to receive either
target-flow inspiratory muscle training (TF-IMT) at 60 percent of their
maximal inspiratory mouth pressure (PImax) or sham TF-IMT at 10 percent
of PImax. Among the data collected were the endurance times (Time, s)
for each subject at the beginning of training and 10 weeks later, in
REV_C13_25.csv.
- Apply one of the none parametric techniques.
- Apply one of the ANOVA techniques.
- Formulate relevant hypotheses, perform the appropriate tests, and find
p values.
- State the statistical decisions and clinical conclusions that the
results of your hypothesis tests justify.
- Describe the population(s) to which you think your inferences are
applicable.
- State the assumptions necessary for the validity of your analyses.
Proyecto final 18 (Asignado)
The purpose of a study by Maltais et al. was to compare and correlate
the increase in arterial lactic acid (La) during exercise and the
oxidative capacity of the skeletal muscle in patients with chronic
obstructive pulmonary disease (COPD) and control subjects (C). There
were nine subjects in each group. The mean age of the patients was 62
years with a standard deviation of 5. Control subjects had a mean age of
54 years with a standard deviation of 3. Among the data collected were
the values for the activity of phosphofructokinase (PFK), hexokinase
(HK), and lactate dehydrogenase (LDH) for the two groups in the file
REV_C13_27.csv
- Apply one of the none parametric techniques.
- Apply one of the ANOVA techniques.
- Formulate relevant hypotheses, perform the appropriate tests, and find
p values.
- State the statistical decisions and clinical conclusions that the
results of your hypothesis tests justify.
- State the assumptions necessary for the validity of your analyses.