Proyectos Finales

Probabilidad y estadística

Licenciatura en Tecnología, ENES Juriquilla.

Proyecto final 1

Noting that non-Hodgkin’s lymphomas (NHL) represent a heterogeneous group of diseases in which prognosis is difficult to predict, Christiansen et al. reported on the prognostic aspects of soluble intercellular adhesion molecule-1 (sICAM-1) in NHL. Among the data collected were the serum sICAM-1 (ng/ml) levels in four groups of subjects: healthy controls (C), high-grade NHL (hNHL), low-grade NHL (1NHL), and patients with hairy cell leukemia (HCL) in the file REV_C08_58.csv.
- Perform a statistical analysis of the data that you think would yield useful information for the researchers.
- Determine p values for each computed test statistic.
- State all assumptions that are necessary to validate your analysis.
- Describe the population(s) about which you think inferences based on your analysis would be applicable.

Proyecto final 2

In Kreiter et al., medical school exams were delivered via computer format. Because there were not enough computer stations to test the entire class simultaneously, the exams were administered over 2 days. Both students and faculty wondered if students testing on day 2 might have an advantage due to extra study time or a breach in test security. Thus, the researchers examined a large medical class (n = 193) tested over 2 days with three 2-hour 80-item multiple-choice exams. Students were assigned testing days via pseudorandom assignment. Of interest was whether taking a particular exam on day 1 or day 2 had a significant impact on scores. Use the data set LDS_C08_MEDSCORES.csv to determine if test, day, or interaction has significant impact on test scores. Let \(\alpha = .05\)

Proyecto final 3

Refer to the serum angiotensin-converting enzyme data on 1600 subjects (LDS_C08_SACEDATA.csv). Sarcoidosis, found throughout the world, is a systemic granulomatous disease of unknown cause. The assay of serum angiotensin-converting enzyme (SACE) is helpful in the diagnosis of active sarcoidosis. The activity of SACE is usually increased in patients with the disease, while normal levels occur in subjects who have not had the disease, those who have recovered, and patients with other granulomatous disorders. The data are the SACE values for four populations of subjects classified according to status regarding sarcoidosis: never had, A; active, B; stable, C; recovered, D. Select a simple random sample of 15 subjects from each population and perform an analysis to determine if you can conclude that the population means are different. Let \(\alpha = .05\). Use Tukey’s test to test for significant differences among individual pairs of means. Compare these results with those of second sample of 30 subjects from each population and finally use all the data and compare your results.

Proyecto final 4 (Asignado)

Refer to the urinary colony-stimulating factor data on 1500 subjects (LDS_C08_CSFDATA.csv). The data are the urinary colony-stimulating factor (CSF) levels in five populations: normal subjects and subjects with four different diseases. Each observation represents the mean colony count of four plates from a single urine specimen from a given subject. Select a simple random sample of size 15 from each of the five populations and perform an analysis of variance to determine if one may conclude that the population means are different. Let \(\alpha = .05\). Use Tukey’s HSD statistic to test for significant differences among all possible pairs of sample means. Prepare a narrative report on the results of your analysis. Compare these results with those of second sample of 30 subjects from each population and finally use all the data and compare your results.

Proyecto final 5 (Asignado)

Refer to the red blood cell data on 1050 subjects (LDS_C08_RBCDATA.csv). Suppose that you are a statistical consultant to a medical researcher who is interested in learning something about the relationship between blood folate concentrations in adult females and the quality of their diet. The researcher has available three populations of subjects: those whose diet quality is rated as good, those whose diets are fair, and those with poor diets. For each subject there is also available her red blood cell (RBC) folate value (in \(\mu g/liter\) of red cells). Draw a simple random sample of size 10 from each population and determine whether the researcher can conclude that the three populations differ with respect to mean RBC folate value. Use Tukey’s test to make all possible comparisons. Let \(\alpha = .05\) and find the p value for each test. Compare these results with those of second random sample of 25 subjects from each population and finally use all the data and compare your results.

Proyecto final 6

Refer to the serum cholesterol data on 350 subjects under three diet regimens (LDS_C08_SERUMCHO.csv). A total of 347 adult males between the ages of 30 and 65 participated in a study to investigate the relationship between the consumption of meat and serum cholesterol levels. Each subject ate beef as his only meat for a period of 20 weeks, pork as his only meat for another period of 20 weeks, and chicken or fish as his only meat for another 20-week period. At the end of each period, serum cholesterol determinations (mg/100 ml) were made on each subject. Select a simple random sample of 10 subjects from the population of 350. Use two-way analysis of variance to determine whether one should conclude that there is a difference in population mean serum cholesterol levels among the three diets. Let \(\alpha = .05\).Compare these results with those of second random sample of 30 subjects from each population and finally use all the data and compare your results.

Proyecto final 7

Potteiger et al. wished to determine if sodium citrate ingestion would improve cycling performance and facilitate favorable metabolic conditions during the cycling ride. Subjects were eight trained male competitive cyclists whose mean age was 25.4 years with a standard deviation of 6.5. Each participant completed a 30-km cycling time trial under two conditions, following ingestion of sodium citrate and following ingestion of a placebo. Blood samples were collected prior to treatment ingestion (PRE-ING); prior to exercising (PRE-EX); during the cycling ride at completion of 10, 20, and 30 km; and 15 minutes after cessation of exercise (POST-EX). The values of partial pressures of oxygen \((P O_2)\) and carbon dioxide \((P CO_2)\) for each subject, under each condition, at each measurement time are in REV_C08_44.csv. Group 1 = sodium citrate and 2 = Placebo.
- Perform a statistical analysis of the data (including hypothesis testing and confidence interval construction) that you think would yield useful information for the researchers.
- Determine p values for each computed test statistic.
- State all assumptions that are necessary to validate your analysis.

Proyecto final 8 (Asignado)

Sloan et al. note that cardiac sympathetic activation and parasympathetic withdrawal result in heart rate increases during psychological stress. As indicators of cardiac adrenergic activity, plasma epinephrine (E) and norepinephrine (NE) generally increase in response to psychological challenge. Power spectral analysis of heart period variability also provides estimates of cardiac autonomic nervous system activity. The authors conducted a study to determine the relationship between neurohumoral and two different spectral estimates of cardiac sympathetic nervous system activity during a quiet resting baseline and in response to a psychologically challenging arithmetic task. Subjects were healthy, medication-free male and female volunteers with a mean age of 37.8 years. None had a history of cardiac, respiratory, or vascular disease. Among the data collected were the measurements on E, NE, low-frequency (LF) and very-low-frequency (VLF) power spectral indices, and low-frequency/high-frequency ratios (LH/HF). Measurements are given for three periods: baseline (B), a mental arithmetic task (MA), and change from baseline to task (DELTA), in file REV_C09_42.csv.
- Perform a statistical analysis of the data (including hypothesis testing and confidence interval construction) that you think would yield useful information for the researchers.
- Construct graphs that you think would be helpful in illustrating the relationships among variables.
- Determine p values for each computed test statistic.
( hint: hacer correlaciones de las variables E y NE contra las variables de frecuencia cardiaca, para comparar condiciones de reposo (B) y actividad (NA) antes de intentar un modelo lineal (lm) solo hacer el modelo lineal de las variables que muestren correlación).

Proyecto final 9

Refer to the data for 1050 subjects with essential hypertension (LDS_C09_HYPERTEN.csv). Suppose that you are a statistical consultant to a medical research team interested in essential hypertension. Select a simple random sample of 80 subjects from the population and perform the analyses that you think would be useful to the researchers. Present your findings and conclusions in narrative form and illustrate with graphs where appropriate. Select a second simple random sample of 15 subjects and compare with your results of the first sample of 80.

Proyecto final 10 (Asignado)

Refer to the data for 1050 subjects with cerebral edema (LDS_C09_CEREBRAL.csv). Cerebral edema with consequent increased intracranial pressure frequently accompanies lesions resulting from head injury and other conditions that adversely affect the integrity of the brain. Available treatments for cerebral edema vary in effectiveness and undesirable side effects. One such treatment is glycerol, administered either orally or intravenously. Of interest to clinicians is the relationship between intracranial pressure and glycerol plasma concentration. Suppose that you are a statistical consultant with a research team investigating the relationship between these two variables. Select a simple random sample from the population and perform the analysis that you think would be useful to the researchers. Present your findings and conclusions in narrative form and illustrate with graphs where appropriate. Compare these results with those of two different random samples.

Proyecto final 11

Refer to the data for 1200 patients with rheumatoid arthritis (CALCIUM). One hundred patients received the medicine at each dose level. Suppose that you are a medical researchers wishing to gain insight into the nature of the relationship between dose level of prednisolone and total body calcium. Select a simple random sample of three patients from each dose level group and do the following. Use the total number of pairs of observations to obtain the least-squares equation describing the relationship between dose level (the independent variable) and total body calcium. - Draw a scatter diagram of the data and plot the equation.
- Compute r and test for significance at the .05 level. Find the p value.
- Compare your results with those of two different random samples of three patients from each dose.

Proyecto final 12 (Asignado)

Yasu et al. used noninvasive magnetic resonance spectroscopy to determine the short- and long-term effects of percutaneous transvenous mitral commissurotomy (PTMC) on exercise capacity and metabolic responses of skeletal muscles during exercise. Data were collected on 11 patients (2 males, 9 females) with symptomatic mitral stenosis. Their mean age was 52 years with a standard deviation of 11. Among the data collected were the following measurements on changes in mitral valve area (d-MVA) and peak oxygen consumption (d-Vo2) 3, 30, and 90 days post-PTMC in REV_C09_46.csv.
- Perform a statistical analysis of the data (including hypothesis testing and confidence interval construction) that you think would yield useful information for the researchers.
- Construct graphs that you think would be helpful in illustrating the relationships among variables.
- Determine p values for each computed test statistic.

Proyecto final 13 (Asignado)

The purpose of a study by Halligan et al. was to evaluate diurnal variation in blood pressure (BP) in women who were normotensive and those with pre-eclampsia. The subjects were similar in age, weight, and mean duration of gestation (35 weeks). The researchers collected the BP readings at REV_C09_39.csv. As part of their analysis they studied the relationship between mean day and night measurements and day/night differences for both diastolic and systolic BP in each group. C1 = group (0 = normotensive, 1 = pre-eclamptic); C2 = day diastolic; C3 = night diastolic; C4 = day systolic; C5 = night systolic.
- Draw a scatter diagram of the data and plot the equation.
- Compute r and test for significance at the .05 level. Find the p value.

Proyecto final 14 (Asignado)

Another variable of interest in the study by Reiss et al. was partial thromboplastin (aPTT), the standard test used to monitor heparin anticoagulation. Use the data in EXR_C09_S07_02.csv to examine the correlation between aPTT levels as measured by the CoaguCheck point-of-care assay and standard laboratory hospital assay in 90 subjects receiving heparin alone, heparin with warfarin, and warfarin and exoenoxaparin.
- Draw a scatter diagram of the data and plot the equation for each of the cases.
- Compute r and test for significance at the .05 level. Find the p value of the models.

Proyecto final 15

The objective of a study by Sakhaee et al. was to ascertain body content of aluminum (A1) noninvasively using the increment in serum and urinary Al following the intravenous administration of deferoxamine (DFO) in patients with kidney stones and osteoporotic women undergoing long-term treatment with potassium citrate (K3Cit) or tricalcium dicitrate (Ca3Cit2), respectively. Subjects consisted of 10 patients with calcium nephrolithiasis and five patients with osteoporosis who were maintained on potassium citrate or calcium citrate for 2–8 years, respectively, plus 16 normal volunteers without a history of regular aluminum-containing antacid use. Among the data collected were the following 24-hour urinary aluminum excretion measurements (\(\mu g/day\)) before (PRE) and after (POST) 2-hour infusion of DFO, en REV_C13_20.csv.
- Apply one of the none parametric techniques.
- Apply one of the ANOVA techniques.
- Formulate relevant hypotheses, perform the appropriate tests, and find p values.
- State the statistical decisions and clinical conclusions that the results of your hypothesis tests justify.
- Describe the population(s) to which you think your inferences are applicable.
- State the assumptions necessary for the validity of your analyses.

Proyecto final 16

The purpose of a study by Kim et al. was to investigate the serial changes in Lp(a) lipoprotein levels with the loss of female sex hormones by surgical menopause and with estrogen replacement therapy in the same women. Subjects were 44 premenopausal women who underwent a transabdominal hysterectomy (TAH). Thirty-one of the women had a TAH and unilateral salpingo-oophorectomy (USO), and 13 had a TAH and bilateral salpingo-oophorectomy (BSO). The women ranged in age from 30 to 53 years. Subjects in the BSO group received .625 mg of conjugated equine estrogen daily 2 months after the operation. The data in REV_C13_23.csv were the subjects’ total cholesterol levels before (TC0), 2 months after (TC2), and 4 months after (TC4) the surgical procedure and hormone replacement therapy.
- Apply one of the none parametric techniques.
- Apply one of the ANOVA techniques.
- Formulate relevant hypotheses, perform the appropriate tests, and find p values.
- State the statistical decisions and clinical conclusions that the results of your hypothesis tests justify.
- Describe the population(s) to which you think your inferences are applicable.
- State the assumptions necessary for the validity of your analyses.

Proyecto final 17 (Asignado)

Heijdra et al. state that many patients with severe chronic obstructive pulmonary disease (COPD) have low arterial oxygen saturation during the night. These investigators conducted a study to determine whether there is a causal relationship between respiratory muscle dysfunction and nocturnal saturation. Subjects were 20 (5 females, 15 males) patients with COPD randomly assigned to receive either target-flow inspiratory muscle training (TF-IMT) at 60 percent of their maximal inspiratory mouth pressure (PImax) or sham TF-IMT at 10 percent of PImax. Among the data collected were the endurance times (Time, s) for each subject at the beginning of training and 10 weeks later, in REV_C13_25.csv.
- Apply one of the none parametric techniques.
- Apply one of the ANOVA techniques.
- Formulate relevant hypotheses, perform the appropriate tests, and find p values.
- State the statistical decisions and clinical conclusions that the results of your hypothesis tests justify.
- Describe the population(s) to which you think your inferences are applicable.
- State the assumptions necessary for the validity of your analyses.

Proyecto final 18 (Asignado)

The purpose of a study by Maltais et al. was to compare and correlate the increase in arterial lactic acid (La) during exercise and the oxidative capacity of the skeletal muscle in patients with chronic obstructive pulmonary disease (COPD) and control subjects (C). There were nine subjects in each group. The mean age of the patients was 62 years with a standard deviation of 5. Control subjects had a mean age of 54 years with a standard deviation of 3. Among the data collected were the values for the activity of phosphofructokinase (PFK), hexokinase (HK), and lactate dehydrogenase (LDH) for the two groups in the file REV_C13_27.csv
- Apply one of the none parametric techniques.
- Apply one of the ANOVA techniques.
- Formulate relevant hypotheses, perform the appropriate tests, and find p values.
- State the statistical decisions and clinical conclusions that the results of your hypothesis tests justify.
- State the assumptions necessary for the validity of your analyses.