Smith College

Senior Projects and Theses in Statistics

2006

Eugenia Kim
"A Cautionary Note Regarding Count Models of Alcohol Consumption in Randomized Controlled Trials"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)

Alcohol consumption is a common outcome in randomized alcohol treatment studies. A challenge in modeling consumption outcomes is to appropriately account for the distribution of drinking, which is often highly skewed, particularly in subjects with alcohol dependence. Use of count models for outcomes in a randomized clinical trial setting is appropriate. A natural model for counts is the single-parameter Poisson distribution. One disadvantage of the Poisson is that it makes strong assumptions (i.e., the mean equals the variance). Extensions of the Poisson, such as the over-dispersed Poisson, negative binomial and two stage (hurdle) or zero-inflated models have been proposed.

Our methods are motivated by the analysis of the ASAP (Assessing Spectrum of Alcohol Problems) study, a randomized clinical trial comparing motivational interviewing to usual care for a sample of alcohol-involved inpatients at an urban hospital. These subjects were followed to see if there were differences in drinking outcomes that could be attributed to randomized group assignment (see figure). We compare the performance of various count models in a series of simulation studies of a randomized clinical trial, as well as apply them to the ASAP trial.

Standard Poisson presents a poor fit for both simulation studies and the ASAP study. In the simulation studies, when the underlying distributions were not Poisson, the Poisson model did not maintain the appropriate Type-I error rate (in one scenario it rejected the null hypothesis more than half the time when there was no group differences). For the analyses of the ASAP study, the Poisson model fit poorly and gave a different answer than any of the other models.

Use of the Poisson model is not appropriate for the analysis of this dataset. The negative binomial provides an excellent fit for these data. Use of the standard Poisson model when the mean and variance are not equal is not recommended, though extensions of the Poisson (incorporating an overdispersion parameter or use of the negative binomial distribution and/or zero-inflated models) addressed many of these shortcomings. As always, analysts are obliged to look at their data and utilize models that provide an appropriate fit in their situation.

An article based on this work is available at http://www.biomedcentral.com/1471-2288/7/9.

Ashley Smith
"Statistical Methods for Hearing Research"
Advisors: Nick Horton and Susan Voss (http://www.math.smith.edu/~nhorton)

The focus of my summer research was the development of statistical methods for research in hearing science. This is related to a number of research projects that seek to characterize measurements of reflectance within the ear, impedence data, and otoacoustic emissions and their relationship to intracranial pressure (ICP). We focused on the question of how variability within a subject (intra-subject variability) compares with variability between subjects (inter-subject variability) when looking at energy reflectance (ER) over a large range of frequencies.

Working with ear-canal reflectance data measured by Smith students who worked previously with Susan Voss and Nicholas Horton, I fit a variety of random effects variance components models in Stata that compared intrasubject variability versus intersubject variability. Frequencies were grouped into smaller ranges of frequencies in order to compare varying levels of variability. There were some complications due to missing values and convergence for some of the models. In general, intersubject variability was larger than intrasubject variability (session effect), though for frequencies between 2000-4000 Hz the subject variability was considerably smaller than the effect of session (as seen in the Figure). We also explored the use of smoothing as an approach to convergence issues. Exploratory models were fit that estimated 3 levels of variance components (adding estimation of left vs. right variability). \

2005

Mariel Finucane
"Incomplete-Data Regression Models for Longitudinal Studies of HIV/AIDS"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)

The increasingly devastating global epidemic of HIV/AIDS demands effective medical and social interventions. Statisticians have an integral role to play in the implementation of rigorous mathematical methods to the study of the causes and effects of HIV/AIDS in the human body and in communities. In this thesis, I will focus on two statistical problems that arise in the analysis of HIV data. First, when repeated measures are taken on an individual in longitudinal studies, these observations are statistically dependent on each other and can thus skew estimates of variability. Second, it is practically never the case that all intended measurements are obtained; the result is missing data, which can induce bias into the estimation of regression parameters. Classical statistical methods are unable to address either of these problems. I will describe a computationally tractable model that can accommodate incomplete longitudinal data, and then apply this model to a previously published dataset of HIV infected individuals receiving HAART who had a history of alcohol problems in order to determine the effect of alcohol use on HIV disease progression over time.

An article based on this work is available at http://www.epi-perspectives.com/content/4/1/8.

Suzanne Switzer
"The Increasing Sophistication of Statistical Methods in The New England Journal of Medicine"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)

Previous surveys (Emerson and Colditz 1979 and 1989) of Original Articles published in The New England Journal of Medicine revealed increasing use of statistical methods over time. We update these findings with data from 311 articles published from 2004-2005. A substantial fraction of articles utilized relatively sophisticated statistical methodologies such as survival analysis (61%), multiple regression (51%) or power calculations (39%). Only 13% of the articles used just simple descriptive statistics (e.g. percentages, means, confidence intervals). Knowledge of material typically included in an introductory statistics course increased this percentage to 21%. This increasing sophistication has implications for medical training and continuing education.

An excerpt of the published letter for Horton NJ and Switzer SS (Smith College '06). Statistical methods in the Journal. New England Journal of Medicine, 2005; 353(18):1977-1979 is available at http://content.nejm.org/cgi/content/extract/353/18/1977

An article based on this work appeard in Chance and is available at http://maven.smith.edu/~nhorton/doctor.pdf.

2004

Emily Shapiro
"Statistical Sleuthing During Epidemics: Maternal Influenza and Schizophrenia"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)

Schizophrenia is among the most debilitating of mental illnesses. Researchers have long sought to understand the underlying causes of schizophrenia in the hope of finding a means of prevention. We review the putative association of maternal influenza with later schizophrenia, and discuss statistical and methodological issues that arise.

Published paper for Horton NJ and Shapiro EC (Smith College '04). Statistical sleuthing during epidemics: maternal influenza and schizophrenia. Chance, 2005; 18(1):11-18 is available at http://www.math.smith.edu/~nhorton/schiz.pdf

Linjuan Qian
"Use of R as a Toolbox for Mathematical Statistics Eexploration"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)

The R language, a freely available environment for statistical computing and graphics is widely used in many fields. This "expert-friendly" system has a powerful command language and programming environment, combined with an active user community. We discuss how R is ideal as a platform to support experimentation in mathematical statistics, both at the undergraduate and graduate levels. Using a series of case studies and activities, we describe how R can be utilized in a mathematical statistics course as a toolbox for experimentation. Examples include the calculation of a running variance, maximization of a non-linear function, resampling of a statistic, simple Bayesian modeling, sampling from multivariate normal and estimation of power. These activities, often requiring only a few dozen lines of code, offer the student the opportunity to explore statistical concepts and experiment. In addition, they provide an introduction to the framework and idioms available in this rich environment.

Published paper for Horton NJ, Brown ER, and Qian L (Smith College '05). Use of R as a toolbox for mathematical statistics exploration. The American Statistician, 2004; 58(4):343-357 is available at http://www.math.smith.edu/~nhorton/R/