2006
Eugenia
Kim
"A Cautionary Note Regarding Count Models of Alcohol Consumption
in Randomized Controlled Trials"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)
Alcohol consumption is a common outcome in randomized alcohol treatment studies.
A challenge in modeling consumption outcomes is to appropriately account for
the distribution of drinking, which is often highly skewed, particularly in
subjects with alcohol dependence. Use of count models for outcomes in a randomized
clinical trial setting is appropriate. A natural model for counts is the single-parameter
Poisson distribution. One disadvantage of the Poisson is that it makes strong
assumptions (i.e., the mean equals the variance). Extensions of the Poisson,
such as the over-dispersed Poisson, negative binomial and two stage (hurdle)
or zero-inflated models have been proposed.
Our methods are motivated by the analysis of the ASAP (Assessing Spectrum of Alcohol Problems) study, a randomized clinical trial comparing motivational interviewing to usual care for a sample of alcohol-involved inpatients at an urban hospital. These subjects were followed to see if there were differences in drinking outcomes that could be attributed to randomized group assignment (see figure). We compare the performance of various count models in a series of simulation studies of a randomized clinical trial, as well as apply them to the ASAP trial.
Standard Poisson presents a poor fit for both simulation studies and the ASAP study. In the simulation studies, when the underlying distributions were not Poisson, the Poisson model did not maintain the appropriate Type-I error rate (in one scenario it rejected the null hypothesis more than half the time when there was no group differences). For the analyses of the ASAP study, the Poisson model fit poorly and gave a different answer than any of the other models.
Use of the Poisson model is not appropriate for the analysis of this dataset. The negative binomial provides an excellent fit for these data. Use of the standard Poisson model when the mean and variance are not equal is not recommended, though extensions of the Poisson (incorporating an overdispersion parameter or use of the negative binomial distribution and/or zero-inflated models) addressed many of these shortcomings. As always, analysts are obliged to look at their data and utilize models that provide an appropriate fit in their situation.
An article based on this work is available at http://www.biomedcentral.com/1471-2288/7/9.
Ashley Smith
"Statistical Methods for Hearing Research"
Advisors: Nick Horton and Susan Voss (http://www.math.smith.edu/~nhorton)
Working with ear-canal reflectance data measured by Smith students who worked previously with Susan Voss and Nicholas Horton, I fit a variety of random effects variance components models in Stata that compared intrasubject variability versus intersubject variability. Frequencies were grouped into smaller ranges of frequencies in order to compare varying levels of variability. There were some complications due to missing values and convergence for some of the models. In general, intersubject variability was larger than intrasubject variability (session effect), though for frequencies between 2000-4000 Hz the subject variability was considerably smaller than the effect of session (as seen in the Figure). We also explored the use of smoothing as an approach to convergence issues. Exploratory models were fit that estimated 3 levels of variance components (adding estimation of left vs. right variability). \
2005
Mariel Finucane
"Incomplete-Data Regression Models for Longitudinal Studies of
HIV/AIDS"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)
The increasingly devastating global epidemic of HIV/AIDS demands effective medical
and social interventions. Statisticians have an integral role to play in the
implementation of rigorous mathematical methods to the study of the causes and
effects of HIV/AIDS in the human body and in communities. In this thesis, I
will focus on two statistical problems that arise in the analysis of HIV data.
First, when repeated measures are taken on an individual in longitudinal studies,
these observations are statistically dependent on each other and can thus skew
estimates of variability. Second, it is practically never the case that all
intended measurements are obtained; the result is missing data, which can induce
bias into the estimation of regression parameters. Classical statistical methods
are unable to address either of these problems. I will describe a computationally
tractable model that can accommodate incomplete longitudinal data, and then
apply this model to a previously published dataset of HIV infected individuals
receiving HAART who had a history of alcohol problems in order to determine
the effect of alcohol use on HIV disease progression over time.
An article based on this work is available at http://www.epi-perspectives.com/content/4/1/8.
Suzanne
Switzer
"The Increasing Sophistication of Statistical Methods in The New
England Journal of Medicine"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)
Previous surveys (Emerson and Colditz 1979 and 1989) of Original Articles published
in The New England Journal of Medicine revealed increasing use of statistical
methods over time. We update these findings with data from 311 articles published
from 2004-2005. A substantial fraction of articles utilized relatively sophisticated
statistical methodologies such as survival analysis (61%), multiple regression
(51%) or power calculations (39%). Only 13% of the articles used just simple
descriptive statistics (e.g. percentages, means, confidence intervals). Knowledge
of material typically included in an introductory statistics course increased
this percentage to 21%. This increasing sophistication has implications for
medical training and continuing education.
An excerpt of the published letter for Horton NJ and Switzer SS (Smith College '06). Statistical methods in the Journal. New England Journal of Medicine, 2005; 353(18):1977-1979 is available at http://content.nejm.org/cgi/content/extract/353/18/1977
An article based on this work appeard in Chance and is available at
http://maven.smith.edu/~nhorton/doctor.pdf.
2004
Emily Shapiro
"Statistical Sleuthing During Epidemics: Maternal Influenza and
Schizophrenia"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)
Schizophrenia is among the most debilitating of mental illnesses.
Researchers have long sought to understand the underlying causes of schizophrenia
in the hope of finding a means of prevention. We review the
putative association of maternal influenza with later schizophrenia, and
discuss statistical and methodological issues that arise.
Published paper for Horton NJ and Shapiro EC (Smith College '04). Statistical sleuthing during epidemics: maternal influenza and schizophrenia. Chance, 2005; 18(1):11-18 is available at http://www.math.smith.edu/~nhorton/schiz.pdf
Linjuan
Qian
"Use of R as a Toolbox for Mathematical Statistics Eexploration"
Advisor: Nick Horton (http://www.math.smith.edu/~nhorton)
The R language, a freely available environment for statistical computing and
graphics is widely used in many fields. This "expert-friendly" system has a
powerful command language and programming environment, combined with an active
user community. We discuss how R is ideal as a platform to support experimentation
in mathematical statistics, both at the undergraduate and graduate levels. Using
a series of case studies and activities, we describe how R can be utilized in
a mathematical statistics course as a toolbox for experimentation. Examples
include the calculation of a running variance, maximization of a non-linear
function, resampling of a statistic, simple Bayesian modeling, sampling from
multivariate normal and estimation of power. These activities, often requiring
only a few dozen lines of code, offer the student the opportunity to explore
statistical concepts and experiment. In addition, they provide an introduction
to the framework and idioms available in this rich environment.
Published paper for Horton NJ, Brown ER, and Qian L (Smith College '05). Use of R as a toolbox for mathematical statistics exploration. The American Statistician, 2004; 58(4):343-357 is available at http://www.math.smith.edu/~nhorton/R/