2005
Joseph Richards
"Classification of Geologic Units on Ganiki Planitia Quadrangle
(V14) Venus Using Statistical Clustering Methods"
Advisor: Jo Hardin (jo.hardin@pomona.edu)
In this project, we analyzed data collected by NASA’s Venus orbiter Magellan
from the Ganiki Planitia Quadrangle (V14) to determine an optimal way to allocate
geologic units into groups corresponding to unit type. Previously, a team led
by Dr. Eric Grosfils created a geologic map of the region using Magellan radar
images. However, they performed their analysis without analyzing the numerical
data encoded in these images and without considering the physical property data
sets of slope, surface emissivity, and surface reflectivity. We extracted the
quantitative data corresponding to radar backscatter, elevation, slope, surface
emissivity, and surface reflectivity from Magellan images. Then, using the existing
geologic map as a baseline, we employed mixture models and the Expectation-Maximization
(EM) algorithm to devise an optimal geologic map based on the data and to identify
units whose data fits more closely with a different unit type than what was
assigned by the mapping. We also created modified versions of the EM algorithm,
considering issues such as the area of each geologic unit and the importance
of different variables. Results showed that most units were classified the same
way as specified by the original geologic map, while a handful of units were
consistently assigned to different groups. We conclude that these areas should
be more thoroughly examined geologically.
Lee Strassenburg
"A Statistical Comparison of the Average Waiting Times Between
Flares in Lupus Patients"
Advisor: Jo Hardin (jo.hardin@pomona.edu)
In this study I investigate the effect of race and type of lupus on the manifestation
of the disease, specifically through the average number of days between recurrences
of flares due to the disease. Lupus is an auto-immune disease that doctors and
researchers do not completely understand. The most immediate concerns related
to understanding the disease include a lack of knowledge as to what causes lupus
and how to treat it. Lupus is a disease with recurring flares interspersed with
periods of remission. Certain groups of lupus patients have more frequent relapses
than other groups. Researchers are trying to determine what factors are significant
in causing the onset of flares, and which groups of lupus patients might be
more susceptible to shorter periods of time between flares, or more intense
flares. The dataset I use to study lupus was collected by the Department of
Nephrology at The Ohio State University. In this manuscript I explore the underlying
distributions for the waiting times between flares for each of the groups I
am comparing: African-Americans versus Caucasians, and renal versus non-renal
lupus patients. Because of the large number of censored datapoints, survival
analysis is the most readily available method to analyze the waiting time between
flares. In order to understand the data more completely, I use both nonparametric
and parametric methods to compare and to estimate the differences between African-Americans
and Caucasians and between renal and non-renal lupus patients. Specifically,
I use Kaplan-Meier curves and the logrank test to compare the survival rates
between groups, the likelihood ratio test to compare estimates under the assumption
that lupus flares are a Poisson Process, and the Kolmogorov-Smirnov goodness-of-fit
test to determine which distribution most closely resembles the waiting time
between flares in lupus patients. With this information, doctors will be better
equipped to monitor the progress of lupus patients and to recommend further
treatment and follow-ups.
2003
Veronica Montes De Oca
"Methods for
Analyzing Health Care Claims Data"
Advisor: Jo Hardin (jo.hardin@pomona.edu)
In the paper The Journal of Behavioral Services \& Research: ``Utilization
and cost of behavioral health services: Employee
characteristics and workplace health promotion," an analysis of behavioral
health care utilization and costs was performed on health care claims data,
including demographic and employment related factors. A two-part modeling method
was used. The first modeling method consisted of logistic regression in order
to identify the factors influencing the likelihood of utilizing behavioral health
services. The second modeling method was linear regression in order to identify
the factors influencing the cost of behavioral health claims.
In my thesis, I will use the linear regression method with a different set of health care claims data and focus on the factors influencing the cost of radiology claims, procedures and diagnosis. In addition, I will use bootstrapping as a nonparametric method of validating traditional linear regression. I will show the advantages and disadvantages of each method.