Pomona College

Senior Theses

2005

Joseph Richards
"
Classification of Geologic Units on Ganiki Planitia Quadrangle (V14) Venus Using Statistical Clustering Methods
"
Advisor: Jo Hardin (jo.hardin@pomona.edu)

In this project, we analyzed data collected by NASA’s Venus orbiter Magellan from the Ganiki Planitia Quadrangle (V14) to determine an optimal way to allocate geologic units into groups corresponding to unit type. Previously, a team led by Dr. Eric Grosfils created a geologic map of the region using Magellan radar images. However, they performed their analysis without analyzing the numerical data encoded in these images and without considering the physical property data sets of slope, surface emissivity, and surface reflectivity. We extracted the quantitative data corresponding to radar backscatter, elevation, slope, surface emissivity, and surface reflectivity from Magellan images. Then, using the existing geologic map as a baseline, we employed mixture models and the Expectation-Maximization (EM) algorithm to devise an optimal geologic map based on the data and to identify units whose data fits more closely with a different unit type than what was assigned by the mapping. We also created modified versions of the EM algorithm, considering issues such as the area of each geologic unit and the importance of different variables. Results showed that most units were classified the same way as specified by the original geologic map, while a handful of units were consistently assigned to different groups. We conclude that these areas should be more thoroughly examined geologically.


Lee Strassenburg
"
A Statistical Comparison of the Average Waiting Times Between Flares in Lupus Patients
"
Advisor: Jo Hardin (jo.hardin@pomona.edu)

In this study I investigate the effect of race and type of lupus on the manifestation of the disease, specifically through the average number of days between recurrences of flares due to the disease. Lupus is an auto-immune disease that doctors and researchers do not completely understand. The most immediate concerns related to understanding the disease include a lack of knowledge as to what causes lupus and how to treat it. Lupus is a disease with recurring flares interspersed with periods of remission. Certain groups of lupus patients have more frequent relapses than other groups. Researchers are trying to determine what factors are significant in causing the onset of flares, and which groups of lupus patients might be more susceptible to shorter periods of time between flares, or more intense flares. The dataset I use to study lupus was collected by the Department of Nephrology at The Ohio State University. In this manuscript I explore the underlying distributions for the waiting times between flares for each of the groups I am comparing: African-Americans versus Caucasians, and renal versus non-renal lupus patients. Because of the large number of censored datapoints, survival analysis is the most readily available method to analyze the waiting time between flares. In order to understand the data more completely, I use both nonparametric and parametric methods to compare and to estimate the differences between African-Americans and Caucasians and between renal and non-renal lupus patients. Specifically, I use Kaplan-Meier curves and the logrank test to compare the survival rates between groups, the likelihood ratio test to compare estimates under the assumption that lupus flares are a Poisson Process, and the Kolmogorov-Smirnov goodness-of-fit test to determine which distribution most closely resembles the waiting time between flares in lupus patients. With this information, doctors will be better equipped to monitor the progress of lupus patients and to recommend further treatment and follow-ups.

2003

Veronica Montes De Oca
"
Methods for Analyzing Health Care Claims Data"
Advisor: Jo Hardin (jo.hardin@pomona.edu)

In the paper The Journal of Behavioral Services \& Research: ``Utilization and cost of behavioral health services: Employee
characteristics and workplace health promotion," an analysis of behavioral health care utilization and costs was performed on health care claims data, including demographic and employment related factors. A two-part modeling method was used. The first modeling method consisted of logistic regression in order to identify the factors influencing the likelihood of utilizing behavioral health services. The second modeling method was linear regression in order to identify the factors influencing the cost of behavioral health claims.

In my thesis, I will use the linear regression method with a different set of health care claims data and focus on the factors influencing the cost of radiology claims, procedures and diagnosis. In addition, I will use bootstrapping as a nonparametric method of validating traditional linear regression. I will show the advantages and disadvantages of each method.