DATASURFING ON THE WORLD WIDE WEB - Part II

Robin H. Lock
Department of Mathematics, Computer Science, and Statistics
St. Lawrence University
Canton, NY 13617
rlock@stlawu.edu

Outline for a talk at the 2016 Joint Statistical Meetings


ABSTRACT: This is a continuation of a presentation from the last JSM in Chicago (1996). At that time we looked at web sources for students and instructors to obtain real data for use in projects and class examples. What’s changed in this regard over the past 20 years? Where are some places to go now to get easy access to useful data? What new challenges have emerged for obtaining data from the ever expanding web? .


CATEGORIES OF DATA SOURCES

  • Dataset Archives with Teaching Suport
  • Pages of Data Links
  • Government Sources
  • R Packages
  • Data from Visualizations
  • More Data for Countries
  • Fun and Games
  • Data Scraping

  • Dataset Archives with Teaching Support

  • Journal of Statistics Education Data Archive More than 100 datasets and documentation contributed by statistics teachers for classroom use. At least 80 of these datasets are tied to longer JSE articles discussing their use in statistics classes. Jenny Baglivo has made a quick summary of some of her favorites from this collection.
  • DASL - Dataset and Stroy Library A collection of datasets and related documentation (stories) which may be searched by data subjects and/or statistical techniques. Thanks to Paul Velleman and DataDesk for taking over hosting of this project. .
  • TSHS Resources Portal A new collection of resources started by the ASA's Section on Teaching Statistics in the Health Sciences. A limited number of datasets at this point, but they are just getting started and have good support for using the data in class. .
  • Statistical Datasets A collection of links to datasets and documantation organized by statistical methods (multiple regression, time series, etc.). Housed at the University of Massachusetts - Amherst, with many datasets from textbooks (especially Hosmer and Lemeshow).

  • Pages of Data Links

  • Datasurfing A page I maintain for getting students started on finding data on the Web.
  • Sports Data Page Links to current and archived data related to various sports.
  • Data Sources Maintained by David Rosen and linked from the WWW Virtual Library: Statistics

  • Government Sources

  • Data.gov "The home of the U.S. Government's open data." Searchable links to hundreds of thousands of datasets. Try "College Scorecard" to get a click away from downloading a .csv file with infomration on almost a hundred varaibles for more than 7000 colleges and universities.

  • R Packages

  • Rdatasets A collection of data sets from various R packages (e.g. datasets, car, Ecdat, MASS, HistData, survival, ...) mintained by Vincent Arel-Bundock. Current list has 758 datasets from more than 30 R packages with links to the data as .csv files and documentation (without neeeding R).

    Several R packages with good data for teaching (requires R to get the data) include ...

  • Mosaic A collection of data sets from the Mosaic package developed by Randall Prium, Daniel Kaplan, and Nicholas Horton.
  • Lock5Data Datasets from the textbook "Statistic: Unlocking the Power of Data" by Lock^5 (Wiley), also availabe at lock5stat.com
  • Stat2Data Datasets from the textbook "Stat2: Models for a World of Data" by Cannon, et al. (Freeman).

  • Data from Visualizations

  • Gapminder Country Data Download data on countries that drives the neat interactive displays at Hans Roslings Gapminder World

  • More Data for Countries

  • World Bank Open Data Search by individual countries, general categories, or specific indicators.
  • CIA Factbook Lots of country level data, but trickier to get it in dowloadable format. Look for "Country Comparisons". Variables there have a "Dowload Data" link, but countries are ordered by that particular variable.

  • Funs and Games

  • World Bank Open Data Search by individual countries, general categories, or specific indicators.

  • Data Scraping

  • IMBD TV Episode Ratings Search by individual countries, general categories, or specific indicators.