A FRAMEWORK FOR TEACHING
STATISTICS
WITHIN THE K-12 MATHEMATICS CURRICULUM
[Draft
version
Introduction
Statistical literacy is required for daily personal choices. Statistics provide information on the composition of foods and thus inform our choices at the grocery store. Statistics help to establish the safety and effectiveness of drugs to help us choose a treatment. Statistics help to establish the safety of toys to assure that our little ones are not at risk. Our investment choices are guided by a plethora of statistical information about stocks and bonds. The Nielsen ratings decide which shows will survive on television and thus affect what is available. Many products have a previous statistical history and our choices of products can be affected by awareness of this history. The design of an automobile is aided by anthropometrics, the statistics of the human body, to enhance passenger comfort. Statistical ratings of fuel efficiency, safety and reliability are available to help us select a vehicle.
Efforts to improve quality and accountability are prominent among the
many ways that statistical thinking and tools can be used to enhance
productivity. The competitive marketplace demands quality. Quality
control practices such as the statistical monitoring of design and
manufacturing processes identify where improvement can be made and lead to
better product quality. Systems of accountability can help produce more effective employees and
organizations, but many accountability systems now in place are not based on
sound statistical principles and may, in fact, have the opposite effect from
the one desired. Good accountability
systems require proper use of statistical tools to determine and apply
appropriate criteria.
The Federal Drug Administration requires extensive testing of drugs to determine effectiveness and side effects before they can be sold. A recent advertisement for a drug designed to reduce blood clots stated “PLAVIX, added to aspirin and your current medications, helps raise your protection against heart attack or stroke.” But the advertisement also warns that “The risk of bleeding may increase with PLAVIX...”
This was determined by a clinical trial involving over 12,000 subjects. Among the 6259 taking PLAVIX + aspirin 3.7% showed major bleeding problems while only 2.7% of the 6303 taking the placebo had major bleeding. This is viewed as a “statistically significant” result.
Statistical literacy involves a healthy dose of skepticism about “scientific” findings. Is the information about side effects of PLAVIX treatment reliable? A statistically literate person should ask such questions and be able to answer them intelligently. A statistically literate high school graduate will be able to understand the conclusions from scientific investigations and to offer an informed opinion about the legitimacy of the reported results. To quote from Mathematics and Democracy: The Case for Quantitative Literacy, such knowledge “empowers people by giving them tools to think for themselves, to ask intelligent questions of experts, and to confront authority confidently. These are skills required to survive in the modern world.”
The Case for Statistics
Education
Over the past quarter century, statistics (often labeled
data analysis and probability) has become a key component of the K-12
mathematics curriculum. Advances in
technology and in modern methods of data analysis of the 1980’s, coupled with
the data richness of society in the information age, led to the development of
curriculum materials geared toward introducing statistical concepts into the
school curriculum as early as the elementary grades. This grass-roots effort was given sanction by
the National Council of Teachers of Mathematics (NCTM) when their influential
document Curriculum and Evaluation
Standards for School Mathematics, published in 1989, included Data Analysis
and Probability as one of the five content strands. As this document and its 2000 replacement
entitled Principles and Standards for
School Mathematics became the basis for reform of mathematics curricula in
many states, the acceptance of and interest in statistics as part of
mathematics education gained strength.
In recent years many mathematics educators and statisticians have
devoted large segments of their careers to the improvement in statistics
education materials and pedagogical techniques.
NCTM is not the only group calling for improved statistics education beginning at the school level. The National Assessment of Educational Progress (NAEP) is developed around the same strands as in the NCTM Standards, with data analysis and probability questions playing an increasingly prominent role in the NAEP exam.
The emerging quantitative literacy movement calls for greater emphasis on practical quantitative skills that will help assure success for high school graduates in life and work; many of these skills are statistical in nature. To quote from Mathematics and Democracy: The Case for Quantitative Literacy :
· Quantitative literacy, also called numeracy, is the natural tool for comprehending information in the computer age. The expectation that ordinary citizens be quantitatively literate is primarily a phenomenon of the late twentieth century.
· Unfortunately, despite years of study and life experience in an environment immersed in data, many educated adults remain functionally illiterate.
· Quantitative literacy empowers people by giving them tools to think for themselves, to ask intelligent questions of experts, and to confront authority confidently. These are the skills required to thrive in the modern world.
A recent study entitled Ready or Not: Creating a High School Diploma That Counts from the American Diploma Project recommends "must have" competencies needed for high school graduates "to succeed in postsecondary education or in high-performance, high- growth jobs" include, in addition to algebra and geometry, aspects of data analysis, statistics, and other applications that are vitally important for other subjects as well as for employment in today's data-rich economy.
Statistics education as proposed in this Framework can enable the "must have" competencies for graduates to “thrive in the modern world”.
NCTM Standards and the Framework
The main objective of this document is to provide a
conceptual Framework for K-12 statistics education. The foundation
for this Framework rests on the NCTM Principles and Standards for School
Mathematics.
The Framework is intended to support the objectives of the NCTM Principles and Standards. It is intended to complement the NCTM recommendations, not to supplant them.
The NCTM Principles and Standards describes the statistics content strand as follows.
Instructional programs from pre-kindergarten through grade 12 should enable all students to—
· formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them;
· select and use appropriate statistical methods to analyze data;
· develop and evaluate inferences and predictions that are based on data;
· understand and apply basic concepts of probability.
The Data Analysis and Probability Standard recommends that students formulate questions that can be answered using data and addresses what is involved in gathering and using the data wisely. Students should learn how to collect data, organize their own or others' data, and display the data in graphs and charts that will be useful in answering their questions. This Standard also includes learning some methods for analyzing data and some ways of making inferences and drawing conclusions from data. The basic concepts and applications of probability are also addressed, with an emphasis on the way that probability and statistics are related.
The NCTM document elaborates on these themes somewhat and provides examples of the types of lessons and activities that might be used in a classroom. Statistics, however, is a relatively new subject for many teachers who have not had an opportunity to develop sound knowledge of the principles and concepts underlying the practices of data analysis that they are now called upon to teach. These teachers do not clearly understand the difference between statistics and mathematics. They do not see the statistics curriculum for grades K-12 as a cohesive and coherent curriculum strand. These teachers may not see how the overall statistics curriculum provides a developmental sequence of learning experiences.
This Framework provides a conceptual structure for statistics education which gives a coherent picture of the overall curriculum. This structure adds to but does not replace the NCTM recommendations.
The Difference between Statistics and Mathematics
"Statistics is a methodological discipline. It exists not for itself but rather to offer to other fields of study a coherent set of ideas and tools for dealing with data. The need for such a discipline arises from the omnipresence of variability. " Cobb and Moore
A major objective of statistics education is to help students develop statistical thinking. Statistical thinking, in large part, must deal with this omnipresence of variability; statistical problem solving and decision making depend on understanding, explaining and quantifying the variability in the data.
It is this focus on variability in data that sets statistics apart from mathematics.
The
Nature of Variability
There are many different sources of variability in data.
Some of the important sources are described below.
Repeated measurements on the same individual vary. Sometimes two measurements vary because the measuring device produces unreliable results, like when we try to measure a large distance with a small ruler. Other times variability results from changes in the system being measured. For example, even with a very precise measuring device your recorded blood pressure would differ from one moment to the next.
Variability is inherent in nature. Individuals are different. When we measure the same quantity across several individuals we are bound to get some differences in the measurements. Although some of this may be due to our measuring instrument, most of it is simply due to the fact that individuals differ. People naturally have different heights, different aptitudes and abilities, or different opinions and emotional responses. When we measure any one these traits we are bound to get variability in the measurements. Different seeds for the same variety of bean will grow to different sizes when subjected to the same environment because no two seeds are exactly alike; there is bound to be variability from seed to seed in the measurements of growth.
If we plant one pack of bean seeds in one field, and another pack of seeds in another location with a different climate, then an observed difference in growth among the seeds in one location with those in the other might be due to inherent differences in the seeds (natural variability) or the observed difference might be due to the fact that the locations are not the same. If one type of fertilizer is used on one field and another type on the other, then observed differences might be due to the difference in fertilizers. For that matter, the observed difference might be due to a factor that we haven't even thought about. A more carefully designed experiment can help us to determine the effects of different factors.
This one basic idea, comparing natural variability to the variability induced by other factors, forms the heart of modern statistics. It has allowed medical science to conclude that some drugs are effective and safe, where as others are ineffective or have harmful side effects. It has been employed by agricultural scientists to demonstrate that a variety of corn grows better in one climate than another, that one fertilizer is more effective than another, or one type of feed is better for beef cattle than another.
In a voter poll, it seems reasonable to use the proportion of voters surveyed (a sample) as an estimate of the unknown proportion of all voters who support a particular candidate. But if a second sample of the same size is used, it is almost certain that there would not be the same proportion of voters in the sample who support the candidate. The value of the proportion will vary from sample to sample. This is called sampling variability. So what is to keep one sample from estimating that the true proportion is .60 and another from saying it is .40 . This is possible but unlikely if proper sampling techniques are used. Poll results are useful because these techniques can assure that unacceptable differences among samples are quite unlikely.
The Role of Context
"The focus on variability naturally gives statistics a
particular content that sets it apart from mathematics itself and from other
mathematical sciences, but there is more than just content that distinguishes
statistical thinking from mathematics. Statistics requires a different kind of thinking, because data are not just numbers, they are numbers
with a context."
Cobb and Moore
Many mathematics problems arise from applied contexts, but the context is removed to reveal mathematical patterns.
Statisticians, like mathematicians, look for patterns, but the meaning of the patterns depends on the context.
"In mathematics, context obscures structure. In data analysis, context provides meaning."
Cobb and Moore
A graph, which appears occasionally in the business section of newspapers, shows a plot of the Dow Jones Industrial Average (DJIA) over a ten year period. The variability of stock prices draws the attention of an investor. This stock index may go up or down over some intervals of time, may fall or rise sharply over short a period, In context the graph raises questions. A serious investor is not only interested in when or how rapidly the index goes up or down, but also why. What was going on in the world when the market went up, what was going on when it went down. But strip away the context. Remove time (years) from the horizontal axis and call it "X", remove stock value (DJIA) from the vertical axis and call it "Y", and there remains a graph of very little interest or mathematical content!
Probability
Probability is an important part of any mathematical education. It is a part of mathematics that enriches the subject as a whole by its interactions with other uses of mathematics. Probability is an essential tool in applied mathematics and mathematical modeling. It is also an essential tool in statistics.
But the use of probability as a mathematical model and the use of probability as a tool in statistics employ not only different approaches but different kinds of reasoning.
Two problems and the nature of the solutions will illustrate the difference.
Problem 1
Assume a coin is "fair."
Question: If we toss the coin 5 times, how many heads will we get?
Problem 2
You pick up a coin.
Question: Is this a fair coin?
Problem 1 is mathematical probability problem.
Problem 2 is a statistics problem which can use the mathematical probability model determined in problem 1 as a tool to seek a solution.
The answer to neither question is deterministic. Coin tossing produces random outcomes which suggest that the answer is probabilistic. The solution to problem 1 starts with the assumption that the coin is fair and proceeds to logically deduce the numerical probabilities for each possible number of heads 0,1,....5.
The solution to problem 2 starts with an unfamiliar coin-we don't know if it is fair or biased. The search for an answer is experimental - toss the coin and see what happens. Examine the resulting data to see if it looks like it came from a fair coin or a biased coin. There are several possible approaches, including: Toss the coin 5 times and record the number of heads. Then do it again: Toss the coin 5 times and record the number of heads. Repeat 100 times. Compile the frequencies of outcomes for each possible number of heads. Compare these results to the frequencies predicted by the mathematical model for a fair coin in problem 1. If the empirical frequencies from the experiment are quite dissimilar from those predicted by the mathematical model for a fair coin and are not likely to be caused by random variation in coin tosses, then we conclude the coin is not fair. In this case we induce an answer by making a general conclusion from observations of experimental results.
Two important uses of "randomization" in statistical work occur in sampling and experimental design. When sampling we "select at random" and in experiments we "randomly assign individuals to different treatments." Randomization does much more than just remove bias in selections and assignments. Randomization leads to chance variability in outcomes which can be described with probability models.
The probability of something says about what percentage of the time it is expected to happen when the basic process is repeated over and over again.
Probability theory does not say very much about one toss of the coin; it makes predictions about the long-run behavior of the coin tosses.
Probability tells us little about the consequences of random selection for one sample but describes the variation we expect to see in samples when the sampling process is repeated a large number of times.
Probability tells us little about the consequences of random assignment for one experiment but describes the variation we expect to see in the results when the experiment is replicated a large number of times.
When randomness is present, the statistician wants to know if the observed result is due to chance, or something else? This is the idea of statistical significance.
The role of
mathematics in statistics education
The evidence that statistics is different from mathematics is not presented to argue that mathematics is not important to statistics education or that statistics education should not be a part of mathematics education. To the contrary, statistics education becomes increasingly mathematical as the level of understanding goes up.
But data collection design, exploration of data, and the interpretation of results should be emphasized in statistics education for statistical literacy. These are heavily dependent on context, but at the introductory level involve limited formal mathematics.
Probability plays an important role in statistical analysis,
but formal mathematical probability should have its own place in the
curriculum. Pre-college statistics
education should emphasize the ways that probability is used in statistical
thinking; an intuitive grasp of probability will suffice at these levels.
The Framework
Underlying Principles
Statistical Problem Solving
Statistical problem solving is an investigative process that involves four components:
Formulate Questions
· clarify the problem at hand
· formulate one (or more) questions that can be answered with data
Collect Data
· design a plan to collect appropriate data
· employ the plan to collect the data
Analyze Data
· select appropriate graphical or numerical methods
· use these methods to analyze the data
Interpret Results
· interpret the analysis
· relate the interpretation to the original question.
The Role of Variability in the Problem Solving Process
Formulate Question
Anticipating Variability -Making the statistics question distinction
The formulation of a statistics question requires an understanding of the difference between a question which anticipates a deterministic answer and a question which anticipates an answer based on data which vary.
The question "How tall am I" will be answered with
a single height. It is not a statistics question. The question "How tall
are adult men in the
The poser of the question "How does sunlight affect the growth of a plant?" should anticipate that the growth of two plants of the same type exposed to the same sunlight will likely differ. This is a statistics question.
The anticipation of variability is the basis for understanding of the statistics question distinction; these are required for proper question formulation.
Collect Data
Acknowledging Variability -Designing for differences
Data collection designs must acknowledge variability in data and frequently are intended to reduce variability. Random sampling is intended to reduce the differences between sample and population, and the sample size influences the effect of sampling variability (error). Experimental designs are chosen to acknowledge the differences between groups subjected to different treatments. Random assignment to the groups is intended to reduce differences between the groups due to factors which are not manipulated in the experiment. Some experimental designs pair subjects so that they are similar. Twins are frequently paired in medical experiments so that observed differences might be more likely attributed to the difference in treatments rather than differences in the subjects.
The understanding of data collection designs which acknowledge differences is required for effective collection of data.
Analyze Data
Accounting of Variability
Using Distributions
The main purpose of statistical analysis is to give an accounting of the variability in the data. When results of an election poll state that "42% of those polled support a particular candidate with margin of error +/- 3% at the 95% confidence level”, the focus is on sampling variability. The poll gives an estimate of the support among all voters. The margin of error indicates how far the sample result (42%+/-3%) might differ from the actual percentage of all voters who support the candidate. The confidence level tells us how often estimates produced by the method employed will produce correct results. This analysis is based on the distribution of estimates from repeated random sampling.
When test scores are described as "normally distributed with mean 450 and standard deviation 100" the focus is on how the scores differ from the mean. The normal distribution describes a bell-shaped pattern of scores and the standard deviation indicates the level of variation of the scores from the mean.
Accounting of variability with the use of distributions is the key idea in the analysis of data.
Interpret Results
Allowing for Variability
Looking beyond the data
Statistical interpretations are made in the presence of variability and must allow for it.
The result of an election poll must be interpreted as an estimate which can vary from sample to sample. The generalization of the poll results to the entire population of voters looks beyond the sample of voters surveyed and must allow for the possibility of variability of results among different samples. The results of a randomized comparative medical experiment must be interpreted in the presence of variability due to the fact that different individuals respond differently to the same treatment as well as the variability due to randomization. The generalization of the results looks beyond the data collected from the subjects who participated in the experiment and must allow for these sources of variability.
Looking beyond the data to make generalizations must allow for variability in the data.
Maturing over Levels
The mature statistician understands the role of variability in the statistical problem solving process. At the point of question formulation, the statistician anticipates the data collection, the nature of the analysis, and the possible interpretations, all of which must consider possible sources of variability. In the end, the mature practitioner reflects upon all aspects of data collection and analysis as well as the question itself when interpreting results. Likewise he links data collection and analysis to each other and the other two components.
The beginning student cannot be expected to make all of these linkages. They require years of experience as well as training. Statistical education should be viewed as a developmental process. To meet the proposed goals, this report will provide a framework for statistical education over three levels. If the goal were to produce a mature practicing statistician, there would certainly be several levels beyond these. There is no attempt to tie these levels to specific grade levels.
The Framework uses three developmental levels, A, B, and C. Although these three levels may parallel grade levels, they are based on development, not age. Thus, a middle school student who has had no prior experience with statistics will need to begin with Level A concepts and activities before moving to Level B. This holds true for a secondary student as well - if a student hasn't had Level A and B experiences prior to high school, then it is not appropriate to jump into Level C expectations. The learning is more teacher driven at level A, but becomes student driven at Levels B and C.
The Framework Model
The conceptual structure for statistics education is provided in the two dimensional model shown in Figure **. One dimension is defined by the problem solving process components plus the nature of the variability considered and how we focus on variability. The second dimension is comprised of the three developmental levels.
Each of the first four rows describes a process component as it develops across levels. The fifth row indicates the nature of the variability considered at a given level. It is understood that work at Level B assumes and develops further the concepts from Level A, and likewise Level C assumes and uses concepts from the lower levels.
FIGURE **
|
Process Component |
Level A |
Level B |
Level C |
|
Formulate Question |
Beginning awareness of the statistics question
distinction Teachers pose questions of interest. Questions restricted to classroom |
Increased awareness of the statistics question
distinction. Students begin to pose their own questions of interest. Questions not restricted to classroom |
Students can make the statistics question
distinction. Students pose their own questions of interest. Questions seek generalization |
|
Collect Data |
Do not yet design for differences Census
of classroom Simple experiment |
Beginning awareness of design for differences Sample surveys Begin to use random selection Comparative experiment Begin to use random allocation |
Students make designs for differences Sampling designs with random selection Experimental designs with randomization |
|
Analyze Data |
Use particular properties of distributions in context of specific example Display variability within a group Compare individual to individual Compare individual to group |
Learn to use particular properties of distributions as tools of analysis Quantify variability within a group Compare group to group in displays Acknowledge sampling error Some quantification of association Simple models for association |
Understand and use distributions in analysis as a global concept Measure variability within a group Measure variability between groups Compare group to group using displays and measures of variability
Describe and quantify sampling error Quantification of association Fitting of Models for association |
|
Interpret Results |
Do not look beyond the data No generalization beyond the classroom Note difference between two individuals with different conditions Observe association in displays |
Acknowledge that looking beyond the data is feasible Acknowledge that a sample may or may not be representative of larger population Note difference between two groups with different conditions Aware of distinction between observational study and experiment Note differences in strength of association Basic interpretation of models for association Aware of the distinction between “association” and “cause and effect” |
Are
able to look beyond the data in some contexts Generalize from sample to population Aware of the effect of randomization on the results of experiments Understand the difference between observational studies and experiments Interpret measures of strength of association Interpret models for association Distinguishes between conclusions from association studies and experiments. |
|
Nature of Variability Focus on Variability |
Measurement variability Natural variability Induced variability Variability within a group |
Sampling variability Variability within a group and variability between groups Co-variability |
Chance variability. Variability in model fitting |
Illustrations
All four steps of the problem solving process are used at all three levels, but the depth of understanding and sophistication of methods used increases across the levels A, B, C. This maturation in understanding the problem solving process and its underlying concepts is paralleled by an increasing complexity in the role of variability. The illustrations of learning activities given here are intended to clarify the differences across the developmental levels for each component of the problem solving process. A later section in this report will give illustrations of the complete problem solving process for learning activities at each level.
Formulate Question
Example 1
A: How long are the words on this page?
B: Are the words in a chapter of a fifth grade book longer than the words in a chapter of a third grade book?
C: Do fifth grade books use longer words than third grade books?
Example 2
A: What type of music is most popular among students in our class?
B: How do the favorite types of music compare among different classes?
C: What type of music is most popular among students in our school?
Example 3
A: In our class, are the heights and arm spans of students approximately the same?
B: Is the relationship between arm span and height for the students in our class the same as the relationship between arm span and height for the students in another class?
C: Is height a useful predictor of arm span for the students in our school?
Example 4
A: Will a plant placed by the window grow taller than a plant placed away from the window?
B: Will five plants placed by the window grow taller than five plants placed away from the window?
C: How does the level of sunlight affect the growth of a plant?
Collect Data
Example 1
A: How long are the words on this page?
The length of every word on the page is determined and recorded.
B: Are the words in a chapter of a fifth grade book longer than the words in a chapter of a third grade book?
A simple random sample of words from each chapter is used.
C: Do fifth grade books use longer words than third grade books?
Other sampling designs are considered, compared and some are used. For example, rather than select words in a simple random sample, a simple random sample of pages from the book is selected and all of the words on the pages chosen are used for the sample.
Note- At each level, issues of measurement should be addressed. The length of word depends on the definition of “word.” For instance, is a number a word? Consistency of definition is important to reduce measurement variability.
Example 4
A: Will a plant placed by the window grow taller than a plant placed away from the window?
A seedling is planted in a pot which is placed on the window sill. A second seedling of the same type and size is planted in a pot which is placed away from the window sill. After six weeks the change in height for each is measured and recorded.
B: Will five plants of a particular type placed by the window grow taller than five plants of the same type placed away from the window?
Five seedlings of the same type and size are planted in a pan which is placed on the window sill. Five seedlings of the same type and size are planted in a pan which is placed away from the window sill. Random numbers are used to decide which plants go in the window. After six weeks the change in height for each seedling is measured and recorded.
C: How does the level of sunlight affect the growth of plants?
Fifteen seedlings of the same type and size are selected. Three pans are used, with five of these seedlings planted in each. Fifteen seedlings of another type are selected. Five of these are planted in each of the three pans. The three pans are placed in locations with three different levels of light. Random numbers are used to decide which plants go in which pan. After six weeks the change in height for each seedling is measured and recorded.
Note- At each level, issues of measurement should be addressed. The method of measuring change in height must be clearly understood and applied in order to reduce measurement variability.
Analyze Data
Example 2
A: What type of music is most popular among students in our class?
A bar graph is used to display the number of students who choose each music category.
B: How do the favorite types of music compare among different classes?
For each class, a bar graph is used to display the percentage of students who choose each music category. These are scaled uniformly for comparison.
C: What type of music is most popular among students in our school?
A bar graph is used to display the percentage of students who choose each music category. Because a random sample is used, an estimate of the margin of error is given.
Note- At each level, issues of measurement should be addressed. A questionnaire will be used to gather students’ music preferences. The design and wording of the questionnaire must be carefully considered to avoid possible biases in the responses. The choice of music categories could also affect results.
Example 3
A: In our class, are the heights and arm spans of students approximately the same?
The difference between height and arm span is determined for each individual.
An X-Y plot is constructed with X=height, Y=arm span. The line Y=X is drawn on this graph.
B: Is the relationship between arm span and height for the students in our class the same as the relationship between arm span and height for the students in another class?
For each class, an X-Y plot is constructed with X=height, Y=arm span. An "eye ball" line is drawn on each graph to describe the relationship between height and arm span. The equation of this line is determined. An elementary measure of association is determined.
C: Is height a useful predictor of arm span for the students in our school?
The least squares regression line is determined and assessed for use as a prediction model.
Note- At each level, issues of measurement should be addressed. The methods used to measure height and arm span must be clearly understood and applied in order to reduce measurement variability. For instance, do we measure height with shoes on or off?
Interpret Results
Example 1
A: How long are the words on this page?
The frequency plot of all word lengths is examined and summarized. In particular, students will note the longest and shortest word lengths, the most common lengths and least common lengths, and the length in the middle.
B: Are the words in a chapter of a fifth grade book longer than the words in a chapter of a third grade book?
The students interpret a comparison of the distribution of a sample of word lengths from the fifth grade book with the distribution of word lengths from the third grade book using box plot to represent each of these. The students also acknowledge that samples are being used which may or may not be representative of the complete chapters.
The box plot for a sample of
word lengths from the fifth grade book is placed beside the box plot of the
sample from the third grade book.
C: Do fifth grade books use longer words than third grade books?
The interpretation at level C includes the interpretation at level B, but also must consider generalizing from the books included in the study to a greater population of books.
Example 4
A: Will a plant placed by the window grow taller than a plant placed away from the window?
In this simple experiment, the interpretation is just a matter of comparing one measurement of change in size to another.
B: Will five plants placed by the window grow taller than five plants placed away from the window?
In this experiment, the student must interpret a comparison of one group of five measurements with another group.
If a difference is noted, then the student acknowledges that is likely caused by the differences in light conditions.
C: How does the level of sunlight affect the growth of a plant?
There are several comparisons of groups possible with this design. If a difference is noted, then the student acknowledges that it is likely caused by the differences in light conditions or the differences in types of plants. It is also acknowledged that the randomization used in experiment can possibly cause some of the observed differences.
Nature of Variability
Variability within a
group
This is the only type considered at Level A.
In Example 1, differences among word lengths on a single page are considered; this is variability within a group of word lengths.
In Example 2, differences among how many students choose each category of music are considered; this is variability within a group of frequencies.
Variability within a
group and variability between groups
At level B, students begin to make comparisons of groups of measurements.
In Example 1, a group of word lengths from a fifth grade book are compared to a group from a third grade book. Such a comparison not only notes differences between the two groups such as the difference between median or mean word lengths, but must also take into consideration how much word lengths differ within each group.
Induced variability
In Example 4, Level B, the experiment is designed to determine if there will be a difference between growth of plants in sunlight and the growth of those away from sunlight. We want to determine if an imposed a difference on the environments will induce a difference in growth.
Sampling variability
In Example 1, Level B, samples of words from a chapter are used. Students observe that two different samples will produce different groups of word lengths. This is sampling variability.
Co-variability
Example 3, Level B or C, investigates the "statistical" relationship between height and arm span. The nature of this statistical relationship is described in terms of how the two variables "co-vary". For instance, if the height of two students differ by 2 centimeters then we would like for our model of the relationship to tell us by how much we might expect their arm spans to differ?
.
Random variability
from sampling
When random selection is used, then differences between samples will be random. This random variation is what leads to the predictability of results.
In Example 2, Level C, this random variation is not only considered but it is the basis for understanding the concept of "margin or error".
Random variability
resulting from assignment to groups in experiments
In Example 4, Level C, plants are randomly assigned to groups. Student consider how this randomization might produce differences in results, although a formal analysis is not done.
Random variation in
model fitting
In Example 3, Level C, students assess how well a regression line will predict arm span from height. This assessment is based on the notion of random differences between actual arm spans and the arm spans predicted by the model.
Moving Forward with Detailed
Descriptions of Each Level
As this document transitions into detailed descriptions of each level,
it’s important to note that the examples selected for illustrating key concepts
and the problem solving process of statistical reasoning are based on real data
and real world context. The stakeholders
reading the document will need to be flexible and adaptable in using these
examples to fit their teaching needs and situation.
Level A
Objectives of Level A
Children are surrounded by data. They may think of data as tallying a
student’s favorite object or as measurements on other students in their
classroom such as arm span and number of books in their school bag.
§
It is in Level A that children need to
develop data sense - an understanding that data are more than just numbers. Statistics
changes numbers into information.
§
Students should learn that data are generated
with respect to particular contexts or situations and can be used to answer
questions about the context or situation.
§
Students should have opportunities to
generate questions about a particular context (such as their classroom) and
determine what data might be collected to answer these questions.
§
Students should learn how to use basic
statistical tools to analyze the data and make informal or casual inferences in
answering the posed questions.
§
Students should develop basic ideas of
probability in order to support their later use of probability in drawing
inferences at levels B and C.
Statistics helps us make better decisions. It is preferable that
students actually collect data but not necessary in every case. Teachers should
take advantage of naturally occurring situations in which students notice a
pattern about some data and begin to raise questions. For example, when taking
daily attendance one morning, students might note that many students are
absent. The teacher could capitalize on this opportunity to have the students
formulate questions that could be answered with attendance data.
Specifically, Level A
recommendations in the Investigative Process include:
1. Formulate the Question
Teachers
help pose questions
Distinguish
between statistical solution and fixed answer
Question
be in context of interest to the student
2. Collect Data to Answer
the Question
Census of
the Classroom
Understand
individual to individual variability
Simple
Experiment with non-random assignment of treatment
Understand
variability due to a condition
3. Analyze the Data
Compare
individual to individual
Compare
individual to a group
Introduce
the idea of a distribution
Describe a
distribution
Association
between two variables
Tools for
exploring distributions and association:
Bar
Graph
Dotplot
Stem-and-Leaf
Plot
Scatterplot
Tables
(using counts)
Mean,
Median, Mode, Range
Modal
Category
4. Interpretation of the
Data
Inference
to the classroom
Acknowledge
results may be different in another class or group
Recognize
limitation of scope of inference to the classroom
Example 1 Choosing the Band for the End of the Year
Party – Conducting a Survey
Children at Level A may be interested in the favorite type of music
among students at a certain grade level. An end of the year party is being
planned and there is only enough money to hire one musical group for the party.
The class might investigate the question:
What type of music is most popular among students?
This question attempts to measure a characteristic in the
population of the school grade children that will have the dance. The
characteristic, favorite music type is a categorical variable where each child
in that grade would be placed in a particular non-numerical category, based on
their favorite music type. These are often called Categorical Data.
The Level A class would most likely conduct a census of the students in a particular classroom to gauge what the favorite music type might be for the whole grade.
At Level A, we want students to
recognize there will be individual to individual variability.
A survey of 24
students in one of the classrooms at that grade is taken with the data
summarized below in the frequency count table. This frequency count table is a tabular
representation of summarizing categorical data. Students might first use tally marks to summarize the
measurements before finding frequencies for each category. Below is summative
data for one possible classroom survey, summarizing the tally marks for each
category.
Favorite Frequency or Count
Country 8
Rap 12
Rock 4
A Level A student might first use a picture
graph to represent the counts for each category. A picture graph uses a
picture of some sort (such as a type of musical band) to represent each
element. Thus, each child who favors a particular music type would put their
cut-out of that type of band directly onto the graph the teacher has created on
the board. Instead of a picture of a band, another representation such as an X
or a square can be used to represent each element of the data set. A child who
prefers ‘Country’ would go to the board and place a dot or X or color in a
square above the column labeled “Country.” In both of these cases, there is a
deliberate recording of each element, one at a time.
Our Favorite Type of Tunes
|
Number of People Who Like This Kind of
Music |
12 |
|
|
|
|
11 |
|
|
|
|
|
10 |
|
|
|
|
|
9 |
|
|
|
|
|
8 |
|
|
|
|
|
7 |
|
|
|
|
|
6 |
|
|
|
|
|
5 |
|
|
|
|
|
4 |
|
|
|
|
|
3 |
|
|
|
|
|
2 |
|
|
|
|
|
1 |
|
|
|
|
|
|
Country |
Rap |
Rock |
|
Type of Music |
||||
Note that a picture graph refers to a graph where an object such as a
construction paper cut-out is used to represent one element on the graph. (A
cut-out of a tooth might be used to record how many teeth were lost by children
in a kindergarten class each month.) The term pictograph is often used to refer to a graph in which a picture or
symbol is used to represent several items that belong in the same category. For
example, on a graph showing the distribution of car riders, walkers, and bus
riders in a class, a cut-out of a school bus might be used to represent 5 bus
riders. Thus, if the class had 13 bus riders, there would be approximately 2.5
busses on the graph. This type of graph requires a basic understanding of
proportional or multiplicative reasoning, and for this reason we do not
advocate its use at Level A except possibly with students who are nearly ready
for Level B. Similarly, circle graphs require an understanding of proportional
reasoning, so we do not advocate their use at Level A except possibly at the
top of level A.
A bar graph takes the student to a summative level because the data
must be summarized from some other representation, such as a picture graph, a
tally or frequency count table. The bar on a bar graph is drawn as a continuous
rectangle reaching up to the desired number on the y-axis.
A bar graph is displayed below for the census taken of the classroom
represented in the above frequency count table.

Students at Level A
should recognize the mode as a way to
describe a ‘representative’ or ‘typical’ value for the distribution.
The mode is the representative value that
students naturally use first. The mode is most useful for categorical
data. Students should understand that
the mode is the category that
contains the most data points, often referred to as the modal category. In our favorite
music example, rap music was preferred by more children, thus the mode or modal
category of the data set is rap music. Students could use this information to
help the teachers in seeking a musical group for the end of the year party that
specializes in rap music.
The vertical axes on the bar graphs constructed above could be scaled
in terms of the proportion or percentage of the sample size for each category.
Since this involves proportional reasoning, converting frequencies to
proportions (or percentages) will be developed in Level B.
Because most of the
data collected at Level A will involve a census of the student’s classroom, the
first stage is for students to learn to read and interpret at a simple level
what the data show about their own class.
It is important to
consider the sort of question, “What might have caused the data to look like
this?”
Then, it is important for children to think about if and how their findings would “scale up” to a larger group, such as the entire grade level, the whole school, all children in the school system, all children in the state, or all people in the nation. They should note variables (such as age or geographic location) that might affect the data in the larger set. In the music example above, students might speculate that if they collected data on music preference from their teachers, the teachers might prefer a different type of music. Or what would happen if they collected music preference from middle school students in their school system? We want Level A students to begin recognizing the limitations of the scope of inference to a specific classroom.
The Simple Experiment
Another type of design for collecting data appropriate at
Level A is a simple experiment which consists
of taking measurements on a particular condition or group. Level A students may
be interested in timing the swing of a pendulum or seeing how far a toy car
runs off the end of a slope from a fixed starting position (future Pinewood
Derby participants?) Data on numerical
variables are obtained from situations by taking measurements such as height, length
(How far can a child jump under certain conditions), or temperatures or where objects
are counted (e.g., determining the number of letters in your first name, the
number of pockets on clothing worn by children in the class, or the number of
siblings each child has). These are often called Numerical Data. Also, measuring the same thing several times
and finding a mean helps to lay the foundation for the fact that the mean has
less variability as an estimate of the true value than does a single reading.
This will be fully developed at Level C.
Example 2 Growing
Beans – A Simple Comparative Experiment
A simple comparative experiment is like a
science experiment in which children compare the results of two conditions or
groups. For example, children might plant dried beans in soil and let them
sprout and then compare which one grows fastest–the one in the light or the one
in the dark. The treatments or groups to be compared are the type of lighting
environment – light or dark. The type of lighting environment is an example of
a categorical variable. Measurements of the plants’ heights can be taken at
regular intervals (e.g., every day) to collect data to answer the question of
whether one lighting environment is better for growing beans. The heights
collected are an example of numerical data.
An appropriate graphical representation for numerical data
of one variable at Level A is a dotplot.
A stem-and-leaf plot is an additional
option for numerical data on one variable. Both the dotplot and stem-and-leaf
plot can also be used to compare two or more similar sets of numerical data. In
creating a dotplot, the x-axis should
be labeled with a range of values that the numerical variable can assume. For
example, in the bean growth experiment children might record in a dotplot the
height of beans that were grown in the dark (labeled D) and in the light
(labeled L) using a dot plot.

It is obvious from
the dotplot that the heights of the plants in the light environment tend to
have higher heights than the plants in the dark environment.
Looking for clusters
and gaps in the distribution helps students to identify the shape of the distribution. Students
should develop a sense of why a distribution takes on a particular shape for
the context of the variable being considered.
Another simple
comparative experiment that Level A students may conduct is compare boys and
girls with respect to their length of jumps. Students may measure the jumping
distance for the students in their class. Once the numerical data is gathered,
the children might compare the length of jumps of girls and boys using a
back-to-back ordered stem-and-leaf plot
|
Girls |
|
Boys |
|
|
10 |
|
|
|
9 |
|
|
|
8 |
|
|
|
7 |
|
|
|
6 |
1
|
|
|
5 |
2
6 9 |
|
9 7 2 |
4 |
1
3 5 5 5 |
|
5 5 3 3 3 2 1 |
3 |
1
1 2 5 6 7 |
|
9 8 7 7 6 4 4 3 2 |
2 |
2
3 4 6 |
|
|
1 |
|
Inches jumped
in the standing broad jump
From the stem and
leaf plot, the students can get a sense of shape (more symmetric for the boys
than for the girls) with boys tending toward having longer jumps.
Making Use of Available Data
Most children love
to eat hot dogs but are aware that too much sodium is not necessarily healthy.
Is there as difference in the sodium content between beef hotdogs (labeled B
below) and poultry hotdogs (labeled P below)? To investigate this question,
students can make use of available data.
Using data from the June 1986 issue of
Consumer Reports magazine, parallel dotplots can be constructed.

Students will notice
that the distribution of the poultry hot dogs two distinct clusters. What might
explain the gap and two clusters? It could be another variable, such as the
price of the poultry hog dogs with more expensive hot dogs having less sodium.
It can also be observed that the beef sodium amounts are more spread out (or
variable) than the poultry hot dogs. It also appears that the center of the
distribution for the poultry hot dogs is higher than the center for the beef.
Describing shape
connects the student to properties of geometry. As students advance to Level B,
the importance of describing shape will lead to an understanding of what
measures are appropriate for describing center and spread.
Describing Center and
Spread
Students should
understand that the median describes
the center of a numerical data set in terms of how many data points are above
and below it. Half of the data points lie above the median and half lie below
it. Children can create a human graph to show how many letters are in their
first name. All of the children with 2-letter names can stand in a line with
all of the children having 3-letter names standing in a parallel line right
next to them, etc. Once all children are assembled, the teacher can ask one
child from each end of the graph to sit down, repeating this procedure until
one child is left standing, representing the median. With Level A students, we
advocate using an odd number of data points so that the median is clear until
students have mastered understanding of a mid-point.
Students should
understand the mean as a fair share at
Level A. In the name length example above, the mean would be interpreted as
“How long would our names be if they were all the same length?” This can be
illustrated in small groups by having children take one snap cube for each
letter in their name. In their small groups, have them put all of the cubes in
the center of the table and redistribute them one at a time so that each child
has the same number. Depending on the children’s experiences with fractions,
they may say that the mean name length is 4 R. 2 or 4 1/2 or 4.5. Another
example would be for the teacher to collect 8 pencils of varying lengths from
children and lay them end-to-end on the chalk rail. Finding the mean will
answer the question “How long would each pencil be if they were all the same
length?” That is, if we could glue all of the pencils together and cut them
into 8 equal sections, how long would the sections be? This can be modeled
using adding machine tape or string by tearing off a piece of tape that is the
same length as all 8 pencils laid end-to-end. Then fold the tape in half three
times to get eights, showing the length of one pencil out of eight pencils of
equal length. Both of these demonstrations can be mapped directly onto the
algorithm for finding the mean: combine all data elements (put all cubes in the
middle, lay all pencils end-to-end and measure, add all elements) and share
fairly (distribute the cubes, fold the tape, and divide by the number of data
elements). Level A students should master the computation (by hand or using
appropriate technology) of the mean so that more sophisticated definitions of
the mean can be developed at Levels B and C.
Use caution when
calculating a mean and median. For example, when collecting categorical data on
favorite type of music, the number of
children in the sample who prefer each type of music is summarized as a
frequency. It is easy to confuse categorical and numerical data in this case
and try to find the mean or median favorite type of music. However, one cannot
use the frequency counts to describe the categorical data in terms of a mean or
median because this is only appropriate for numerical data.
The mean and median
are measures of location for
describing the center of a numerical data set. Determining the maximum and
minimum values of a numerical data set assists children in describing the
position or location of the smallest and largest value in a data set. These two
measures of location lead to a measure of
spread for the distribution, the range.
In addition to
describing the center of a data set, it is useful to know how the data are
spread out. Measures of spread only make sense with numerical or measurement
data.
The range is a single number that tells how
far it is from the minimum element to the maximum element. In looking at the
stem and leaf plot formed in Example 2 for the jumping distances, the range
differs for the jumping distance of boys (range = 39 inches) and girls (range =
27 inches). Girls are more consistent in
their jumping distances than boys.
Looking for an Association
Students should be
able to look at the possible association
of a numerical variable and a categorical variable by comparing dotplots of
a numerical variable disaggregated by a categorical variable. For example,
using the parallel dot plots showing the growth habits of beans in the light
and dark, students should look for similarities within each category and
differences between the categories. As mentioned earlier, students should
readily recognize from the dot plot that the beans grown in the light environment
have grown taller overall and reason that it is best for beans to have a light
environment. Measures of center and spread can also be compared. For example,
students could calculate or make a visual estimate of the mean height of the
beans grown in the light and the beans grown in the dark to substantiate their
claim that light conditions are better for beans. They might also note that the
range for plants grown in the dark is 4 and for plants grown in the light is 5.
Putting that information together with the mean should enable students to
further solidify their conclusions about the advantages of grown beans in the
light. Considering the hot dog data, general impressions from the dot plots are
that there is more variation in the sodium content for beef hot dogs. For beef hot dogs the sodium contents are
between 250 and 650, while for poultry hot dogs all are between 350 and
600. Neither the centers nor the shapes
for the distributions are obvious from the dot plots. It is interesting to note the two apparent
clusters of data for poultry hot dogs.
Nine of the 17 poultry hot dogs have sodium content between 350 and 450
mg, while 8 of the 17 poultry hot dogs have sodium content between 500 and 650
mg. A possible explanation for this
division is that some poultry hot dogs are made from chicken, while others are
made from turkey.
Example 3 Purchasing
Sweatsuits – The Role of Height and Armspan?
What about the
association between two numerical variables?
Parent –teacher organizations at elementary schools have as a popular
fund raiser ‘spirit wear’ such as sweatshirts and sweatpants with the school
name and mascot. The organizers need to have some guidelines on how to order
sizes. Should they offer the shirt and pants separately or offer the sweatshirt
and sweatpants as one outfit? Are the heights and arm spans of elementary
closely related or due to growing patterns of children, are they different?
Thus, some useful questions to answer are:
Is height a useful
predictor of arm span?
How strong is the
association between height and arm span?
A scatterplot can be used to graphically
represent data when values of two numerical variables are obtained from the
same individual or object. Can we use arm span to predict a person’s height?
Students can measure each other’s arm spans and heights, and then construct a
scatterplot to look for a relationship between these two numerical variables.
Data on height and arm span are measured in centimeters for 26 students. [Note
that for illustrative purposes, the data presented below if for college
students.]

With the use of a
scatterplot, level A students can visually look
for trends and pattern.
For example, in the
arm span vs. height scatterplot above, students should be able to identify the
consistent relationship between the two variables: as one gets larger, so does
the other. Based on this sample, the organizers might feel comfortable in
ordering some complete outfits of sweatshirt and sweatpants based on sizes.
However, some students may need to order the sweatshirt and sweatpants
separately based on sizes. Another important question the organizers will need
to ask is whether this sample is representative of all the students in the
school? How was the sample taken?
Students at Level A
can also use a scatterplot to graphically look at the relationship of a
numerical variable over time, referred to as a time plot.
For example,
children might chart the outside temperature at various times during the day by
recording the values themselves or by using data from a newspaper or the
internet.

Time Plot
When the student
advances to Level B, these trends and patterns will be quantified with
measures of
association.
Understanding variability
Students should
explore possible reasons that data look the way they do and differentiate between variation and error.
For example, in graphing the colors of candies in a small packet, children
might expect the colors to be evenly distributed (or they may know from prior
experience that they are not). Children could speculate about why certain
colors appear more or less frequently due to variation (e.g., cost of dyes,
market research on people’s preferences, etc.). Children could also identify
possible places where errors could have occurred in their handling of the
data/candies (e.g., dropped candies, candies stuck in bag, eaten candies,
candies given away to others, colors not recorded because they don’t match
personal preference, miscounting). Teachers should capitalize on naturally-occurring “errors” that happen
when collecting data in the classroom and help students speculate about the impact of these errors on the final
results. For example, when asking students to vote for their favorite food, it
is common for students to vote twice, to forget to vote, to record their vote
in the wrong spot, to misunderstand what is being asked, to change their minds,
or to want to vote for an option that is not listed. Counting errors are also
common among young children and can lead to incorrect tallies of data points in
categories. Teachers can help students think about how these events might
affect the final outcome if only one person did this, if several people did it,
or if many people did it. Students can generate additional examples of ways
that errors might occur in a particular data-gathering situation.
The notions of error
and variability should be used to explain the outliers, clusters, and gaps that
students observe in the graphical representations of the data. An understanding
of error versus natural or expected variability will help students to interpret
whether an outlier is usual (to be expected) or is the outlier unusual (could
it be a recording error?)
At level A, it is
imperative that students begin to understand this concept of variability. As
students move from Level A to Level B, then Level C, it is important to always
keep at the forefront that understanding
variability is the essence of developing data sense.
The role of probability
Level A students need to develop basic ideas of probability in order to
support their later use of probability in drawing inferences at levels B and C.
At level A, students should understand that probability is a measure of the chance that something will happen. It
is a measure of certainty or uncertainty. Events should be seen as lying on
a continuum from impossible to certain, with less likely, equally likely, and
more likely lying in between. Students learn to informally assign numbers to
the likelihood that something will occur. An example of assigning numbers on a
number line is given below.
0 ¼ 1/2 ¾ 1
_________________________________________________________________
Impossible Unlikely Equally likely Likely
Certain
Or less likely to occur and
or more likely
not occur
Student should have experiences finding probabilities using empirical data. Through experimentation (or simulation), students should develop an explicit understanding of the notion that the more times you repeat an experiment, the closer the results will be to the expected mathematical model. At Level A we are only considering simple models based on equally likely outcomes or, at the most, something based on this such as the sum of the faces on two number cubes. For example, very young children can state that a penny should land on heads half the time and on tails half of the time when flipped. The student has given the expected model and probability for tossing a head or tail, assuming that the coin is ‘fair’. However, if a child flips a penny 10 times to obtain empirical data, it is quite possible that s/he will not get 5 heads and 5 tails. If all children in the class flip a penny 10 times and the results are aggregated across the class, we would expect to see that the results will begin stabilizing to the expected probabilities of 50% heads and 50% tails. This is known as the Law of Large Numbers. Thus, at Level A, probability experiments should focus on obtaining empirical data to develop relative frequency interpretations that children can easily translate to models with known and understandable ‘mathematical’ probabilities. The classic flipping coins, spinning simple spinners and tossing a number cube are reliable tools to use in helping level A students develop an understanding of probability. The concept of relative frequency interpretations will be important at Level B when the student works with proportional reasoning – going from counts or frequencies to proportions or percentages.
As students work with empirical data, such as flipping a coin, they can
develop an understanding for the concept of randomness. They will see that when flipping a coin 10
times, although we would expect 5 heads and 5 tails, the actual results will
vary from one student to the next. They will also see that if a head results that
doesn’t mean that the next flip will result in a tail. With a random process,
there is always uncertainty as to how the coin will land from one toss to the
next. However, at level A, students can begin to develop the notion that
although we have uncertainty and variability in our results, by examining what
happens to the random process in the long
run, we can quantify the uncertainty and variability with probabilities –
giving a predictive number for the likelihood of an outcome in the long run. At
Level B, students will develop the concept of the simple random sample and will
see the role that probability plays with randomness.
Conclusion
If students become
comfortable with the ideas and concepts described above, they will be prepared
to further develop and enhance their understanding of the key concepts for data
sense at level B.
It is also important
to recognize that helping students develop data sense at Level A allows
mathematics instruction to be driven by data.
The traditional mathematics strands of algebra, functions, geometry, and
measurement can all be developed with the use of data. Making sense of data
should be an integrated part of the mathematics curriculum starting in kindergarten.
Objectives of Level B
Instruction at Level B should build on the statistical base developed at Level A and set the stage for statistics at Level C. Instructional activities at Level B should continue to emphasize the four main components in the investigative process, and should have the spirit of genuine statistical practice. Students who complete Level B should see statistical reasoning as a process for solving problems through data and quantitative reasoning. At Level B:
Students become more aware of the statistical question distinction (a question with an answer based on data that vary versus a question with a deterministic answer).
Students make decisions about what variables to measure and how to measure them in order to address the question posed.
Students use and expand the graphical, tabular and numerical summaries introduced at Level A to investigate more sophisticated problems.
Students develop a basic understanding the role of probability in random selection when selecting a sample and random assignment when conducting an experiment.
Students investigate problems with more emphasis placed on possible associations among two or more variables and understand how a more sophisticated collection of graphical, tabular and numerical summaries is used to address these questions.
Students recognize ways that statistics is used or misused in their world.
Specifically, Level B recommendations in the Investigative Process include:
1. Formulate Questions
Students begin to pose their own questions
Students address questions involving a group larger than their classroom and begin to recognize the distinction between a population, a census, and a sample
2. Collect Data
Students conduct censuses of two or more classrooms
Students conduct non-random sample surveys and begin to use random selection
Students conduct comparative experiments and begin to use random allocation
3. Analyze Data
Students expand understanding of a data distribution
Students quantify variability within a group
Students compare two or more distributions in displays and with summary measures
Students use expanded tools for summarizing and comparing distributions including:
Histograms
The IQR (Inter-Quartile Range) and MAD (Mean Absolute Deviation)
Five-Number Summaries and Boxplots
Students acknowledge sampling error
Students quantify the strength of association between two variables, develop simple models for association between two numeric variables, and use expanded tools for exploring association including:
Contingency Tables for two categorical variables
Time Series
The QCR (Quadrant Count Ratio) as a measure of strength of association
Simple lines for modeling association between two numeric variables
4. Interpret Results
Students describe differences between two or more groups with respect to center, spread, and shape
Students acknowledge that a sample may not be representative of larger population
Students understand basic interpretations of measures of the association and models for association
Students become aware of the distinction between an observational study and a designed experiment
Students become aware of the distinction between “association” and “cause and effect”
Students recognize sampling variability in summary measures
Example 1, Level A Revisited: Choosing a Band for the School Dance
Many of the graphical, tabular and numerical summaries introduced at Level A can be used and expanded to investigate more sophisticated problems at Level B. Let’s revisit planning for the school dance problem introduced in Level A where a Level A class investigated the question:
What type of music is most popular among students in our
class?
by conducting a census of the class. That is, the class was considered to be the entire population and data were collected on every member of the population. A similar investigation at Level B would include recognition that one class may not be representative of the opinions of all students at their school, and Level B students might want to compare the opinions of their class with other classes from their school. A Level B class might investigate the questions:
What type of music is most popular among students at our
school?
How do the favorite types of music compare between
different classes?
Since class sizes may be different,
in order to make comparisons, results should be summarized with relative
frequencies or percentages. Percentages are useful in that they allow us
to think of having comparable results for groups of size 100. Level B students
will see more emphasis in proportional reasoning throughout the mathematics
curriculum, and they should be comfortable summarizing and interpreting data in
terms of percentages or fractions.
The results from two classes are
summarized in the table below using both frequency counts and relative
frequency percentages.
|
Class 1 |
|
Class 2 |
||||
|
Favorite |
Frequency |
Relative Frequency |
|
Favorite |
Frequency |
Relative Frequency |
|
Country |
8 |
33% |
|
Country |
5 |
17% |
|
Rap |
12 |
50% |
|
Rap |
11 |
37% |
|
Rock |
4 |
17% |
|
Rock |
14 |
47% |
|
|
24 |
100% |
|
|
30 |
101% |
The bar graph below compares the
percentage for favorite music between the two classes.

Students at Level B should begin
to recognize that there is not only variability from one individual to another
within a group, but that there is variability in results from one group to
another. This second type of variability is illustrated by the fact that in
Class 1 the most popular music is rap music while in Class 2 it is rock music.
That is, the mode for Class 1 is rap music, while the mode for Class 2 is rock
music.
The results from the two samples might be combined in order to have a larger sample of the entire school. The combined results indicate that rap music was the favorite type of music for 43% of the students, rock music was preferred by 33%, while only 24% of the students selected country music as their favorite. Level B students should recognize that although this is a larger sample, it may not be representative of the entire population (all students at their school). In statistics, randomness and probability are incorporated into the sample selection procedure in order to provide a method that is fair and to improve the chances of selecting a representative sample. For example, if the class decides to select what is called a simple random sample of 54 students, then each possible sample of 54 students has the same probability of being selected. This application of probability illustrates one of the roles of probability in statistics. Although Level B students may not actually employ a random selection procedure when collecting data, issues related to obtaining representative samples should be discussed at Level B.
Connecting Two
Categorical Variables
Since rap was the most popular music for the combined two classes, the students might argue for a rap band for the dance. However, more than half of those surveyed preferred either rock or country music. Will these students be unhappy if a rap band is chosen? Not necessarily since many students who like rock music may also like rap music as well. To investigate this problem, students might explore two additional questions.
Do students who like rock music tend to like or dislike rap
music?
Do students who like country music tend to like or dislike
rap music?
To address these questions, the survey should ask students not only their favorite type of music, but also whether or not they like rap, rock, and country music.
The Two-Way Frequency Table (or Contingency Table) below provides a way
to investigate possible connections between two categorical variables.
|
Two-Way
Frequency Table |
||||
|
|
Like
Rap Music? |
Row
Totals |
||
|
Yes |
No |
|||
|
Like Rock
Music? |
Yes |
27 |
6 |
33 |
|
No |
4 |
17 |
21 |
|
|
Column
Totals |
31 |
23 |
54 |
|
According to these results, of the 33 students who liked rock music, 27 also liked rap music. That is, 82% (27/33) of the students who like rock music also like rap music. This indicates that students who like rock music tend to like rap music as well. A similar analysis could be performed to determine if students who like country tend to like or dislike rap music. Once again, notice the use of proportional reasoning in interpreting these results.
Questionnaires and
Their Difficulties
At level B, students should begin to learn about surveys and
the many pitfalls to avoid when designing and conducting a survey. One issue
involves the wording of questions.
Questions must be unambiguous and easy to understand. For example, the questions:
Are you against the school implementing a non-door policy on bathroom
stalls?
is worded in a confusing way. An alternative way to pose
this question is:
The school is considering implementing a non-door policy on bathroom
stalls. What is your opinion regarding this policy?
Strongly Oppose Oppose No Opinion Support Strongly
Support
Questions should avoid leading the respondent to an answer.
For example, the question
Since our football team hasn’t had a winning season in 20 years and is
costing the school money rather than generating funds, do you feel we should
concentrate more on another sport such as soccer or basketball?
is worded a way that is biased against the football team.
The response to question with coded responses should include
all possible answers and should not overlap. For example, if the responses to
question
How much time do you spend studying at home on a typical night?
then the responses:
none up to 1 hour more than 1 hour
confuses a student who spends 1 hour a night studying.
There are many other issues concerning question formulation
and conducting sample surveys. One issue that should be discussed at Level B
involves how the interviewer asks the questions as well as how accurately the
responses are recorded. It is important for students to realize that the
conclusions from their study depend on the accuracy of their data.
Measure of Center –
The Mean as Balance Point
Another idea developed at Level A that can be expanded at Level B is the mean for a collection of numeric data. At Level A the mean is interpreted as the “fair share” value for data. That is, the mean is the value you would get if all the data are combined and then redistributed evenly so that each value is the same. Another interpretation of the mean is that it is the balance point of the corresponding data distribution. Following is an outline of an activity that illustrates the notion of the mean as a balance point.
Nine students were asked: “How many pets do you have?”
The resulting data are: 1, 3, 4, 4, 4, 5, 7, 8, 9. These data are summarized in the dot plot below.
Note that in the actual activity, stick-on notes are used as “dots” instead of X’s.
X
X
X
X X X
X X X
-+----+----+----+----+----+----+----+----+-
1
2 3 4
5 6 7
8 9
If the pets are combined into one group there are a total of 45 pets. If the pets are redistributed evenly among the 9 students, then each student would get 5 pets. That is, the mean number of pets is 5. The dot plot representing the result that all 9 students have exactly 5 pets is shown below:
X
X
X
X
X
X
X
X
X
-+----+----+----+----+----+----+----+----+-
1 2
3 4 5
6 7 8
9
![]()
It is hopefully obvious that if a pivot is placed at the value 5 then the horizontal axis will “balance” at this pivot point. That is, the “balance point” for the horizontal axis for this dot plot is 5. What is the balance point for the dot plot displaying the original data?
We begin by noting what happens if one of the dots over 5 is removed and placed over the value 7 as shown below.
X
X
X
X
X
X
X
X X
-+----+----+----+----+----+----+----+----+-
1
2 3 4
5 6 7
8 9
Clearly, if the pivot remains at 5, the horizontal axis will
tilt right. What can be done to the remaining dots over 5 to “re-balance” the
horizontal axis at the pivot point?
Since 7 is 2 above 5, one
solution is to move a dot 2 below 5
to 3 as shown below:
X
X
X
X
X
X
X X X
-+----+----+----+----+----+----+----+----+-
1
2 3 4
5 6 7
8 9
Clearly, the horizontal axis is now re-balanced at the pivot
point. Is this the only way to re-balance the axis at 5? Another way to
re-balance the axis at the pivot point would be to move two dots from 5 to 4 as
shown below:
X
X
X
X
X X
X X
X
-+----+----+----+----+----+----+----+----+-
1
2 3 4
5 6 7
8 9
The horizontal axis is now re-balanced at the pivot point. That is, the “balance point” for the horizontal axis for this dot plot is 5. Replacing each “X” (dot) in this plot with the distance between the value and 5, we have:
0
0
0
0
1 0
1 0
2
-+----+----+----+----+----+----+----+----+-
1
2 3 4
5 6 7
8 9
![]()
Notice that the total distance for the two values below the 5 (the two 4’s) is the same as the total distance for the one value above the 5 (the 7). For this reason, the balance point of the horizontal axis is 5. Replacing each value in the dot plot of the original data by its distance from 5 yields the following plot.
1
1
4
2 1 0
2 3
4
-+----+----+----+----+----+----+----+----+-
1
2 3 4
5 6 7
8 9
![]()
Notice that the total distance for the values below 5 is 9, the same as the total distance for the values above 5. For this reason, the mean (5) is the balance point of the horizontal axis.
Both the mean and median are often referred to as measures of center. At Level A the median was also introduced as the quantity that has the same number of data values on each side of it in the ordered data. This sameness of each side is the reason the median is a measure of center. The previous activity demonstrates that the total distance for the values below the mean is the same as the total distance for the values above the mean and illustrates why the mean is also considered to be a measure of center.
A Measure of Spread –
The Mean Absolute Deviation
Statistics is concerned with variability in data. One important idea is to quantify how much variability exists in a collection of numerical data. Quantities that measure the degree of variability in data are called measures of spread. At Level A students are introduced to the Range as a measure of spread in numeric data. At Level B students should be introduced to the idea of comparing data values to a central value such as the mean or the median, and quantifying how different the data are from this central value.
In the number of pets example, how different are the original data values from the mean? One way to measure the degree of variability from the mean is to determine the total distance for all values from the mean. Using the final dot plot from the previous example, the total distance the nine data values are from the mean of 5 pets is 18 pets. The magnitude of this quantity depends on several factors, including the number of measurements. To adjust for the number of measurements, the total distance from the mean is divided by the number of measurements. The resulting quantity is called the Mean Absolute Deviation or MAD. The MAD is the average distance of all the data from the mean. That is,
MAD = Total Distance for all Values from the Mean
Number of Data Values
The MAD for the data on number of pets from the previous activity is:
MAD = 18/9 = 2
The MAD indicates that the actual number of pets for the 9 students differ from the mean of 5 pets on average by 2 pets.
The MAD is an indicator of spread based on all the data and provides a measure of average variation in the data from the mean. The MAD also serves as a precursor to the standard deviation developed at Level C.
Representing Data
Distributions – The Frequency Table, Histogram
At Level B, students should develop additional tabular and graphical devices for representing data distributions for numeric variables. Several of these can build upon representations developed at Level A. For example, students at Level B might explore the problem of placing an order for hats. To prepare an order, one needs to know which hat sizes are most common and which occur least often. To obtain information about hat sizes it is necessary to measure head circumferences. In planning an order for adults, students might collect preliminary data on the head circumferences of their parents, guardians, or other adults. Such data would be the result of a non-random sample. The data summarized in the following stem and leaf plot are head circumferences measured in millimeters for a sample of 55 adults.
51| 3
52| 5
53|
133455
54|
2334699
55|
12222345
56|
0133355588
57|
113477
58|
02334458
59| 1558
60| 13
61| 28
Based on the stem and leaf plot, some head sizes do appear to be more common than others. Head circumferences in the 560’s are most common. Head circumferences fall off in somewhat of a symmetric manner on both sides of the 560’s with very few smaller than 530 mm or larger than 600 mm.
In practice, a decision of how many hats to order would be based on a much larger sample, possibly hundreds or even thousands of adults. If a larger sample were available, a stem and leaf plot would not be a practical device for summarizing the data distribution. An alternative to the stem and leaf display is to form a distribution based on groups or intervals of data. This method can be illustrated through a smaller data set such as the 55 head circumferences but is applicable for larger sets as well. The grouped frequency and grouped relative frequency distributions and the relative frequency histogram that correspond to the above stem-and-leaf plot are:
Limits on Head Interval of Head Relative
Stem Circumference Circumferences Frequency Frequency (%)
51 510 - 519 510
- < 520 1 1.8
52 520 - 529 520
- < 530 1 1.8
53 530 - 539 530
- < 540 6 10.9
54 540 - 549 540
- < 550 7 12.7
55 550 - 559 550
- < 560 8 14.5
56 560 - 569 560
- < 570 10 18.2
57 570 - 579 570
- < 580 6 10.9
58 580 - 589 580
- < 590 8 14.5
59 590 - 599 590
- < 600 4 7.3
60 600 - 609 600
- < 610 2 3.6
61 610 - 619 610 - < 620 2 3.6
T0TAL 55 99.8
Relative Frequency (%)

If the hat manufacturer requires that orders be in multiples of 250 hats then, based on the above results, how many hats of each size should be ordered? Using the relative frequency distribution, the number of each size to order for an order of 250 hats would be:
Hat Size Number to Order
51 5
52 5
53 27
54 32
55 36
56 46
57 27
58 36
59 18
60 9
61 9
Once again, notice how students at Level B would utilize proportional reasoning in determining the number to order of each size.
Comparing
Distributions – The Boxplot
Problems that require comparing distributions between two or more groups are common in statistics. For example, at Level A students compared the amount of sodium between “beef” and “poultry” hotdogs by examining parallel dotplots. At Level B more sophisticated representations should be developed for comparing distributions. One of the most useful graphical devices for comparing distributions of numeric data is the boxplot. The boxplot (also called a box-and-whiskers plot) is a graph based on a division of the ordered data into four groups with the same number of data values in each group (approximately one-fourth). The four groups are determined from the Five-Number Summary (the minimum data value, the first quartile, the median, the third quartile, and the maximum data value). The Five-Number Summaries and comparative box plots for the data on sodium content for beef (labeled B) and poultry (labeled P) hot dogs introduced in Level A are given below.
Beef Hot Dogs(n=20) Poultry Hot Dogs (n = 17)
Minimum 253 357
First Quartile 320.5 379
Median 380.5 430
Third Quartile 478 535
Maximum 645 588

Interpreting results based on a boxplot analysis requires comparisons based on global characteristics of each distribution (center, spread, and shape). For example, the median sodium content for poultry hot dogs is 430 mg, almost 50 mg more than the median sodium content for beef hot dogs. The medians indicate that a central value for poultry hot dogs typically has more sodium than a central value for beef hot dogs. The range for the beef hot dogs is 392 mg versus 231 mg for the poultry hot dogs. The ranges indicate that overall, there is more spread (variation) in beef hot dogs compared to poultry hot dogs. Another measure of spread that should be introduced at Level B is the interquartile range or IQR. The IQR is the difference between the third and first quartiles and indicates the range of the middle 50% of data. The IQR’s are 157.5 mg for beef and 156 mg for poultry hot dogs. The IQR’s suggest that the spread within the middle half of data for beef hot dogs is similar to spread within the middle half of data for poultry hot dogs. The box plots also suggest that each distribution is slightly skewed right. That is, each distribution appears to have somewhat more variation in the upper half. Finally, it is interesting to note that more than 25% of beef hot dogs have less sodium than all poultry hot dogs. On the other hand, the highest sodium levels are for beef hot dogs.
Note that there are several variations of boxplots. At Level C a boxplot analysis might include an analysis for outliers (values that are extremely large or small when compared to the variation in the majority of the data). If outliers are identified, these are often detached from the “whiskers” of the plot. Outlier analysis is not appropriate at Level B and whiskers are attached to the minimum and maximum data values.
Measuring the
Strength of Association between Two Quantitative Variables
At Level B, more sophisticated data representations should be developed for the investigation of problems that involve the study of the relationship between two numeric variables. At Level A, the problem of packaging sweatsuits (shirt and pants together or separate) was examined through a study of the relationship between height and arm span. There are several statistical questions related to this problem that can be addressed at Level B which require a more in-depth analysis of the height-arm span data. For example,
How strong is the association between height and arm span?
Is height a useful predictor of arm span?
Following are data on height and arm span measured in centimeters for 26 students. For convenience, the data on height have been ordered.
Height Arm Span Height Arm Span
155 151 173 170
162 162 175 166
162 161 176 172
163 172 176 172
164 167 178 173
164 155 178 166
165 164 181 183
165 164 183 181
166 167 183 179
166 164 183 174
168 165 183 179
171 164 185 177
171 168 188 185
The height and arm span data (measured in centimeters) for 26 students are displayed in the following scatterplot. The scatterplot suggests a fairly strong increasing relationship between height and arm span, and the relationship between height and arm span appears to be quite linear.

Measuring the strength of association between two variables is an important statistical concept that should be introduced at Level B. The scatterplot below for the Height/Arm Span data includes a vertical line drawn through the mean height and a horizontal line drawn through the mean arm span.
![]()
![]()

The two lines divide the scatter plot into four regions (or quadrants). The upper right region (Quadrant 1) contains points that correspond to individuals with above average height and above average arm span. The upper left region (Quadrant 2) contains points that correspond to individuals with below average height and above average arm span. The lower left region (Quadrant 3) contains points that correspond to individuals with below average height and below average arm span. The lower right region (Quadrant 4) contains points that correspond to individuals with above average height and below average arm span.
Notice that most points in the scatter plot are in either Quadrant 1 or Quadrant 3. That is, most people with above average height also have above average arm span (Quadrant 1) and most people with below average height also have below average arm span. Two people have above below heights with above average arm span (Quadrant 2) and one person has above average height with below average arm span (Quadrant 4). These results indicate that there is a positive association between the variables height and arm span. Generally stated, two numeric variables are positively associated when above average values of one variable tend to occur with above average values of the other and when below average values of one variable tend to occur with below average values of the other. Negative association between two numeric variables can be stated in a similar way.
A correlation coefficient is a quantity that measures the direction and strength of an association between two variables. Note that in the previous example, points in Quadrants 1 and 3 contribute to the positive association between height and arm span, and there are a total of 23 points in these two quadrants. Points in Quadrants 2 and 4 contribute to the negative association between height and arm span, and there are a total of 3 points in these two quadrants. One correlation coefficient between height and arm span is given by the QCR (Quadrant Count Ratio):
QCR = 23 –3 = .77
26
A QCR of .77 indicates that there is a fairly strong positive association between the two variables height and arm span. This indicates that a person’s height is a useful predictor of his/her arm span.
In general, the QCR is defined as:
[(Number of Points in Quadrants
1& 3) – (Number of Points in Quadrants 2& 4)]
Number of Points in all Four Quadrants
The QCR has the following properties:
The QCR is unitless
The QCR is always between –1 and +1 inclusive
The QCR is a measure of the strength of association based only on the number of points in each quadrant and, consequently, has its shortcomings. At Level C the shortcomings of the QCR can be addressed and used as foundation for developing Pearson’s correlation coefficient.
Modeling Linear
Association
The height – arm span data was collected at Level A in order
to study the problem of packaging sweatsuits. Should the shirt and pants be
packaged separately or together? A QCR
of .77 suggests a fairly strong positive association between height and arm
span, which indicates that height is a useful predictor of arm span and that
the shirt and pants could be packaged together. If packaged together, how can
person decide which size sweatsuit to buy? Certainly, the pant-size of a
sweatsuit depends on a person’s height and the shirt-size depends on a person’s
arm span. Since many people know their
height, but may not know their arm span, can height be used to help people
decide which size sweatsuit they wear?
Specifically,
Can the relationship between height and arm span be described using a
linear function?
The study of linear relationships will be introduced in other areas of the mathematics curriculum to students at Level B. The degree to which these ideas have been developed will determine how we might proceed at this point. For example, if students have not yet been introduced to the equation of a line, then they might simply draw a line through the “center of the data” as shown below:
![]()
![]()

This line can be used to predict a person’s arm span if his or her height is known. For example, to predict the arm span for person who is 170 cm tall, we would draw a vertical line up from the X-axis at Height = 170. At the point this vertical line intersects the line, draw a horizontal line to the Y-axis. The value where this horizontal line intersects the Y-axis is the predicted arm span. Based on the graph above, it appears that we would predict an arm span of approximately 166 cm for person who is 170 cm tall.
If students are familiar with the equation for a line and know how to determine the equation from two points, then they might determine the Mean – Mean line, which is determined as follows. Order the data according to the X-coordinates and divide the data into two “halves” based on this ordering. In the case that you have an odd number of measurements, remove the middle point from the analysis. Determine the means for the X-coordinates and Y-coordinates in each half and determine the equation of the line that passes through these two points. Using the previous data:
Lower Half (13 Points) Upper Half (13 Points)
Mean Height = 164.8 Mean Height = 180.2
Mean Arm Span = 163.4 Mean Arm Span = 175.2
The equation of the line that goes through the points (164.8, 163.4) and (180.2, 175.2) is: Predicted Arm Span » 37.1 + .766(Height). This equation can be used to more accurately predict a person’s height than an eye-ball line For example, if a person is 170 cm tall, then we would predict his/her height to be approximately : 37.1 + .766(170) = 167.3 cm.
The Role of
Randomness
In statistics, we will often want
to extend the results beyond the particular group studied to a larger group,
the population. We are trying to
gain information about the population by examining a portion of the population,
called a sample. Such
generalizations are only valid, however, if the data are representative of that
larger group. A representative sample is one in which the relevant
characteristics of the sample members are generally the same as the
characteristics of the population. Improper or biased sample selection tends to
systematically favor certain outcomes and can produce misleading results and
erroneous conclusions.
Random sampling is a way to remove
bias in sample selection and tends to produce representative samples. Random
sampling attempts to reduce bias in sample selection by being fair to each
member of the population. At Level B, students should experience the
consequences of non-random selection and develop a basic understanding of the
principles involved in random selection procedures. Following
is a description of an activity that allows students to compare sample results that are based on personal (non-random)
selection versus sample results that are based on random selection.
Consider the 80 circles on the next page. What is the average diameter for these 80 circles? Each student should take about 15 seconds and select five circles that he/she thinks best represents the size of all 80 circles. After selecting their sample, each student should find the average diameter for the circles in her/his personal sample. Note that the diameter for the small circles is 1 cm, for the medium sized circles is 2 cm, and for the large circles is 3 cm.
Next, each student should number the circles from 1 to 80 and use a table of random digits to select a random sample of size 5. Each student should find the average diameter for the circles in her/his random sample. The sample mean diameters for the entire class results can be summarized for the two selection procedures with back-to-back stem and leaf plots.
How do the means for the two sample selection procedures compare with the true mean diameter of 1.25 cm? Personal selection will tend to yield sample means that are generally larger than 1.25. That is, personal selection tends to be biased with a systematic favoring toward the larger circles and an overestimation of the population mean. Random selection tends to produce sample means that both underestimate and overestimate the population mean, but on the average, yield the population mean. That is, random selection tends to be unbiased.
In the previous example, the fact that the sample means vary
from one sample to another illustrates an idea that was
introduced earlier in the favorite music type survey. This is the notion
that results vary from one sample to another. Imposing randomness into the sampling procedure allows us to use probability to predict the
long-run behavior in the variability in the sample means from repeated
sampling. The variation in results from repeated sampling is described through
what is called the sampling distribution.
Sampling distributions will be explored in depth at Level C.
Eighty Circles

Comparative
Experiments
Another important statistical idea that
should be introduced at Level B is that of comparative experimental studies. Comparative experimental studies involve
comparisons of the effects of two or more treatments.
At Level B, studies comparing two treatments are adequate. For example, we
might want to study the effects of listening to rock music on one’s ability to
memorize. Before undertaking a study such as this, it
is important that students have the opportunity to identify and, as much as
possible, to adjust for as many potential extraneous sources that may interfere
with the results. To address these issues, the class needs to develop a design
strategy for collecting appropriate data.
One simple experiment would be to randomly divide the class into two equal
sized (or near equal sized) groups. Random assignment provides a fair way to assign students to each
group that tends to average out
differences in student ability and other characteristics that might affect the
result. For example suppose a class has 28 students. The 28 students are randomly assigned into
one of two groups of fourteen. One way to accomplish this is to place 28 pieces
of paper in a box –14 labeled “ M” and 14 labeled “S.” Shake the box well and have each student
randomly choose a piece of paper. The 14
M’s will listen to music and the 14 S’s will have silence.
Each student will be shown a list
of words. Rules for how long students
have to study the words and how long they have to reproduce the words must be
determined. For example, students may
have two minutes to study the words, a one-minute pause, and then two minutes
to reproduce (write down) as many words as possible. The number of words
remembered under each condition (listening to music or silence) is the variable
of interest.
The Five- Number Summaries and
comparative box plots for a hypothetical set of data are shown below. These
results suggest that students generally memorize fewer words when listening to
music compared to when there is silence.
With the exception of the maximum value in the Music Group (which is
classified as an outlier), all summary measures for the Music Group (labeled M)
are lower than the corresponding summary measures for the Silence Group
(labeled S). Without the outlier, the degree of variation in the scores appears
to be similar for both groups, and both distributions appear to be reasonably
symmetric.
Five
Number Summaries
Music Silence
Minimum 3 6
First
Quartile 6 8
Median 7 10
Third
Quartile 9 12
Maximum 15 14

Time Series
Another important
statistical idea that should be introduced at Level B is that of time
series. Problems that explore trends in
data over time are quite common. For
example, the population of the
How
has the number of live births changed over the past 30 years?
The United States Census
Bureau publishes vital statistics in its annual Statistical Abstract of the
United States. The data below represent the number of live births per year for
residents of the United States since 1970.
Note that in 1970, the value 3,731 represents 3,731,000 live births.
Year Births(x1,000) Year Births(x1,000)
1970 3731 1985 3761
1971 3556 1986 3757
1972 3258 1987 3809
1973 3137 1988 3910
1974 3160 1989 4041
1975 3144 1990 4158
1976 3168 1991 4111
1977 3327 1992 4065
1978 3333 1993 4000
1979 3494 1994 3979
1980 3612 1995 3900
1981 3629 1996 3891
1982 3681 1997 3881
1983 3639 1998 3942
1984 3669 1999 3959
The scatterplot below shows the number of live births over time. This graph indicates that:
from 1970 to 1975, the number of live births generally declined,
from 1976 to 1990, the number of live births generally increased,
from 1991 to 1997, the number live births generally declined,
and it appears that the number of live birth may have started to increase since 1997.

Summary of Level B
Understanding the statistical concepts of Level B enables a student to begin to appreciate that data analysis is an investigative process consisting of formulating their own questions, collecting data through various sources (censuses, non-random and random sample surveys, and comparative experiments with random allocation), analyzing data through graphs and simple summary measures, and interpreting results with an eye toward inference to a population based on a sample. As they begin to formulate their own questions, they become aware that the world around them is filled with data that affect their own lives, and they begin to appreciate that statistics can help them make decisions based on data, logic and investigation, and not whim.
Level C
Levels A and B of the statistics guidelines introduce students to:
· Statistics as an investigatory process
· The importance of using data to answer appropriately framed questions
· Types of variables (categorical versus numerical)
· Graphical displays (bar graph, histogram, box plot, scatterplot)
· Numerical summaries (counts, proportions, mean, median, range, quartiles, interquartile range, correlation)
· Common designs of studies (census, simple random sample, randomized designs for experiments)
· The process of drawing conclusions from data
· The role of probability in statistical investigations
All of these ideas are revisited at Level C, but the types of studies emphasized here will be of a deeper statistical nature. Statistical studies at this level will require students to draw on basic concepts from earlier work, to extend the concepts to cover a wider scope of investigatory issues, and to develop a deeper understanding of inferential reasoning along with increased ability to explain that reasoning to others.
Objectives of Level C
At Level C, the emphasis will be on interpretation and the use of statistical methods to answer questions rather than on the mechanics of computing summary statistics or drawing graphs. In general, students should be able to
· Formulate questions that can be answered with data.
Specifically, Level C includes recommendations on:
1. The Investigatory Process
What is the question and how are data collected and analyzed to provide an answer?
2. Design of a Statistical Study
Sample survey
Experiment
Observational study
3. Analysis of Data
Mean and standard deviation
Sampling distributions (through simulation)
Association (categorical variables)
Regression and correlation (measurement variables)
4. Interpretation: Statistical Inference
Statistical Significance
P-values
Margin of Error
5. The Role of Probability
Review of probability as essential to statistical inference
Special role of the normal distribution
Getting Started
Data and stories that surround the data must be of interest to students! That is the main point to remember when teaching data analysis. The second point is that the data and stories must have enough depth to demonstrate the need for statistical thinking. The following example illustrates these points.
Students are interested in issues that affect their lives, and issues of health often fall into that category. News items are an excellent place to look for items of current interest, including items on health, and one health-related topic making lots of news lately is obesity. The following relates to a news story that is rich enough to provide a context for many of the statistical topics to be covered in a high school setting.
A recent newspaper article begins with the following
lines. “Ask anyone: Americans are
getting fatter and fatter. Advertising
campaigns say they are. So do federal officials and the scientists they rely
on. … In 1991, 23 percent of Americans fell into the obese category; now 31
percent do, a more than 30 percent increase. But Dr. Jeffrey Friedman, an
obesity researcher, at
The following are suggested questions to explore with students who have a Level B background in statistics, but are moving on to Level C.
a. Sketch what you think a distribution of weights of American adults might have looked like in 1991. Adjust the sketch to show what the distribution of weights might look like today (actually, 2002). Before making your sketches, think about the shape, center and spread of your distributions. Will the distribution be skewed or symmetric? Will the median be smaller, larger or about the same size as the mean? Will the spread increase as you move from the 1991 distribution to the 2002 distribution?
b. Which sounds more newsworthy, “Obesity has increased by over 30%” or “On the average, the weight of Americans has increased by less than 10 pounds?” Explain your reasoning.
c.
The title of the article is The Fat Epidemic: He Says It’s an Illusion. [See New
York Times,
d. The data on which the percentages are based
come from the
Students should realize that a distribution of weights is going to be skewed toward the larger values (many are overweight but few are underweight). This generally produces a situation in which the mean is larger than the median. Because 8% shifted over the obesity line between 1991 and 2002, but the average weight (or center) did not shift very much, the upper tail of the distribution must have gotten “fatter,” indicating a larger spread for the 2002 data. Students will have a variety of interesting answers for parts (b) and (c). The role of the teacher is help students understand whether or not their answers are supported by the facts. Part (d) foreshadows a concept to be studied at Level C.
For the curious, here is how obesity is defined. “Body Mass Index (BMI) is a mathematical calculation used to determine whether a patient is overweight. BMI is calculated by dividing a person's body weight in kilograms by their height in meters squared (weight [kg] /height [m]2) or by using the conversion with pounds (lbs) and inches (in) squared as shown below, This number can be misleading, however, for very muscular people, or for pregnant or lactating women.
[Weight (lbs) ÷ height (in)2 ] x 704.5 =BMI
Being obese and being overweight are not the same condition. A BMI of 30 or more is considered obese and a BMI between 25 to 29.9 is considered overweight.”
[Source: http://www.obesity.org/]
The Investigatory Process
As stated at the beginning of Level A, data are more than just numbers. Students need to understand the types of questions that can be answered with data. “Is the overall health of high school students declining in this country?” That is too big a question to answer by a statistical investigation (or even many statistical investigations). Certain aspects of the health of students, however, can be investigated by formulating more specific questions like “What is the rate of obesity among high school students?”; “What is the average daily caloric intake for high school seniors?”; “Is a three-day-a-week exercise regimen enough to maintain heart rate and weight within acceptable limits?” Question formulation, then, becomes the starting point for a statistical investigation.
Most questions that can be answered through data collection and interpretation require data from a designed study, either a sample survey or an experiment. These two types of statistical investigations have some common elements, as each requires randomization both for purposes of reducing bias and building a foundation for statistical inference, and each makes use of the common inference mechanisms of margin of error in estimation and p-value in hypothesis testing (both to be explained later). But these two types of investigations have very different objectives and requirements. Sample surveys are used to estimate or make decisions about characteristics (parameters) of populations and a well-defined, fixed population is the main ingredient of such a study. Experiments are used to estimate or compare the effects of different experimental conditions, called treatments, and require well-defined treatments and experimental units on which to study those treatments.
Estimating the proportion of residents of a city that would support an increase in taxes for education requires a sample survey. If the selection of residents is random, then the results from the sample can be extended to represent the population from which the sample was selected. A measure of sampling error (margin of error) can be calculated to ascertain how far the estimate is likely to be from the true value. Testing to see if a new medication to improve breathing for asthma patients produces greater lung capacity than a standard medication requires an experiment in which a group of patients who have consented to participate in the study are randomly assigned to either the new or the standard medication. With this type of randomized comparative design, an investigator can determine, with a measured degree of uncertainty, whether or not the new medication caused an improvement in lung capacity. Randomized experiments are, in fact, the only type of statistical study capable of establishing cause and effect. Any generalization extends only to the types of units used in the experiment, however, as the experimental units are not usually sampled randomly from a larger population. To generalize to a larger class of experimental units, more experiments would have to be conducted. That is one reason why replication is a hallmark of good science.
Studies that have no random selection of sampling units or random assignment of treatments to experimental units are called observational studies in this document. A study of how many students in your high school have asthma and how this breaks down among gender and age groups would be of this type. Observational studies are not amenable to statistical inference in the usual sense of that term, but they can provide valuable insights on the distribution of measured values and the types of associations among variables that might be expected.
Students should understand the key features of both sample surveys and experimental designs, including how to set up simple versions of both types of investigations, how to analyze the data appropriately (as the correct analysis is related to the design), and how to clearly and precisely state conclusions for these designed studies. Key elements of the design and implementation of data collection plans for these types of studies will be addressed here; the analysis and interpretation components will be addressed in the following two sections.
Students should understand that obtaining good results from a sample survey depends on four basic features - the population, the sample, the randomization process that connects the two, and the accuracy of the measurements made on the sampled elements. For example, to investigate a question on health of students, a survey might be planned for a high school. What is the population to be investigated? Is it all the students in the school (which changes on a daily basis)? Perhaps the questions of interest involve only juniors and seniors. Once the population is defined as precisely as possible, what is an appropriate sample size and how can a random sample be selected? Is there, for example, a list of student who can then be numbered for random selection? Once the sampled students are found, what questions are to be asked of them? Are the questions fair and unbiased (as far as possible) and can or will the students actually answer them accurately?
Two samples of size 50 from the same population of students will not give the same result on, say, the proportion of students who eat a healthy breakfast. This variation from sample to sample is called sampling error and this is the type of error that we can measure statistically. Other errors (or biases) that slip in because the sample was not a random sample to begin with, because the sample was selected from the wrong population, because a number of the students selected in the sample refused to participate, or because the questions were poorly written and the responses ambiguous, are not easily measured. These types of errors should be considered carefully before the study begins so that plans can be made to reduce them as far as possible.
Students should understand that obtaining good results from an experiment depends upon four basic features - well-defined treatments, appropriate experimental units to which these treatments can be assigned, the randomization process that is used to assign treatments to experimental units, and accurate measurements of the results of the experiment. Experimental units generally are not randomly selected from a population of possible units. Rather, they are the ones that happen to be available for the study. In experiments with human subjects, the people involved have to sign an agreement stating that they are willing to participate in an experimental study. In experiments with agricultural crops, the experimental units are the field plots that happen to be available. In an industrial experiment on process improvement the units may be the production lines in operation in a given week.
As in the sample survey, variability due to the random assignment of treatments to experimental units can be measured statistically, but variability resulting from a poor design cannot. Suppose treatment A gets assigned to patients over the age of 60 and treatment B to patients under the age of 50. If the treatment responses differ, it is now impossible to tell whether the difference is due to the treatments themselves or the ages of the patients. (This kind of bias in experiments is called confounding.) The randomization process, if properly done, will usually balance treatment groups so that this type of bias is minimized.
Students should understand that observational studies are useful for suggesting patterns in data and relationships between variables, but do not provide a strong basis for estimating population parameters or establishing differences among treatments. Asking the students in your room whether or not they eat a healthy breakfast is not going to help you establish the proportion of healthy breakfast eaters in the school because the students in your room may not be representative of the students in the school. Random sampling is the only way to be confident of a representative sample for statistical purposes. Similarly, feeding your cats diet A and your neighbors cats diet B is not going to allow you to claim that one diet is better than the other in terms of weight control because there was no random assignment of experimental units (cats) to treatments (diets), and as a consequence there may be many biasing or confounding variables. Studies of the type suggested above are merely observational; they may suggest patterns and relationships but they are not a reliable basis for statistical inference.
When analyzing data from well-designed sample surveys, students should understand that an appropriate analysis is one that can lead to justifiable inferential statements based on estimates of population parameters. The ability to draw conclusions about a population using information from a sample depends on information provided by the distribution (called the sampling distribution) of the sample statistic that is being used to summarize the sample data. The two most common statistics used in applications are the sample proportion for categorical data and the sample mean for measurement data. When the sample mean is the statistic of interest, the appropriate measure of spread (variation) in the sample is the sample standard deviation. Students should gain an understanding of the behavior of proportions, means and standard deviations through the practice of calculating and interpreting these summary statistics on many real data sets.
Properties of the sampling distribution for a sample proportion can be illustrated quite simply by using random digits as a device to model various populations. Suppose a population is assumed to have 60% “successes” (p = .6) and we are to take a random sample of n=40 cases from this population. How far can we expect the sample proportion of successes to deviate from this population value? This can be answered by determining what the sampling distribution of sample proportions looks like through repeated selection of samples of 40 random digits (with replacement) from a population in which 6 of the ten digits from 0 to 9 are labeled “success” and 4 are not. A simulated distribution constructed using sample proportions from 200 different random samples of size 40 from a population with 60% successes is shown in Figure 1.
![]() |
Getting into the habit of using shape, center and spread to describe distributions, one can state that this simulated sampling distribution of sample proportions has a mound shape (approximately normal). Because the mean and standard deviation of the 200 sample proportions were .59 and .08 respectively, the simulated distribution shown in Figure 1 has a mean of .59 (very close to p = .6) and a standard deviation of .08. By studying this sampling distribution and others that can be generated the same way, students will see patterns emerge and will see that the sampling distributions for sample proportions center at p, the population proportion of “successes”, and that the standard deviations for the sampling distribution turns out to be approximately
![]()
where
is the observed sample proportion in a single sample. In
addition, if the sample size is reasonably large, the shape of the sampling
distribution is approximately normal.
A follow-up analysis of these simulated sampling distributions shows students that about 95% of the sample proportions lie with a distance of
![]()
from the true value of p. This distance, sometimes called the margin of error, is useful in estimating where a population proportion might lie in relation to a sample proportion for a new study in which the true population proportion is not known.
Properties of the sampling distribution for a sample mean can be illustrated in a similar way. Figure 2 shows the distribution of the sample mean when samples of 10 random digits are selected (with replacement). This models sampling from a population that has a uniform distribution with equal numbers of 0’s, 1’s, 2’s, and so on.
![]() |
This distribution can be described as approximately normal with a mean of 4.53 (the mean of the 200 sample means from the simulation) and a standard deviation of 0.87 (th standard deviation of the 200 sample means). The population described has a mean of 4.5 and a standard deviation of 2.9. By studying this sampling distribution and others produced similarly students will see patterns and can see that the standard deviation of the sampling distribution for a sampling mean is approximately the population standard deviation divided by the square root of the sample size, in this case 2.9/√10 = 0.92.
The margin of error in estimating a population mean from a single random sample is approximately
![]()
where s denotes the sample standard deviation for the observed sample. The sample mean should be within this distance of the true population mean about 95% of the time in repeated random sampling.
Regression analysis refers to the study of relationships between variables. If the “cloud” of points in a scatterplot of paired numerical data has a linear shape, a straight line may be a realistic model of the relationship between the variables under study. The least squares line runs through the center (in some sense) of the cloud of points. Residuals are defined to be the deviations in the y direction between the points in the scatterplot and the least squares line; spread is now the variation around the least squares line, as measured by the standard deviation of the residuals. When using a fitted model to predict a value of y from x, the margin of error depends on the standard deviation of the residuals.
The key to statistical inference is the sampling distribution of the sample statistic, which provides information on the population parameters being estimated or the treatment effects being tested. As described in the previous section, knowledge of the sampling distribution for a statistic, like a sample proportion or sample mean, leads to a margin of error that provides information about the maximum likely distance between a sample estimate and the population parameter being estimated. Another way to state this key concept of inference is that an estimator plus and minus the margin of error produces an interval of plausible values for the population parameter, any one of which could have produced the observed sample result as a reasonably likely outcome.
Do the treatments differ? In analyzing experimental data this is one of the first questions asked. This question of difference is generally posed in terms of differences between the centers of the data distributions (although it could be posed as a difference between 90th percentiles or any other measure of an aspect of a distribution). Because the mean is the most commonly used statistic for measuring center of a distribution, this question of differences is generally posed as a question about the differences in means. The analysis of experimental data, then, usually involves a comparison of means.
Unlike sample surveys, experiments do not depend on random samples from a fixed population. Instead, they require random assignment of treatments to pre-selected experimental units. The key question, then, is of the following form: “Could the observed difference in treatment means be due to the random assignment (chance) alone, or can it be attributed to the treatments administered?”
The following examples are designed to illustrate the points made above by carefully considering the four phases of a statistical analysis (question, design, analysis, interpretation) in a variety of real contexts.
A survey of student music preferences was introduced at level A, where the analysis consisted of making counts of student responses and displaying the data in a bar graph. At level B the analysis was expanded to considering relative frequencies of preferences and cross-classified responses for two types of music displayed on a two-way table. The following survey questions were suggested:
What kinds of music do you like? A Survey
1. What kinds of music do you like?
a. Do
you like country music? Yes or
No
b. Do
you like rap music? Yes or
No
c. Do
you like rock music? Yes or
No
2. Which of the following types of music do you like to most? Select only one.
Country Rap/Hip
Hop Rock
It was proposed that the survey questions be asked of a representative sample of students from a school so that generalizations can be made to the school level. This could be accomplished by selecting a simple random sample of 50 students from the school. The results can then be generalized to the school (but not beyond) and the level C discussion will center on basic principles of generalization, or statistical inference.
The level C analysis begins with a two-way table of counts that summarizes the data on two of the questions, “Do you like rock music?” and “Do you like rap music?” The table provides a way to examine the responses to each question separately as well as a way to explore possible connections (association) between the two categorical variables. The two-way table is shown in Table 1.
TABLE
Like Rock Music?
Yes No Row Totals
Yes 25 4 29
Like Rap Music?
No 6 15 21
Column Totals 31 19 50 = Grand Total
As demonstrated at level B, there are a variety of ways to interpret data summarized in a two-way table such as Table 1. Some examples based on all 50 students in the survey include:
25 of the 50 students (50%) liked both rap and rock music.
29 of the 50 students (58%) liked rap music.
19 of the 50 students (38% ) did not like rock music.
One type of statistical inference relates to conjectures (hypotheses) made before the data were collected. Suppose a student says “I think more than 50% of the students in the school like rap music.” The statistical question is, then, do the sample data support this claim or not. One way to arrive at an answer is to set up a hypothetical population that has 50% successes (like even and odd digits produced by a random number generator) and repeatedly take samples of size 50 from it, each time recording the proportion of even digits. The sampling distribution of proportions so generated will be similar to the one in Figure 3.
![]() |
A sample proportion greater than or equal to the observed .58 occurs 12 times out of 100. This suggests that the result of .58 is not very unusual in sampling from a population with .5 as the “true” proportion of students who like rap music, so there is little evidence to support the student’s claim. A value of .50 (or maybe even something smaller) is plausible based on what was observed in the sample. The fraction of times the observed result is matched or exceeded (.12 in this investigation) is called the approximate p-value. A small p-value would have supported the student’s claim.
Suppose another student hypothesized that only 40% of the students in the school like rap music. Now, the samples of size 50 must be repeatedly selected from a population that has 40% successes (like random digits with 1 through 4 representing success and the other digits representing failure). Figure 4 shows the results of one such simulation. The observed result of .58 was reached only 1 time out of 100 (approximate p-value = .01). It is not likely that a population in which 40% of the students like rap music would have produced a sample proportion of 58% in a random sample of size 50. This student’s claim can be rejected!

Another way of stating the above is that .5 is a plausible value for the true
population proportion, based on the sample evidence, but .4 is not. A set of plausible values can be found by
using the margin of error, as explained previously. The margin of error is estimated to be about
![]()
Thus, any population proportion between .58 - .14 = .44 and .58+.14=.72 can be considered a plausible value for the true proportion of students who like rap music. Notice that .5 is well within this interval but .4 is not.
Another type of question to ask about the student preferences on music is of the form “Do those who like rock music also tend to like rap music?” In other words, is there an association between liking rock music and liking rap music?
The same data from the random sample of 50 students is used to answer this question.
According Table 1 a total of 31 students in survey liked rock music. Among those students, the proportion who also like rap music is (25/31) = .81. Among the 19 students who do not like rock music, 4/19 = .21 is the proportion who like rap music. The large difference between these two proportions (.60) suggests that most students who like rock also like rap. There appears to be a strong association between liking rock music and liking rap music.
But could this association simply be due to chance (as introduced through the random sampling). If there were no association between the two groups the 31 students who like rock would behave as a random selection from the 50 in the sample. To simulate this situation we make up a population of 29 1’s (those who like rap) and 21 0’s (those who do not like rap) and mix them together. Then, we select 31 (representing those who like rock) at random and see how many 1’s (those who like rap) we get. It is this entry that goes into the (yes, yes) cell of the table, and from that data the difference in proportions can be calculated. Repeating the process many times produces a simulated sampling distribution for the difference between two proportions, as shown in Figure 5.
![]() |
The observed difference of .6 was never reached in 100 trials, indicating that the observed difference cannot be attributed to chance alone. There is evidence of a real association between liking rock music and liking rap music.
What is the effect of different lengths of light and dark on the growth of radish seedlings? This question was posed for a class of biology students who then set about designing and carrying out an experiment to investigate the question. All possible relative lengths of light to dark cannot possibly be investigated in one experiment, so the students decided to focus the question on three treatments: 24 hours of light, 12 hours of light and 12 hours of darkness, and 24 hours of darkness. This covers the extreme cases and one in the middle.
With the help of a teacher, the class decided to use plastic bags as "growth chambers." The plastic bags would permit the students to observe and measure the germination of the seeds without disturbing them. Two layers of moist paper towel were put into a disposable plastic bag, with a line stapled about 1/3 of the way from the bottom of the bag (side opposite opening) to hold the paper towel in place and provide a seam to hold the radish seeds in place for germination. Although three growth chambers would be sufficient to examine the three treatments, this class made four growth chambers with one designated for the 24 hours of light treatment, one for the12 hours of light and 12 hours of darkness treatment, and two for the 24 hours of darkness treatment. One hundred twenty seeds were available for the study. Thirty of the seeds were chosen at random and placed along the stapled seam of the 24 hours of light bag. Thirty seeds were then chosen at random from the remaining 90 seeds and placed in the bag the 12 hours of light and 12 hours of darkness bag. Finally, 30 of the remaining 60 seeds were chosen at random and placed in one of the 24 hours of darkness bags and the final 30 seeds placed in the final bag. After three days, the lengths of radish seedlings for the germinating seeds were measured and recorded. These data are provided in Table 2; the measurements are in millimeters.
|
Treatment 1 24 light |
Treatment 2 12 light, 12 dark |
Treatment 3 24 dark |
|
|
2 |
3 |
5 |
20 |
|
3 |
4 |
5 |
20 |
|
5 |
5 |
8 |
22 |
|
5 |
9 |
8 |
24 |
|
5 |
10 |
8 |
25 |
|
5 |
10 |
8 |
25 |
|
5 |
10 |
10 |
25 |
|
7 |
10 |
10 |
25 |
|
7 |
10 |
10 |
25 |
|
7 |
11 |
10 |
26 |
|
8 |
13 |
10 |
29 |
|
8 |
15 |
11 |
30 |
|
8 |
15 |
14 |
30 |
|
9 |
15 |
14 |
30 |
|
10 |
17 |
15 |
30 |
|
10 |
20 |
15 |
30 |
|
10 |
20 |
15 |
30 |
|
10 |
20 |
15 |
31 |
|
10 |
20 |
15 |
33 |
|
10 |
20 |
15 |
35 |
|
10 |
21 |
16 |
35 |
|
10 |
21 |
20 |
35 |
|
14 |
22 |
20 |
35 |
|
15 |
22 |
20 |
35 |
|
15 |
23 |
20 |
35 |
|
20 |
25 |
20 |
36 |
|
21 |
25 |
20 |
37 |
|
21 |
27 |
20 |
38 |
|
|
|
20 |
40 |
The first step is to make a plot of the data to look for patterns and any unusual departures from pattern. Box plots are ideal for comparing data from more than one treatment, as you can see in Figure 6. Both the centers and the spreads increase as the amount of darkness increases. There are three outliers (one at 20mm and two at 21mm) in the Treatment 1 (24 hours of light) data. Otherwise, the distributions are fairly symmetric, which is good for statistical inference.
![]() |
Treatment 1is 24 hours of light, Treatment 2 is 12 hours of light and 12 of darkness, and Treatment 3 is 24 hours of darkness.
The summary statistics for these data are as follows:
Treatment n
MEAN MEDIAN ST DEV
1 28
9.643 9.500 5.027
2 28
15.82 16.00 6.76
3 58
21.86 20.00 9.75
Experiments are designed to compare treatments, usually by comparing means. So, the original question on the effect of different lengths of light and dark on the growth of radish seedlings can be turned into two questions on treatment means. Is there evidence that 12 hours of light and 12 hours of dark (Treatment 2) has a higher mean than 24 hours of light (Treatment 1)? Is there evidence that 24 hours of dark (Treatment 3) has a higher mean than 12 hours of light and 12 hours of dark (Treatment 2)? Based on the box plots and the summary statistics, it looks like the means differ, but students should now realize that this casual observation can be verified by ruling out chance as a possible explanation. The Treatment 2 mean is 6.2 mm larger than the Treatment 1 mean. If there is no real difference between the two treatments, then the observed difference must be due to the random assignment of seeds to the bags; one bag was simply lucky enough to get a preponderance of good and lively seeds. But, if a difference this large (6.2 mm) is likely to be the result of randomization alone, then we should see differences of this magnitude quite often if we repeatedly re-randomize the measurements and calculate a new difference in observed means. This, however, is not the case, as is seen in Figure 7. This figure was produced by mixing the measurements from Treatments 1 and 2 together, randomly splitting them into two groups of 28 measurements each, recording the difference in means for the two groups, and repeating the process 200 times.
![]() |
The observed difference of 6.2 mm was exceeded only one time in 200 tries, for an approximate p-value of 1/200. This is very small, and gives strong evidence to support the hypothesis that there is a real difference between the means for Treatments 1 and 2. The observed difference of 6.2 mm is very unlikely to be due simply to chance.
In a comparison of the means for Treatments 2 and 3 the same procedure is used, except that the combined measurements are split into groups of 28 and 58 each time. Again, the observed difference of 6 mm was exceeded only 1 time out of 200 trials (see Figure 8), giving strong evidence of a real difference between the means for Treatments 2 and 3. In summary, all three means show real differences that cannot be explained by the random assignment of seeds to the bags; the more hours of darkness, the greater the growth of the seedling, at least for these three light versus darkness times.
FIGURE 8

What is the density of the earth? This is a question the intrigued the great scientist Henry Cavendish, who attempted to answer the question in 1798.
Cavendish estimated the density of the earth by using the crude tools that were available to him at the time. He did not literally take a random sample; he measured on different days and at different times, as he was able. But, the density of the earth does not change over time, so his measurements can be thought of as a random sample of all the measurements he could have taken on this constant. The variation in the measurements is due to his measurement error, not to changes in the earth’s density. The earth’s density is the constant that is being estimated.
This is a typical example of an estimation problem that occurs in science. There is no real “population” of measurements that can be sampled; rather, the sample data is assumed to be a random selection from the conceptual population of all measurements that could have been selected. At this point there may be some confusion between an “experiment” and a “sample survey” because Cavendish actually conducted a scientific investigation to get his measurements. The key, however, is that he conducted essential the same investigation many times with a goal of estimating a constant, much like interviewing many people to estimate the proportion who favor a certain candidate for office. He did not randomly assign treatments to experimental units for the purpose of comparing treatment effects.
A famous Cavendish dataset contains his 29 measurements of the density of the earth, in grams per cubic centimeter. The data are shown below. [Source: http://lib.stat.cmu.edu/DASL/]
|
Density |
5.46 |