3. Population Parameters and Sample Statistics
Measuring population and sample characteristics (Parameter and statistic)
A very important concept is how to describe the numerical characteristic of populations and samples. For example, if we want to understand the income of all US citizens, our population is all US citizens. Let's say that we collect data on the income of every single US citizen and find that the median income of all US citizens is $55,000 per year. What do we call the measure (the $55,000) that describes our population (all US citizens)? A parameter!
- Parameter
- A measure of a characteristic of a population (includes measures like mean, median, mode, standard deviation, etc.)
- The “true” value we want to know
- A fixed measure that does not change
However, realistically, collecting this data from every single US citizen is not feasible. When this is the case we must rely on samples. Say we collect a random sample of 10,000 US citizens and find that the median income of our sample is $58,000. What do we call the measure ($58,000) that describes our sample (10,000 US citizens)? A statistic!
Now say we gathered data from a new sample of 10,000 US citizens. We calculate the median income of our sample (the statistic of this sample) and find that it is $53,000. Notice that the statistic of our new sample is different than the statistic of our first sample. That is expected. Remember, this is because each random sample is, by chance, slightly different.
- Statistic
- A measure of a characteristic of a sample (includes measures like mean, median, mode, standard deviation, etc.)
- Unique to each sample, so it changes depending on the sample
In the real world, we often do not know the population parameter, but it’s precisely what we want to know. We therefore must rely on samples and statistics to try to figure out what the population parameter likely is. This is called inference! We can never be completely sure of the population parameter when we do this, but we can get close, or at least figure out a range that we are pretty sure includes the population parameter.