What Is Statistical Significance?
Friday, November 11, 2011| by Michaela Mora | ![]() |

I hear questions related to statistical significance on a daily basis. It is usually some variation of “How much sample do we need to be significant?” which often reflects some confusion about the term.
Statistical significance is a concern when we are interested in detecting differences not due to chance between two or more groups (people, objects, ads etc.) being compared.
As sample size increases, the margin of error around a percent or a mean get smaller and we get, not only more precise estimates, but also more sensitivity to detect differences that are not due to chance. In a large sample, a difference of 1 or 2 percentage points may be significant, while in a smaller sample, where there is more variation, we may need to see more than 10 percentage points to detect significant differences.
In survey research, we often talk as if the results are finite point estimates when in fact we should be talking in ranges since there is always a margin of error around any estimate. So if the margin of error is +/-3% and we get a value of 50% for a variable, it means that the true value of the variable should be between 47% and 53%.
Now, if we measure the same variable in another group with a sample size where the margin of error is +/-5% and we get a value of 57% for the same variable, it means that the true value is expected to be between 52% and 62%. Despite the 7 percentage point differences, which seems large, we can’t say that it is statistically significant because there is some overlap between the margin of error range of each group (47% – 53% and 52%- 62%) and the true value of the variable in the second group could be 52% or 53% which are values included in the first group’s margin of error range.
How confident are we about this? We often say 95% confident, which means that if we repeat the study 100 times, we can expect similar results 95 times and be wrong 5 times. This is called Confidence level and the margin of error range is called Confidence Interval. In short, we want to make sure the true value falls within the same range every time we repeat the study. Unfortunately, statistical confidence has an inverse relation with estimate precision. If you want to be 99% certain then you have to allow for a larger confidence interval that will include the true value.
If there is no comparative analysis involved, it doesn’t make any sense to talk in terms of statistical significance. However, we are still concerned about estimate precision of results in total. We want our margin of error to be as small as our budget and tolerance for risk allow. To get greater precision, we need larger sample, which in turn costs more money. To be more certain, you sacrifice some precision. There is always a trade-off to make.
Next time when you considering sample size for a survey get ready to answer these questions:
Unfortunately, the difference between the sample you want and the one you can afford is oftent significant (pun intended), so budget questions are always in the mix. For more help on calculating sample size and margin of error, use our Sample Size and Margin of Error Calculators.

I recently got a request for advice via Twitter with this question: What % of segment needs to be interviewed to gain reliable insight for product optimization?
Reliability has to do with consistency of results across data collection instruments and points in time when the data is collected. I see this question being more about validity and representativeness which is related to population heterogeneity and sample source.
To determine the sample size of a segment we need to ask:
Depending on budget and timeline constraints you could use two approaches to sampling for segments:
As you can see, estimating the sample size for a segment is not different from estimating the size for the total sample and there is no magical % to determine how large the sample size should be. Sorry.
| by Michaela Mora | ![]() |

I meet many clients who worry about sample size trying to ensure they get an enough large sample so that statistically significant differences can be found and inferences to a larger population can be made, but they often don’t know that these statistical tests were meant to work within the probability sampling theory framework.
Since the advent of online panels and the increase of online surveys using panel-provided samples, the issue of testing for significant differences using standard parametric tests has become a moot point in many research studies.
Nowadays many of the surveys conducted online use samples provided by online panels, but these are mostly convenience samples (non-probability). The populations of online panels include respondents who are willing to participate in studies, excluding those unwilling to be part of the panel who may be members of the target population we are after.
In probability sampling, each possible respondent from the target population has a known probability to be chosen. Probability sampling helps us to avoid some of the selection biases that can make a sample not representative of the target population. For more on this read Does A Large Sample Size Guarantee A Representative Sample?
A single probability sample doesn’t guarantee to be representative of a target population, but we can quantify how often samples will meet some criterion of representativeness. This is the notion behind confidence intervals. The probability sampling procedure guarantees that each unit in the population of interest could appear in the sample.
By taking into account all possible random samples that can be taken from a population, we can estimate how often the true value of an estimate can be expected to be within a specific range of values. So, when we talk about a 95% confidence interval, this really means that the true value of a particular variable is expected to fall within an interval of values 95 out of 100 times we repeat the procedure. When an opinion poll indicates that 50% of people are in favor of a political decision with a +/-3% margin of error at a 95% confidence interval, it is really saying that we can expect that between 47% and 53% of people will be in favor of the decision 95 out 100 times, if we were to repeat the poll. When we test for significant differences, we are looking to see if the value falls outside that range.
Unfortunately, taking a probability sample is hard and costly. For most consumer research studies and social behavior studies, we really don’t know the size of the actual population of consumers behaving in certain ways or consuming certain products, and trying to find out would make the research prohibitively expensive. This is why we often have to settle for convenience samples like the ones offered by online panels. They still can offer valuable insights if designed with care, but again doing statistical testing in a convenience sample is pointless since the assumptions about probability sampling are violated.
Online panels are here to stay, and they will continue to be a source for affordable sample for market research. Research using convenience sample is often better than not research at all if the survey is well designed and screening criteria are used to define the target population.
A more appropriate case for testing statistically significant differences are random samples taken from a customer database, since this is essentially the population frame where we can count all members and estimate their probability to be chosen.
However, if you don’t have a customer database or are interested in surveying non-customers, then use a convenience sample, if that is what your research budget can afford or there is no other way to get to the actual population frame (list to pull the sample from), but don’t fret about testing for significant differences. You may feel more confidence if you are able to replicate the results in repeated surveys, but be always cautious about inferences made from convenience samples since there could be a hidden systematic bias in the data.
It is always important that whenever you use convenience samples you consider the following when analyzing the results:
1. Who is systematically excluded from the sample?
2. What groups are over- or underrepresented in the sample?
3. Have the results been replicated with different samples and data collection methods?
If testing for significant difference gives you peace of mind, even when using convenience samples, do it to confirm the “direction” of the data, but restrain yourself from doing inferences to a larger population.
To learn more about our consumer data service visit Consumer Shopping Behavior Insights. To request consumer shopping behavior data and insights don’t hesitate to contact us.

Determining the sample size is one of the early steps that must be taken in the planning of a survey. Unfortunately, there is no magic formula that will tell us what the perfect sample is since there are several factors we need to think about:
Below is a table illustrating how the margin of error and level of confidence interact with sample size. To get the same level of precision (e.g. +/-3.2%), larger samples are needed as the confidence level increases. For example, if we want to be certain that in 95 out of 100 times the survey is repeated the estimate will be +/- 3.2%, we need a sample of 950.

For more help on calculating sample size and margin of error, use our Sample Size and Margin of Error Calculators.
SAMPLE SIZE CALCULATION CHECK LIST
As a summary, to determine the sample size needed in a survey, we need to answer the following questions:
So the answer to the question “What is the right sample size for a survey?” is: It depends. I hope I gave you some guidance in choosing sample size, but the final decision is up to you. To calculate sample size and margin of error, use our Sample Size and Margin of Error Calculators.
Have you wondered, what sample size is needed to get a representative sample, read Does A Large Sample Size Guarantee A Representative Sample?
To learn more about our consumer data service visit Consumer Shopping Behavior Insights. To request consumer shopping behavior data and insights don’t hesitate to contact us.