What Does It Mean If a Statistic Is Resistant?
Here's a scenario: you've got a dataset of 100 salaries in a company. Now you calculate the average salary. One person — the CEO — makes $5 million. Which means ninety-nine people make between $40,000 and $80,000. That one outlier inflates the number to something like $90,000, which doesn't represent what most employees actually earn.
But if you used the median salary instead, you'd get something around $55,000 — much closer to reality.
That right there is the difference between a statistic that gets wrecked by outliers and one that shrugs them off. The median is resistant. The mean is not Worth keeping that in mind. Less friction, more output..
What Is a Resistant Statistic?
A resistant statistic is a measure that stays relatively stable even when your data contains extreme values — outliers, typos, or weird anomalies that don't reflect the broader pattern. It "resists" being pulled around by those outliers.
The most common example people learn first is the median versus the mean.
- The mean (average) is non-resistant. Add one absurdly large value to your dataset and the mean skyrockets. Add one absurdly small value and it plummets. It's sensitive to every single number in your data, including the weird ones.
- The median (the middle value when you sort everything) is resistant. You could change the highest value from $5 million to $50 million and the median wouldn't budge an inch. It only cares about position, not magnitude.
This isn't just some academic distinction. It shapes how you interpret data every single day Small thing, real impact..
Other Resistant Measures Worth Knowing
The median isn't the only resistant player in town. A few others show up regularly:
- Trimmed mean: You calculate the mean after lopping off a certain percentage of the highest and lowest values (say, the top 5% and bottom 5%). This gives you something between a mean and a median — the benefits of both.
- Winsorized mean: Similar idea, but instead of removing outliers, you replace them with the nearest "acceptable" value. So if your top 5% are wildly high, you treat them all as if they were just at the 95th percentile threshold.
- Interquartile range (IQR): A resistant measure of spread, unlike the standard deviation, which getsinflated by outliers. The IQR simply looks at the range between the 25th and 75th percentiles — the middle 50% of your data.
What Does "Resistant" Actually Mean, Mathematically?
If you want the formal definition: a statistic is resistant if the result doesn't change much — or doesn't change at all — when you modify a small portion of your data, even drastically And that's really what it comes down to..
In more technical terms, the breakdown point of a statistic is the smallest proportion of contaminated data that can cause the statistic to take on an arbitrarily large (or small) value. The median has a breakdown point of 50% — you could corrupt half your data and the median would still be reasonable. The mean's breakdown point is just 1/n (essentially zero for any real dataset). One bad apple spoils the whole bunch But it adds up..
No fluff here — just what actually works It's one of those things that adds up..
Why It Matters
Here's why this matters in practice: most real-world data is messy. You will encounter outliers. They come from measurement errors, data entry mistakes, genuinely unusual cases, or just the natural skew of real phenomena like income or real estate prices Most people skip this — try not to..
If you blindly use non-resistant statistics without checking your data, you'll make decisions based on numbers that don't reflect reality.
Think about the salary example again. If you're a job candidate negotiating pay and the company says "the average salary here is $90,000," you might feel good about that offer. But if that number is being dragged up by one executive making millions, you're about to get lowballed. The median would have told you the truth Which is the point..
This plays out everywhere:
- Housing prices: The median home price in a city tells you more about what typical buyers face than the average, because a few mansions can skew the average dramatically upward.
- Test scores: If most students scored 70-85% but one person got a perfect score and five people got zeros, the average might look deceptively average.
- Business metrics: Revenue averages can look great while most deals are underperforming, if a few huge contracts are carrying the weight.
The point isn't that the mean is "bad.Practically speaking, " It's that you need to know which tool fits your situation. Using a non-resistant statistic on data with outliers is like using a hammer on a screw — it might work sometimes, but you'll cause damage.
How It Works
Understanding resistance comes down to how a statistic uses the data values themselves.
The mean adds up every single number and divides by the count. Every value gets a vote. So if you have 99 values at 50 and one value at 5,000, that one outlier contributes 5,000 to the sum while each of the 99 normal values contributes 50. Which means the outlier has nearly 100 times more influence than any single typical value. That's why it swings the result so much That's the part that actually makes a difference..
The median, by contrast, only cares about rank order. That's why it finds the middle. It doesn't care if that middle value is 50 or 5,000 — it just needs to know where it sits in the sorted list. Changing the extreme values doesn't change their rank, so the median stays put Not complicated — just consistent..
Trimmed and Winsorized means work by giving you a compromise. You keep the general spirit of the mean (it uses all the actual values, not just position) but you limit how much damage outliers can do by capping or removing them Most people skip this — try not to. Surprisingly effective..
When to Use Which
Use the mean when your data is roughly symmetric and doesn't have significant outliers. If you've checked for outliers and your distribution looks normal (bell-shaped), the mean gives you more information because it uses every data point.
Use the median when your data is skewed or contains outliers you can't explain or correct. This is especially true for things like income, home prices, response times, and other variables that naturally have a floor but no ceiling And that's really what it comes down to..
Use a trimmed mean when you want to acknowledge that outliers exist but you're not sure they're all garbage. Maybe some are real. Trimming gives you a middle ground.
Common Mistakes / What Most People Get Wrong
The biggest mistake is assuming the mean is always the "correct" average. It's not wrong — it's just answering a different question. That's why the mean tells you about the total divided by the count. Now, the median tells you about the typical case. Neither is universally better. The mistake is using one without thinking about what question you're actually trying to answer.
The official docs gloss over this. That's a mistake.
Another error: not checking for outliers at all. Before you report any statistic, you should glance at your data's distribution. Think about it: a simple box plot or just sorting the values and looking at the extremes will tell you whether the mean is trustworthy. Because of that, most people skip this step because they learned that statistics are supposed to be "objective" and automatic. They're not Took long enough..
A subtler mistake: confusing "resistant" with "always better." Some people hear that the median is resistant and decide to use it for everything. But if your data genuinely doesn't have outliers, the median can actually waste information. You're throwing away the extra insight the mean provides.
Also worth noting: resistance isn't the only property that matters. Day to day, statistics also differ in efficiency (how well they use the data), bias (whether they systematically miss the true value), and simplicity (how easy they are to explain). The mean wins on efficiency for clean data. In practice, the median wins on resistance. Neither is perfect Most people skip this — try not to..
Practical Tips / What Actually Works
Here's what I'd do if you're analyzing any dataset:
- Plot your data first. A histogram or box plot takes 30 seconds and shows you whether you have outliers. Don't calculate anything until you've looked.
- Report both the mean and median when there's a meaningful difference. This is honest reporting. If they diverge, that itself is interesting information about your data's shape.
- Ask what question you're answering. "What's the typical experience?" points to median. "What's the total per unit?" points to mean.
- Consider a trimmed mean if you want something in between. A 10% trimmed mean (dropping the top and bottom 10%) is a common choice that balances resistance with information.
- Don't "clean" outliers without thinking. Sometimes outliers are errors and should be fixed or removed. Sometimes they're the most important data points. Know which situation you're in.
FAQ
Is the median always resistant? Yes. The median has a 50% breakdown point, meaning you could change up to half your data arbitrarily and the median would still be reasonable. It's the most resistant common measure Easy to understand, harder to ignore..
What's the difference between resistant and reliable? In statistics, these terms overlap but aren't identical. "Resistant" specifically means the statistic doesn't change much when data is modified. "dependable" is broader — it means the statistic performs well even when assumptions (like normality) are violated. All resistant statistics are dependable, but some solid statistics aren't technically resistant And that's really what it comes down to..
Can a mean ever be resistant? Not in the traditional sense. Even so, a trimmed mean or Winsorized mean is a modified version of the mean that gains resistance. That's why these variants exist.
Why do textbooks use the mean so much? Because in ideal conditions (normal distribution, no outliers), the mean is the most efficient estimator — it uses all available information and has the smallest possible error. The problem is those ideal conditions don't always hold in real data.
Should I always use the median for salary or price data? Generally, yes — especially for one-time looks at a dataset. But if you're tracking changes over time, be consistent. And if you've verified your data has no significant outliers, the mean is fine. The key is making an informed choice rather than defaulting to one without looking.
Closing
The concept of resistant statistics comes down to this: some numbers in your data matter more than others, depending on what you're measuring. Knowing whether your statistic is sensitive or resistant to outliers isn't a technical nicety — it's what separates a number that tells the truth from one that misleads you But it adds up..
Next time you see an average, pause for a second. Ask yourself what's hiding in the data. The answer might change everything And that's really what it comes down to..