NNumvella

Standard Deviation Explained: What It Is and How to Calculate It

By The Numvella Team · 4 min read

Two datasets can have the same average yet look completely different — one tightly clustered, the other wildly spread out. Standard deviation is the number that captures that spread. This guide explains what it measures, how to calculate it by hand, and how to read it.

What standard deviation measures

Standard deviation is the typical distance of a value from the mean. A small standard deviation means the data clusters tightly around the average; a large one means it's widely dispersed. It's expressed in the same units as the data, which is why it's reported far more often than its cousin the variance (which is in squared units).

Population vs sample standard deviation

If your numbers are the entire group you care about, use the population standard deviation, dividing by N. If they're a sample drawn from a larger population, use the sample standard deviation, dividing by N − 1. That "minus one" (Bessel's correction) makes the sample estimate less biased. In most real-world statistics — surveys, experiments, quality control — you have a sample, so the sample version is the usual choice.

The formula

The population standard deviation is σ = √( Σ(x − μ)² ÷ N ): take each value's distance from the mean μ, square it, average those squares, and take the square root. The sample version s uses N − 1 in the denominator instead of N.

Step-by-step example

Take the dataset 2, 4, 6, 8, 10. First the mean: (2 + 4 + 6 + 8 + 10) ÷ 5 = 6. Now each deviation from the mean, squared:

ValueDeviation (x − 6)Squared
2−416
4−24
600
824
10416

The squared deviations sum to 40. For the population standard deviation, divide by N = 5 to get a variance of 8, then take the square root: σ = √8 ≈ 2.83. For the sample standard deviation, divide by N − 1 = 4 to get 10, so s = √10 ≈ 3.16.

Real-world examples

  • Test scores: two classes can average 75%, but the one with the smaller standard deviation has more consistent students.
  • Investing: standard deviation of returns is a common measure of volatility — higher means a bumpier ride.
  • Manufacturing: tight tolerances mean a small standard deviation; a rising one signals a process drifting out of control.

How to interpret it: the 68-95-99.7 rule

For data that follows a normal (bell-curve) distribution, about 68% of values fall within one standard deviation of the mean, about 95% within two, and about 99.7% within three. So if test scores average 70 with a standard deviation of 10, roughly 95% of students scored between 50 and 90. This rule turns an abstract number into an intuitive sense of how unusual a value is.

Standard deviation pairs naturally with the average — if you're choosing between mean and median first, see mean vs median.

Variance vs. standard deviation

Variance is the average of the squared deviations — the number you get just before the final square root. In our example the population variance was 8 and the standard deviation √8 ≈ 2.83. They carry the same information, but variance is in squared units (squared test points, squared dollars), which is hard to interpret. Taking the square root returns the measure to the original units, which is why standard deviation is what gets reported. Variance still matters in further statistics, where squared quantities add together cleanly.

Why we square the deviations

A natural first instinct is to average the raw distances from the mean — but those deviations always sum to zero, because the positives and negatives cancel exactly (that's what makes it the mean). Squaring fixes this by making every deviation positive, and it has a useful side effect: it penalizes large deviations more than small ones, so a single far-out value moves the standard deviation more than several near ones. Taking the square root at the end undoes the squaring's effect on the units.

Standard deviation and the bell curve

Standard deviation becomes especially powerful with normally distributed data, because it lets you locate any value with a z-score: how many standard deviations it sits from the mean. A z-score of +2 means a value is two standard deviations above average — in the top ~2.5% of a normal distribution. Test designers, pollsters, and quality engineers use z-scores to turn a raw number into "how unusual is this?" The 68-95-99.7 rule above is just the most-used set of z-score landmarks.

A quick gut check

Before trusting a calculated standard deviation, sanity-check it against the data's range. For a modest dataset, the standard deviation is usually somewhere between a quarter and a half of the range (the max minus the min). Our example ranged from 2 to 10 — a range of 8 — and the standard deviation was about 2.8 to 3.2, comfortably in that band. If your result is wildly outside it, you've probably mixed up population and sample, forgotten to square the deviations, or mistyped a value.

Frequently asked questions

How do I calculate standard deviation?

Find the mean, subtract it from each value and square the result, average those squares (the variance), then take the square root. Divide by N for a population or N − 1 for a sample.

What is the difference between population and sample standard deviation?

Population divides by N and is used when your data is the whole group; sample divides by N − 1 and is used when your data is a sample estimating a larger population.

What does the 68-95-99.7 rule mean?

For a normal distribution, about 68% of values lie within one standard deviation of the mean, 95% within two, and 99.7% within three.

What is a high standard deviation?

It's relative to the data. A high standard deviation means values are spread far from the mean; a low one means they cluster close to it.