Standard deviation is the positive square root of the variance and the most widely used measure of dispersion in statistical analysis.
## Core concept
Standard deviation measures how spread out data values are from their arithmetic mean. It is denoted by σ (sigma) for population data and s for sample data. Unlike range or quartile deviation, standard deviation uses all data points and is particularly sensitive to extreme values.
Why it matters: - It is the most reliable measure of dispersion for making comparisons between datasets - Forms the foundation for probability distributions and hypothesis testing - Used in investment risk analysis, quality control, and forecasting - Directly related to the normal distribution (68–95–99.7 rule)
## Formula / rule
For ungrouped data (population): $$\sigma = \sqrt{\frac{\sum(x_i - \bar{x})^2}{N}}$$
For ungrouped data (sample): $$s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}$$
For grouped data (population): $$\sigma = \sqrt{\frac{\sum f(x_i - \bar{x})^2}{N}}$$
where: - $x_i$ = individual values or class midpoints - $\bar{x}$ = arithmetic mean - $f$ = frequency - $N$ = total number of values (population) - $n$ = sample size
Shortcut formula (reduces calculation time): $$\sigma = \sqrt{\frac{\sum x_i^2}{N} - \left(\frac{\sum x_i}{N}\right)^2}$$
## Common exam applications
- Comparing variability: Two datasets with the same mean can have different standard deviations; higher SD indicates greater dispersion
- Coefficient of Variation (CV): Standardised comparison when means differ: $CV = \frac{\sigma}{\bar{x}} \times 100\%$
- Normal distribution: Approximately 68% of data lies within ±1σ from mean; 95% within ±2σ
- Combined SD: When combining two groups, use weighted formula incorporating individual SDs and means
- Quality control: Setting control limits in manufacturing (e.g., ±3σ from target)
## Worked example
Question: Calculate the standard deviation of the dataset: 10, 12, 14, 16, 18
Solution: - Mean $\bar{x} = \frac{10+12+14+16+18}{5} = \frac{70}{5} = 14$
| $x_i$ | $x_i - \bar{x}$ | $(x_i - \bar{x})^2$ | |-------|------------------|-------------------| | 10 | –4 | 16 | | 12 | –2 | 4 | | 14 | 0 | 0 | | 16 | 2 | 4 | | 18 | 4 | 16 | | | | $\sum = 40$ |
$$\sigma = \sqrt{\frac{40}{5}} = \sqrt{8} = 2.83$$
## Common mistakes
- Using N–1 for population data: Use N–1 only for samples (unbiased estimator)
- Confusing variance with SD: Variance = $\sigma^2$; SD = $\sigma$. Always take the square root
- Ignoring units: SD is in the same unit as the original data; variance is in squared units
- Arithmetic errors in grouped data: Always use class midpoints, not class boundaries
- Assuming lower SD = better: Context matters; in profit analysis, lower SD is good; in market comparison, interpretation depends on business goals
## Key relationship
$$\text{Variance} = (\text{Standard Deviation})^2$$
Standard deviation is preferred over variance in practice because it is expressed in the same units as the data and is easier to interpret.