U30 Uses and Abuses of Statistics: Essential Formulas
This topic examines how statistical information can be presented in a misleading way, intentionally or unintentionally. Understanding these common pitfalls is crucial for critically evaluating data claims.
1 Misleading Graphs and Charts
Truncated Y-Axis
Starting the vertical axis at a value $y_0 > 0$ exaggerates small differences. The visual ratio of two bars with heights $h_1$ and $h_2$ becomes $\frac{h_1 - y_0}{h_2 - y_0}$, which is not equal to the true ratio $\frac{h_1}{h_2}$.
2 Misuse of Averages
Choosing a Favorable Measure of Central Tendency
For a data set $x_1, x_2, \dots, x_n$, the mean $\bar{x}$, median, and mode can differ significantly, especially in skewed distributions. The mean is sensitive to extreme values (outliers).
If a data set is $\{1, 2, 2, 3, 100\}$, the mean is $21.6$, while the median is $2$. Using the mean here would be misleading.
3 Correlation vs. Causation
Pearson's Correlation Coefficient
A high correlation $r$, close to $+1$ or $-1$, does not imply causation. The formula for the correlation coefficient is:
A third, lurking variable may cause the relationship. Remember: "$r$ measures linear association, not cause."
4 Biased Sampling and Question Wording
Non-Representative Samples
If a sample of size $n$ is drawn from a population of size $N$, but the sampling frame excludes a subgroup, estimates like the sample proportion $\hat{p}$ will be biased. The true population parameter $p$ satisfies $p = \frac{X}{N}$, but the biased estimate $\hat{p}_{b}$ is $\frac{X_s}{n}$, where $X_s$ is from a non-representative sample.
Leading questions can shift responses. For example, "Do you support the necessary policy A?" vs. "Do you support policy A?" yield different response proportions.
Struggling with complex problems?
Learner App features AI step-by-step analysis technology. Snap a photo and it will guide you through the solution!
Download Learner Now