Some algebraic properties of z-scores

Load packages

library(tidyverse)

Motivation

Z-scores (z-values) are a useful and widely employed tool to gauge and compare measurements. For instance, z-scores help to compare the relative position of some measurements with respect to their distributions. In this post, we will prove some basic (algebraic) properties of z-values. There’s nothing new to that, it’s just I’d like to have it neat and concise somewhere to quickly find it. I’ll add some explanation for the ease of reception.

Definition of z-scores

A z-score measures the distance of a measurement to its (arithmetic) mean value. This distance is then set in relation to the “standard” distance of those measures to their mean.

That is,

\[ z = \frac{x_i - \bar{x}}{s_x} \]

where \(s_x\) is the standard deviation of \(x\), and \(x_i\) is one measurement of the variable \(X\). \(\bar{x}\) represents the average (mean) value.

Let’s call the distance from the mean \(d\), then the difference of measurement \(i\) to the mean is

\[ d_i = x_i - \bar{x} \]

It doesn’t really matter whether we subtract \(x_i\) from the mean or the other way round. The absolute value of \(d_i\) remains the same; it’s just that measurement would appear as smaller or larger than the mean.

Now we express \(d\) in terms of the standard difference (sd) of the measurements to its mean:

\[ z = \frac{d}{s_x} \]

Intuition

Z-scores have a mean of zero

By calculating \(z\) we subtract the mean value, i.e. \(\bar{x}\) from each measurement. As a consequence, the mean z-score will have a mean that is exactly \(\bar{x}\) lower than the initial mean (ie., \(\bar{x}\)). Subtracting from a value exactly the same value will yield zero.

Z-scores have a SD of 1

Each measurement is set in relation to the SD of \(X\), ie., we devide each \(d_i\) by \(s_x\). In consequence, the sum of all measurements will be lower then the initial sum of \(x_1, x_2, \cdots\) of exactly \(s_x\). Dividing a value by the same value yields 1. It that’s true for the sum, then it will also be true for the mean. As the SD is basically the mean of the ratio \(\frac{d}{s_x}\), the SD of \(Z\) will also equal 1.

Proofs

Z-scores have a mean of zero

In order to show that \(\bar{x}=0\) it suffices to show that \(\sum{x}=0\). That’s because if a sum equals zero, the mean must equal zero too.

\[ \begin{align} \sum{z} &= \sum \frac{x_i - \bar{X}}{s_x} \\ &= \frac{1}{s_x} \sum{x_i - \bar{x}} \\ &= \frac{1}{s_x} \sum{x_i} - n\cdot \bar{x} \\ &= \frac{1}{s_x} (n \cdot \bar{x} - n\cdot \bar{x}) \\ &= 0 \end{align} \]

Note that the sum of alle measurements equals \(n\)-times the mean, i.e.,

\[ \sum{x_i} = n\bar{x} \\ \]

This fact stems for the definition of the mean:

\[ \bar{x} = \frac{1}{n} \sum x_i \qquad \text{multiply both sides by }n \]

\[ \Leftrightarrow n\bar{x} = \sum x_i \\ \]

Z-scores have a SD (and a variance) of 1

Remember that SD of some variable \(x\) is defined as:

\[ sd(x) = \frac{1}{n}\sum{ \left( ( x_i - \bar{x} ) ^2 \right)} \]

If \(\bar{x}=0\) – as it is the case for z-scores – this simplifies to

\[ sd(x) = \frac{1}{n}\sum{ ( x_i ) ^2 } \]

In order to show that the SD of \(Z\) equals 1 it suffices to show that the mean equals 1. That’s because if a sum equals 1, the mean must equal 1 too.

Note that the SD of Z is defines such that

\[ s(Z) = \frac{1}{n} \sum z_i^2 \]

In comparison, let \(sum(Z)\) be defines such that

\[ sum(Z)= \sum z_i^2 \]

Hence, if \(sum(Z) = 1\) it follows that \(sd(Z) = 1\).

\[ \begin{align} \sum Z_i^2 &= \sum \left( \frac{x_i - \bar{x}}{s_x} \right)^2 \\ &= \frac{1}{s_x^2} \sum(x_i - \bar{x})^2 \qquad \text{note that }ns_x^2 = \sum(x_i - \bar{x})^2\\ \end{align} \]

Hence

\[ \begin{align} &= \frac{1}{s_x^2}n s_x^2 \\ &= n \end{align} \]

We arrive at the fact the the sum of the squared z-scores equals \(n\). Dividing by \(n\) (or \(n-1\)) yields 1.

In other words, the SD of \(Z\) equals 1.

Note that

\[ sd(Z) = 1 \Leftrightarrow Var(Z)=1 \]

where \(Var\) is the Variance. Whenever sd equals 1 the variance will equal 1, too.

We do not distinguish between the estimated population variance where we standardize be \(n-1\) and the sample variance where we standardize by \(n\). The principle remains the same, and the conclusions remain the same.

The average squared z-score equals 1

As discussed above the avarage squared z-score equals the SD of the z-score for a given variable \(x\):

\[ \frac{1}{n} \sum z_i^2 = 1 \]