← Back to Blog

Norm and distance

math > linear-algebra

2025-09-207 min read

#linear-algebra #math #vectors

This blog is based on Jong-han Kim's Linear Algebra

Norm

The Euclidean norm (or just norm) of an nn-vector xx is

x=x12+x22++xn2=xTx\lVert x \rVert = \sqrt{x^2_1 + x^2_2 + \dots + x^2_n} = \sqrt{x^T x}

Properties

for any nn-vectors xx and yy, and any scalar β\beta


RMS value

Mean-square value of nn-vector xx is

x12++xn2n=x2n\frac{x^2_1 + \dots + x^2_n}{n} = \frac{\lVert x \rVert^2}{n}

Root-mean-square value (RMS value) is

rms(x)=x12++xn2n=xn\mathbf{rms}(x) = \sqrt{\frac{x^2_1 + \dots + x^2_n}{n}} = \frac{\lVert x \rVert}{\sqrt{n}}

Norm of block vectors

suppose a,b,ca, b, c are vectors

(a,b,c)2=aTa+bTb+cTc=a2+b2+c2\lVert(a, b, c)\rVert^2 = a^Ta + b^Tb + c^Tc = \lVert a\rVert^2 + \lVert b\rVert^2 + \lVert c\rVert^2

so we have

(a,b,c)=a2+b2+c2=(a2,b2,c2)\lVert(a, b, c)\rVert = \sqrt{\lVert a\rVert^2 + \lVert b\rVert^2 + \lVert c\rVert^2} = \lVert(\lVert a\rVert^2, \lVert b\rVert^2, \lVert c\rVert^2)\rVert

Chebyshev inequality

suppose that kk of the numbers x1,,xn\lvert x_1\rvert, \dots, \lvert x_n\rvert are a\geq a
then kk of the numbers x12,,xn2x^2_1, \dots, x^2_n are a2\geq a^2
so x2=x12++xn2ka2\lVert x\rVert^2 = x^2_1 + \dots + x^2_n \geq k a^2
so we have kx2/a2k \leq \lVert x\rVert^2 / a^2 number of xix_i with xia\lVert x_i\rVert \geq a is no more than x2/a2\lVert x\rVert^2 / a^2

fraction of entries with xia\lvert x_i\rvert \geq a is no more than (rms(x)a2)\left(\frac{\mathbf{rms}(x)}{a}^2 \right)

e.g., no more than 4% of entries can satisfy xi5rms(x)\lvert x_i\rvert \geq 5 \mathbf{rms}(x)


Distance

Euclidean distance between nn-vectors aa and bb is

dist(a,b)=ab\mathbf{dist}(a, b) = \lVert a-b\rVert

agrees with ordinary distance for n=1,2,3n = 1, 2, 3
rms(ab)\mathbf{rms}(a - b) is the RMS deviation between aa and bb


Triangle inequality

Triangle with vertices at positions a,b,ca, b, c
edge lengths are ab,bc,ac\lVert a - b\rVert, \lVert b - c\rVert, \lVert a - c\rVert
by triangle inequality

ac=(ab)+(bc)ab+bc\lVert a - c\rVert = \lVert(a-b) + (b-c)\rVert \leq \lVert a-b\rVert + \lVert b-c\rVert

Standard deviation

for nn-vector xx, avg(x)=1Tx/n\mathbf{avg}(x) = \mathbf{1}^T x / n
de-meaned vector is x~=xavg(x)1 (so avg(x~)=0)\tilde{x} = x - \mathbf{avg}(x)\mathbf{1} \ \left(\text{so} \ \mathbf{avg}(\tilde{x}) = 0 \right) standard deviation of xx is

std(x)=rms(x~)=x(1Tx/n)1n\mathbf{std}(x) = \mathbf{rms}(\tilde{x}) = \frac{\lVert x - (\mathbf{1}^T x/n)\mathbf{1}\rVert}{\sqrt{n}}

std(x)\mathbf{std}(x) gives typical amount xix_i vary from avg(x)\mathbf{avg}(x)
std(x)=0\mathbf{std}(x) = 0 only if x=α1x = \alpha\mathbf{1} for some α\alpha
greek letters μ,σ\mu, \sigma commonly used for mean, standard deviation
a basic formula:

rms(x)2=avg(x)2+std(x)2\mathbf{rms}(x)^2 = \mathbf{avg}(x)^2 + \mathbf{std}(x)^2

The Core Identity

The statistical identity relating the Root Mean Square (RMS), average (avg), and standard deviation (std) of a data vector x\mathbf{x} is given by:

rms(x)2=avg(x)2+std(x)2\text{rms}(\mathbf{x})^2 = \text{avg}(\mathbf{x})^2 + \text{std}(\mathbf{x})^2

This identity is derived from the geometric decomposition of a vector in an n-dimensional space, based on the Pythagorean theorem.

Vector Definitions

Let x\mathbf{x} be a data vector in Rn\mathbf{R}^n and 1\mathbf{1} be the vector of ones in Rn\mathbf{R}^n.

x=[x1x2xn],1=[111]\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}, \quad \mathbf{1} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix}

We define the average (mean) of x\mathbf{x} as μ=avg(x)=1ni=1nxi=1n1Tx\mu = \text{avg}(\mathbf{x}) = \frac{1}{n}\sum_{i=1}^{n} x_i = \frac{1}{n}\mathbf{1}^T\mathbf{x}.

The vector x\mathbf{x} can be decomposed into two fundamental components:

  1. Average Component Vector:
    A vector where each element is the mean, μ\mu. This represents the constant part of the data.
μ1=[μμμ]\mu\mathbf{1} = \begin{bmatrix} \mu \\ \mu \\ \vdots \\ \mu \end{bmatrix}
  1. Deviation (De-meaned) Vector: The vector of deviations from the mean. This represents the fluctuating part of the data.
x~=xμ1=[x1μx2μxnμ]\tilde{\mathbf{x}} = \mathbf{x} - \mu\mathbf{1} = \begin{bmatrix} x_1 - \mu \\ x_2 - \mu \\ \vdots \\ x_n - \mu \end{bmatrix}

Orthogonal Decomposition

The decomposition of x\mathbf{x} is written as x=μ1+x~\mathbf{x} = \mu\mathbf{1} + \tilde{\mathbf{x}}. The key geometric insight is that these two component vectors are orthogonal, meaning their dot product is zero.

Proof of Orthogonality:

(μ1)Tx~=(μ1)T(xμ1)=μ1Txμ21T1\begin{align*} (\mu\mathbf{1})^T \tilde{\mathbf{x}} &= (\mu\mathbf{1})^T (\mathbf{x} - \mu\mathbf{1}) \\ &= \mu\mathbf{1}^T\mathbf{x} - \mu^2\mathbf{1}^T\mathbf{1} \end{align*}

By definition, 1Tx=nμ\mathbf{1}^T\mathbf{x} = n\mu and the dot product 1T1=n\mathbf{1}^T\mathbf{1} = n. Substituting these in:

=μ(nμ)μ2(n)=nμ2nμ2=0\begin{align*} &= \mu(n\mu) - \mu^2(n) \\ &= n\mu^2 - n\mu^2 \\ &= 0 \end{align*}

Since their dot product is zero, the vectors are orthogonal: μ1x~\mu\mathbf{1} \perp \tilde{\mathbf{x}}.

The Pythagorean Theorem & Final Derivation

Because the components are orthogonal, they form a right-angled triangle in Rn\mathbb{R}^n. The Pythagorean theorem applies to their squared norms (lengths):

x2=μ12+x~2\|\mathbf{x}\|^2 = \|\mu\mathbf{1}\|^2 + \|\tilde{\mathbf{x}}\|^2

The statistical terms are the mean of these squared norms. By dividing the entire equation by nn, we derive the final identity. We use the definitions:

The final derivation is:

x2n=μ12n+x~2nrms(x)2=avg(x)2+std(x)2\begin{align*} \frac{\|\mathbf{x}\|^2}{n} &= \frac{\|\mu\mathbf{1}\|^2}{n} + \frac{\|\tilde{\mathbf{x}}\|^2}{n} \\[1em] \text{rms}(\mathbf{x})^2 &= \text{avg}(\mathbf{x})^2 + \text{std}(\mathbf{x})^2 \end{align*}

Mean return and risk


Cheyshev inequality for standard deviation

For any two nn-vectors a\mathbf{a} and b\mathbf{b}, the absolute value of their dot product is less than or equal to the product of their norms.

aTbab\lvert\mathbf{a}^T\mathbf{b}\rvert \leq \lVert\mathbf{a}\rVert\lVert\mathbf{b}\rVert

This is true because the geometric definition of the dot product is abcos(θ)\lVert\mathbf{a}\rVert\lVert\mathbf{b}\rVert\cos(\theta), and the absolute value cos(θ)\lvert\cos(\theta)\rvert cannot exceed 1. Written out in terms of their components, the inequality is:

a1b1++anbn(a12++an2)1/2(b12++bn2)1/2|a_1b_1 + \dots + a_nb_n| \leq (a^2_1 + \dots + a^2_n)^{1/2}(b^2_1 + \dots + b^2_n)^{1/2}

The Triangle Inequality

The norm of the sum of two vectors is less than or equal to the sum of their individual norms. Geometrically, this means the length of any side of a triangle is less than or equal to the sum of the lengths of the other two sides.

a+ba+b\|\mathbf{a}+\mathbf{b}\| \leq \|\mathbf{a}\| + \|\mathbf{b}\|

Proof

This inequality can be proven using the Cauchy-Schwarz inequality as follows.

a+b2=(a+b)T(a+b)=a2+2aTb+b2a2+2aTb+b2(since xx)a2+2ab+b2(by the Cauchy-Schwarz inequality)=(a+b)2\begin{align*} \|\mathbf{a}+\mathbf{b}\|^2 &= (\mathbf{a}+\mathbf{b})^T(\mathbf{a}+\mathbf{b}) \\ &= \|\mathbf{a}\|^2 + 2\mathbf{a}^T\mathbf{b} + \|\mathbf{b}\|^2 \\ &\leq \|\mathbf{a}\|^2 + 2|\mathbf{a}^T\mathbf{b}| + \|\mathbf{b}\|^2 \quad (\text{since } x \le |x|) \\ &\leq \|\mathbf{a}\|^2 + 2\|\mathbf{a}\|\|\mathbf{b}\| + \|\mathbf{b}\|^2 \quad (\text{by the Cauchy-Schwarz inequality}) \\ &= (\|\mathbf{a}\| + \|\mathbf{b}\|)^2 \end{align*}

Taking the square root of both sides completes the proof of the triangle inequality.

a+ba+b\|\mathbf{a}+\mathbf{b}\| \leq \|\mathbf{a}\| + \|\mathbf{b}\|

Derivation of Cauchy-Schwarz inequality

It's cleary true if either aa or bb is 00
so assume α=a\alpha = \lVert a\rVert and β=b\beta = \lVert b\rVert are nonzero.
We have

0βaαb2=βa22(βa)T(αb)+αb2=β2a22βα(aTb)+α2b2=2a2b22ab(aTb)\begin{align*} 0 &\leq \lVert\beta a-\alpha b\rVert^2 \\ &= \lVert\beta a\rVert^2 - 2(\beta a)^T(\alpha b) + \lVert\alpha b\rVert^2 \\ &= \beta^2\lVert a\rVert^2 - 2\beta\alpha(a^Tb) + \alpha^2\lVert b\rVert^2 \\ &= 2\lVert a\rVert^2\lVert b\rVert^2 - 2\lVert a\rVert\lVert b\rVert(a^Tb) \end{align*}

divide by 2ab2\lVert a\rVert\lVert b\rVert to get aTbaba^Tb \leq \lVert a\rVert\lVert b\rVert


Angle

Angle between two nonzero vectors a,ba, b defined as

(a,b)=arccos(aTbab)\angle(a,b) = \arccos{\left(\frac{a^Tb}{\lVert a\rVert\lVert b\rVert}\right)} aTb=abcos((a,b))a^Tb = \lVert a\rVert\lVert b\rVert \cos{(\angle(a, b))}

coincides with ordinary angle between vectors in 2D and 3D.

Classification of angles

θ=(a,b)\theta = \angle(a,b)

Spherical distance

if a,ba, b are on sphere of radius R\mathbf{R}, distance along the sphere is R(a,b)\mathbf{R}\angle(a, b)

spherical distance image

Source: Wikipedia Great-circle distance


Correlation coefficient

vectors aa and bb, and de-meaned vectors

a~=aavg(a)1,b~=bavg(b)1\tilde{a} = a - \mathbf{avg}(a)\mathbf{1}, \quad \tilde{b} = b - \mathbf{avg}(b)\mathbf{1}

correlation coefficient (between aa and bb, with a~0,b~0\tilde{a} \neq 0, \tilde{b} \neq 0)

ρ=a~Tb~a~b~\rho = \frac{\tilde{a}^T\tilde{b}}{\lVert \tilde{a}\rVert\lVert \tilde{b}\rVert}