Linear functions

This blog is based on Jong-han Kim's Linear Algebra

Superposition and linear functions

$f: \mathbf{R}^n \rightarrow \mathbf{R}$
$f$ satisfies the superposition property if

f(\alpha x + \beta y) = \alpha f(x) + \beta f(y)

A function that satisfies superposition is called linear

linear verus affine

The inner product function

With $a$ an $n$ -vector, the function

f(x) = a^Tx = a_1 x_1 + a_2 x_2 + \dots + a_n x_n

is the inner product function.

The inner product function is linear

\begin{align*} f(\alpha x + \beta y) &= a^T(\alpha x + \beta y) \\ &= a^T(\alpha x) + a^T(\beta y) \\ &= \alpha(a^T x) + \beta(a^T y) \\ &= \alpha f(x) + \beta f(y) \end{align*}

All linear functions are inner products

suppose $f: \mathbf{R}^n \rightarrow \mathbf{R}$ is linear
then it can be expressed as $f(x) = a^T x$ for some $a$
specifically: $a_i = f(e_i)$
follows from

\begin{align*} f(x) &= f(x_1e_1 + x_2e_2 + \dots + x_ne_n) \\ &= x_1f(e_1) + x_2f(e_2) + \dots + x_nf(e_n) \end{align*}

Affine functions

A function that is linear plus a constant is called affine.
General form is $f(x) = a^T x + b$ , with $a$ an $n$ -vector and $b$ a scalar
a function $f: \mathbf{R}^n \rightarrow \mathbf{R}$ is affine if and only if

f(\alpha x + \beta y) = \alpha f(x) + \beta f(y)

holds for all $\alpha, \beta$ with $\alpha + \beta = 1$ , and all $n$ -vectors $x, y$

First-order Taylor approximation

suppose $f: \mathbf{R}^n \rightarrow \mathbf{R}$
first-order Taylor approximation of $f$ , near point $z$ :

\hat{f}(x) = f(z) + \frac{\partial f}{\partial x_1}(z)(x_1 - z_1) + \dots + \frac{\partial f}{\partial x_n}(z)(x_n - z_n)

$\hat{f}(x)$ is very close to $f(x)$ when $x_i$ are all near $z_i$
$\hat{f}$ is an affine function of $x$ can write using inner product as

\hat{f}(x) = f(z) + \nabla f(z)^T(x - z)

where $n$ -vector $\nabla f(z)$ is the gradient of $f$ at $z$ ,

\nabla f(z) = \left( \frac{\partial f}{\partial x_1}(z), \dots, \frac{\partial f}{\partial x_n}(z) \right)

Regression Model

regression model is (the affine function of $x$ )

\hat{y} = x^T\beta + \nu

$x$ is a feature vector; its elements $x_i$ are called regressors
$n$ -vector $\beta$ is the weight vector
scalar $\nu$ is the offset
scalar $\hat{y}$ is the prediction