Unary regression and multivariate regression

Any elementary econometrics, statistics, or machine learning book will derive the solution of multiple linear linear regression in detail and will not be covered here.

We present some of the Settings used in this article. Yyy is the dependent variable vector of NNN dimension, assuming y=Xβ+ϵy=X\beta+ epsilony=Xβ+ϵ, if the independent variable is PPP dimension, arrange XXX into N×(p+1)N\times (p+1)N×(p+1) matrix, ⋅0=1Nx_{\cdot 0}=1_Nx⋅0=1N is the intercept of all 111, and we have the least square estimate:


Beta. ^ = ( X X ) 1 X y \hat \beta = (X’X)^{-1}X’y

If it is univariate regression and there is no intercept term, the independent variable is denoted as NNN dimension vector XXX, y=x’ βy=x’\betay=x ‘β in the least square estimation of β\betaβ is


Beta. ^ = x y x x \hat\beta=\dfrac{x’y}{x’x}

What is the connection between the two? In multivariate regression, if the column vectors of XXX are orthogonal to each other, that is, X’XX ‘XX ‘X is diagonal matrix, it can be concluded that, Each coefficient estimates for beta ^ j = x ⋅ j ‘yx ⋅ ⋅ j j’ x \ hat \ beta_j = \ dfrac {x_ {\ cdot j} ‘y} {x_ {\ cdot j}’ x_ beta ^ j = {\ cdot j}} ⋅ ⋅ j ‘x x jx ⋅ j’ y.

So that gives us a hint, can we construct some orthogonal dimensions?

2 “gramm – Schmidt process

We calculate β^p\hat\beta_pβ^p by the following procedure:

  1. Z ⋅ ⋅ 0 0 = x = 1 nz_ {\ cdot 0} = x_ {\ cdot 0} = 1 _nz ⋅ ⋅ 0 0 = x = 1 n.
  2. Traverse j = 1,… , pj = 1, \ ldots, pj = 1,… ⋅jx_{\cdot j}x⋅j , j – 1 l = 0, \ ldots, j – 1 = 0, l… , for each Z ⋅lz_{\cdot L}z⋅ L of J −1, Coefficient of gamma ^ lj = z respectively ⋅ ⋅ ‘x l jz ⋅ ⋅ l \’ z l hat \ gamma_ {lj} = \ dfrac {z_ {\} cdot l ‘x_ {\ cdot j}} {z_ {\} cdot l’ z_ {\ cdot l}} gamma ^ lj ⋅ ⋅ ‘l z = z lz ⋅ ⋅’ x l j, Finally get z ⋅ ⋅ j j = x – ∑ k = 0 j = 1 gamma ^ KJZ ⋅ kz_ = x_ {\ cdot j} {\ cdot j} – \ sum_ {k = 0} ^ {j = 1} \ hat \ gamma_ z_ {kj}} {\ cdot k ⋅ ⋅ j z = x j – ∑ k = 0 j = 1 gamma ^ KJZ ⋅ k;
  3. ⋅ Pz_ {\cdot p}z⋅ P with YYy Get the final beta ^ p = z p ‘yz ⋅ ⋅ ⋅’ z p p = \ \ hat \ beta_p dfrac {z_ {\ cdot p} ‘y} {z_ {\ cdot p}’ z_ {\ cdot p}} beta ^ p ⋅ ⋅ p ‘z = z pz ⋅ p’ y.

⋅px_{\cdot p}x⋅ only appears in Z ⋅pz_{\cdot p}z⋅, ⋅ p, z – 1 z_ {\ cdot 0}, \ ldots, z_ {} \ cdot p – 1 ⋅ 0, z… Z ⋅ P −1 are all orthogonal, hence the above results. If ϵ ~ N (0, sigma 2 in) \ epsilon \ sim N (0, \ sigma I_N ^ 2) ϵ ~ N (0, sigma 2 in), then the estimates of variance can be written to


Var ( Beta. ^ p ) = z p z p z p Var ( y ) z p z p z p = sigma 2 z p z p \text{Var}(\hat\beta_p)=\dfrac{z_{\cdot p}’}{z_{\cdot p}’z_{\cdot p}} \text{Var}(y) \dfrac{z_{\cdot p}}{z_{\cdot p}’z_{\cdot p}} = \dfrac{\sigma^2}{z_{\cdot p}’z_{\cdot p}}

Note that each dimension can be used as the PPP dimension, so that each β^j\hat\beta_jβ^j can be derived in this way.

3 QR decomposition

γ^jj=0\hat\gamma_{jj}=0γ^jj=0, where j=0,… , pj = 0, \ ldots, pj = 0,… , p, will all the Gamma ^ ij \ hat \ gamma_ {ij} Gamma ^ ij line (p + 1) * (p + 1) (p + 1) \ times (p + 1) (p + 1) * (p + 1) of the upper triangular matrix Γ \ Gamma Γ, at the same time to remember the Z = (1, ⋅ ⋅ 0, Z Z… ⋅ p), z z = (z_ {\ cdot 0}, z_ {1} \ cdot, \ ldots, z_ {\ cdot p}) z = (1, ⋅ ⋅ 0, z z… ⋅ p, z), have


X = Z Γ X=Z\Gamma

To construct a (p + 1) * (p + 1) (p + 1) \ times (p + 1) (p + 1) * (p + 1) of the diagonal matrix DDD, diagonal elements for Dii = ∥ z ⋅ I ∥ D_ = {2} \ Vert z_ {I} \ cdot \ VertDii = ∥ z ⋅ I ∥, Namely = D2Z ‘Z’ Z Z ^ 2 Z ‘Z = = D D2, insert in the middle of the type on D – 1 D = Ip + 1 D ^ {1} D = I_ 1 D = {p + 1} D – Ip + 1, there are


X = Z Γ = Z D 1 D Γ X=Z\Gamma = ZD^{-1}D\Gamma

Let Q=ZD−1Q=ZD^{-1}Q=ZD−1, R=D γ R=D\GammaR=D γ R= QRX=QRX=QR

Since the column vectors of ZZZ are orthogonal to each other, Q’Q=D −1Z ‘ZD= Ip+1Q’Q=D^{-1}Z ‘zd =I_{p+1}Q ‘Q=D −1Z ‘ZD= Ip+1, and RRR is also an upper triangular matrix. Using QR decomposition, we can write the least squares estimate as


Beta. ^ = R 1 Q y \hat\beta = R^{-1}Q’y

And there is a fitting value


y ^ = Q Q y \hat y=QQ’y

Since RRR is the upper triangular matrix, and the last behavior (0,… , 0, z ∥ ⋅ p ∥) (0, \ ldots, 0, \ Vert z_ {\ cdot p} \ Vert) (0,… So R−1R^{-1}R−1 is also an upper triangular matrix, and the last behavior (0,… , 0, 1 / ∥ z ⋅ p ∥) (0, \ ldots, 0, 1 / \ Vert z_ {\ cdot p} \ Vert) (0,… ⋅, 0, 1 / ∥ z p ∥). Using Q = (z / ∥ z ⋅ ⋅ 0 0 ∥, z 1 / ∥ z ⋅ ⋅ ∥,… P / ∥ z, z ⋅ ⋅ p ∥) Q = (z_ {\ cdot 0} / \ Vert z_ {\ cdot 0} \ Vert, z_ {1} \ cdot / \ Vert z_ {1} \ cdot \ Vert, \ ldots, Z_ {p} \ cdot / \ Vert z_ {\ cdot p} \ Vert) Q = (z / ∥ z ⋅ ⋅ 0 0 ∥, z 1 / ∥ z ⋅ ⋅ ∥,… P / ∥ z, z ⋅ ⋅ p ∥), can be concluded that R – 1 Q ‘R ^ {1} Q’ final behavior z R – 1 Q ‘⋅ ⋅’ p / ∥ z p ∥ 2 z_ {\ cdot p} ‘/ \ Vert z_ {\ cdot p} \ Vert ⋅ ⋅’ p ^ 2 z / ∥ z p ∥ 2, therefore, there is


Beta. ^ p = z p y / z p 2 \hat\beta_p=z_{\cdot p}’y/\Vert z_{\cdot p}\Vert^2

This is also consistent with the results of section 2.

reference

  • Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, 2009.