* better latex support (changed delimiter from /( to regular $)
This commit is contained in:
parent
3a2878496a
commit
426c030c36
10 changed files with 402 additions and 173 deletions
|
@ -1,4 +1,6 @@
|
||||||
import subprocess
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
import time
|
||||||
|
|
||||||
|
|
||||||
def pdf_to_mmd(path_input: str):
|
def pdf_to_mmd(path_input: str):
|
||||||
|
@ -11,6 +13,18 @@ def pdf_to_mmd(path_input: str):
|
||||||
output_dir = "../documents/mmds"
|
output_dir = "../documents/mmds"
|
||||||
command = ['nougat', path_input, "-o", output_dir]
|
command = ['nougat', path_input, "-o", output_dir]
|
||||||
subprocess.run(command)
|
subprocess.run(command)
|
||||||
|
time.sleep(1)
|
||||||
|
# Change the math delimiter to the common delimiter used in MMD
|
||||||
|
with open(f"{output_dir}/{str(Path(path_input).stem)}.mmd", "r+") as doc:
|
||||||
|
content = doc.read()
|
||||||
|
print(content)
|
||||||
|
|
||||||
|
content = content.replace(r"\[", "$$").replace(r"\]", "$$")
|
||||||
|
content = content.replace(r"\(", "$").replace(r"\)", "$")
|
||||||
|
# delete the content of the file
|
||||||
|
doc.seek(0)
|
||||||
|
doc.truncate()
|
||||||
|
doc.write(content)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,170 +0,0 @@
|
||||||
**Exercices corriges**
|
|
||||||
|
|
||||||
**Algebre lineaire 1**
|
|
||||||
|
|
||||||
## 1 Enonces
|
|
||||||
|
|
||||||
**Exercice 1**: On rappelle que \((E,+,\cdot)\) est un \(\mathbb{K}\)-espace vectoriel si
|
|
||||||
|
|
||||||
1. \((E,+)\) est un groupe commutatif\(\,\);
|
|
||||||
2. \(\forall x,y\in E,\,\forall\alpha\in\mathbb{K},\,\alpha\cdot(x+y)=\alpha\cdot x +\alpha\cdot y\,\);
|
|
||||||
3. \(\forall x\in E,\,\forall\alpha,\beta\in\mathbb{K},\,(\alpha+\beta)\cdot x= \alpha\cdot x+\beta\cdot x\,\);
|
|
||||||
4. \(\forall x\in E,\,\forall\alpha,\beta\in\mathbb{K},\,\alpha\cdot(\beta\cdot x )=(\alpha\beta)\cdot x\,\);
|
|
||||||
5. \(1\cdot x=x\).
|
|
||||||
|
|
||||||
Soit \((E,+,\cdot)\) un \(\mathbb{K}\)-espace vectoriel. On note \(0_{E}\) l'element neutre de \((E,+)\) (que l'on appelle aussi l'origine de \((E,+,\cdot)\)) et \(0_{\mathbb{K}}\) le nombre zero (dans \(\mathbb{K}\)). Pour tout \(x\) dans \(E\), le symetrique de \(x\) est note \(-x\).
|
|
||||||
|
|
||||||
1. Montrer que, pour tout \(x\in E\), \(x+x=2\cdot x\).
|
|
||||||
2. Montrer que, pour tout \(x\in E\), \(0_{\mathbb{K}}\cdot x=0_{E}\).
|
|
||||||
3. Montrer que, pour tout \(x\in E\), \((-1)\cdot x=-x\).
|
|
||||||
|
|
||||||
**Exercice 2**: Soient \(F_{1},\ldots,F_{m}\) des sous-espaces vectoriels d'un \(\mathbb{R}\)-espace vectoriel \((E,+,\cdot)\). Montrer que \(F:=F_{1}\cap\ldots\cap F_{m}\) est un sous-espace vectoriel de \(E\).
|
|
||||||
|
|
||||||
**Exercice 3**: Soient \((E,+,\cdot)\) un \(\mathbb{R}\)-espace vectoriel, \(\{x_{1},\ldots,x_{m}\}\) une famille de vecteurs de \(E\). Montrer que \(F:=\operatorname{vect}\{x_{1},\ldots,x_{m}\}\) est un sous-espace vectoriel de \(E\).
|
|
||||||
|
|
||||||
**Exercice 4**: Soient \((E,+,\cdot)\) un \(\mathbb{R}\)-espace vectoriel, \(F\) un sous-espace vectoriel de \(E\) et \(A,B\) deux sous-ensembles de \(E\).
|
|
||||||
|
|
||||||
1. Montrer que, si \(A\subset B\), alors \(\operatorname{vect}A\subset\operatorname{vect}B\).
|
|
||||||
2. Montrer que \(A\) est un sous-espace vectoriel de \(E\) si et seulement si \(\operatorname{vect}A=A\).
|
|
||||||
3. Montrer que, si \(A\subset B\subset F\) et \(A\) engendre \(F\), alors \(B\) engendre \(F\).
|
|
||||||
|
|
||||||
**Exercice 5**: Considerons les vecteurs de \(\mathbb{R}^{4}\) suivants :
|
|
||||||
|
|
||||||
\[\mathbf{e}_{1}=\left(\begin{array}{c}1\\ 1\\ 1\\ 1\\ 1\end{array}\right),\quad\mathbf{e}_{2}=\left(\begin{array}{c}0\\ 1\\ 2\\ -1\end{array}\right),\quad\mathbf{e}_{3}=\left(\begin{array}{c}1\\ 0\\ -2\\ 3\end{array}\right),\quad\mathbf{e}_{4}=\left(\begin{array}{c}2\\ 1\\ 0\\ -1\end{array}\right).\]
|
|
||||||
|
|
||||||
La famille \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3},\mathbf{e}_{4}\}\) est-elle libre\(\,\)? Est-ce une base de \(\mathbb{R}^{4}\,\)?
|
|
||||||
|
|
||||||
**Exercice 6**: Considerons les vecteurs de \(\mathbb{R}^{4}\) suivants :
|
|
||||||
|
|
||||||
\[\mathbf{e}_{1}=\left(\begin{array}{c}1\\ 1\\ 1\\ 1\end{array}\right),\quad\mathbf{e}_{2}=\left(\begin{array}{c}0\\ 1\\ 2\\ 1\end{array}\right),\quad\mathbf{e}_{3}=\left(\begin{array}{c}1\\ 0\\ -2\\ 3\end{array}\right),\quad\mathbf{e}_{4}=\left(\begin{array}{c}1\\ 1\\ 2\\ -2\end{array}\right).\]1. La famille \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3},\mathbf{e}_{4}\}\) est-elle libre?
|
|
||||||
2. Quel est le rang de la famille \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3},\mathbf{e}_{4}\}\)?
|
|
||||||
3. Determiner une relation entre les nombres reels \(\alpha\) et \(\beta\) pour que le vecteur \(\mathbf{u}=(1,1,\alpha,\beta)^{t}\) appartienne au sous-espace vectoriel engendre par la famille \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3},\mathbf{e}_{4}\}\).
|
|
||||||
|
|
||||||
**Exercice 7**: Soit \(E=\mathbb{R}^{\mathbb{R}}\), l'espace des fonctions de \(\mathbb{R}\) dans \(\mathbb{R}\).
|
|
||||||
|
|
||||||
1. Soient \(c\) et \(s\) les fonctions definies par \[\forall x\in\mathbb{R},\quad c(x)=\cos x\quad\text{et}\quad s(x)=\sin x.\] Montrer que \(\{c,s\}\) est une famille libre de \(E\). Quelle est la dimension du sous-espace vectoriel \(T\) engendre par la famille \(\{c,s\}\)?
|
|
||||||
2. Soient \(\alpha,\beta,\gamma\) trois reels fixes. Soient \(f,g,h\) les fonctions definies par \[\forall x\in\mathbb{R},\quad f(x)=\cos(x+\alpha),\quad g(x)=\cos(x+\beta) \quad\text{et}\quad h(x)=\cos(x+\gamma).\] Montrer que \(f,g,h\) appartiennent a \(T\), et expliciter leurs coordonnees dans la base \(\{c,s\}\) de \(T\). La famille \(\{f,g,h\}\) est-elle libre? Quel est son rang?
|
|
||||||
3. Soient \(a_{1},a_{2},a_{3}\) trois reels distincts. Pour tout entier \(k\in\{1,2,3\}\) on note \(f_{k}\) la fonction definie sur \(\mathbb{R}\) par \[\forall x\in\mathbb{R},\quad f_{k}(x)=\left|x-a_{k}\right|.\] Montrer que \(\{f_{1},f_{2},f_{3}\}\) est une famille libre de \(E\).
|
|
||||||
|
|
||||||
**Exercice 8**:
|
|
||||||
1. On rappelle que \(\mathcal{C}_{0}(\mathbb{R})\) designe l'espace des fonctions continues de \(\mathbb{R}\) dans \(\mathbb{R}\). Montrer que \(\mathcal{A}:=\{f\in\mathcal{C}_{0}(\mathbb{R})|\forall x\in\mathbb{R},\;f(x)= f(-x)\}\) et \(\mathcal{B}:=\{f\in\mathcal{C}_{0}(\mathbb{R})|\forall x\in\mathbb{R},\;f(x)= -f(-x)\}\) sont des sous-espaces vectoriels de \(\mathcal{C}_{0}(\mathbb{R})\). Sont-ils en somme directe?
|
|
||||||
2. Montrer que \(A:=\{(x,y,z)\in\mathbb{R}^{3}|x+y+z=0\}\) et \(B:=\{(x,y,z)\in\mathbb{R}^{3}|x-y+z=0\}\) sont des sous-espaces vectoriels de \(\mathbb{R}^{3}\). Sont-ils en somme directe?
|
|
||||||
|
|
||||||
**Exercice 9**:
|
|
||||||
1. Soient \(F:=\{(x,x,x)\in\mathbb{R}^{3}|x\in\mathbb{R}\}\) et \(G:=\{(0,y,z)\in\mathbb{R}^{3}|y,z\in\mathbb{R}\}\). Montrer que \(F\) et \(G\) sont deux sous-espaces vectoriels de \(\mathbb{R}^{3}\). Preciser leurs bases et leurs dimensions. Sont-ils en somme directe?
|
|
||||||
2. Soit \(H:=\{(x,y,z,t)\in\mathbb{R}^{4}|x=2y-z,\;t=x+y+z\}\). Verifier que \(H\) est un sous-espace vectoriel de \(\mathbb{R}^{4}\). En donner une base et la dimension.
|
|
||||||
|
|
||||||
**Exercice 10**: Soient \((E,+,\cdot)\) un \(\mathbb{R}\)-espace vectoriel et \(A,B,C\) trois sous-espaces vectoriels de \(E\).
|
|
||||||
|
|
||||||
1. Montrer que \((A\cap C)+(B\cap C)\subset(A+B)\cap C\). Donner un exemple dans \(\mathbb{R}^{2}\) pour lequel l'inclusion est stricte.
|
|
||||||
2. Montrer que, si \(A+B=A+C\), \(A\cap B=A\cap C\) et \(B\subset C\), alors \(B=C\).
|
|
||||||
|
|
||||||
**Exercice 11**: On considere l'application donnee par
|
|
||||||
|
|
||||||
1. Montrer que \(\varphi\) est une application lineaire. Determiner l'image par \(\varphi\) des vecteurs de la base canonique \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3}\}\) de \(\mathbb{R}^{3}\). Calculer \(\varphi(2\mathbf{e}_{1}+\mathbf{e}_{2}-\mathbf{e}_{3})\).
|
|
||||||
2. Determiner le noyau de \(\varphi\). En donner une base et preciser sa dimension.
|
|
||||||
|
|
||||||
3. L'application \(\varphi\) est-elle injective? surjective? bijective?
|
|
||||||
4. Soit \(\psi\) l'application lineaire donnee par \[\psi\colon \mathbb{R}^{2} \longrightarrow \mathbb{R}^{3}\] \[\left(\begin{array}{c}x\\ y\end{array}\right) \longmapsto \left(\begin{array}{c}x-y\\ x+y\\ x+2y\end{array}\right).\] Determiner \(\varphi\circ\psi\).
|
|
||||||
|
|
||||||
**Exercice 12**: On considere l'application donnee par
|
|
||||||
|
|
||||||
\[\varphi\colon \mathbb{R}^{3} \longrightarrow \mathbb{R}^{2}\] \[\left(\begin{array}{c}x\\ y\\ z\end{array}\right) \longmapsto \left(\begin{array}{c}y+z\\ x\end{array}\right)\]
|
|
||||||
|
|
||||||
ainsi que les vecteurs \(\mathbf{u}:=(1,2,3)^{t}\) et \(\mathbf{v}:=(1,1,1)^{t}\).
|
|
||||||
|
|
||||||
1. Montrer que \(\varphi\) est lineaire. Determiner \(\varphi(\mathbf{u})\), \(\varphi(\mathbf{v})\) et \(\varphi(\mathbf{u}-2\mathbf{v})\).
|
|
||||||
2. Determiner le noyau de \(\varphi\). En donner une base et preciser sa dimension.
|
|
||||||
3. Determiner l'image de \(\varphi\). En donner une base et preciser sa dimension.
|
|
||||||
|
|
||||||
**Exercice 13**: Soient \(E\) et \(F\) deux \(\mathbb{R}\)-espaces vectoriels et \(\varphi\) une application lineaire de \(E\) dans \(F\). Soit \(\mathcal{A}:=\{x_{1},\ldots,x_{m}\}\) une famille de vecteurs de \(E\).
|
|
||||||
|
|
||||||
1. Montrer que, si \(\mathcal{A}\) est liee, alors \(f(\mathcal{A})=\{\varphi(x_{1}),\ldots,\varphi(x_{m})\}\) est liee.
|
|
||||||
2. Montrer que, si \(\varphi(\mathcal{A})\) est libre, alors \(\mathcal{A}\) est libre.
|
|
||||||
3. Montrer que, si \(\mathcal{A}\) est libre et \(\varphi\) est injective, alors \(\varphi(\mathcal{A})\) est libre.
|
|
||||||
|
|
||||||
## 2 Solutions
|
|
||||||
|
|
||||||
**Solution de l'exercice 1**
|
|
||||||
|
|
||||||
1. Pour tout \(x\in E\), \(2\cdot x=(1+1)\cdot x=1\cdot x+1\cdot x=x+x\), ou l'on a utilise successivement les axiomes (II-2) et (II-4).
|
|
||||||
2. On a : \[\begin{array}{rcl}0_{\mathbb{K}}\cdot x&=&(0_{\mathbb{K}}2)\cdot x\\ &=&0_{\mathbb{K}}\cdot(2\cdot x)&\text{[d'apres l'axiome (II-3)]}\\ &=&0_{\mathbb{K}}\cdot(x+x)&\text{[d'apres la question (1)]}\\ &=&0_{\mathbb{K}}\cdot x+0_{\mathbb{K}}\cdot x.\end{array}\] En simplifiant (c'est-a-dire, en ajoutant \(-(0_{\mathbb{K}}\cdot x)\) des deux cotes), on obtient l'egalite \(0_{E}=0_{\mathbb{K}}\cdot x\).
|
|
||||||
3. D'apres la question (2), \(0_{E}=0_{\mathbb{K}}\cdot x=(1+(-1))\cdot x=(1\cdot x)+((-1)\cdot x)=x+((-1) \cdot x)\), ou la troisieme egalite resulte de l'axiome (II-2) et ou la derniere egalite resulte de l'axiome (II-4). On en deduit que \((-1)\cdot x\) est le symetrique de \(x\), c'est-a-dire, \(-x\).
|
|
||||||
|
|
||||||
**Solution de l'exercice 2**: Nous devons montrer que pour tous \(x,y\in F\) et pour tout \(\alpha\in\mathbb{R}\), \(x+\alpha y\in F\). Soient donc \(x,y\in F\) et \(\alpha\in\mathbb{R}\) quelconques. Par definition de l'intersection, pour tout \(k\in\{1,\ldots,m\}\), \(x,y\in F_{k}\). Comme \(F_{k}\) est un sous-espace vectoriel de \(E\) nous deduisons que
|
|
||||||
|
|
||||||
\[x+\alpha y\in F_{k},\]et ce pour tout \(k\in\{1,\ldots,m\}\). Donc \(x+\alpha y\) appartient a l'intersection des \(F_{k}\), c'est-a-dire, a \(F\).
|
|
||||||
|
|
||||||
**Solution de l'exercice 3** : Remarquons tout d'abord que \(F\) est non vide, puisque que
|
|
||||||
|
|
||||||
\[0_{E}=0\cdot x_{1}+\cdots+0\cdot x_{m}\in F.\]
|
|
||||||
|
|
||||||
Soient \(x,y\in F\) et \(\alpha\in\mathbb{R}\) quelconques. Alors \(x\) et \(y\) s'ecrivent
|
|
||||||
|
|
||||||
\[x=\alpha_{1}x_{1}+\cdots+\alpha_{m}x_{m}\quad\mbox{et}\quad y=\beta_{1}x_{1}+ \cdots+\beta_{m}x_{m},\]
|
|
||||||
|
|
||||||
avec \(\alpha_{1},\ldots,\alpha_{m},\beta_{1},\ldots,\beta_{m}\in\mathbb{R}\). Donc,
|
|
||||||
|
|
||||||
\[x+\alpha y = (\alpha_{1}x_{1}+\cdots+\alpha_{m}x_{m})+\alpha(\beta_{1}x_{1}+ \cdots+\beta_{m}x_{m})\] \[= (\alpha_{1}+\alpha\beta_{1})x_{1}+\cdots+(\alpha_{m}+\alpha\beta _{m})x_{m}.\]
|
|
||||||
|
|
||||||
Par consequent, \(x+\alpha y\) est une combinaison lineaire des vecteurs \(x_{1},\ldots,x_{m}\), c'est-a-dire, un element de \(F\).
|
|
||||||
|
|
||||||
**Solution de l'exercice 4** :
|
|
||||||
1. Supposons que \(A\subset B\), et montrons que tout element de \(\mbox{vect}\,A\) appartient a vect \(B\). Soit donc \(x\) quelconque dans \(\mbox{vect}\,A=\emptyset\), alors \(\mbox{vect}\,A=\{0\}\) et donc \(x\) est forcement le vecteur nul. Comme \(\mbox{vect}\,B\) est un sous-espace vectoriel, \(\mbox{vect}\,B\ni 0\) et l'on a bien \(\mbox{vect}\,A\subset\mbox{vect}\,B\). Si \(A\) est non vide, alors \[\exists p\in\mathbb{N}^{*},\ \exists x_{1},\ldots,x_{p}\in A,\ \exists\alpha_{1},\ldots, \alpha_{p}\in\mathbb{R}\colon\quad x=\alpha_{1}x_{1}+\cdots+\alpha_{p}x_{p}.\] Puisque \(A\subset B\), les \(x_{k}\) sont aussi dans \(B\), de sorte que \(x\) est une combinaison lineaire de vecteurs de \(B\), c'est-a-dire, un element de \(\mbox{vect}\,B\). On a donc encore \(\mbox{vect}\,A\subset\mbox{vect}\,B\).
|
|
||||||
2. Supposons que \(A=\mbox{vect}\,A\). Puisque \(\mbox{vect}\,A\) est un sous-espace vectoriel, il en est de meme de \(A\). Reciproquement, supposons que \(A\) soit un sous-espace vectoriel, et montrons que \(A=\mbox{vect}\,A\). Remarquons que tout element de \(A\) est une combinaison lineaire particuliere d'elements de \(A\) (prendre \(p=1\), \(\alpha_{1}=1\) et \(x_{1}=x\)). Donc on a clairement l'inclusion \(A\subset\mbox{vect}\,A\). De plus, si \(A\) est un sous-espace vectoriel, alors \(A\) est non vide. Soit alors \(x\in\mbox{vect}\,A\) : \[\exists p\in\mathbb{N}^{*},\ \exists x_{1},\ldots,x_{p}\in A,\ \exists\alpha_{1},\ldots,\alpha_{p}\in\mathbb{R}\colon\quad x= \alpha_{1}x_{1}+\cdots+\alpha_{p}x_{p}.\] Puisque \(A\) est stable par combinaison lineaire, \(x\in A\). On a donc aussi l'inclusion \(\mbox{vect}\,A\subset A\).
|
|
||||||
3. D'apres le point (1), \(\mbox{vect}\,A\subset\mbox{vect}\,B\subset\mbox{vect}\,F\). Or, \(\mbox{vect}\,F=F\) puisque \(F\) est un sous-espace vectoriel. De plus, \(\mbox{vect}\,A=F\) puisque \(A\) engendre \(F\). Finalement, on a : \[F\subset\mbox{vect}\,B\subset F,\] ce qui montre que \(\mbox{vect}\,B=F\). Autrement dit, \(B\) engendre \(F\).
|
|
||||||
|
|
||||||
**Solution de l'exercice 5** : On resout l'equation vectorielle \(\alpha\mathbf{e}_{1}+\beta\mathbf{e}_{2}+\gamma\mathbf{e}_{3}+\delta\mathbf{e}_ {4}=\mathbf{0}\). Ceci revient resoudre le systeme lineaire
|
|
||||||
|
|
||||||
\[\left\{\begin{array}{rcl}0&=&\alpha+\gamma+2\delta,\\ 0&=&\alpha+\beta+\delta,\\ 0&=&\alpha+2\beta-2\gamma,\\ 0&=&\alpha-\beta+3\gamma-\delta.\end{array}\right.\]
|
|
||||||
|
|
||||||
On trouve que la seule solution possible est \(\alpha=\beta=\gamma=\delta=0\). Donc la famille \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3},\mathbf{e}_{4}\}\) est libre, et puisque son cardinal est egal a la dimension de \(\mathbb{R}^{4}\), c'est une base de \(\mathbb{R}^{4}\).
|
|
||||||
|
|
||||||
**Solution de l'exercice 6**:
|
|
||||||
|
|
||||||
1. On resout l'equation vectorielle \(\alpha\mathbf{e}_{1}+\beta\mathbf{e}_{2}+\gamma\mathbf{e}_{3}+\delta\mathbf{e }_{4}=\mathbf{0}.\) Ceci revient resoudre le systeme lineaire \[\left\{\begin{array}{rcl}0&=&\alpha+\gamma+\delta,\\ 0&=&\alpha+\beta+\delta,\\ 0&=&\alpha+2\beta-2\gamma+2\delta,\\ 0&=&\alpha+\beta+3\gamma-2\delta.\end{array}\right.\] On trouve que ce systeme est equivalent au systeme \[\left\{\begin{array}{rcl}0&=&\alpha+\gamma+\delta,\\ 0&=&\beta-\gamma,\\ 0&=&\gamma-\delta.\end{array}\right.\] Ce systeme admet d'autres solutions que la solution nulle. On en deduit que \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3},\mathbf{e}_{4}\}\) n'est pas libre.
|
|
||||||
2. D'apres ce qui precede, le rang de la famille \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3},\mathbf{e}_{4}\}\) est inferieur ou egal a 3. On considere alors la famille \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3}\}.\) On verifie facilement qu'elle est libre, de sorte que le rang cherche est en fait egal a 3.
|
|
||||||
3. Pour que \(\mathbf{u}\) appartienne au _sev_ engendre par \(\{\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{3},\mathbf{e}_{4}\}\), il faut que l'equation vectorielle \[\mathbf{u}=\alpha\mathbf{e}_{1}+\beta\mathbf{e}_{2}+\gamma\mathbf{e}_{3}+ \delta\mathbf{e}_{4}\] admette au moins une solution. On cherche donc a resoudre le systeme lineaire \[\left\{\begin{array}{rcl}1&=&\alpha+\gamma+\delta,\\ 1&=&\alpha+\beta+\delta,\\ a&=&\alpha+2\beta-2\gamma+2\delta,\\ b&=&\alpha+\beta+3\gamma-2\delta.\end{array}\right.\] On verifie que ce systeme est equivalent au systeme \[\left\{\begin{array}{rcl}1&=&\alpha+\gamma+\delta,\\ 0&=&\beta-\gamma,\\ a-1&=&-\gamma+\delta,\\ b-1&=&3\gamma-3\delta.\end{array}\right.\] En considerant les deux dernieres equations, on voit que le systeme n'a de solution que si \(b-1=-3(a-1)\), c'est-a-dire, si \(b+3a=4.\)
|
|
||||||
|
|
||||||
**Solution de l'exercice 7**:
|
|
||||||
|
|
||||||
1. Considerons l'equation \(\alpha c+\beta s=0\) dans \(\mathbb{R}^{\mathbb{R}}.\) Cette equation est equivalente a \[\forall x\in\mathbb{R},\quad\alpha\cos x+\beta\sin x=0.\] Les choix \(x=0\) et \(x=\pi/2\) donnent respectivement \(\alpha=0\) et \(\beta=0.\) La famille \(\{c,s\}\) est donc libre, et la dimension de \(T\) est egale a 2.
|
|
||||||
2. Puisque \(\cos(x+\alpha)=\cos x\cos\alpha-\sin x\sin\alpha,\) on voit que \[f=\cos\alpha\cdot c-\sin\alpha\cdot s\in T\] et que les coordonnees de \(f\) dans la base \(\{c,s\}\) de \(T\) sont donnees par le couple \((\cos\alpha,-\sin\alpha).\) De meme, \[g=\cos\beta\cdot c-\sin\beta\cdot s\in T\quad\mbox{et}\quad h=\cos\gamma \cdot c-\sin\gamma\cdot s\in T;\]les coordonnees de \(g\) et \(h\) dans la base \(\{c,s\}\) de \(T\) sont donnees respectivement par les couples \((\cos\beta,-\sin\beta)\) et \((\cos\gamma,-\sin\gamma)\). La famille \(\{f,g,h\}\) ne peut pas etre libre, puisque son cardinal est egal a 3 alors que la dimension de l'espace vectoriel \(T\) est egale a 2. Son rang vaut au plus 2 (car \(\dim T=2\)) et au moins 1 (car les fonctions \(f,g,h\) sont non nulles). Le rang est egal a 1 lorsque \(f,g,h\) sont colineaires, c'est-a dire lorsqu'il existe \(a\) et \(b\) dans \(\mathbb{R}\) tels que \(f=ag=bh\) ou, de maniere equivalente, lorsque \[\left(\begin{array}{c}\cos\alpha\\ -\sin\alpha\end{array}\right)=a\left(\begin{array}{c}\cos\beta\\ -\sin\beta\end{array}\right)=b\left(\begin{array}{c}\cos\gamma\\ -\sin\gamma\end{array}\right).\] Des equations \(\cos\alpha=a\cos\beta\) et \(\sin\alpha=a\sin\beta\) on tire, en les elevant au carre et en les sommant, que \(a^{2}=1\), c'est-a-dire, que \(a\in\{-1,1\}\). Si \(a=1\), alors \(\beta=\alpha+2k\pi\), et si \(a=-1\), alors \(\beta=\alpha+\pi+2k\pi\). En resume, \(f\) et \(g\) sont colineaires si et seulement si \(\beta\in\{\alpha\}+\pi\mathbb{Z}\). De meme, \(f\) et \(h\) sont colineaires si et seulement si \(\gamma\in\{\alpha\}+\pi\mathbb{Z}\). La famille \(\{f,g,h\}\) est donc de rang 1 lorsque \(\alpha\), \(\beta\) et \(\gamma\) different d'un multiple entier de \(\pi\) ; elle est de rang 2 dans le cas contraire.
|
|
||||||
3. Considerons l'equation \(\alpha f_{1}+\beta f_{2}+\gamma f_{3}=0\) dans \(\mathbb{R}^{\mathbb{R}}\), qui equivaut a la condition \[\forall x\in\mathbb{R},\quad\alpha f_{1}(x)+\beta f_{2}(x)+\gamma f_{3}(x)=0.\] Les choix \(x=a_{1}\), \(x=a_{2}\) et \(x=a_{3}\) donnent respectivement les equations \[\beta\left|a_{1}-a_{2}\right|+\gamma\left|a_{1}-a_{3}\right| = 0,\] \[\alpha\left|a_{2}-a_{1}\right|+\gamma\left|a_{2}-a_{3}\right| = 0,\] \[\alpha\left|a_{3}-a_{1}\right|+\beta\left|a_{3}-a_{2}\right| = 0.\] Posons \(a:=\left|a_{3}-a_{1}\right|\), \(b:=\left|a_{3}-a_{2}\right|\) et \(c:=\left|a_{1}-a_{2}\right|\). Le systeme d'equations precedent s'ecrit \[\left\{\begin{array}{ccl}0&=&a\alpha+b\beta,\\ 0&=&c\alpha+b\gamma,\\ 0&=&c\beta+a\gamma.\end{array}\right.\] En resolvant ce systeme lineaire, et en tenant compte du fait que \(a\), \(b\) et \(c\) sont non nuls, on voit que la seule solution possible est \(\alpha=\beta=\gamma=0\). On peut aussi ecrire le systeme sous forme matricielle, et remarquer, pour arriver a la meme conclusion, que la matrice \[\left[\begin{array}{ccc}a&b&0\\ c&0&b\\ 0&c&a\end{array}\right]\] a pour determinant le reel non nul \(-2abc\).
|
|
||||||
|
|
||||||
**Solution de l'exercice 8**:
|
|
||||||
|
|
||||||
1. La fonction nulle \(\nu\) (definie par \(\nu(x)=0\) pour tout \(x\in\mathbb{R}\)) appartient a \(\mathcal{A}\) et a \(\mathcal{B}\). Donc, \(\mathcal{A}\) et \(\mathcal{B}\) sont non vides. De plus, pour toutes fonctions \(f,g\in\mathcal{A}\) et tout reel \(\alpha\), la fonction \(f+\alpha g\) satisfait : \[\forall x\in\mathbb{R},\quad(f+\alpha g)(x)=f(x)+\alpha g(x)=f(-x)+\alpha g(-x)=( f+\alpha g)(-x).\] Par consequent, \(f+\alpha g\in\mathcal{A}\). Donc \(\mathcal{A}\) est un sous-espace vetoriel de \(\mathcal{C}_{0}(\mathbb{R})\). De meme, pour toutes fonctions \(f,g\in\mathcal{B}\) et tout reel \(\alpha\), la fonction \(f+\alpha g\) satisfait : \[\forall x\in\mathbb{R},\quad(f+\alpha g)(x)=f(x)+\alpha g(x)=-f(-x)-\alpha g(-x)=-( f+\alpha g)(-x).\]Par consequent, \(f+\alpha g\in\mathcal{A}\). Donc \(\mathcal{B}\) est un sous-espace vetoriel de \(\mathcal{C}_{0}(\mathbb{R})\). Soit maintenant \(f\) une fonction de \(\mathcal{A}\cap\mathcal{B}\). Alors, pour tout \(x\in\mathbb{R}\), \[f(x)=f(-x)\quad\mbox{et}\quad f(x)=-f(-x),\] ce qui montre que \(f(x)=0\). Donc \(f=\nu\). On en deduit que \(\mathcal{A}\cap\mathcal{B}=\{\nu\}=\{0_{\mathcal{C}_{0}(\mathbb{R})}\}\), et que \(\mathcal{A}\) et \(\mathcal{B}\) sont en somme directe.
|
|
||||||
2. Il est facile de voir que \(A\) et \(B\) contiennent le vecteur \(\mbox{nul}\ (0,0,0)\). De plus, si \((x,y,z)\) et \((x^{\prime},y^{\prime},z^{\prime})\) appartiennent a \(A\) et \(\alpha\in\mathbb{R}\), alors \((x,y,z)+\alpha(x^{\prime},y^{\prime},z^{\prime})=(x+\alpha x^{\prime},y+\alpha y ^{\prime},z+\alpha z^{\prime})\) satisfait \[(x+\alpha x^{\prime})+(y+\alpha y^{\prime})+(z+\alpha z^{\prime})=(x+y+z)+ \alpha(x^{\prime}+y^{\prime}+z^{\prime})=0.\] Donc \((x,y,z)+\alpha(x^{\prime},y^{\prime},z^{\prime})\in A\), et \(A\) est un sous-espace vectoriel de \(\mathbb{R}^{3}\). De meme, si \((x,y,z)\) et \((x^{\prime},y^{\prime},z^{\prime})\) appartiennent a \(B\) et \(\alpha\in\mathbb{R}\), alors \((x,y,z)+\alpha(x^{\prime},y^{\prime},z^{\prime})=(x+\alpha x^{\prime},y+\alpha y ^{\prime},z+\alpha z^{\prime})\) satisfait \[(x+\alpha x^{\prime})-(y+\alpha y^{\prime})+(z+\alpha z^{\prime})=(x-y+z)+ \alpha(x^{\prime}-y^{\prime}+z^{\prime})=0.\] Donc \((x,y,z)+\alpha(x^{\prime},y^{\prime},z^{\prime})\in B\), et \(B\) est un sous-espace vectoriel de \(\mathbb{R}^{3}\). Soit maintenant \((x,y,z)\) un vecteur de \(A\cap B\). Alors, \[x+y+z=0\quad\mbox{et}\quad x-y+z=0.\] Le vecteur \((1,0,-1)\) satisfait les deux equations ci-dessus. On voit donc que \(A\cap B\) n'est pas reduit a \(\{(0,0,0)\}\). Les sous-espaces \(A\) et \(B\) ne sont pas en somme directe.
|
|
||||||
|
|
||||||
**Solution de l'exercice 9**:
|
|
||||||
1. Il est facile de voir que le vecteur \((0,0,0)\) appartient a \(F\) et a \(G\). Donc \(F\) et \(G\) sont non vides. Soient \((x,x,x),(y,y,y)\in F\) et \(\alpha\in\mathbb{R}\). Alors \[(x,x,x)+\alpha(y,y,y)=(x+\alpha y,x+\alpha y,x+\alpha y)\in F.\] Donc \(F\) est un sous-espace vectoriel de \(\mathbb{R}^{3}\). Soient \((0,y,z),(0,y^{\prime},z^{\prime})\in G\) et \(\alpha\in\mathbb{R}\). Alors \[(0,y,z)+\alpha(0,y^{\prime},z^{\prime})=(0,y+\alpha y^{\prime},z+\alpha z^{ \prime})\in G.\] Donc \(G\) est un sous-espace vectoriel de \(\mathbb{R}^{3}\). On voit que \[F=\{x(1,1,1)|x\in\mathbb{R}\}=\mbox{vect}\{(1,1,1)\},\] \[G=\{y(0,1,0)+z(0,0,1)|x,y\in\mathbb{R}\}=\mbox{vect}\{(0,1,0),(0,0,1)\}.\] De plus, on verifie facilement que les familles \(\{(1,1,1)\}\) et \(\{(0,1,0),(0,0,1)\}\) sont libres. Elles forment donc des bases respectives de \(F\) et \(G\). On en deduit que \(\mbox{dim}\,F=1\) et \(\mbox{dim}\,G=2\). Enfin, si \((x,y,z)\in F\cap G\), alors \(x=y=z\) et \(x=0\). Donc \(F\cap G=\{(0,0,0)\}\), et \(F\) et \(G\) sont en somme directe.
|
|
||||||
2. On verifie facilement que \((0,0,0,0)\in H\), de sorte que \(F\neq\emptyset\). Soient \((x,y,z,t),(x^{\prime},y^{\prime},z^{\prime},t^{\prime})\in H\) et \(\alpha\in\mathbb{R}\). Alors, \((x,y,z,t)+\alpha(x^{\prime},y^{\prime},z^{\prime},t^{\prime})=(x+\alpha x^{ \prime},y+\alpha y^{\prime},z+\alpha z^{\prime},t+\alpha t^{\prime})\) satisfait : \[x+\alpha x^{\prime}=2y-z+\alpha(2y^{\prime}-z^{\prime})=2(y+\alpha y^{\prime})-( z+\alpha z^{\prime}),\] \[t+\alpha t^{\prime}=x+y+z+\alpha(x^{\prime}+y^{\prime}+z^{\prime})=(x+\alpha x^{\prime})+(y+\alpha y ^{\prime})+(z+\alpha z^{\prime}),\] ce qui montre que \((x,y,z,t)+\alpha(x^{\prime},y^{\prime},z^{\prime},t^{\prime})\in H\). Donc \(H\) est un sous-espace vectoriel de \(\mathbb{R}^{4}\). De plus, \[H = \{(2y-z,y,z,x+y+z)|x,y,z\in\mathbb{R}\}\] \[= \{x(0,0,0,1)+y(2,1,0,1)+z(-1,0,1,1)|x,y,z\in\mathbb{R}\}\] \[= \mbox{vect}\{(0,0,0,1),(2,1,0,1),(-1,0,1,1)\}.\]Considerons l'equation vectorielle \(\alpha(0,0,0,1)+\beta(2,1,0,1)+\gamma(-1,0,1,1)=(0,0,0,0)\). Cette equation equivaut au systeme \[\left\{\begin{array}{rcl}0&=&2\beta+\gamma\\ 0&=&2\beta\\ 0&=&\gamma\\ 0&=&\alpha+\beta+\gamma\end{array}\right.\] dont l'unique solution est \(\alpha=\beta=\gamma=0\). La famille \(\{(0,0,0,1),(2,1,0,1),(-1,0,1,1)\}\) est donc libre, et c'est une base de \(H\).
|
|
||||||
|
|
||||||
**Solution de l'exercice 10**:
|
|
||||||
1. Soit \(x\in(A\cap C)+(B\cap C)\). Alors \(x=a+b\) avec \(a\in A\cap C\) et \(b\in B\cap C\). Puisque \(a\in C\) et \(b\in C\) et \(C\) est un sev, \(a+b\in C\). Donc \(x\) appartient a \(A+B\) et a \(C\). Dans \(\mathbb{R}^{2}\), Si l'on prend \(A=\mbox{vect}\{e_{1}\}\), \(B=\mbox{vect}\{e_{2}\}\) et \(C=\mbox{vect}\{e_{1}+e_{2}\}\), ou \(\{e_{1},e_{2}\}\) est la base canonique, alors \[(A\cap C)+(B\cap C)=\{0\}\cap\{0\}=\{0\}\quad\mbox{et}\quad(A+B)\cap C=\mathbb{ R}^{2}\cap C=C.\]
|
|
||||||
2. Puisque \(B\subset C\), il suffit de montrer que \(C\subset B\). Soit donc \(x\in C\). Puisque \(0_{E}\in A\), \(x=0_{E}+x\in A+C\). Puisque \(A+C=A+B\), on peut ecrire \(x=a+b\) avec \(a\in A\) et \(b\in B\). Maintenant, \(a=x-b\), ou \(x\in C\) et \(x\in B\subset C\), et puisque \(C\) est un sev, \(a\in C\). Donc \(a\in A\cap C=B\cap C\). Donc \(a\in B\). Finalement, \(x=a+b\) avec \(a\in B\) et \(b\in B\). Puisque \(B\) est un sev, \(x\in B\).
|
|
||||||
|
|
||||||
**Solution de l'exercice 11**:
|
|
||||||
1. Verifions que \(\varphi\) est lineaire : \[\varphi\left(\alpha\left(\begin{array}{c}x\\ y\\ z\end{array}\right)+\beta\left(\begin{array}{c}x^{\prime}\\ y^{\prime}\\ z^{\prime}\end{array}\right)\right)\] \[= \varphi\left(\begin{array}{c}\alpha x+\beta x^{\prime}\\ \alpha y+\beta y^{\prime}\\ \alpha z+\beta z^{\prime}\end{array}\right)\] \[= \left(\begin{array}{c}-(\alpha x+\beta x^{\prime})+2(\alpha y+ \beta y^{\prime})+2(\alpha z+\beta z^{\prime})\\ -8(\alpha x+\beta x^{\prime})+7(\alpha y+\beta y^{\prime})+4(\alpha z+\beta z^ {\prime})\\ -13(\alpha x+\beta x^{\prime})+5(\alpha y+\beta y^{\prime})+8(\alpha z+\beta z ^{\prime})\end{array}\right)\] \[= \alpha\left(\begin{array}{c}-x+2y+2z\\ -8x+7y+4z\\ -13x+5y+8z\end{array}\right)+\beta\left(\begin{array}{c}-x^{\prime}+2y^{ \prime}+2z^{\prime}\\ -8x^{\prime}+7y^{\prime}+4z^{\prime}\\ -13x^{\prime}+5y^{\prime}+8z^{\prime}\end{array}\right)\] \[= \alpha\varphi\left(\begin{array}{c}x\\ y\\ z\end{array}\right)+\beta\varphi\left(\begin{array}{c}x^{\prime}\\ y^{\prime}\\ z^{\prime}\end{array}\right).\]
|
|
||||||
|
|
||||||
Ensuite,
|
|
||||||
|
|
||||||
\[\varphi(\mathbf{e}_{1})=\left(\begin{array}{c}-1\\ -8\\ -13\end{array}\right),\quad\varphi(\mathbf{e}_{2})=\left(\begin{array}{c}2\\ 7\\ 5\end{array}\right),\quad\varphi(\mathbf{e}_{3})=\left(\begin{array}{c}2\\ 4\\ 8\end{array}\right).\]
|
|
||||||
|
|
||||||
Enfin, \(2\mathbf{e}_{1}+\mathbf{e}_{2}-\mathbf{e}_{3}=(2,1,-1)^{t}\), de sorte que
|
|
||||||
|
|
||||||
\[\varphi(2\mathbf{e}_{1}+\mathbf{e}_{2}-\mathbf{e}_{3}) = 2\varphi(\mathbf{e}_{1})+\varphi(\mathbf{e}_{2})-\varphi(\mathbf{e }_{3})\] \[= 2\left(\begin{array}{c}-1\\ -8\\ -13\end{array}\right)+\left(\begin{array}{c}2\\ 7\\ 5\end{array}\right)-\left(\begin{array}{c}2\\ 4\\ 8\end{array}\right)\;=\;\left(\begin{array}{c}-2\\ -13\\ -29\end{array}\right).\]2. On cherche les solutions de l'equation vectorielle \(\varphi(\mathbf{x})=\mathbf{0}\). En notant \(\mathbf{x}=(x,y,z)^{t}\), on obtient le systeme \[\left\{\begin{array}{rcl}0&=&-x+2y+2z,\\ 0&=&-8x+7y+4z,\\ 0&=&-13x+5y+8z.\end{array}\right.\] La seule solution de ce systeme est le vecteur nul, ce que l'on peut voir aussi en calculant le determinant de la matrice \[\left[\begin{array}{rrr}-1&2&2\\ -8&7&4\\ -13&5&8\end{array}\right].\] Donc \(\ker\varphi=\{\mathbf{0}\}\), l'unique base de \(\ker\varphi\) est \(\emptyset\), et sa dimension est nulle.
|
|
||||||
3. Puisque \(\ker\varphi=\{\mathbf{0}\}\), l'application \(\varphi\) est injective. Puisque les dimensions des espaces de depart et d'arriveee sont toutes deux egales a 3, \(\varphi\) est aussi surjective, et donc bijective.
|
|
||||||
4. En notant \(\mathbf{x}=(x,y)^{t}\), on a : \[(\varphi\circ\psi)(\mathbf{x}) = \varphi(\psi)(\mathbf{x}))\] \[= \varphi\left(\begin{array}{c}x-y\\ x+y\\ x+2y\end{array}\right)\] \[= \left(\begin{array}{c}-(x-y)+2(x+y)+2(x+2y)\\ -8(x-y)+7(x+y)+4(x+2y)\\ -13(x-y)+5(x+y)+8(x+2y)\end{array}\right)\] \[= \left(\begin{array}{c}3x+7y\\ 3x+23y\\ 34y\end{array}\right).\]
|
|
||||||
|
|
||||||
**Solution de l'exercice 12** :
|
|
||||||
|
|
||||||
1. Verifions que \(\varphi\) est lineaire : \[\varphi\left(\alpha\left(\begin{array}{c}x\\ y\\ z\end{array}\right)+\beta\left(\begin{array}{c}x^{\prime}\\ y^{\prime}\\ z^{\prime}\end{array}\right)\right)\] \[= \varphi\left(\begin{array}{c}\alpha x+\beta x^{\prime}\\ \alpha y+\beta y^{\prime}\\ \alpha z+\beta z^{\prime}\end{array}\right)\] \[= \left(\begin{array}{c}(\alpha y+\beta y^{\prime\prime})+( \alpha z+\beta z^{\prime})\\ \alpha x+\beta x^{\prime}\end{array}\right)\] \[= \alpha\left(\begin{array}{c}y+z\\ x\end{array}\right)+\beta\left(\begin{array}{c}y^{\prime}+z^{\prime}\\ x^{\prime}\end{array}\right)\] \[= \alpha\varphi\left(\begin{array}{c}x\\ y\\ z\end{array}\right)+\beta\varphi\left(\begin{array}{c}x^{\prime}\\ y^{\prime}\\ z^{\prime}\end{array}\right).\] Ensuite, \(\varphi(\mathbf{u})=(5,1)^{t}\), \(\varphi(\mathbf{v})=(2,1)^{t}\) et \[\varphi(\mathbf{u}-2\mathbf{v})=\left(\begin{array}{c}5\\ 1\end{array}\right)-2\left(\begin{array}{c}2\\ 1\end{array}\right)=\left(\begin{array}{c}1\\ -1\end{array}\right).\]2. Le vecteur \((x,y,z)^{t}\) appartient a ker \(\varphi\) si et seulement si \(y+z=0\) et \(x=0\). C'est donc l'ensemble des vecteurs de la forme \((0,y,-y)^{t}\) ou \(y\in\mathbb{R}\) : \[\ker\varphi=\left\{\left(\begin{array}{c}0\\ y\\ -y\end{array}\right)\Bigg{|}\ y\in\mathbb{R}\right\}=\text{vect}\left\{\left( \begin{array}{c}0\\ 1\\ -1\end{array}\right)\right\}.\] Le sous-espace vectoriel ker \(\varphi\) est donc de dimension \(1\), et admet pour base le singleton \(\{(0,1,-1)^{t}\}\).
|
|
||||||
3. D'apres le theoreme du rang, \(\dim\mathbb{R}^{3}=\operatorname{rg}\varphi+\dim\ker\varphi\), ce qui implique que \(\operatorname{rg}\varphi=2\). On en deduit que \(\operatorname{im}\varphi=\mathbb{R}^{2}\) et que n'importe quelle base de \(\mathbb{R}^{2}\), par exemple la base canonique, est une base de \(\operatorname{im}\varphi\).
|
|
||||||
|
|
||||||
**Solution de l'exercice 13**
|
|
||||||
|
|
||||||
1. Si \(\mathcal{A}\) est liee, il existe \(\alpha_{1},\ldots,\alpha_{m}\) non tous nuls tels que \(\alpha_{m}x_{m}+\cdots+\alpha_{m}x_{m}=0\). Mais alors \[\alpha_{1}\varphi(x_{1})+\cdots+\alpha_{m}\varphi(x_{m})=\varphi(\alpha_{1}x_ {1}+\cdots+\alpha_{m}x_{m})=0,\] et puisqu'au moins un des \(\alpha_{j}\) est non nul, non voyons que \(\{\varphi(x_{1}),\ldots,\varphi(x_{m})\}\) est liee.
|
|
||||||
2. Ce point se deduit du precedent par contre-apposition.
|
|
||||||
3. Supposons \(\mathcal{A}\) libre et \(\varphi\) injective, et considerons l'equation \(\alpha_{1}\varphi(x_{1})+\cdots+\alpha_{m}\varphi(x_{m})=0\). Le membre de gauche n'est autre que \(\varphi(\alpha_{1}x_{1}+\cdots+\alpha_{m}x_{m})\), et puisque \(\varphi\) injective, on a necessairement \(\alpha_{1}x_{1}+\cdots+\alpha_{m}x_{m}=0\). Puisque \(\mathcal{A}\) est libre, on deduit de cette derniere equation que \(\alpha_{1}=\ldots=\alpha_{m}=0\). Donc \(\varphi(\mathcal{A})\) est libre.
|
|
381
documents/mmds/test_linear_algebra.mmd
Normal file
381
documents/mmds/test_linear_algebra.mmd
Normal file
|
@ -0,0 +1,381 @@
|
||||||
|
Orthogonal Matrices and the Singular Value Decomposition
|
||||||
|
|
||||||
|
Carlo Tomasi
|
||||||
|
|
||||||
|
The first Section below extends to $m\times n$ matrices the results on orthogonality and projection we have previously seen for vectors. The Sections thereafter use these concepts to introduce the Singular Value Decomposition (SVD) of a matrix, the pseudo-inverse, and its use for the solution of linear systems.
|
||||||
|
|
||||||
|
## 1 Orthogonal Matrices
|
||||||
|
|
||||||
|
Let ${\cal S}$ be an $n$-dimensional subspace of ${\bf R}^{m}$ (so that we necessarily have $n\leq m$), and let ${\bf v}_{1},\ldots,{\bf v}_{n}$ be an orthonormal basis for ${\cal S}$. Consider a point $P$ in ${\cal S}$. If the coordinates of $P$ in ${\bf R}^{m}$ are collected in an $m$-dimensional vector
|
||||||
|
|
||||||
|
$${\bf p}=\left[\begin{array}{c}p_{1}\\ \vdots\\ p_{m}\end{array}\right]\,$$
|
||||||
|
|
||||||
|
and since $P$ is in ${\cal S}$, it must be possible to write ${\bf p}$ as a linear combination of the ${\bf v}_{j}$s. In other words, there must exist coefficients
|
||||||
|
|
||||||
|
$${\bf q}=\left[\begin{array}{c}q_{1}\\ \vdots\\ q_{n}\end{array}\right]$$
|
||||||
|
|
||||||
|
such that
|
||||||
|
|
||||||
|
$${\bf p}=q_{1}{\bf v}_{1}+\ldots+q_{n}{\bf v}_{n}=V{\bf q}$$
|
||||||
|
|
||||||
|
where
|
||||||
|
|
||||||
|
$$V=\left[\begin{array}{ccc}{\bf v}_{1}&\cdots&{\bf v}_{n}\end{array}\right]$$
|
||||||
|
|
||||||
|
is an $m\times n$ matrix that collects the basis for ${\cal S}$ as its columns. Then for any $i=1,\ldots,n$ we have
|
||||||
|
|
||||||
|
$${\bf v}_{i}^{T}{\bf p}={\bf v}_{i}^{T}\sum_{j=1}^{n}q_{j}{\bf v}_{j}=\sum_{j=1 }^{n}q_{j}{\bf v}_{i}^{T}{\bf v}_{j}=q_{i}\,$$
|
||||||
|
|
||||||
|
since the ${\bf v}_{j}$ are orthonormal. This is important, and may need emphasis:
|
||||||
|
|
||||||
|
_If_
|
||||||
|
|
||||||
|
$${\bf p}=\sum_{j=1}^{n}q_{j}{\bf v}_{j}$$
|
||||||
|
|
||||||
|
_and the vectors of the basis ${\bf v}_{1},\ldots,{\bf v}_{n}$ are orthonormal, then the coefficients $q_{j}$ are the signed magnitudes of the projections of ${\bf p}$ onto the basis vectors:_
|
||||||
|
|
||||||
|
$$q_{j}={\bf v}_{j}^{T}{\bf p}. \tag{1}$$In matrix form,
|
||||||
|
|
||||||
|
$${\bf q}=V^{T}{\bf p}\;. \tag{2}$$
|
||||||
|
|
||||||
|
Also, we can collect the $n^{2}$ equations
|
||||||
|
|
||||||
|
$${\bf v}_{i}^{T}{\bf v}_{j}=\left\{\begin{array}{ll}1&\mbox{ if }i=j\\ 0&\mbox{ otherwise}\end{array}\right.$$
|
||||||
|
|
||||||
|
into the following matrix equation:
|
||||||
|
|
||||||
|
$$V^{T}V=I \tag{3}$$
|
||||||
|
|
||||||
|
where $I$ is the $n\times n$ identity matrix. A matrix $V$ that satisfies equation (3) is said to be _orthogonal_. Thus, a matrix is orthogonal if its columns are orthonormal. Since the _left inverse_ of a matrix $V$ is defined as the matrix $L$ such that
|
||||||
|
|
||||||
|
$$LV=I\;, \tag{4}$$
|
||||||
|
|
||||||
|
comparison with equation (3) shows that the left inverse of an orthogonal matrix $V$ exists, and is equal to the transpose of $V$.
|
||||||
|
|
||||||
|
Of course, this argument requires $V$ to be full rank, so that the solution $L$ to equation (4) is unique. However, $V$ is certainly full rank, because it is made of orthonormal columns.
|
||||||
|
|
||||||
|
Notice that $VR=I$ cannot possibly have a solution when $m>n$, because the $m\times m$ identity matrix has $m$ linearly independent 1 columns, while the columns of $VR$ are linear combinations of the $n$ columns of $V$, so $VR$ can have at most $n$ linearly independent columns.
|
||||||
|
|
||||||
|
Footnote 1: Nay, orthonormal.
|
||||||
|
|
||||||
|
Of course, this result is still valid when $V$ is $m\times m$ and has orthonormal columns, since equation (3) still holds. However, for square, full-rank matrices ($r=m=n$), the distinction between left and right inverse vanishes, as we saw in class. Since the matrix $VV^{T}$ contains the inner products between the _rows_ of $V$ (just as $V^{T}V$ is formed by the inner products of its _columns_), the argument above shows that the rows of a _square_ orthogonal matrix are orthonormal as well. We can summarize this discussion as follows:
|
||||||
|
|
||||||
|
**Theorem 1.1**: _The left inverse of an orthogonal $m\times n$ matrix $V$ with $m\geq n$ exists and is equal to the transpose of $V$:_
|
||||||
|
|
||||||
|
$$V^{T}V=I\;.$$
|
||||||
|
|
||||||
|
_In particular, if $m=n$, the matrix $V^{-1}=V^{T}$ is also the right inverse of $V$:_
|
||||||
|
|
||||||
|
$$V\mbox{ square}\quad\Rightarrow\;\;V^{-1}V=V^{T}V=VV^{-1}=VV^{T}=I\;.$$
|
||||||
|
|
||||||
|
Sometimes, when $m=n$, the geometric interpretation of equation (2) causes confusion, because two interpretations of it are possible. In the interpretation given above, the point $P$ remains the same, and the underlying reference frame is changed from the elementary vectors ${\bf e}_{j}$ (that is, from the columns of $I$) to the vectors ${\bf v}_{j}$ (that is, to the columns of $V$). Alternatively, equation (2) can be seen as a transformation, in a fixed reference system, of point $P$ with coordinates ${\bf p}$ into a different point $Q$ with coordinates ${\bf q}$. This, however, is relativity, and should not be surprising: If you spin clockwise on your feet, or if you stand still and the whole universe spins counterclockwise around you, the result is the same.2
|
||||||
|
|
||||||
|
Footnote 2: At least geometrically. One solution may be more efficient than the other in other ways.
|
||||||
|
|
||||||
|
Consistently with either of these geometric interpretations, we have the following result:
|
||||||
|
|
||||||
|
**Theorem 1.2**: _The norm of a vector ${\bf x}$ is not changed by multiplication by an orthogonal matrix $V$:_
|
||||||
|
|
||||||
|
$$\|V{\bf x}\|=\|{\bf x}\|\.$$
|
||||||
|
|
||||||
|
**Proof.**
|
||||||
|
|
||||||
|
$$\|V{\bf x}\|^{2}={\bf x}^{T}V^{T}V{\bf x}={\bf x}^{T}{\bf x}=\|{\bf x}\|^{2}\.$$
|
||||||
|
|
||||||
|
$\Delta$
|
||||||
|
|
||||||
|
We conclude this section with an obvious but useful consequence of orthogonality. In an earlier note, we defined the projection ${\bf p}$ of a vector ${\bf b}$ onto another vector ${\bf c}$ as the point on the line through ${\bf c}$ that is closest to ${\bf b}$. This notion of projection can be extended from lines to vector spaces by the following definition: The _projection_${\bf p}$ of a point ${\bf b}\in{\bf R}^{n}$_onto a subspace_$C$ is the point in $C$ that is closest to ${\bf b}$.
|
||||||
|
|
||||||
|
Also, for _unit_ vectors ${\bf c}$, the projection matrix is ${\bf c}{\bf c}^{T}$, and the vector ${\bf b}-{\bf p}$ is orthogonal to ${\bf c}$. An analogous result holds for subspace projection, as the following theorem shows.
|
||||||
|
|
||||||
|
**Theorem 1.3**: _Let $U$ be an orthogonal matrix. Then the matrix $UU^{T}$ projects any vector ${\bf b}$ onto ${\rm range}(U)$. Furthermore, the difference vector between ${\bf b}$ and its projection ${\bf p}$ onto ${\rm range}(U)$ is orthogonal to ${\rm range}(U)$:_
|
||||||
|
|
||||||
|
$$U^{T}({\bf b}-{\bf p})={\bf 0}\.$$
|
||||||
|
|
||||||
|
**Proof.** A point ${\bf p}$ in ${\rm range}(U)$ is a linear combination of the columns of $U$:
|
||||||
|
|
||||||
|
$${\bf p}=U{\bf x}$$
|
||||||
|
|
||||||
|
where ${\bf x}$ is the vector of coefficients (as many coefficients as there are columns in $U$). The squared distance between ${\bf b}$ and ${\bf p}$ is
|
||||||
|
|
||||||
|
$$\|{\bf b}-{\bf p}\|^{2}=({\bf b}-{\bf p})^{T}({\bf b}-{\bf p})={\bf b}^{T}{\bf b }+{\bf p}^{T}{\bf p}-2{\bf b}^{T}{\bf p}={\bf b}^{T}{\bf b}+{\bf x}^{T}U^{T}U{ \bf x}-2{\bf b}^{T}U{\bf x}\.$$
|
||||||
|
|
||||||
|
Because of orthogonality, $U^{T}U$ is the identity matrix, so
|
||||||
|
|
||||||
|
$$\|{\bf b}-{\bf p}\|^{2}={\bf b}^{T}{\bf b}+{\bf x}^{T}{\bf x}-2{\bf b}^{T}U{\bf x }\.$$
|
||||||
|
|
||||||
|
The derivative of this squared distance with respect to ${\bf x}$ is the vector
|
||||||
|
|
||||||
|
$$2{\bf x}-2U^{T}{\bf b}$$
|
||||||
|
|
||||||
|
which is zero iff
|
||||||
|
|
||||||
|
$${\bf x}=U^{T}{\bf b}\,$$
|
||||||
|
|
||||||
|
that is, when
|
||||||
|
|
||||||
|
$${\bf p}=U{\bf x}=UU^{T}{\bf b}$$
|
||||||
|
|
||||||
|
as promised.
|
||||||
|
|
||||||
|
For this value of ${\bf p}$ the difference vector ${\bf b}-{\bf p}$ is orthogonal to ${\rm range}(U)$, in the sense that
|
||||||
|
|
||||||
|
$$U^{T}({\bf b}-{\bf p})=U^{T}({\bf b}-UU^{T}{\bf b})=U^{T}{\bf b}-U^{T}{\bf b}= {\bf 0}\.$$
|
||||||
|
|
||||||
|
## 2 The Singular Value Decomposition
|
||||||
|
|
||||||
|
Here is the main intuition captured by the Singular Value Decomposition (SVD) of a matrix:
|
||||||
|
|
||||||
|
An $m\times n$ matrix $A$ of rank $r$ maps the $r$-dimensional unit hypersphere in $\text{rowspace}(A)$ into an $r$-dimensional hyperellipse in $\text{range}(A)$.
|
||||||
|
|
||||||
|
Thus, a hypersphere is stretched or compressed into a hyperellipse, which is a quadratic hypersurface that generalizes the two-dimensional notion of ellipse to an arbitrary number of dimensions. In three dimensions, the hyperellipse is an ellipsoid, in one dimension it is a pair of points. In all cases, the hyperellipse in question is centered at the origin.
|
||||||
|
|
||||||
|
For instance, the rank-2 matrix
|
||||||
|
|
||||||
|
$$A=\frac{1}{\sqrt{2}}\left[\begin{array}{ccc}\sqrt{3}&\sqrt{3}\\ -3&3\\ 1&1\end{array}\right] \tag{5}$$
|
||||||
|
|
||||||
|
transforms the unit circle on the plane into an ellipse embedded in three-dimensional space. Figure 1 shows the map
|
||||||
|
|
||||||
|
$$\mathbf{b}=A\mathbf{x}\.$$
|
||||||
|
|
||||||
|
Two diametrically opposite points on the unit circle are mapped into the two endpoints of the major axis of the ellipse, and two other diametrically opposite points on the unit circle are mapped into the two endpoints of the minor axis of the ellipse. The lines through these two pairs of points on the unit circle are always orthogonal. This result can be generalized to any $m\times n$ matrix.
|
||||||
|
|
||||||
|
Simple and fundamental as this geometric fact may be, its proof by geometric means is cumbersome. Instead, we will prove it algebraically by first introducing the existence of the SVD and then using the latter to prove that matrices map hyperspheres into hyperellipses.
|
||||||
|
|
||||||
|
Figure 1: The matrix in equation (5) maps a circle on the plane into an ellipse in space. The two small boxes are corresponding points.
|
||||||
|
|
||||||
|
**Theorem 2.1**: _If $A$ is a real $m\times n$ matrix then there exist orthogonal matrices_
|
||||||
|
|
||||||
|
$$\begin{array}{rcl}U&=&\left[\begin{array}{ccc}{\bf u}_{1}&\cdots&{\bf u}_{m} \end{array}\right]\in{\cal R}^{m\times m}\\ V&=&\left[\begin{array}{ccc}{\bf v}_{1}&\cdots&{\bf v}_{n}\end{array}\right] \in{\cal R}^{n\times n}\end{array}$$
|
||||||
|
|
||||||
|
_such that_
|
||||||
|
|
||||||
|
$$U^{T}AV=\Sigma={\rm diag}(\sigma_{1},\ldots,\sigma_{p})\in{\cal R}^{m\times n}$$
|
||||||
|
|
||||||
|
_where $p=\min(m,n)$ and $\sigma_{1}\geq\ldots\geq\sigma_{p}\geq 0$. Equivalently,_
|
||||||
|
|
||||||
|
$$A=U\Sigma V^{T}\.$$
|
||||||
|
|
||||||
|
The columns of $V$ are the _right singular vectors_ of $A$, and those of $U$ are its _left singular vectors_. The diagonal entries of $\Sigma$ are the _singular values_ of $A$. The ratio
|
||||||
|
|
||||||
|
$$\kappa(A)=\sigma_{1}/\sigma_{p} \tag{6}$$
|
||||||
|
|
||||||
|
is the _condition number_ of $A$, and is possibly infinite.
|
||||||
|
|
||||||
|
**Proof.** Let ${\bf x}$ and ${\bf y}$ be unit vectors in ${\bf R}^{n}$ and ${\bf R}^{m}$, respectively, and consider the bilinear form
|
||||||
|
|
||||||
|
$$z={\bf y}^{T}A{\bf x}\.$$
|
||||||
|
|
||||||
|
The set
|
||||||
|
|
||||||
|
$${\cal S}\,=\,\{{\bf x},\,{\bf y}\,\,|\,{\bf x}\in{\bf R}^{n},\,\,{\bf y}\in{ \bf R}^{m},\,\,\|{\bf x}\|=\|{\bf y}\|=1\}$$
|
||||||
|
|
||||||
|
is compact, so that the scalar function $z({\bf x},{\bf y})$ must achieve a maximum value on ${\cal S}$, possibly at more than one point 3. Let ${\bf u}_{1},\,{\bf v}_{1}$ be two unit vectors in ${\bf R}^{m}$ and ${\bf R}^{n}$ respectively where this maximum is achieved, and let $\sigma_{1}$ be the corresponding value of $z$:
|
||||||
|
|
||||||
|
Footnote 3: Actually, at least at two points: if ${\bf u}_{1}^{T}A{\bf v}_{1}$ is a maximum, so is $(-{\bf u}_{1})^{T}A(-{\bf v}_{1})$.
|
||||||
|
|
||||||
|
$$\max_{\|{\bf x}\|=\|{\bf y}\|=1}{\bf y}^{T}A{\bf x}={\bf u}_{1}^{T}A{\bf v}_{1 }=\sigma_{1}\.$$
|
||||||
|
|
||||||
|
It is easy to see that ${\bf u}_{1}$ is parallel to the vector $A{\bf v}_{1}$. If this were not the case, their inner product ${\bf u}_{1}^{T}A{\bf v}_{1}$ could be increased by rotating ${\bf u}_{1}$ towards the direction of $A{\bf v}_{1}$, thereby contradicting the fact that ${\bf u}_{1}^{T}A{\bf v}_{1}$ is a maximum. Similarly, by noticing that
|
||||||
|
|
||||||
|
$${\bf u}_{1}^{T}A{\bf v}_{1}={\bf v}_{1}^{T}A^{T}{\bf u}_{1}$$
|
||||||
|
|
||||||
|
and repeating the argument above, we see that ${\bf v}_{1}$ is parallel to $A^{T}{\bf u}_{1}$.
|
||||||
|
|
||||||
|
The vectors ${\bf u}_{1}$ and ${\bf v}_{1}$ can be extended into orthonormal bases for ${\bf R}^{m}$ and ${\bf R}^{n}$, respectively. Collect these orthonormal basis vectors into orthogonal matrices $U_{1}$ and $V_{1}$. Then
|
||||||
|
|
||||||
|
$$U_{1}^{T}AV_{1}=S_{1}=\left[\begin{array}{cc}\sigma_{1}&{\bf 0}^{T}\\ {\bf 0}&A_{1}\end{array}\right]\.$$
|
||||||
|
|
||||||
|
In fact, the first column of $AV_{1}$ is $A{\bf v}_{1}=\sigma_{1}{\bf u}_{1}$, so the first entry of $U_{1}^{T}AV_{1}$ is ${\bf u}_{1}^{T}\sigma_{1}{\bf u}_{1}=\sigma_{1}$, and its other entries are ${\bf u}_{j}^{T}A{\bf v}_{1}=0$ because $A{\bf v}_{1}$ is parallel to ${\bf u}_{1}$ and therefore orthogonal, by construction, to$\mathbf{u}_{2},\ldots,\mathbf{u}_{m}$. A similar argument shows that the entries after the first in the first row of $S_{1}$ are zero: the row vector $\mathbf{u}_{1}^{T}A$ is parallel to $\mathbf{v}_{1}^{T}$, and therefore orthogonal to $\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$, so that $\mathbf{u}_{1}^{T}A\mathbf{v}_{2}=\ldots=\mathbf{u}_{1}^{T}A\mathbf{v}_{n}=0$.
|
||||||
|
|
||||||
|
The matrix $A_{1}$ has one fewer row and column than $A$. We can repeat the same construction on $A_{1}$ and write
|
||||||
|
|
||||||
|
$$U_{2}^{T}A_{1}V_{2}=S_{2}=\left[\begin{array}{cc}\sigma_{2}&\mathbf{0}^{T} \\ \mathbf{0}&A_{2}\end{array}\right]$$
|
||||||
|
|
||||||
|
so that
|
||||||
|
|
||||||
|
$$\left[\begin{array}{cc}1&\mathbf{0}^{T}\\ \mathbf{0}&U_{2}^{T}\end{array}\right]U_{1}^{T}AV_{1}\left[\begin{array}{cc} 1&\mathbf{0}^{T}\\ \mathbf{0}&V_{2}\end{array}\right]=\left[\begin{array}{cc}\sigma_{1}&0& \mathbf{0}^{T}\\ 0&\sigma_{2}&\mathbf{0}^{T}\\ \mathbf{0}&\mathbf{0}&A_{2}\end{array}\right]\.$$
|
||||||
|
|
||||||
|
This procedure can be repeated until $A_{k}$ vanishes (zero rows or zero columns) to obtain
|
||||||
|
|
||||||
|
$$U^{T}AV=\Sigma$$
|
||||||
|
|
||||||
|
where $U^{T}$ and $V$ are orthogonal matrices obtained by multiplying together all the orthogonal matrices used in the procedure, and
|
||||||
|
|
||||||
|
$$\Sigma=\mathrm{diag}(\sigma_{1},\ldots,\sigma_{p})\.$$
|
||||||
|
|
||||||
|
Since matrices $U$ and $V$ are orthogonal, we can premultiply the matrix product in the theorem by $U$ and postmultiply it by $V^{T}$ to obtain
|
||||||
|
|
||||||
|
$$A=U\Sigma V^{T}\,$$
|
||||||
|
|
||||||
|
which is the desired result.
|
||||||
|
|
||||||
|
It only remains to show that the elements on the diagonal of $\Sigma$ are nonnegative and arranged in non-increasing order. To see that $\sigma_{1}\geq\ldots\geq\sigma_{p}$ (where $p=\min(m,n)$), we can observe that the successive maximization problems that yield $\sigma_{1},\ldots,\sigma_{p}$ are performed on a sequence of sets each of which contains the next. To show this, we just need to show that $\sigma_{2}\leq\sigma_{1}$, and induction will do the rest. We have
|
||||||
|
|
||||||
|
$$\begin{array}{rcl}\sigma_{2}&=&\max_{\|\hat{\mathbf{x}}\|=\|\hat{\mathbf{y }}\|=1}\hat{\mathbf{y}}^{T}A_{1}\hat{\mathbf{x}}=\max_{\|\hat{\mathbf{x}}\|= \|\hat{\mathbf{y}}\|=1}\left[\begin{array}{cc}0&\hat{\mathbf{y}}\end{array} \right]^{T}S_{1}\left[\begin{array}{c}0\\ \hat{\mathbf{x}}\end{array}\right]\\ &=&\max_{\|\hat{\mathbf{x}}\|=\|\hat{\mathbf{y}}\|=1}\left[\begin{array}{cc}0 &\hat{\mathbf{y}}\end{array}\right]^{T}U_{1}^{T}AV_{1}\left[\begin{array}{c} 0\\ \hat{\mathbf{x}}\end{array}\right]=&\max_{\|\mathbf{x}\|=\|\mathbf{y}\|=1} \mathbf{y}^{T}A\mathbf{x}\leq\sigma_{1}\.\\ &&\mathbf{x}^{T}\mathbf{v}_{1}=\mathbf{y}^{T}\mathbf{u}_{1}=0\end{array}$$
|
||||||
|
|
||||||
|
To explain the last equality above, consider the vectors
|
||||||
|
|
||||||
|
$$\mathbf{x}=V_{1}\left[\begin{array}{c}0\\ \hat{\mathbf{x}}\end{array}\right]\quad\mbox{and}\quad\mathbf{y}=U_{1}\left[ \begin{array}{c}0\\ \hat{\mathbf{y}}\end{array}\right]\.$$
|
||||||
|
|
||||||
|
The vector $\mathbf{x}$ is equal to the unit vector $[0\ \hat{\mathbf{x}}]^{T}$ transformed by the orthogonal matrix $V_{1}$, and is therefore itself a unit vector. In addition, it is a linear combination of $\mathbf{v}_{2},\ldots,\mathbf{v}_{n}$, and is therefore orthogonal to $\mathbf{v}_{1}$. A similar argument shows that $\mathbf{y}$ is a unit vector orthogonal to $\mathbf{u}_{1}$. Because $\mathbf{x}$ and $\mathbf{y}$ thus defined belong to subsets (actually sub-spheres) of the unit spheres in $\mathbf{R}^{n}$ and $\mathbf{R}^{m}$, we conclude that $\sigma_{2}\leq\sigma_{1}$.
|
||||||
|
|
||||||
|
The $\sigma_{i}$ are nonnegative because all these maximizations are performed on unit hyper-spheres. The $\sigma_{i}$s are maxima of the function $z(\mathbf{x},\mathbf{y})$ which always assumes both positive and negative values on any hyper-sphere: If $z(\mathbf{x},\mathbf{y})$ is negative, then $z(-\mathbf{x},\mathbf{y})$ is positive, and if $\mathbf{x}$ is on a hyper-sphere, so is $-\mathbf{x}$. $\Delta$
|
||||||
|
|
||||||
|
[MISSING_PAGE_FAIL:7]
|
||||||
|
|
||||||
|
Finally, both the 2-norm and the Frobenius norm
|
||||||
|
|
||||||
|
$$\|A\|_{F}=\sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n}|a_{ij}|^{2}}$$
|
||||||
|
|
||||||
|
and
|
||||||
|
|
||||||
|
$$\|A\|_{2}=\sup_{{\bf x}\neq 0}\frac{\|A{\bf x}\|}{\|{\bf x}\|}$$
|
||||||
|
|
||||||
|
are neatly characterized in terms of the SVD:
|
||||||
|
|
||||||
|
$$\|A\|_{F}^{2} = \sigma_{1}^{2}+\ldots+\sigma_{p}^{2}$$ $$\|A\|_{2} = \sigma_{1}\.$$
|
||||||
|
|
||||||
|
In the next few sections we introduce fundamental results and applications that testify to the importance of the SVD.
|
||||||
|
|
||||||
|
## 3 The Pseudoinverse
|
||||||
|
|
||||||
|
One of the most important applications of the SVD is the solution of linear systems in the least squares sense. A linear system of the form
|
||||||
|
|
||||||
|
$$A{\bf x}={\bf b} \tag{7}$$
|
||||||
|
|
||||||
|
arising from a real-life application may or may not admit a solution, that is, a vector ${\bf x}$ that satisfies this equation exactly. Often more measurements are available than strictly necessary, because measurements are unreliable. This leads to more equations than unknowns (the number $m$ of rows in $A$ is greater than the number $n$ of columns), and equations are often mutually incompatible because they come from inexact measurements. Even when $m\leq n$ the equations can be incompatible, because of errors in the measurements that produce the entries of $A$. In these cases, it makes more sense to find a vector ${\bf x}$ that minimizes the norm
|
||||||
|
|
||||||
|
$$\|A{\bf x}-{\bf b}\|$$
|
||||||
|
|
||||||
|
of the _residual_ vector
|
||||||
|
|
||||||
|
$${\bf r}=A{\bf x}-{\bf b}\.$$
|
||||||
|
|
||||||
|
where the double bars henceforth refer to the Euclidean norm. Thus, ${\bf x}$ cannot exactly satisfy any of the $m$ equations in the system, but it tries to satisfy all of them as closely as possible, as measured by the sum of the squares of the discrepancies between left- and right-hand sides of the equations.
|
||||||
|
|
||||||
|
In other circumstances, not enough measurements are available. Then, the linear system (7) is under-determined, in the sense that it has fewer independent equations than unknowns (its rank $r$ is less than $n$).
|
||||||
|
|
||||||
|
Incompatibility and under-determinacy can occur together: the system admits no solution, and the least-squares solution is not unique. For instance, the system
|
||||||
|
|
||||||
|
$$x_{1}+x_{2} = 1$$ $$x_{1}+x_{2} = 3$$ $$x_{3} = 2$$has three unknowns, but rank 2, and its first two equations are incompatible: $x_{1}+x_{2}$ cannot be equal to both 1 and 3. A least-squares solution turns out to be ${\bf x}=[1\ 1\ 2]^{T}$ with residual ${\bf r}=A{\bf x}-{\bf b}=[1\ -1\ 0]$, which has norm $\sqrt{2}$ (admittedly, this is a rather high residual, but this is the best we can do for this problem, in the least-squares sense). However, any other vector of the form
|
||||||
|
|
||||||
|
$${\bf x}^{\prime}=\left[\begin{array}{c}1\\ 1\\ 2\end{array}\right]+\alpha\left[\begin{array}{c}-1\\ 1\\ 0\end{array}\right]$$
|
||||||
|
|
||||||
|
is as good as ${\bf x}$. For instance, ${\bf x}^{\prime}=[0\ 2\ 2]$, obtained for $\alpha=1$, yields exactly the same residual as ${\bf x}$ (check this).
|
||||||
|
|
||||||
|
In summary, an exact solution to the system (7) may not exist, or may not be unique. An approximate solution, in the least-squares sense, always exists, but may fail to be unique.
|
||||||
|
|
||||||
|
If there are several least-squares solutions, all equally good (or bad), then one of them turns out to be shorter than all the others, that is, its norm $\|{\bf x}\|$ is smallest. One can therefore redefine what it means to "solve" a linear system so that there is always exactly one solution. This minimum norm solution is the subject of the following theorem, which both proves uniqueness and provides a recipe for the computation of the solution.
|
||||||
|
|
||||||
|
**Theorem 3.1**: _The minimum-norm least squares solution to a linear system $A{\bf x}={\bf b}$, that is, the shortest vector ${\bf x}$ that achieves the_
|
||||||
|
|
||||||
|
$$\min_{{\bf x}}\|A{\bf x}-{\bf b}\|\,$$
|
||||||
|
|
||||||
|
_is unique, and is given by_
|
||||||
|
|
||||||
|
$$\hat{{\bf x}}=V\Sigma^{\dagger}U^{T}{\bf b} \tag{8}$$
|
||||||
|
|
||||||
|
_where_
|
||||||
|
|
||||||
|
$$\Sigma^{\dagger}=\left[\begin{array}{cccccc}1/\sigma_{1}&&&&0\ \ \cdots\ \ 0\\ &\ddots&&&&\\ &&1/\sigma_{r}&&&&\vdots&\vdots\\ &&&0&&&&\\ &&&\ddots&&&\\ &&&&0\ \ 0\ \cdots\ \ 0\end{array}\right]$$
|
||||||
|
|
||||||
|
_is an $n\times m$ diagonal matrix._
|
||||||
|
|
||||||
|
The matrix
|
||||||
|
|
||||||
|
$$A^{\dagger}=V\Sigma^{\dagger}U^{T}$$
|
||||||
|
|
||||||
|
is called the _pseudoinverse_ of $A$.
|
||||||
|
|
||||||
|
**Proof.** The minimum-norm Least Squares solution to
|
||||||
|
|
||||||
|
$$A{\bf x}={\bf b}$$
|
||||||
|
|
||||||
|
is the shortest vector ${\bf x}$ that minimizes
|
||||||
|
|
||||||
|
$$\|A{\bf x}-{\bf b}\|$$
|
||||||
|
|
||||||
|
that is,
|
||||||
|
|
||||||
|
$$\|U\Sigma V^{T}{\bf x}-{\bf b}\|\.$$
|
||||||
|
|
||||||
|
[MISSING_PAGE_FAIL:10]
|
||||||
|
|
||||||
|
as promised. The residual, that is, the norm of $\|A{\bf x}-{\bf b}\|$ when ${\bf x}$ is the solution vector, is the norm of $\Sigma{\bf y}-{\bf c}$, since this vector is related to $A{\bf x}-{\bf b}$ by an orthogonal transformation (see equation (9)). In conclusion, the square of the residual is
|
||||||
|
|
||||||
|
$$\|A{\bf x}-{\bf b}\|^{2}=\|\Sigma{\bf y}-{\bf c}\|^{2}=\sum_{i=r+1}^{m}c_{i}^{2 }=\sum_{i=r+1}^{m}({\bf u}_{i}^{T}{\bf b})^{2}$$
|
||||||
|
|
||||||
|
which is the projection of the right-hand side vector ${\bf b}$ onto the complement of the range of $A$. $\Delta$
|
||||||
|
|
||||||
|
## 4 Least-Squares Solution of a Homogeneous Linear Systems
|
||||||
|
|
||||||
|
Theorem 3.1 works regardless of the value of the right-hand side vector ${\bf b}$. When ${\bf b}={\bf 0}$, that is, when the system is _homogeneous_, the solution is trivial: the minimum-norm solution to
|
||||||
|
|
||||||
|
$$A{\bf x}={\bf 0} \tag{10}$$
|
||||||
|
|
||||||
|
is
|
||||||
|
|
||||||
|
$${\bf x}=0\,$$
|
||||||
|
|
||||||
|
which happens to be an exact solution. Of course it is not necessarily the only one (any vector in the null space of $A$ is also a solution, by definition), but it is obviously the one with the smallest norm.
|
||||||
|
|
||||||
|
Thus, ${\bf x}=0$ is the minimum-norm solution to any homogeneous linear system. Although correct, this solution is not too interesting. In many applications, what is desired is a _nonzero_ vector ${\bf x}$ that satisfies the system (10) as well as possible. Without any constraints on ${\bf x}$, we would fall back to ${\bf x}=0$ again. For homogeneous linear systems, the meaning of a least-squares solution is therefore usually modified, once more, by imposing the constraint
|
||||||
|
|
||||||
|
$$\|{\bf x}\|=1$$
|
||||||
|
|
||||||
|
on the solution. Unfortunately, the resulting constrained minimization problem does not necessarily admit a _unique_ solution. The following theorem provides a recipe for finding this solution, and shows that there is in general a whole hypersphere of solutions.
|
||||||
|
|
||||||
|
**Theorem 4.1**: _Let_
|
||||||
|
|
||||||
|
$$A=U\Sigma V^{T}$$
|
||||||
|
|
||||||
|
_be the singular value decomposition of $A$. Furthermore, let ${\bf v}_{n-k+1},\ldots,{\bf v}_{n}$ be the $k$ columns of $V$ whose corresponding singular values are equal to the last singular value $\sigma_{n}$, that is, let $k$ be the largest integer such that_
|
||||||
|
|
||||||
|
$$\sigma_{n-k+1}=\ldots=\sigma_{n}\.$$
|
||||||
|
|
||||||
|
_Then, all vectors of the form_
|
||||||
|
|
||||||
|
$${\bf x}=\alpha_{1}{\bf v}_{n-k+1}+\ldots+\alpha_{k}{\bf v}_{n} \tag{11}$$
|
||||||
|
|
||||||
|
_with_
|
||||||
|
|
||||||
|
$$\alpha_{1}^{2}+\ldots+\alpha_{k}^{2}=1 \tag{12}$$_are unit-norm least squares solutions to the homogeneous linear system_
|
||||||
|
|
||||||
|
$$A{\bf x}={\bf 0},$$
|
||||||
|
|
||||||
|
_that is, they achieve the_
|
||||||
|
|
||||||
|
$$\min_{\|{\bf x}\|=1}\|A{\bf x}\|\.$$
|
||||||
|
|
||||||
|
Note: when $\sigma_{n}$ is greater than zero the most common case is $k=1$, since it is very unlikely that different singular values have _exactly_ the same numerical value. When $A$ is rank deficient, on the other case, it may often have more than one singular value equal to zero. In any event, if $k=1$, then the minimum-norm solution is unique, ${\bf x}={\bf v}_{n}$. If $k>1$, the theorem above shows how to express _all_ solutions as a linear combination of the last $k$ columns of $V$.
|
||||||
|
|
||||||
|
**Proof.** The reasoning is very similar to that for the previous theorem. The unit-norm Least Squares solution to
|
||||||
|
|
||||||
|
$$A{\bf x}={\bf 0}$$
|
||||||
|
|
||||||
|
is the vector ${\bf x}$ with $\|{\bf x}\|=1$ that minimizes
|
||||||
|
|
||||||
|
$$\|A{\bf x}\|$$
|
||||||
|
|
||||||
|
that is,
|
||||||
|
|
||||||
|
$$\|U\Sigma V^{T}{\bf x}\|\.$$
|
||||||
|
|
||||||
|
Since orthogonal matrices do not change the norm of vectors they are applied to (theorem 1.2), this norm is the same as
|
||||||
|
|
||||||
|
$$\|\Sigma V^{T}{\bf x}\|$$
|
||||||
|
|
||||||
|
or, with ${\bf y}=V^{T}{\bf x}$,
|
||||||
|
|
||||||
|
$$\|\Sigma{\bf y}\|\.$$
|
||||||
|
|
||||||
|
Since $V$ is orthogonal, $\|{\bf x}\|=1$ translates to $\|{\bf y}\|=1$. We thus look for the unit-norm vector ${\bf y}$ that minimizes the norm (squared) of $\Sigma{\bf y}$, that is,
|
||||||
|
|
||||||
|
$$\sigma_{1}^{2}y_{1}^{2}+\ldots+\sigma_{n}^{2}y_{n}^{2}\.$$
|
||||||
|
|
||||||
|
This is obviously achieved by concentrating all the (unit) mass of ${\bf y}$ where the $\sigma$s are smallest, that is by letting
|
||||||
|
|
||||||
|
$$y_{1}=\ldots=y_{n-k}=0. \tag{13}$$
|
||||||
|
|
||||||
|
From ${\bf y}=V^{T}{\bf x}$ we obtain ${\bf x}=V{\bf y}=y_{1}{\bf v}_{1}+\ldots+y_{n}{\bf v}_{n}$, so that equation (13) is equivalent to equation (11) with $\alpha_{1}=y_{n-k+1},\ldots,\alpha_{k}=y_{n}$, and the unit-norm constraint on ${\bf y}$ yields equation (12). $\Delta$
|
BIN
documents/pdfs/test_linear_algebra.pdf
Normal file
BIN
documents/pdfs/test_linear_algebra.pdf
Normal file
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
|
@ -123,12 +123,16 @@ with gr.Blocks() as main_tab:
|
||||||
with gr.Row():
|
with gr.Row():
|
||||||
with gr.Column(scale=12):
|
with gr.Column(scale=12):
|
||||||
file_input = gr.File(label="Select a PDF file")
|
file_input = gr.File(label="Select a PDF file")
|
||||||
math_checkbox = gr.Checkbox(label="Enable math mode (your pdf file will be converted to some latex-like format for the chatbot to understand it better)")
|
math_checkbox = gr.Checkbox(label="Interpret as LaTeX (a latex version will be created then given to "
|
||||||
|
"the chatbot, the conversion take some time)")
|
||||||
|
|
||||||
with gr.Column():
|
with gr.Column():
|
||||||
with gr.Group():
|
with gr.Group():
|
||||||
chatbot = gr.Chatbot(scale=2)
|
chatbot = gr.Chatbot(scale=2,
|
||||||
msg = gr.Textbox(scale=2)
|
latex_delimiters=[{"left": "$$", "right": "$$", "display": True},
|
||||||
|
{"left": "$", "right": "$", "display": False}])
|
||||||
|
msg = gr.Textbox(label="User message", scale=2)
|
||||||
|
|
||||||
msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
|
msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
|
||||||
bot, chatbot, chatbot
|
bot, chatbot, chatbot
|
||||||
)
|
)
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue