线性代数复习
Notes of Andrew Ng’s Machine Learning —— (2) Linear Algebra Review
Matrices and Vectors
- Matrices are 2-dimensional arrays:
$$
\left[\begin{array}{ccc}
a & b & c \
d & e & f \
g & h & i \
j & k & l \
\end{array}\right]
$$
The above matrix has four rows and three columns, so it is a 4 x 3 matrix
.
- Vector are matrices with one column and many rows:
$$
\left[\begin{array}{c}
w \
x \
y \
z \
\end{array}\right]
$$
The above vector is a 4 x 1 matrix
.
Notation and terms
- $A_{ij}$ refers to the element in the ith row and jth column of matrix A.
- A vector with ‘n’ rows is referred to as a
'n'-dimensional vector
. - $v_i$ refers to the element in the ith row of the vector.
- In general, all our vectors and matrices will be
1-indexed
, which refers that it’s beginning from1
. Note that this is different to lots of programming languages. - Matrices are usualy denoted by uppercase names while vectors are lowercase.
Scalar
means that an object is a single value, not a vector or matrix.- $\R$ referss to the set of scalar real numbers.
- $R^n$ refers to the set of n-dimensional vectors of real numbers.
in Octave/Matlab
1 | % The ; denotes we are going back to a new row. |
Output:
1 | A = |
Addition and Scalar Multiplication
Addition
Addition and subtraction are element-wise, so you simply add or subtract each corresponding element:
$$
\left[\begin{array}{cc}
a & b \
c & d \
\end{array}\right]
+
\left[\begin{array}{cc}
w & x\
y & z\
\end{array}\right]
=
\left[\begin{array}{cc}
a+w & b+x \
c+y & d+z \
\end{array}\right]
$$
To add or subtract two matrices, their dimensions must be the same.
Scalar multiplication
In scalar multiplication, we simply multiply every element by the scalar value:
$$
\left[\begin{array}{cc}
a & b \
c & d \
\end{array}\right]
*x
=
\left[\begin{array}{cc}
ax & bx \
cx & dx \
\end{array}\right]
$$
in Octave/Matlab
1 | % Initialize matrix A and B |
Output:
1 | A = |
Matrix-Vector Multiplication
We map the column of the vector onto each row of the matrix, multiplying each element and summing the result.
$$
\left[\begin{array}{cc}
a & b\
c & d\
e & f\
\end{array}\right]
*
\left[\begin{array}{c}
x\
y
\end{array}\right]
=
\left[\begin{array}{cc}
ax & by\
cx & dy\
ex & fy\
\end{array}\right]
$$
The result is a vector. The number of columns of the matrix must equal the number of rows of the vector.
An m x n matrix
multiplied by an n x 1 vector
results in an m x 1 vector
.
in Octave/Matlab
1 | % Initialize matrix A |
Output:
1 | A = |
Neat Trick
Say, we have a set of four sizes of houses, and we have a hypotheses for predictiong what the price of a house. We are going to compute $h(x)$ of each of our 4 houses:
House sizes:
$$
\begin{array}{c}
2104\
1416\
1534\
852\
\end{array}
$$
Hypothesis:
$$
h_\theta(x)=-40+0.25x
$$
It turns out there’s neat way of posing this, applying this hypothesis to all of my houses at the same time via a Matrix-Vector multiplication.
- Construct a
DataMatrix
:
$$
\textrm{DataMatrix}=
\left[\begin{array}{cc}
1 & 2104\
1 & 1416\
1 & 1534\
1 & 852\
\end{array}\right]
$$
- Put
Parameters
to a vector:
$$
\textrm{Parameters}=
\left[\begin{array}{c}
-40\
0.25\
\end{array}\right]
$$
- Then, the
Predictions
will be clear by calculate a Matrix-Vector Multiplication:
$$
\begin{array}{ccccc}
\textrm{Predictions} & = & \textrm{DataMatrix} & * & \textrm{Parameters}\
& = & \left[\begin{array}{cc}
1 & 2104\
1 & 1416\
1 & 1534\
1 & 852
\end{array}\right] & * & \left[\begin{array}{c}
-40\
0.25
\end{array}\right]
\end{array}
$$
The reuslt will be something like this:
$$
\textrm{Predictions}=
\left[\begin{array}{c}
-40 \times 1 + 0.25 \times 2104\
-40 \times 1 + 0.25 \times 1416\
\vdots
\end{array}\right]
$$
Obviously, it’s equal to the codes below:
1 | for (i = 0; i < X.size(); i++) { |
However, our new trick simplifies the code, makes it more readable as well as driving it faster to be solved in most programming languages, we just construct two matrices and do a multiplication:
1 | DataMatrix = [...] |
Matrix-Matrix Multiplication
We multiply two matrices by breaking it into serveral vector multiplications and concatenating the result.
$$
\left[\begin{array}{cc}
a & b\
c & d\
e & f\
\end{array}\right]
*
\left[\begin{array}{cc}
w & x\
y & z\
\end{array}\right]
=
\left[\begin{array}{cc}
aw+by & ax+bz\
cw+dy & cx+dz\
ew+fy & ex+fz\
\end{array}\right]
$$
An m x n matrix
multiplied by an n x o matrix
result in an m x o
matrix ($[m \times n]*[n \times o]=[m \times o]$). In the above example, a 3 x 2 matrix times a 2 x 2 matrix resulted in a 3 x 2 matrix.
To multiply two matrices, the number of columns of the first matrix must equal the number of rows of the second matrix.
in Octave/Matlab
1 | A = [1, 2; 3, 4; 5, 6] |
Output:
1 | A = |
Neat Trick
Let’s say, as befor, that we have four houses, and we want to predict their prices. Ony now, we have three competing hypotheses. We want to apply all three competing hypotheses to all four Xs. It turns out we can do that very efficiently using a matrix-matrix multiplication.
Matrix Multiplication Properties
Non-commutative
Matrices are not commutative:
$$
A \times B \neq B \times A
$$
Associative
Matrices are associative:
$$
(A \times B) \times C = A \times (B \times C)
$$
Identity matrix
Identity matrix
: a matrix that simply has 1
‘s on the diagonal (upper left to lower right diagonal) and 0
‘s elsewhere.
$$
I=\left[\begin{array}{ccc}
1 & 0 & 0\
0 & 1 & 0\
0 & 0 & 1\
\end{array}\right]
$$
The identity matrix, when multiplied by any matrix of the same dimensions, results in the original matrix. It’s just like multiplying numbers by 1.
$$
A \times I = I \times A = A
$$
Notice that when doing A*I
, the I
should match the matrix’s columns and when doing I*A
, the I
should match the matrix’s rows:
$$
A_{m \times n} \times I_{n \times n}=I_{m \times m} \times A_{m \times n} = A_{m \times n}
$$
in Octave/Matlab
1 | % Initialize random matrices A and B |
Output:
1 | A = |
Inverse and Transpose
Inverse
The inverse of a matrix $A$ is denoted $A^{-1}$. Multiplying by the inverse results in the identity matrix:
$$
A_{m \times m} \times A^{-1}{m \times m}=A^{-1}{m \times m} \times A_{m \times m} = I_{m \times m}
$$
A non square matrix does not have an inverse matrix. We can compute inverses of matrices in octave with the pinv(A)
function and in Matlab with the inv(A)
function. Matrices that don’t have an inverse are singular or degenerate.
In practice, when we are using normal equation with Octave, there are two functions to inverse a Matrix – pinv and inv. For some mathematically reason, The pinv(A)
will always offer us the value of data that we want, even if A is non-invertible.
Transpose
The transposition of a matrix is like rotating the matrix 90º in clockwise direction and then reversing it.
In other words: Let $A$ be an $m \times n$ matrix, and let $B=A^T$. Then $B$ is an $n \times m$ matrix, and $B_{ij}=A_{ji}$.
$$
A=
\left[\begin{array}{cc}
a & b\
c & d\
e & f\
\end{array}\right]
\qquad
A^T=
\left[\begin{array}{ccc}
a & c & e\
b & d & f\
\end{array}\right]
$$
We can compute transposition of matrices in matlab with the transpose(A)
function or A'
in Octave/Matlab
1 | % Initialize matrix A |
Output:
1 | A = |