Notes of Advanced Machine Learning(I)

Two different ways represent a distribution over several random variables: (1) product of conditional probabilities: \(p(x_1,x_2,x_3,x_4)=p(x_4)p(x_3|x_4)p(x_2|x_3,x_4)p(x_1|x_2,x_3,x_4)\) and (2) global energy function: \(p(x_1,x_2,x_3,x_4)=\frac{1}{Z}e^{-E(x_1,x_2,x_3,x_4)}\), where \(Z\) is the partition function.
Directed graphical models use conditional probabilities, which undirected graphical models use energy functions that are a sum of several terms. Deep belief net(DBN) is a hybrid model.
1. Probabilistic Model
Two different ways represent a distribution over several random variables:
- product of conditional probabilities: p(x1,x2,x3,x4)=p(x4)p(x3|x4)p(x2|x3,x4)p(x1|x2,x3,x4)
- global energy function:
p(x1,x2,x3,x4)=1Ze{-E(x1,x2,x3,x4)},

where Zis the partition function.

Directed graphical models use conditional probabilities(Bayesian networks), while undirected graphical models(Markov random fields, Boltzmann machines) use energy functions that are a sum of several terms. Deep belief net(DBN) is a hybrid model.

Directed Graphs

Directed graphs are useful for expressing causal relationships between random variables.
- The joint distribution defined by the graph is given by the product of a conditional distribution for each node conditioned on its parents.
- For example, the joint distribution over x1,,x7 factorizes:
p(x)=p(x1)p(x2)p(x3)p(x4|x1,x2,x3)p(x5|x1,x3)p(x6|x4)p(x7|x4,x5)

Markov Random Fields

p(x)=1Zcc(xc)
- Each potential function is a mapping from joint configurations of random variables in a clique to non-negative real numbers.
- The choice of potential functions is not restricted to having specific probabilistic interpretations.
- Potential functions are often represented as exponentials:
p(x)=1Zcc(xc)=1Z(-cE(xc))=1Z(-E(x)) (Boltzmann distribution)
- Computing Z is very hard, which represents a major limitation of undirected models.
1. Singular Value Decomposition
Singular Value Decomposition(SVD) is a factorization of a real or complex matrix. Formally, the singular value decomposition of an mn matrix M is a factorization of the form

M=UV*

where U is a mm unitary matrix, is an mn rectangular diagonal matrix with nonnegative real numbers on the diagonal, and V*(the conjugate transpose of V: (V*)ij=Vji, for real matrix, it equals the transpose) is an nn unitary matrix.

A complex square matrix U is unitary if U*U=UU*=I.

The diagonal entries ij of are known as the singular values of M, which means they are the square roots of the eigenvalues of matrix MM*. The m columns of U and n columns of V are called the left-singular vectors and right-singular vectors of M, respectively.

The SVD and the eigendecomposition are closely related:
- The left-singular vectors of M(columns of U) are eigenvectors of MM*.
- The right-singular vectors of M(columns of V) are eigenvectors of M*M.
- The non-zero singular values of M(diagonal entries of ) are the square roots of the non-zero eigenvalues of both M*M and MM*.

References:

U Toronto CSC2535: http://www.cs.toronto.edu/~hinton/csc2535/lectures.html

Notes of Advanced Machine Learning(I)

Probabilistic Model

Directed Graphs

Markov Random Fields

Singular Value Decomposition