aboutsummaryrefslogtreecommitdiff
path: root/posts/2019-02-14-raise-your-elbo.md
diff options
context:
space:
mode:
authorYuchen Pei <me@ypei.me>2019-02-18 10:12:40 +0100
committerYuchen Pei <me@ypei.me>2019-02-18 10:12:40 +0100
commit30936eea34d55dcd6ce09770dae9693f6759bb9a (patch)
tree4c6d784af475eed2fa9298f905145b1ede8fda14 /posts/2019-02-14-raise-your-elbo.md
parentb779a077058fd7fb45d57a1fd091fdb538f40128 (diff)
fixed some typos
Diffstat (limited to 'posts/2019-02-14-raise-your-elbo.md')
-rw-r--r--posts/2019-02-14-raise-your-elbo.md11
1 files changed, 6 insertions, 5 deletions
diff --git a/posts/2019-02-14-raise-your-elbo.md b/posts/2019-02-14-raise-your-elbo.md
index 5b789aa..0a3e6ed 100644
--- a/posts/2019-02-14-raise-your-elbo.md
+++ b/posts/2019-02-14-raise-your-elbo.md
@@ -38,7 +38,7 @@ slides](https://www.cs.tau.ac.il/~rshamir/algmb/presentations/EM-BW-Ron-16%20.pd
(clear explanations of the connection between EM and Baum-Welch),
Chapter 10 of [Bishop\'s
book](https://www.springer.com/us/book/9780387310732) (brilliant
-introduction to variational GMM) and Section 2.5 of [Sudderth\'s
+introduction to variational GMM), Section 2.5 of [Sudderth\'s
thesis](http://cs.brown.edu/~sudderth/papers/sudderthPhD.pdf) and
[metacademy](https://metacademy.org). Also thanks to Josef Lindman
Hörnlund for discussions. The research was done while working at KTH
@@ -77,7 +77,7 @@ To this end, we can simply discard $D(q || p)$ in (1) and obtain:
$$\log Z \ge L(w, q) \qquad (1.3)$$
-and keep in mind that the inequality becomes equality when
+and keep in mind that the inequality becomes an equality when
$q = {w \over Z}$.
It is time to define the task of variational inference (VI), also known
@@ -692,8 +692,9 @@ complicated, and we do not consider it this way here.
Plugging in (9.1) we obtain the updates at E-step
-$$r_{\ell i k} \propto \exp(\psi(\phi^{\pi_\ell}_k) + \psi(\phi^{\eta_k}_{x_{\ell i}}) - \psi(\sum_w \phi^{\eta_k}_w)). \qquad (10)$$
+$$r_{\ell i k} \propto \exp(\psi(\phi^{\pi_\ell}_k) + \psi(\phi^{\eta_k}_{x_{\ell i}}) - \psi(\sum_w \phi^{\eta_k}_w)), \qquad (10)$$
+where $\psi$ is the digamma function.
Similarly, plugging in (9.3)(9.7)(9.9), at M-step, we update the
posterior of $\pi$ and $\eta$:
@@ -747,7 +748,7 @@ Both terms are infinite series:
$$L(p, q) = \sum_{k = 1 : \infty} \mathbb E_{q(\theta_k)} \log {p(\theta_k) \over q(\theta_k)} + \sum_{i = 1 : m} \sum_{k = 1 : \infty} q(z_i = k) \mathbb E_{q(\theta)} \log {p(x_i, z_i = k | \theta) \over q(z_i = k)}.$$
-There are several solutions to deal with the infinities. One is to set
+There are several ways to deal with the infinities. One is to fix some level $T > 0$ and set
$v_T = 1$ almost surely (Blei-Jordan 2006). This effectively turns the
model into a finite one, and both terms become finite sums over
$k = 1 : T$.
@@ -911,7 +912,7 @@ As an example, here\'s SVI applied to LDA:
$(\phi^{\eta_k}_w)_{k = 1 : n_z, w = 1 : n_x}$:
$$\phi^{\eta_k}_w = (1 - \rho_t) \phi^{\eta_k}_w + \rho_t \tilde \phi^{\eta_k}_w$$
-6. Increment $t$ and go back to Step 1.
+6. Increment $t$ and go back to Step 2.
In the original paper, $\rho_t$ needs to satisfy some conditions that
guarantees convergence of the global parameters: