From e9795c6b134eed858ddb73c036ff5c941d7e9838 Mon Sep 17 00:00:00 2001
From: Yuchen Pei <me@ypei.me>
Date: Fri, 18 Jun 2021 17:47:12 +1000
Subject: Updated.

---
 posts/2019-02-14-raise-your-elbo.org | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

(limited to 'posts/2019-02-14-raise-your-elbo.org')

diff --git a/posts/2019-02-14-raise-your-elbo.org b/posts/2019-02-14-raise-your-elbo.org
index 9e15552..f0de7d1 100644
--- a/posts/2019-02-14-raise-your-elbo.org
+++ b/posts/2019-02-14-raise-your-elbo.org
@@ -47,6 +47,7 @@ under CC BY-SA and GNU FDL./
 ** KL divergence and ELBO
    :PROPERTIES:
    :CUSTOM_ID: kl-divergence-and-elbo
+   :ID:       2bb0d405-f6b4-483f-9f2d-c0e945faa3ac
    :END:
 Let $p$ and $q$ be two probability measures. The Kullback-Leibler (KL)
 divergence is defined as
@@ -120,6 +121,7 @@ Bayesian version.
 ** EM
    :PROPERTIES:
    :CUSTOM_ID: em
+   :ID:       6d694b38-56c2-4e10-8a1f-1f82e309073f
    :END:
 To illustrate the EM algorithms, we first define the mixture model.
 
@@ -198,6 +200,7 @@ model is:
 *** GMM
     :PROPERTIES:
     :CUSTOM_ID: gmm
+    :ID:       5d5265f6-c2b9-42f1-a4a1-0d87417f0b02
     :END:
 Gaussian mixture model (GMM) is an example of mixture models.
 
@@ -240,6 +243,7 @@ $\epsilon I$ is called elliptical k-means algorithm.
 *** SMM
     :PROPERTIES:
     :CUSTOM_ID: smm
+    :ID:       f4b3a462-8ae7-44f2-813c-58b007eaa047
     :END:
 As a transition to the next models to study, let us consider a simpler
 mixture model obtained by making one modification to GMM: change
@@ -275,6 +279,7 @@ Dirichlet allocation (LDA), not to be confused with the other LDA
 *** pLSA
     :PROPERTIES:
     :CUSTOM_ID: plsa
+    :ID:       d4f58158-dcb6-4ba1-a9e2-bf53bff6012e
     :END:
 The pLSA model (Hoffman 2000) is a mixture model, where the dataset is
 now pairs $(d_i, x_i)_{i = 1 : m}$. In natural language processing, $x$
@@ -294,6 +299,7 @@ corresponds to type 2.
 **** pLSA1
      :PROPERTIES:
      :CUSTOM_ID: plsa1
+     :ID:       969f470e-5bbe-464e-a3b7-f996c8f04de3
      :END:
 The pLSA1 model (Hoffman 2000) is basically SMM with $x_i$ substituted
 with $(d_i, x_i)$, which conditioned on $z_i$ are independently
@@ -340,6 +346,7 @@ dimensional embeddings $D_{u, \cdot}$ and $X_{w, \cdot}$.
 **** pLSA2
      :PROPERTIES:
      :CUSTOM_ID: plsa2
+     :ID:       eef3249a-c45d-4a07-876f-68b2a2e957e5
      :END:
 Let us turn to pLSA2 (Hoffman 2004), corresponding to (2.92). We rewrite
 it as
@@ -392,6 +399,7 @@ $$\begin{aligned}
 *** HMM
     :PROPERTIES:
     :CUSTOM_ID: hmm
+    :ID:       16d00eda-7136-49f5-8427-c775c7a91317
     :END:
 The hidden markov model (HMM) is a sequential version of SMM, in the
 same sense that recurrent neural networks are sequential versions of
@@ -518,6 +526,7 @@ as ${(7) \over (8)}$ and ${(9) \over (8)}$ respectively.
 ** Fully Bayesian EM / MFA
    :PROPERTIES:
    :CUSTOM_ID: fully-bayesian-em-mfa
+   :ID:       77f1d7ae-3785-45d4-b88f-18478e41f3b9
    :END:
 Let us now venture into the realm of full Bayesian.
 
@@ -567,6 +576,7 @@ e.g. Section 10.1 of Bishop 2006.
 *** Application to mixture models
     :PROPERTIES:
     :CUSTOM_ID: application-to-mixture-models
+    :ID:       52bf6025-1180-44dc-8272-e6af6e228bf3
     :END:
 *Definition (Fully Bayesian mixture model)*. The relations between
 $\pi$, $\eta$, $x$, $z$ are the same as in the definition of mixture
@@ -658,6 +668,7 @@ until convergence.
 *** Fully Bayesian GMM
     :PROPERTIES:
     :CUSTOM_ID: fully-bayesian-gmm
+    :ID:       814289c0-2527-42a0-914b-d64ad62ecd05
     :END:
 A typical example of fully Bayesian mixture models is the fully Bayesian
 Gaussian mixture model (Attias 2000, also called variational GMM in the
@@ -684,6 +695,7 @@ Chapter 10.2 of Bishop 2006 or Attias 2000.
 *** LDA
     :PROPERTIES:
     :CUSTOM_ID: lda
+    :ID:       7d752891-ef33-4b58-9dc3-d6a61325bfa6
     :END:
 As the second example of fully Bayesian mixture models, Latent Dirichlet
 allocation (LDA) (Blei-Ng-Jordan 2003) is the fully Bayesian version of
@@ -747,6 +759,7 @@ So the algorithm iterates over (10) and (11)(12) until convergence.
 *** DPMM
     :PROPERTIES:
     :CUSTOM_ID: dpmm
+    :ID:       187cb168-b3f8-428e-962a-80ad5966f844
     :END:
 The Dirichlet process mixture model (DPMM) is like the fully Bayesian
 mixture model except $n_z = \infty$, i.e. $z$ can take any positive
@@ -900,6 +913,7 @@ $$\begin{aligned}
 ** SVI
    :PROPERTIES:
    :CUSTOM_ID: svi
+   :ID:       47efee6c-67ac-44eb-92fb-4d576ae2ec99
    :END:
 In variational inference, the computation of some parameters are more
 expensive than others.
@@ -969,6 +983,7 @@ for some $\kappa \in (.5, 1]$ and $\tau \ge 0$.
 ** AEVB
    :PROPERTIES:
    :CUSTOM_ID: aevb
+   :ID:       a196df8f-1574-4390-83a4-dd22d8fcecaf
    :END:
 SVI adds to variational inference stochastic updates similar to
 stochastic gradient descent. Why not just use neural networks with
@@ -1048,6 +1063,7 @@ approximation of $U(x, \phi, \theta)$ itself can be done similarly.
 *** VAE
     :PROPERTIES:
     :CUSTOM_ID: vae
+    :ID:       59e07ae5-a4d3-4b95-949f-0b4348f2b70b
     :END:
 As an example of AEVB, the paper introduces variational autoencoder
 (VAE), with the following instantiations:
@@ -1069,6 +1085,7 @@ With this, one can use backprop to maximise the ELBO.
 *** Fully Bayesian AEVB
     :PROPERTIES:
     :CUSTOM_ID: fully-bayesian-aevb
+    :ID:       0fb4f75b-4b62-440f-adc7-996b2d7f718a
     :END:
 Let us turn to fully Bayesian version of AEVB. Again, we first recall
 the ELBO of the fully Bayesian mixture models:
@@ -1117,6 +1134,7 @@ Again, one may use Monte-Carlo to approximate this expetation.
 ** References
    :PROPERTIES:
    :CUSTOM_ID: references
+   :ID:       df1567c9-b0e1-499f-a9d1-c0c915b2b98d
    :END:
 
 - Attias, Hagai. "A variational baysian framework for graphical models."
-- 
cgit v1.2.3