Updated.

author: Yuchen Pei <me@ypei.me> 2021-06-18 17:47:12 +1000
committer: Yuchen Pei <me@ypei.me> 2021-06-18 17:47:12 +1000
commit: e9795c6b134eed858ddb73c036ff5c941d7e9838 (patch)
tree: 8749a5460dc81658b6016f8d08c0a129e60cef0e /posts
parent: 0663174364fef45d3985019b4f98375b4195bb0f (diff)
7 files changed, 57 insertions, 1 deletions
diff --git a/posts/2017-08-07-mathematical_bazaar.org b/posts/2017-08-07-mathematical_bazaar.org
index 64bf335..11aa727 100644
--- a/posts/2017-08-07-mathematical_bazaar.org
+++ b/posts/2017-08-07-mathematical_bazaar.org
@@ -23,6 +23,7 @@ Before I start I should point out that
 ** problems of academia
    :PROPERTIES:
    :CUSTOM_ID: problems-of-academia
+   :ID:       2d61ed4d-abb3-415a-b21a-e051efe18499
    :END:
 Open source projects are characterised by publicly available source
 codes as well as open invitations for public collaborations, whereas
@@ -110,6 +111,7 @@ source collaboration works on Github.
 ** open source collaborations on Github
    :PROPERTIES:
    :CUSTOM_ID: open-source-collaborations-on-github
+   :ID:       7d634bb9-9e64-4e71-a84f-f09b18a6fa1d
    :END:
 On [[https://github.com][Github]], every project is publicly available
 in a repository (we do not consider private repos). The owner can update
@@ -153,6 +155,7 @@ mention them by =@someone=.
 ** open research in mathematics
    :PROPERTIES:
    :CUSTOM_ID: open-research-in-mathematics
+   :ID:       70f64ecd-f261-4ce0-9e1d-39713cd789d6
    :END:
 All this points to a promising direction of open research. A maths
 project may have a wiki / collection of notes, the paper being written,
@@ -200,6 +203,7 @@ advantages:
 ** related readings
    :PROPERTIES:
    :CUSTOM_ID: related-readings
+   :ID:       d2e4c2b9-a325-42a8-898c-bdf2588a6b0f
    :END:
 
 - [[http://www.catb.org/esr/writings/cathedral-bazaar/][The Cathedral
diff --git a/posts/2018-04-10-update-open-research.org b/posts/2018-04-10-update-open-research.org
index 4b078d5..7cc1781 100644
--- a/posts/2018-04-10-update-open-research.org
+++ b/posts/2018-04-10-update-open-research.org
@@ -45,6 +45,7 @@ research.
 *** Freedom and community
     :PROPERTIES:
     :CUSTOM_ID: freedom-and-community
+    :ID:       f0ec8170-e86e-49c6-bd34-3904f31317eb
     :END:
 Ideals matter. Stallman's struggles stemmed from the frustration of
 denied request of source code (a frustration I shared in academia except
@@ -130,6 +131,7 @@ community:
 *** Tools for open research
     :PROPERTIES:
     :CUSTOM_ID: tools-for-open-research
+    :ID:       07f852e1-c95d-407f-972b-8395ac7903a4
     :END:
 The open research workshop revolved around how to lead academia towards
 a more open culture. There were discussions on open research tools,
@@ -174,6 +176,7 @@ afterwards I was also made aware by more of them, like the following:
 *** An anecdote from the workshop
     :PROPERTIES:
     :CUSTOM_ID: an-anecdote-from-the-workshop
+    :ID:       857899dd-f3f8-4eac-a14a-779604066da4
     :END:
 In a conversation during the workshop, one of the participants called
 open science "normal science", because reproducibility, open access,
diff --git a/posts/2018-12-02-lime-shapley.org b/posts/2018-12-02-lime-shapley.org
index 05ef4ee..0eaacf2 100644
--- a/posts/2018-12-02-lime-shapley.org
+++ b/posts/2018-12-02-lime-shapley.org
@@ -17,6 +17,7 @@ under CC BY-SA and GNU FDL./
 ** Shapley values
    :PROPERTIES:
    :CUSTOM_ID: shapley-values
+   :ID:       1f4de00d-669e-421a-bd48-f4f1da8400f3
    :END:
 A coalitional game $(v, N)$ of $n$ players involves
 
@@ -69,6 +70,7 @@ together with $v(\emptyset) = 0$ defines the Shapley values.
 ** LIME
    :PROPERTIES:
    :CUSTOM_ID: lime
+   :ID:       b34aabbc-2f50-4969-bf7b-87678fd577e6
    :END:
 LIME (Ribeiro et. al. 2016) is a model that offers a way to explain
 feature contributions of supervised learning models locally.
@@ -139,6 +141,7 @@ of dimension $n$.
 ** Shapley values and LIME
    :PROPERTIES:
    :CUSTOM_ID: shapley-values-and-lime
+   :ID:       bb960a6a-be79-44ff-968f-000990673738
    :END:
 The connection between the Shapley values and LIME is noted in
 Lundberg-Lee (2017), but the underlying connection goes back to 1988
@@ -243,6 +246,7 @@ Plugging this back into (6) we get the desired result. $\square$
 ** SHAP
    :PROPERTIES:
    :CUSTOM_ID: shap
+   :ID:       11af7fd2-a4f9-42da-a275-5d15cde0ca48
    :END:
 The paper that coined the term "SHAP values" (Lundberg-Lee 2017) is not
 clear in its definition of the "SHAP values" and its relation to LIME,
@@ -304,6 +308,7 @@ general, how do we evaluate models of interpretation?
 ** Evaluating SHAP
    :PROPERTIES:
    :CUSTOM_ID: evaluating-shap
+   :ID:       a7249300-814f-499e-b808-c66a17a4d32e
    :END:
 The quest of the SHAP paper can be decoupled into two independent
 components: showing the niceties of Shapley values and choosing the
@@ -332,6 +337,7 @@ justifications of such choices.
 ** References
    :PROPERTIES:
    :CUSTOM_ID: references
+   :ID:       7ba9546e-2485-43e4-be48-16762fc447b2
    :END:
 
 - Charnes, A., B. Golany, M. Keane, and J. Rousseau. "Extremal Principle
diff --git a/posts/2019-01-03-discriminant-analysis.org b/posts/2019-01-03-discriminant-analysis.org
index 34c16bf..a0ada73 100644
--- a/posts/2019-01-03-discriminant-analysis.org
+++ b/posts/2019-01-03-discriminant-analysis.org
@@ -23,6 +23,7 @@ under CC BY-SA and GNU FDL./
 ** Theory
    :PROPERTIES:
    :CUSTOM_ID: theory
+   :ID:       69be3baf-7f60-42f2-9184-ee8840eea554
    :END:
 Quadratic discriminant analysis (QDA) is a classical classification
 algorithm. It assumes that the data is generated by Gaussian
@@ -69,6 +70,7 @@ be independent.
 *** QDA
     :PROPERTIES:
     :CUSTOM_ID: qda
+    :ID:       f6e95892-01cf-4569-b01e-22ed238d0577
     :END:
 We look at QDA.
 
@@ -94,6 +96,7 @@ sample for each class.
 *** Vanilla LDA
     :PROPERTIES:
     :CUSTOM_ID: vanilla-lda
+    :ID:       5a6ca0ca-f385-4054-9b19-9cac69b1a59a
     :END:
 Now let us look at LDA.
 
@@ -127,6 +130,7 @@ nearest neighbour classifier.
 *** Nearest neighbour classifier
     :PROPERTIES:
     :CUSTOM_ID: nearest-neighbour-classifier
+    :ID:       8880764c-6fbe-4023-97dd-9711c7c50ea9
     :END:
 More specifically, we want to transform the first term of (0) to a norm
 to get a classifier based on nearest neighbour modulo $\log \pi_i$:
@@ -160,6 +164,7 @@ $A \mu_i$ (again, modulo $\log \pi_i$) and label the input with $i$.
 *** Dimensionality reduction
     :PROPERTIES:
     :CUSTOM_ID: dimensionality-reduction
+    :ID:       70e1afc1-9c45-4a35-a842-48573e077b36
     :END:
 We can further simplify the prediction by dimensionality reduction.
 Assume $n_c \le n$. Then the centroid spans an affine space of dimension
@@ -195,6 +200,7 @@ words, the prediction does not change regardless of =n_components=.
 *** Fisher discriminant analysis
     :PROPERTIES:
     :CUSTOM_ID: fisher-discriminant-analysis
+    :ID:       05ff25da-8c52-4f20-a0ac-4422f19e10ce
     :END:
 The Fisher discriminant analysis involves finding an $n$-dimensional
 vector $a$ that maximises between-class covariance with respect to
@@ -232,6 +238,7 @@ $a = c V_x D_x^{-1} V_m$ with $p = 1$.
 *** Linear model
     :PROPERTIES:
     :CUSTOM_ID: linear-model
+    :ID:       feb827b6-0064-4192-b96b-86a942c8839e
     :END:
 The model is called linear discriminant analysis because it is a linear
 model. To see this, let $B = V_m^T D_x^{-1} V_x^T$ be the matrix of
@@ -256,6 +263,7 @@ This is how scikit-learn implements LDA, by inheriting from
 ** Implementation
    :PROPERTIES:
    :CUSTOM_ID: implementation
+   :ID:       b567283c-20ee-41a8-8216-7392066a5ac5
    :END:
 This is where things get interesting. How do I validate my understanding
 of the theory? By implementing and testing the algorithm.
@@ -279,6 +287,7 @@ The result is
 *** Fun facts about LDA
     :PROPERTIES:
     :CUSTOM_ID: fun-facts-about-lda
+    :ID:       f1d47f43-27f6-49dd-bd0d-2e685c38e241
     :END:
 One property that can be used to test the LDA implementation is the fact
 that the scatter matrix $B(X - \bar x)^T (X - \bar X) B^T$ of the
diff --git a/posts/2019-02-14-raise-your-elbo.org b/posts/2019-02-14-raise-your-elbo.org
index 9e15552..f0de7d1 100644
--- a/posts/2019-02-14-raise-your-elbo.org
+++ b/posts/2019-02-14-raise-your-elbo.org
@@ -47,6 +47,7 @@ under CC BY-SA and GNU FDL./
 ** KL divergence and ELBO
    :PROPERTIES:
    :CUSTOM_ID: kl-divergence-and-elbo
+   :ID:       2bb0d405-f6b4-483f-9f2d-c0e945faa3ac
    :END:
 Let $p$ and $q$ be two probability measures. The Kullback-Leibler (KL)
 divergence is defined as
@@ -120,6 +121,7 @@ Bayesian version.
 ** EM
    :PROPERTIES:
    :CUSTOM_ID: em
+   :ID:       6d694b38-56c2-4e10-8a1f-1f82e309073f
    :END:
 To illustrate the EM algorithms, we first define the mixture model.
 
@@ -198,6 +200,7 @@ model is:
 *** GMM
     :PROPERTIES:
     :CUSTOM_ID: gmm
+    :ID:       5d5265f6-c2b9-42f1-a4a1-0d87417f0b02
     :END:
 Gaussian mixture model (GMM) is an example of mixture models.
 
@@ -240,6 +243,7 @@ $\epsilon I$ is called elliptical k-means algorithm.
 *** SMM
     :PROPERTIES:
     :CUSTOM_ID: smm
+    :ID:       f4b3a462-8ae7-44f2-813c-58b007eaa047
     :END:
 As a transition to the next models to study, let us consider a simpler
 mixture model obtained by making one modification to GMM: change
@@ -275,6 +279,7 @@ Dirichlet allocation (LDA), not to be confused with the other LDA
 *** pLSA
     :PROPERTIES:
     :CUSTOM_ID: plsa
+    :ID:       d4f58158-dcb6-4ba1-a9e2-bf53bff6012e
     :END:
 The pLSA model (Hoffman 2000) is a mixture model, where the dataset is
 now pairs $(d_i, x_i)_{i = 1 : m}$. In natural language processing, $x$
@@ -294,6 +299,7 @@ corresponds to type 2.
 **** pLSA1
      :PROPERTIES:
      :CUSTOM_ID: plsa1
+     :ID:       969f470e-5bbe-464e-a3b7-f996c8f04de3
      :END:
 The pLSA1 model (Hoffman 2000) is basically SMM with $x_i$ substituted
 with $(d_i, x_i)$, which conditioned on $z_i$ are independently
@@ -340,6 +346,7 @@ dimensional embeddings $D_{u, \cdot}$ and $X_{w, \cdot}$.
 **** pLSA2
      :PROPERTIES:
      :CUSTOM_ID: plsa2
+     :ID:       eef3249a-c45d-4a07-876f-68b2a2e957e5
      :END:
 Let us turn to pLSA2 (Hoffman 2004), corresponding to (2.92). We rewrite
 it as
@@ -392,6 +399,7 @@ $$\begin{aligned}
 *** HMM
     :PROPERTIES:
     :CUSTOM_ID: hmm
+    :ID:       16d00eda-7136-49f5-8427-c775c7a91317
     :END:
 The hidden markov model (HMM) is a sequential version of SMM, in the
 same sense that recurrent neural networks are sequential versions of
@@ -518,6 +526,7 @@ as ${(7) \over (8)}$ and ${(9) \over (8)}$ respectively.
 ** Fully Bayesian EM / MFA
    :PROPERTIES:
    :CUSTOM_ID: fully-bayesian-em-mfa
+   :ID:       77f1d7ae-3785-45d4-b88f-18478e41f3b9
    :END:
 Let us now venture into the realm of full Bayesian.
 
@@ -567,6 +576,7 @@ e.g. Section 10.1 of Bishop 2006.
 *** Application to mixture models
     :PROPERTIES:
     :CUSTOM_ID: application-to-mixture-models
+    :ID:       52bf6025-1180-44dc-8272-e6af6e228bf3
     :END:
 *Definition (Fully Bayesian mixture model)*. The relations between
 $\pi$, $\eta$, $x$, $z$ are the same as in the definition of mixture
@@ -658,6 +668,7 @@ until convergence.
 *** Fully Bayesian GMM
     :PROPERTIES:
     :CUSTOM_ID: fully-bayesian-gmm
+    :ID:       814289c0-2527-42a0-914b-d64ad62ecd05
     :END:
 A typical example of fully Bayesian mixture models is the fully Bayesian
 Gaussian mixture model (Attias 2000, also called variational GMM in the
@@ -684,6 +695,7 @@ Chapter 10.2 of Bishop 2006 or Attias 2000.
 *** LDA
     :PROPERTIES:
     :CUSTOM_ID: lda
+    :ID:       7d752891-ef33-4b58-9dc3-d6a61325bfa6
     :END:
 As the second example of fully Bayesian mixture models, Latent Dirichlet
 allocation (LDA) (Blei-Ng-Jordan 2003) is the fully Bayesian version of
@@ -747,6 +759,7 @@ So the algorithm iterates over (10) and (11)(12) until convergence.
 *** DPMM
     :PROPERTIES:
     :CUSTOM_ID: dpmm
+    :ID:       187cb168-b3f8-428e-962a-80ad5966f844
     :END:
 The Dirichlet process mixture model (DPMM) is like the fully Bayesian
 mixture model except $n_z = \infty$, i.e. $z$ can take any positive
@@ -900,6 +913,7 @@ $$\begin{aligned}
 ** SVI
    :PROPERTIES:
    :CUSTOM_ID: svi
+   :ID:       47efee6c-67ac-44eb-92fb-4d576ae2ec99
    :END:
 In variational inference, the computation of some parameters are more
 expensive than others.
@@ -969,6 +983,7 @@ for some $\kappa \in (.5, 1]$ and $\tau \ge 0$.
 ** AEVB
    :PROPERTIES:
    :CUSTOM_ID: aevb
+   :ID:       a196df8f-1574-4390-83a4-dd22d8fcecaf
    :END:
 SVI adds to variational inference stochastic updates similar to
 stochastic gradient descent. Why not just use neural networks with
@@ -1048,6 +1063,7 @@ approximation of $U(x, \phi, \theta)$ itself can be done similarly.
 *** VAE
     :PROPERTIES:
     :CUSTOM_ID: vae
+    :ID:       59e07ae5-a4d3-4b95-949f-0b4348f2b70b
     :END:
 As an example of AEVB, the paper introduces variational autoencoder
 (VAE), with the following instantiations:
@@ -1069,6 +1085,7 @@ With this, one can use backprop to maximise the ELBO.
 *** Fully Bayesian AEVB
     :PROPERTIES:
     :CUSTOM_ID: fully-bayesian-aevb
+    :ID:       0fb4f75b-4b62-440f-adc7-996b2d7f718a
     :END:
 Let us turn to fully Bayesian version of AEVB. Again, we first recall
 the ELBO of the fully Bayesian mixture models:
@@ -1117,6 +1134,7 @@ Again, one may use Monte-Carlo to approximate this expetation.
 ** References
    :PROPERTIES:
    :CUSTOM_ID: references
+   :ID:       df1567c9-b0e1-499f-a9d1-c0c915b2b98d
    :END:
 
 - Attias, Hagai. "A variational baysian framework for graphical models."
diff --git a/posts/2019-03-13-a-tail-of-two-densities.org b/posts/2019-03-13-a-tail-of-two-densities.org
index 783e0c5..f1b6b15 100644
--- a/posts/2019-03-13-a-tail-of-two-densities.org
+++ b/posts/2019-03-13-a-tail-of-two-densities.org
@@ -80,6 +80,7 @@ BY-SA]] and [[https://www.gnu.org/licenses/fdl.html][GNU FDL]]./
 ** The gist of differential privacy
    :PROPERTIES:
    :CUSTOM_ID: the-gist-of-differential-privacy
+   :ID:       91bf2eb5-8509-4180-b471-939280dc1438
    :END:
 If you only have one minute, here is what differential privacy is about:
 
@@ -118,6 +119,7 @@ Now if you have an hour...
 ** \(\epsilon\)-dp
    :PROPERTIES:
    :CUSTOM_ID: epsilon-dp
+   :ID:       d29da3db-8b9a-4bad-811e-4af1cd9f856d
    :END:
 *Definition (Mechanisms)*. Let \(X\) be a space with a metric
 \(d: X \times X \to \mathbb N\). A /mechanism/ \(M\) is a function that
@@ -188,6 +190,7 @@ where in the last step we use the condition (1.5). \(\square\)
 ** Approximate differential privacy
    :PROPERTIES:
    :CUSTOM_ID: approximate-differential-privacy
+   :ID:       c48c68f8-d749-47f8-b5de-c92cc53f8cea
    :END:
 Unfortunately, \(\epsilon\)-dp does not apply to the most commonly used
 noise, the Gaussian noise. To fix this, we need to relax the definition
@@ -205,6 +208,7 @@ if \(\delta < 1\).
 *** Indistinguishability
     :PROPERTIES:
     :CUSTOM_ID: indistinguishability
+    :ID:       7875ad81-326b-4eaa-a3ae-9e09df96ea1b
     :END:
 To understand \((\epsilon, \delta)\)-dp, it is helpful to study
 \((\epsilon, \delta)\)-indistinguishability.
@@ -535,6 +539,7 @@ The rest of the proof is almost the same as the proof of Claim 4.
 *** Back to approximate differential privacy
     :PROPERTIES:
     :CUSTOM_ID: back-to-approximate-differential-privacy
+    :ID:       706c037d-ea44-4ade-8007-7f1f41d394e8
     :END:
 By Claim 0 and 1 we have
 
@@ -741,6 +746,7 @@ proof of Theorem A.1.
 ** Composition theorems
    :PROPERTIES:
    :CUSTOM_ID: composition-theorems
+   :ID:       b672a060-d886-4f07-92d2-1d92f5f4c0c8
    :END:
 So far we have seen how a mechanism made of a single query plus a noise
 can be proved to be differentially private. But we need to understand
@@ -1108,6 +1114,7 @@ The rest is the same as in the proof of Claim 17. \(\square\)
 ** Subsampling
    :PROPERTIES:
    :CUSTOM_ID: subsampling
+   :ID:       eeda51d4-9370-49c6-9710-9c9ab88f91e2
    :END:
 Stochastic gradient descent is like gradient descent, but with random
 subsampling.
@@ -1257,6 +1264,7 @@ guarantee for DP-SGD, among other things.
 ** References
    :PROPERTIES:
    :CUSTOM_ID: references
+   :ID:       65686625-6bd1-4e42-b00d-5f1744945884
    :END:
 
 - Abadi, Martín, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya
diff --git a/posts/2019-03-14-great-but-manageable-expectations.org b/posts/2019-03-14-great-but-manageable-expectations.org
index 6438090..68e757a 100644
--- a/posts/2019-03-14-great-but-manageable-expectations.org
+++ b/posts/2019-03-14-great-but-manageable-expectations.org
@@ -26,11 +26,12 @@ privacy guarantees for composed subsampling queries in general, and for
 DP-SGD in particular. I also compare these privacy guarantees.
 
 /If you are confused by any notations, ask me or try
-[[/notations.html][this]]./
+[[file:/notations.html][this]]./
 
 ** Rényi divergence and differential privacy
    :PROPERTIES:
    :CUSTOM_ID: rényi-divergence-and-differential-privacy
+   :ID:       d1763dea-5e8f-4393-8f14-1d781147dcb5
    :END:
 Recall in the proof of Gaussian mechanism privacy guarantee (Claim 8) we
 used the Chernoff bound for the Gaussian noise. Why not use the Chernoff
@@ -161,6 +162,7 @@ considering Rényi dp.
 *** Moment Composition
     :PROPERTIES:
     :CUSTOM_ID: moment-composition
+    :ID:       d5e94e5a-236d-4c41-96a4-4a93341f249a
     :END:
 *Claim 22 (Moment Composition Theorem)*. Let \(M\) be the adaptive
 composition of \(M_{1 : k}\). Suppose for any \(y_{< i}\), \(M_i(y_{< i})\) is
@@ -228,6 +230,7 @@ the Advanced Composition Theorem (Claim 18).
 *** Subsampling
     :PROPERTIES:
     :CUSTOM_ID: subsampling
+    :ID:       25cd27ac-fcb6-462f-9a3b-da861124d7b2
     :END:
 We also have a subsampling theorem for the Rényi dp.
 
@@ -330,6 +333,7 @@ assumptions.
 ** ACGMMTZ16
    :PROPERTIES:
    :CUSTOM_ID: acgmmtz16
+   :ID:       8b85cce3-01ad-4404-80c0-b73076d183a9
    :END:
 What follows is my understanding of this result. I call it a conjecture
 because there is a gap which I am not able to reproduce their proof or
@@ -597,6 +601,7 @@ true, for the following reasons:
 ** Tensorflow implementation
    :PROPERTIES:
    :CUSTOM_ID: tensorflow-implementation
+   :ID:       f856ad67-4f78-46b4-8c98-fda07a0dc670
    :END:
 The DP-SGD is implemented in
 [[https://github.com/tensorflow/privacy][TensorFlow Privacy]]. In the
@@ -650,6 +655,7 @@ automatically computed given a DP-SGD instance.
 ** Comparison among different methods
    :PROPERTIES:
    :CUSTOM_ID: comparison-among-different-methods
+   :ID:       30502f53-d9ba-48ea-868a-dd4db995a6d4
    :END:
 So far we have seen three routes to compute the privacy guarantees for
 DP-SGD with the Gaussian mechanism:
@@ -795,6 +801,7 @@ achieve the result in Route 3.
 ** Further questions
    :PROPERTIES:
    :CUSTOM_ID: further-questions
+   :ID:       277e8a8c-cc34-4ba9-84fb-d8950f6dc9de
    :END:
 Here is a list of what I think may be interesting topics or potential
 problems to look at, with no guarantee that they are all awesome
@@ -816,6 +823,7 @@ untouched research problems:
 ** References
    :PROPERTIES:
    :CUSTOM_ID: references
+   :ID:       708aa715-dc2c-49ac-b7bb-f85ac168d8b3
    :END:
 
 - Abadi, Martín, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya
author	Yuchen Pei <me@ypei.me>	2021-06-18 17:47:12 +1000
committer	Yuchen Pei <me@ypei.me>	2021-06-18 17:47:12 +1000
commit	e9795c6b134eed858ddb73c036ff5c941d7e9838 (patch)
tree	8749a5460dc81658b6016f8d08c0a129e60cef0e /posts
parent	0663174364fef45d3985019b4f98375b4195bb0f (diff)