aboutsummaryrefslogtreecommitdiff
path: root/posts
diff options
context:
space:
mode:
Diffstat (limited to 'posts')
-rw-r--r--posts/2017-08-07-mathematical_bazaar.org4
-rw-r--r--posts/2018-04-10-update-open-research.org3
-rw-r--r--posts/2018-12-02-lime-shapley.org6
-rw-r--r--posts/2019-01-03-discriminant-analysis.org9
-rw-r--r--posts/2019-02-14-raise-your-elbo.org18
-rw-r--r--posts/2019-03-13-a-tail-of-two-densities.org8
-rw-r--r--posts/2019-03-14-great-but-manageable-expectations.org10
7 files changed, 57 insertions, 1 deletions
diff --git a/posts/2017-08-07-mathematical_bazaar.org b/posts/2017-08-07-mathematical_bazaar.org
index 64bf335..11aa727 100644
--- a/posts/2017-08-07-mathematical_bazaar.org
+++ b/posts/2017-08-07-mathematical_bazaar.org
@@ -23,6 +23,7 @@ Before I start I should point out that
** problems of academia
:PROPERTIES:
:CUSTOM_ID: problems-of-academia
+ :ID: 2d61ed4d-abb3-415a-b21a-e051efe18499
:END:
Open source projects are characterised by publicly available source
codes as well as open invitations for public collaborations, whereas
@@ -110,6 +111,7 @@ source collaboration works on Github.
** open source collaborations on Github
:PROPERTIES:
:CUSTOM_ID: open-source-collaborations-on-github
+ :ID: 7d634bb9-9e64-4e71-a84f-f09b18a6fa1d
:END:
On [[https://github.com][Github]], every project is publicly available
in a repository (we do not consider private repos). The owner can update
@@ -153,6 +155,7 @@ mention them by =@someone=.
** open research in mathematics
:PROPERTIES:
:CUSTOM_ID: open-research-in-mathematics
+ :ID: 70f64ecd-f261-4ce0-9e1d-39713cd789d6
:END:
All this points to a promising direction of open research. A maths
project may have a wiki / collection of notes, the paper being written,
@@ -200,6 +203,7 @@ advantages:
** related readings
:PROPERTIES:
:CUSTOM_ID: related-readings
+ :ID: d2e4c2b9-a325-42a8-898c-bdf2588a6b0f
:END:
- [[http://www.catb.org/esr/writings/cathedral-bazaar/][The Cathedral
diff --git a/posts/2018-04-10-update-open-research.org b/posts/2018-04-10-update-open-research.org
index 4b078d5..7cc1781 100644
--- a/posts/2018-04-10-update-open-research.org
+++ b/posts/2018-04-10-update-open-research.org
@@ -45,6 +45,7 @@ research.
*** Freedom and community
:PROPERTIES:
:CUSTOM_ID: freedom-and-community
+ :ID: f0ec8170-e86e-49c6-bd34-3904f31317eb
:END:
Ideals matter. Stallman's struggles stemmed from the frustration of
denied request of source code (a frustration I shared in academia except
@@ -130,6 +131,7 @@ community:
*** Tools for open research
:PROPERTIES:
:CUSTOM_ID: tools-for-open-research
+ :ID: 07f852e1-c95d-407f-972b-8395ac7903a4
:END:
The open research workshop revolved around how to lead academia towards
a more open culture. There were discussions on open research tools,
@@ -174,6 +176,7 @@ afterwards I was also made aware by more of them, like the following:
*** An anecdote from the workshop
:PROPERTIES:
:CUSTOM_ID: an-anecdote-from-the-workshop
+ :ID: 857899dd-f3f8-4eac-a14a-779604066da4
:END:
In a conversation during the workshop, one of the participants called
open science "normal science", because reproducibility, open access,
diff --git a/posts/2018-12-02-lime-shapley.org b/posts/2018-12-02-lime-shapley.org
index 05ef4ee..0eaacf2 100644
--- a/posts/2018-12-02-lime-shapley.org
+++ b/posts/2018-12-02-lime-shapley.org
@@ -17,6 +17,7 @@ under CC BY-SA and GNU FDL./
** Shapley values
:PROPERTIES:
:CUSTOM_ID: shapley-values
+ :ID: 1f4de00d-669e-421a-bd48-f4f1da8400f3
:END:
A coalitional game $(v, N)$ of $n$ players involves
@@ -69,6 +70,7 @@ together with $v(\emptyset) = 0$ defines the Shapley values.
** LIME
:PROPERTIES:
:CUSTOM_ID: lime
+ :ID: b34aabbc-2f50-4969-bf7b-87678fd577e6
:END:
LIME (Ribeiro et. al. 2016) is a model that offers a way to explain
feature contributions of supervised learning models locally.
@@ -139,6 +141,7 @@ of dimension $n$.
** Shapley values and LIME
:PROPERTIES:
:CUSTOM_ID: shapley-values-and-lime
+ :ID: bb960a6a-be79-44ff-968f-000990673738
:END:
The connection between the Shapley values and LIME is noted in
Lundberg-Lee (2017), but the underlying connection goes back to 1988
@@ -243,6 +246,7 @@ Plugging this back into (6) we get the desired result. $\square$
** SHAP
:PROPERTIES:
:CUSTOM_ID: shap
+ :ID: 11af7fd2-a4f9-42da-a275-5d15cde0ca48
:END:
The paper that coined the term "SHAP values" (Lundberg-Lee 2017) is not
clear in its definition of the "SHAP values" and its relation to LIME,
@@ -304,6 +308,7 @@ general, how do we evaluate models of interpretation?
** Evaluating SHAP
:PROPERTIES:
:CUSTOM_ID: evaluating-shap
+ :ID: a7249300-814f-499e-b808-c66a17a4d32e
:END:
The quest of the SHAP paper can be decoupled into two independent
components: showing the niceties of Shapley values and choosing the
@@ -332,6 +337,7 @@ justifications of such choices.
** References
:PROPERTIES:
:CUSTOM_ID: references
+ :ID: 7ba9546e-2485-43e4-be48-16762fc447b2
:END:
- Charnes, A., B. Golany, M. Keane, and J. Rousseau. "Extremal Principle
diff --git a/posts/2019-01-03-discriminant-analysis.org b/posts/2019-01-03-discriminant-analysis.org
index 34c16bf..a0ada73 100644
--- a/posts/2019-01-03-discriminant-analysis.org
+++ b/posts/2019-01-03-discriminant-analysis.org
@@ -23,6 +23,7 @@ under CC BY-SA and GNU FDL./
** Theory
:PROPERTIES:
:CUSTOM_ID: theory
+ :ID: 69be3baf-7f60-42f2-9184-ee8840eea554
:END:
Quadratic discriminant analysis (QDA) is a classical classification
algorithm. It assumes that the data is generated by Gaussian
@@ -69,6 +70,7 @@ be independent.
*** QDA
:PROPERTIES:
:CUSTOM_ID: qda
+ :ID: f6e95892-01cf-4569-b01e-22ed238d0577
:END:
We look at QDA.
@@ -94,6 +96,7 @@ sample for each class.
*** Vanilla LDA
:PROPERTIES:
:CUSTOM_ID: vanilla-lda
+ :ID: 5a6ca0ca-f385-4054-9b19-9cac69b1a59a
:END:
Now let us look at LDA.
@@ -127,6 +130,7 @@ nearest neighbour classifier.
*** Nearest neighbour classifier
:PROPERTIES:
:CUSTOM_ID: nearest-neighbour-classifier
+ :ID: 8880764c-6fbe-4023-97dd-9711c7c50ea9
:END:
More specifically, we want to transform the first term of (0) to a norm
to get a classifier based on nearest neighbour modulo $\log \pi_i$:
@@ -160,6 +164,7 @@ $A \mu_i$ (again, modulo $\log \pi_i$) and label the input with $i$.
*** Dimensionality reduction
:PROPERTIES:
:CUSTOM_ID: dimensionality-reduction
+ :ID: 70e1afc1-9c45-4a35-a842-48573e077b36
:END:
We can further simplify the prediction by dimensionality reduction.
Assume $n_c \le n$. Then the centroid spans an affine space of dimension
@@ -195,6 +200,7 @@ words, the prediction does not change regardless of =n_components=.
*** Fisher discriminant analysis
:PROPERTIES:
:CUSTOM_ID: fisher-discriminant-analysis
+ :ID: 05ff25da-8c52-4f20-a0ac-4422f19e10ce
:END:
The Fisher discriminant analysis involves finding an $n$-dimensional
vector $a$ that maximises between-class covariance with respect to
@@ -232,6 +238,7 @@ $a = c V_x D_x^{-1} V_m$ with $p = 1$.
*** Linear model
:PROPERTIES:
:CUSTOM_ID: linear-model
+ :ID: feb827b6-0064-4192-b96b-86a942c8839e
:END:
The model is called linear discriminant analysis because it is a linear
model. To see this, let $B = V_m^T D_x^{-1} V_x^T$ be the matrix of
@@ -256,6 +263,7 @@ This is how scikit-learn implements LDA, by inheriting from
** Implementation
:PROPERTIES:
:CUSTOM_ID: implementation
+ :ID: b567283c-20ee-41a8-8216-7392066a5ac5
:END:
This is where things get interesting. How do I validate my understanding
of the theory? By implementing and testing the algorithm.
@@ -279,6 +287,7 @@ The result is
*** Fun facts about LDA
:PROPERTIES:
:CUSTOM_ID: fun-facts-about-lda
+ :ID: f1d47f43-27f6-49dd-bd0d-2e685c38e241
:END:
One property that can be used to test the LDA implementation is the fact
that the scatter matrix $B(X - \bar x)^T (X - \bar X) B^T$ of the
diff --git a/posts/2019-02-14-raise-your-elbo.org b/posts/2019-02-14-raise-your-elbo.org
index 9e15552..f0de7d1 100644
--- a/posts/2019-02-14-raise-your-elbo.org
+++ b/posts/2019-02-14-raise-your-elbo.org
@@ -47,6 +47,7 @@ under CC BY-SA and GNU FDL./
** KL divergence and ELBO
:PROPERTIES:
:CUSTOM_ID: kl-divergence-and-elbo
+ :ID: 2bb0d405-f6b4-483f-9f2d-c0e945faa3ac
:END:
Let $p$ and $q$ be two probability measures. The Kullback-Leibler (KL)
divergence is defined as
@@ -120,6 +121,7 @@ Bayesian version.
** EM
:PROPERTIES:
:CUSTOM_ID: em
+ :ID: 6d694b38-56c2-4e10-8a1f-1f82e309073f
:END:
To illustrate the EM algorithms, we first define the mixture model.
@@ -198,6 +200,7 @@ model is:
*** GMM
:PROPERTIES:
:CUSTOM_ID: gmm
+ :ID: 5d5265f6-c2b9-42f1-a4a1-0d87417f0b02
:END:
Gaussian mixture model (GMM) is an example of mixture models.
@@ -240,6 +243,7 @@ $\epsilon I$ is called elliptical k-means algorithm.
*** SMM
:PROPERTIES:
:CUSTOM_ID: smm
+ :ID: f4b3a462-8ae7-44f2-813c-58b007eaa047
:END:
As a transition to the next models to study, let us consider a simpler
mixture model obtained by making one modification to GMM: change
@@ -275,6 +279,7 @@ Dirichlet allocation (LDA), not to be confused with the other LDA
*** pLSA
:PROPERTIES:
:CUSTOM_ID: plsa
+ :ID: d4f58158-dcb6-4ba1-a9e2-bf53bff6012e
:END:
The pLSA model (Hoffman 2000) is a mixture model, where the dataset is
now pairs $(d_i, x_i)_{i = 1 : m}$. In natural language processing, $x$
@@ -294,6 +299,7 @@ corresponds to type 2.
**** pLSA1
:PROPERTIES:
:CUSTOM_ID: plsa1
+ :ID: 969f470e-5bbe-464e-a3b7-f996c8f04de3
:END:
The pLSA1 model (Hoffman 2000) is basically SMM with $x_i$ substituted
with $(d_i, x_i)$, which conditioned on $z_i$ are independently
@@ -340,6 +346,7 @@ dimensional embeddings $D_{u, \cdot}$ and $X_{w, \cdot}$.
**** pLSA2
:PROPERTIES:
:CUSTOM_ID: plsa2
+ :ID: eef3249a-c45d-4a07-876f-68b2a2e957e5
:END:
Let us turn to pLSA2 (Hoffman 2004), corresponding to (2.92). We rewrite
it as
@@ -392,6 +399,7 @@ $$\begin{aligned}
*** HMM
:PROPERTIES:
:CUSTOM_ID: hmm
+ :ID: 16d00eda-7136-49f5-8427-c775c7a91317
:END:
The hidden markov model (HMM) is a sequential version of SMM, in the
same sense that recurrent neural networks are sequential versions of
@@ -518,6 +526,7 @@ as ${(7) \over (8)}$ and ${(9) \over (8)}$ respectively.
** Fully Bayesian EM / MFA
:PROPERTIES:
:CUSTOM_ID: fully-bayesian-em-mfa
+ :ID: 77f1d7ae-3785-45d4-b88f-18478e41f3b9
:END:
Let us now venture into the realm of full Bayesian.
@@ -567,6 +576,7 @@ e.g. Section 10.1 of Bishop 2006.
*** Application to mixture models
:PROPERTIES:
:CUSTOM_ID: application-to-mixture-models
+ :ID: 52bf6025-1180-44dc-8272-e6af6e228bf3
:END:
*Definition (Fully Bayesian mixture model)*. The relations between
$\pi$, $\eta$, $x$, $z$ are the same as in the definition of mixture
@@ -658,6 +668,7 @@ until convergence.
*** Fully Bayesian GMM
:PROPERTIES:
:CUSTOM_ID: fully-bayesian-gmm
+ :ID: 814289c0-2527-42a0-914b-d64ad62ecd05
:END:
A typical example of fully Bayesian mixture models is the fully Bayesian
Gaussian mixture model (Attias 2000, also called variational GMM in the
@@ -684,6 +695,7 @@ Chapter 10.2 of Bishop 2006 or Attias 2000.
*** LDA
:PROPERTIES:
:CUSTOM_ID: lda
+ :ID: 7d752891-ef33-4b58-9dc3-d6a61325bfa6
:END:
As the second example of fully Bayesian mixture models, Latent Dirichlet
allocation (LDA) (Blei-Ng-Jordan 2003) is the fully Bayesian version of
@@ -747,6 +759,7 @@ So the algorithm iterates over (10) and (11)(12) until convergence.
*** DPMM
:PROPERTIES:
:CUSTOM_ID: dpmm
+ :ID: 187cb168-b3f8-428e-962a-80ad5966f844
:END:
The Dirichlet process mixture model (DPMM) is like the fully Bayesian
mixture model except $n_z = \infty$, i.e. $z$ can take any positive
@@ -900,6 +913,7 @@ $$\begin{aligned}
** SVI
:PROPERTIES:
:CUSTOM_ID: svi
+ :ID: 47efee6c-67ac-44eb-92fb-4d576ae2ec99
:END:
In variational inference, the computation of some parameters are more
expensive than others.
@@ -969,6 +983,7 @@ for some $\kappa \in (.5, 1]$ and $\tau \ge 0$.
** AEVB
:PROPERTIES:
:CUSTOM_ID: aevb
+ :ID: a196df8f-1574-4390-83a4-dd22d8fcecaf
:END:
SVI adds to variational inference stochastic updates similar to
stochastic gradient descent. Why not just use neural networks with
@@ -1048,6 +1063,7 @@ approximation of $U(x, \phi, \theta)$ itself can be done similarly.
*** VAE
:PROPERTIES:
:CUSTOM_ID: vae
+ :ID: 59e07ae5-a4d3-4b95-949f-0b4348f2b70b
:END:
As an example of AEVB, the paper introduces variational autoencoder
(VAE), with the following instantiations:
@@ -1069,6 +1085,7 @@ With this, one can use backprop to maximise the ELBO.
*** Fully Bayesian AEVB
:PROPERTIES:
:CUSTOM_ID: fully-bayesian-aevb
+ :ID: 0fb4f75b-4b62-440f-adc7-996b2d7f718a
:END:
Let us turn to fully Bayesian version of AEVB. Again, we first recall
the ELBO of the fully Bayesian mixture models:
@@ -1117,6 +1134,7 @@ Again, one may use Monte-Carlo to approximate this expetation.
** References
:PROPERTIES:
:CUSTOM_ID: references
+ :ID: df1567c9-b0e1-499f-a9d1-c0c915b2b98d
:END:
- Attias, Hagai. "A variational baysian framework for graphical models."
diff --git a/posts/2019-03-13-a-tail-of-two-densities.org b/posts/2019-03-13-a-tail-of-two-densities.org
index 783e0c5..f1b6b15 100644
--- a/posts/2019-03-13-a-tail-of-two-densities.org
+++ b/posts/2019-03-13-a-tail-of-two-densities.org
@@ -80,6 +80,7 @@ BY-SA]] and [[https://www.gnu.org/licenses/fdl.html][GNU FDL]]./
** The gist of differential privacy
:PROPERTIES:
:CUSTOM_ID: the-gist-of-differential-privacy
+ :ID: 91bf2eb5-8509-4180-b471-939280dc1438
:END:
If you only have one minute, here is what differential privacy is about:
@@ -118,6 +119,7 @@ Now if you have an hour...
** \(\epsilon\)-dp
:PROPERTIES:
:CUSTOM_ID: epsilon-dp
+ :ID: d29da3db-8b9a-4bad-811e-4af1cd9f856d
:END:
*Definition (Mechanisms)*. Let \(X\) be a space with a metric
\(d: X \times X \to \mathbb N\). A /mechanism/ \(M\) is a function that
@@ -188,6 +190,7 @@ where in the last step we use the condition (1.5). \(\square\)
** Approximate differential privacy
:PROPERTIES:
:CUSTOM_ID: approximate-differential-privacy
+ :ID: c48c68f8-d749-47f8-b5de-c92cc53f8cea
:END:
Unfortunately, \(\epsilon\)-dp does not apply to the most commonly used
noise, the Gaussian noise. To fix this, we need to relax the definition
@@ -205,6 +208,7 @@ if \(\delta < 1\).
*** Indistinguishability
:PROPERTIES:
:CUSTOM_ID: indistinguishability
+ :ID: 7875ad81-326b-4eaa-a3ae-9e09df96ea1b
:END:
To understand \((\epsilon, \delta)\)-dp, it is helpful to study
\((\epsilon, \delta)\)-indistinguishability.
@@ -535,6 +539,7 @@ The rest of the proof is almost the same as the proof of Claim 4.
*** Back to approximate differential privacy
:PROPERTIES:
:CUSTOM_ID: back-to-approximate-differential-privacy
+ :ID: 706c037d-ea44-4ade-8007-7f1f41d394e8
:END:
By Claim 0 and 1 we have
@@ -741,6 +746,7 @@ proof of Theorem A.1.
** Composition theorems
:PROPERTIES:
:CUSTOM_ID: composition-theorems
+ :ID: b672a060-d886-4f07-92d2-1d92f5f4c0c8
:END:
So far we have seen how a mechanism made of a single query plus a noise
can be proved to be differentially private. But we need to understand
@@ -1108,6 +1114,7 @@ The rest is the same as in the proof of Claim 17. \(\square\)
** Subsampling
:PROPERTIES:
:CUSTOM_ID: subsampling
+ :ID: eeda51d4-9370-49c6-9710-9c9ab88f91e2
:END:
Stochastic gradient descent is like gradient descent, but with random
subsampling.
@@ -1257,6 +1264,7 @@ guarantee for DP-SGD, among other things.
** References
:PROPERTIES:
:CUSTOM_ID: references
+ :ID: 65686625-6bd1-4e42-b00d-5f1744945884
:END:
- Abadi, Martín, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya
diff --git a/posts/2019-03-14-great-but-manageable-expectations.org b/posts/2019-03-14-great-but-manageable-expectations.org
index 6438090..68e757a 100644
--- a/posts/2019-03-14-great-but-manageable-expectations.org
+++ b/posts/2019-03-14-great-but-manageable-expectations.org
@@ -26,11 +26,12 @@ privacy guarantees for composed subsampling queries in general, and for
DP-SGD in particular. I also compare these privacy guarantees.
/If you are confused by any notations, ask me or try
-[[/notations.html][this]]./
+[[file:/notations.html][this]]./
** Rényi divergence and differential privacy
:PROPERTIES:
:CUSTOM_ID: rényi-divergence-and-differential-privacy
+ :ID: d1763dea-5e8f-4393-8f14-1d781147dcb5
:END:
Recall in the proof of Gaussian mechanism privacy guarantee (Claim 8) we
used the Chernoff bound for the Gaussian noise. Why not use the Chernoff
@@ -161,6 +162,7 @@ considering Rényi dp.
*** Moment Composition
:PROPERTIES:
:CUSTOM_ID: moment-composition
+ :ID: d5e94e5a-236d-4c41-96a4-4a93341f249a
:END:
*Claim 22 (Moment Composition Theorem)*. Let \(M\) be the adaptive
composition of \(M_{1 : k}\). Suppose for any \(y_{< i}\), \(M_i(y_{< i})\) is
@@ -228,6 +230,7 @@ the Advanced Composition Theorem (Claim 18).
*** Subsampling
:PROPERTIES:
:CUSTOM_ID: subsampling
+ :ID: 25cd27ac-fcb6-462f-9a3b-da861124d7b2
:END:
We also have a subsampling theorem for the Rényi dp.
@@ -330,6 +333,7 @@ assumptions.
** ACGMMTZ16
:PROPERTIES:
:CUSTOM_ID: acgmmtz16
+ :ID: 8b85cce3-01ad-4404-80c0-b73076d183a9
:END:
What follows is my understanding of this result. I call it a conjecture
because there is a gap which I am not able to reproduce their proof or
@@ -597,6 +601,7 @@ true, for the following reasons:
** Tensorflow implementation
:PROPERTIES:
:CUSTOM_ID: tensorflow-implementation
+ :ID: f856ad67-4f78-46b4-8c98-fda07a0dc670
:END:
The DP-SGD is implemented in
[[https://github.com/tensorflow/privacy][TensorFlow Privacy]]. In the
@@ -650,6 +655,7 @@ automatically computed given a DP-SGD instance.
** Comparison among different methods
:PROPERTIES:
:CUSTOM_ID: comparison-among-different-methods
+ :ID: 30502f53-d9ba-48ea-868a-dd4db995a6d4
:END:
So far we have seen three routes to compute the privacy guarantees for
DP-SGD with the Gaussian mechanism:
@@ -795,6 +801,7 @@ achieve the result in Route 3.
** Further questions
:PROPERTIES:
:CUSTOM_ID: further-questions
+ :ID: 277e8a8c-cc34-4ba9-84fb-d8950f6dc9de
:END:
Here is a list of what I think may be interesting topics or potential
problems to look at, with no guarantee that they are all awesome
@@ -816,6 +823,7 @@ untouched research problems:
** References
:PROPERTIES:
:CUSTOM_ID: references
+ :ID: 708aa715-dc2c-49ac-b7bb-f85ac168d8b3
:END:
- Abadi, Martín, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya