From c09000791216e57a3431b0c161725910b0bd9072 Mon Sep 17 00:00:00 2001
From: Yuchen Pei <me@ypei.me>
Date: Thu, 21 Mar 2019 14:30:38 +0100
Subject: reddit peer review

---
 posts/2019-03-13-a-tail-of-two-densities.md | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/posts/2019-03-13-a-tail-of-two-densities.md b/posts/2019-03-13-a-tail-of-two-densities.md
index 08979fb..fd61440 100644
--- a/posts/2019-03-13-a-tail-of-two-densities.md
+++ b/posts/2019-03-13-a-tail-of-two-densities.md
@@ -6,10 +6,16 @@ comments: true
 ---
 
 This is Part 1 of a two-part post where I give an introduction to
-differential privacy, a study of [tail bounds](https://en.wikipedia.org/wiki/Concentration_inequality)
+the mathematics of differential privacy.
+Practically speaking, [differential privacy](https://en.wikipedia.org/wiki/Differential_privacy) 
+is a technique of perturbing database queries so that query results do not leak information.
+
+This post however focuses on the mathematical aspects of differential privacy, which is
+a study of [tail bounds](https://en.wikipedia.org/wiki/Concentration_inequality)
 of the divergence between
 two probability measures, with the end goal of applying it to [stochastic
 gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent).
+It should be suitable for anyone familiar with probability theory.
 
 I start with the definition of $\epsilon$-differential privacy
 (corresponding to max divergence), followed by
@@ -45,12 +51,6 @@ Finally I use the results from both Part 1 and Part 2 to obtain some privacy
 guarantees for composed subsampling queries in general, and for DP-SGD in particular. 
 I also compare these privacy guarantees.
 
-This post focuses on the mathematics of differential privacy, and should be
-suitable for anyone familiar with probability theory.
-For how the subject discussed in this post is related to privacy, 
-check out the [Wikipedia entry](https://en.wikipedia.org/wiki/Differential_privacy)
-or [Dwork-Roth 2013](https://www.cis.upenn.edu/~aaroth/privacybook.html).
-
 **Acknowledgement**. I would like to thank
 [Stockholm AI](http://stockholm.ai) for introducing me to the subject
 of differential privacy. Thanks to (in chronological order) Reynaldo
@@ -101,11 +101,11 @@ by adding noise to the gradients.
 Now if you have an hour\...
 
 [^dv]: For those who have read about differential privacy and never heard 
-of the term "divergence variable", it is closely related to the notion of "privacy loss",
+of the term \"divergence variable\", it is closely related to the notion of \"privacy loss\",
 see the paragraph under Claim 6 in [Back to approximate differential privacy](#back-to-approximate-differential-privacy).
 I defined the term this way so that we can focus on the more general stuff: 
 compared to the privacy loss $L(M(x) || M(x'))$, the term $L(p || q)$ removes 
-the "distracting information" that $p$ and $q$ are related to databases, 
+the \"distracting information\" that $p$ and $q$ are related to databases, 
 queries, mechanisms etc., but merely probability laws. By removing the distraction, 
 we simplify the analysis. And once we are done with the analysis of $L(p || q)$, 
 we can apply the results obtained in the general setting to the special case 
@@ -150,6 +150,10 @@ $S \subset \mathbb R^n$,
 
 $$\mathbb P(M(x) \in S) \le e^\epsilon P(M(x') \in S). \qquad (1)$$
 
+Practically speaking, this means given the results from perturbed query on
+two known databases that differs by one row, it is hard to determine
+which result is from which database.
+
 An example of $\epsilon$-dp mechanism is the Laplace mechanism.
 
 **Definition**. The *Laplace distribution* over $\mathbb R$
-- 
cgit v1.2.3