From c09000791216e57a3431b0c161725910b0bd9072 Mon Sep 17 00:00:00 2001 From: Yuchen Pei Date: Thu, 21 Mar 2019 14:30:38 +0100 Subject: reddit peer review --- posts/2019-03-13-a-tail-of-two-densities.md | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/posts/2019-03-13-a-tail-of-two-densities.md b/posts/2019-03-13-a-tail-of-two-densities.md index 08979fb..fd61440 100644 --- a/posts/2019-03-13-a-tail-of-two-densities.md +++ b/posts/2019-03-13-a-tail-of-two-densities.md @@ -6,10 +6,16 @@ comments: true --- This is Part 1 of a two-part post where I give an introduction to -differential privacy, a study of [tail bounds](https://en.wikipedia.org/wiki/Concentration_inequality) +the mathematics of differential privacy. +Practically speaking, [differential privacy](https://en.wikipedia.org/wiki/Differential_privacy) +is a technique of perturbing database queries so that query results do not leak information. + +This post however focuses on the mathematical aspects of differential privacy, which is +a study of [tail bounds](https://en.wikipedia.org/wiki/Concentration_inequality) of the divergence between two probability measures, with the end goal of applying it to [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). +It should be suitable for anyone familiar with probability theory. I start with the definition of $\epsilon$-differential privacy (corresponding to max divergence), followed by @@ -45,12 +51,6 @@ Finally I use the results from both Part 1 and Part 2 to obtain some privacy guarantees for composed subsampling queries in general, and for DP-SGD in particular. I also compare these privacy guarantees. -This post focuses on the mathematics of differential privacy, and should be -suitable for anyone familiar with probability theory. -For how the subject discussed in this post is related to privacy, -check out the [Wikipedia entry](https://en.wikipedia.org/wiki/Differential_privacy) -or [Dwork-Roth 2013](https://www.cis.upenn.edu/~aaroth/privacybook.html). - **Acknowledgement**. I would like to thank [Stockholm AI](http://stockholm.ai) for introducing me to the subject of differential privacy. Thanks to (in chronological order) Reynaldo @@ -101,11 +101,11 @@ by adding noise to the gradients. Now if you have an hour\... [^dv]: For those who have read about differential privacy and never heard -of the term "divergence variable", it is closely related to the notion of "privacy loss", +of the term \"divergence variable\", it is closely related to the notion of \"privacy loss\", see the paragraph under Claim 6 in [Back to approximate differential privacy](#back-to-approximate-differential-privacy). I defined the term this way so that we can focus on the more general stuff: compared to the privacy loss $L(M(x) || M(x'))$, the term $L(p || q)$ removes -the "distracting information" that $p$ and $q$ are related to databases, +the \"distracting information\" that $p$ and $q$ are related to databases, queries, mechanisms etc., but merely probability laws. By removing the distraction, we simplify the analysis. And once we are done with the analysis of $L(p || q)$, we can apply the results obtained in the general setting to the special case @@ -150,6 +150,10 @@ $S \subset \mathbb R^n$, $$\mathbb P(M(x) \in S) \le e^\epsilon P(M(x') \in S). \qquad (1)$$ +Practically speaking, this means given the results from perturbed query on +two known databases that differs by one row, it is hard to determine +which result is from which database. + An example of $\epsilon$-dp mechanism is the Laplace mechanism. **Definition**. The *Laplace distribution* over $\mathbb R$ -- cgit v1.2.3