aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--posts/2019-01-03-discriminant-analysis.md24
1 files changed, 13 insertions, 11 deletions
diff --git a/posts/2019-01-03-discriminant-analysis.md b/posts/2019-01-03-discriminant-analysis.md
index f019454..2aa3ba6 100644
--- a/posts/2019-01-03-discriminant-analysis.md
+++ b/posts/2019-01-03-discriminant-analysis.md
@@ -5,11 +5,11 @@ template: post
comments: true
---
-In this post I will talk about theory and implementation of linear and
+In this post I talk about theory and implementation of linear and
quadratic discriminant analysis, classical methods in statistical
learning.
-[]{#Acknowledgement}**Acknowledgement**. Various sources were of great
+**Acknowledgement**. Various sources were of great
help to my understanding of the subject, including Chapter 4 of [The
Elements of Statistical
Learning](https://web.stanford.edu/~hastie/ElemStatLearn/), [Stanford
@@ -17,7 +17,9 @@ CS229 Lecture notes](http://cs229.stanford.edu/notes/cs229-notes2.pdf),
and [the scikit-learn
code](https://github.com/scikit-learn/scikit-learn/blob/7389dba/sklearn/discriminant_analysis.py).
-Theory {#Theory}
+This post is licensed under CC BY-SA and GNU FDL.
+
+Theory
------
Quadratic discriminant analysis (QDA) is a classical classification
@@ -62,7 +64,7 @@ Guassian Naive Bayes is a different specialisation of QDA: it assumes
that all $\Sigma_i$ are diagonal, since all the features are assumed to
be independent.
-### QDA {#QDA}
+### QDA
We look at QDA.
@@ -85,7 +87,7 @@ throw an exception.
This won\'t be a problem of the LDA, though, unless there is only one
sample for each class.
-### Vanilla LDA {#Vanilla LDA}
+### Vanilla LDA
Now let us look at LDA.
@@ -116,7 +118,7 @@ This can be seen as applying a linear transformation to $X$ to turn its
covariance matrix to identity. And thus the model becomes a sort of a
nearest neighbour classifier.
-### Nearest neighbour classifier {#Nearest neighbour classifier}
+### Nearest neighbour classifier
More specifically, we want to transform the first term of (0) to a norm
to get a classifier based on nearest neighbour modulo $\log \pi_i$:
@@ -147,7 +149,7 @@ So we just need to make $A = D_x^{-1} V_x^T$. When it comes to
prediction, just transform $x$ with $A$, and find the nearest centroid
$A \mu_i$ (again, modulo $\log \pi_i$) and label the input with $i$.
-### Dimensionality reduction {#Dimensionality reduction}
+### Dimensionality reduction
We can further simplify the prediction by dimensionality reduction.
Assume $n_c \le n$. Then the centroid spans an affine space of dimension
@@ -174,7 +176,7 @@ will result in a lossy compression / regularisation equivalent to doing
analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) on
$(M - \bar x) V_x D_x^{-1}$.
-### Fisher discriminant analysis {#Fisher discriminant analysis}
+### Fisher discriminant analysis
The Fisher discriminant analysis involves finding an $n$-dimensional
vector $a$ that maximises between-class covariance with respect to
@@ -209,7 +211,7 @@ column of $V_m$.
Therefore, the solution to Fisher discriminant analysis is
$a = c V_x D_x^{-1} V_m$ with $p = 1$.
-### Linear model {#Linear model}
+### Linear model
The model is called linear discriminant analysis because it is a linear
model. To see this, let $B = V_m^T D_x^{-1} V_x^T$ be the matrix of
@@ -231,7 +233,7 @@ thus the decision boundaries are linear.
This is how scikit-learn implements LDA, by inheriting from
`LinearClassifierMixin` and redirecting the classification there.
-Implementation {#Implementation}
+Implementation
--------------
This is where things get interesting. How do I validate my understanding
@@ -253,7 +255,7 @@ well-written though.
The result is
[here](https://github.com/ycpei/machine-learning/tree/master/discriminant-analysis).
-### Fun facts about LDA {#Fun facts about LDA}
+### Fun facts about LDA
One property that can be used to test the LDA implementation is the fact
that the scatter matrix $B(X - \bar x)^T (X - \bar X) B^T$ of the