diff options
-rw-r--r-- | posts/2019-01-03-discriminant-analysis.md | 24 |
1 files changed, 13 insertions, 11 deletions
diff --git a/posts/2019-01-03-discriminant-analysis.md b/posts/2019-01-03-discriminant-analysis.md index f019454..2aa3ba6 100644 --- a/posts/2019-01-03-discriminant-analysis.md +++ b/posts/2019-01-03-discriminant-analysis.md @@ -5,11 +5,11 @@ template: post comments: true --- -In this post I will talk about theory and implementation of linear and +In this post I talk about theory and implementation of linear and quadratic discriminant analysis, classical methods in statistical learning. -[]{#Acknowledgement}**Acknowledgement**. Various sources were of great +**Acknowledgement**. Various sources were of great help to my understanding of the subject, including Chapter 4 of [The Elements of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/), [Stanford @@ -17,7 +17,9 @@ CS229 Lecture notes](http://cs229.stanford.edu/notes/cs229-notes2.pdf), and [the scikit-learn code](https://github.com/scikit-learn/scikit-learn/blob/7389dba/sklearn/discriminant_analysis.py). -Theory {#Theory} +This post is licensed under CC BY-SA and GNU FDL. + +Theory ------ Quadratic discriminant analysis (QDA) is a classical classification @@ -62,7 +64,7 @@ Guassian Naive Bayes is a different specialisation of QDA: it assumes that all $\Sigma_i$ are diagonal, since all the features are assumed to be independent. -### QDA {#QDA} +### QDA We look at QDA. @@ -85,7 +87,7 @@ throw an exception. This won\'t be a problem of the LDA, though, unless there is only one sample for each class. -### Vanilla LDA {#Vanilla LDA} +### Vanilla LDA Now let us look at LDA. @@ -116,7 +118,7 @@ This can be seen as applying a linear transformation to $X$ to turn its covariance matrix to identity. And thus the model becomes a sort of a nearest neighbour classifier. -### Nearest neighbour classifier {#Nearest neighbour classifier} +### Nearest neighbour classifier More specifically, we want to transform the first term of (0) to a norm to get a classifier based on nearest neighbour modulo $\log \pi_i$: @@ -147,7 +149,7 @@ So we just need to make $A = D_x^{-1} V_x^T$. When it comes to prediction, just transform $x$ with $A$, and find the nearest centroid $A \mu_i$ (again, modulo $\log \pi_i$) and label the input with $i$. -### Dimensionality reduction {#Dimensionality reduction} +### Dimensionality reduction We can further simplify the prediction by dimensionality reduction. Assume $n_c \le n$. Then the centroid spans an affine space of dimension @@ -174,7 +176,7 @@ will result in a lossy compression / regularisation equivalent to doing analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) on $(M - \bar x) V_x D_x^{-1}$. -### Fisher discriminant analysis {#Fisher discriminant analysis} +### Fisher discriminant analysis The Fisher discriminant analysis involves finding an $n$-dimensional vector $a$ that maximises between-class covariance with respect to @@ -209,7 +211,7 @@ column of $V_m$. Therefore, the solution to Fisher discriminant analysis is $a = c V_x D_x^{-1} V_m$ with $p = 1$. -### Linear model {#Linear model} +### Linear model The model is called linear discriminant analysis because it is a linear model. To see this, let $B = V_m^T D_x^{-1} V_x^T$ be the matrix of @@ -231,7 +233,7 @@ thus the decision boundaries are linear. This is how scikit-learn implements LDA, by inheriting from `LinearClassifierMixin` and redirecting the classification there. -Implementation {#Implementation} +Implementation -------------- This is where things get interesting. How do I validate my understanding @@ -253,7 +255,7 @@ well-written though. The result is [here](https://github.com/ycpei/machine-learning/tree/master/discriminant-analysis). -### Fun facts about LDA {#Fun facts about LDA} +### Fun facts about LDA One property that can be used to test the LDA implementation is the fact that the scatter matrix $B(X - \bar x)^T (X - \bar X) B^T$ of the |