diff options
| -rw-r--r-- | posts/2019-01-03-discriminant-analysis.md | 24 | 
1 files changed, 13 insertions, 11 deletions
diff --git a/posts/2019-01-03-discriminant-analysis.md b/posts/2019-01-03-discriminant-analysis.md index f019454..2aa3ba6 100644 --- a/posts/2019-01-03-discriminant-analysis.md +++ b/posts/2019-01-03-discriminant-analysis.md @@ -5,11 +5,11 @@ template: post  comments: true  --- -In this post I will talk about theory and implementation of linear and +In this post I talk about theory and implementation of linear and  quadratic discriminant analysis, classical methods in statistical  learning. -[]{#Acknowledgement}**Acknowledgement**. Various sources were of great +**Acknowledgement**. Various sources were of great  help to my understanding of the subject, including Chapter 4 of [The  Elements of Statistical  Learning](https://web.stanford.edu/~hastie/ElemStatLearn/), [Stanford @@ -17,7 +17,9 @@ CS229 Lecture notes](http://cs229.stanford.edu/notes/cs229-notes2.pdf),  and [the scikit-learn  code](https://github.com/scikit-learn/scikit-learn/blob/7389dba/sklearn/discriminant_analysis.py). -Theory {#Theory} +This post is licensed under CC BY-SA and GNU FDL. + +Theory  ------  Quadratic discriminant analysis (QDA) is a classical classification @@ -62,7 +64,7 @@ Guassian Naive Bayes is a different specialisation of QDA: it assumes  that all $\Sigma_i$ are diagonal, since all the features are assumed to  be independent. -### QDA {#QDA} +### QDA  We look at QDA. @@ -85,7 +87,7 @@ throw an exception.  This won\'t be a problem of the LDA, though, unless there is only one  sample for each class. -### Vanilla LDA {#Vanilla LDA} +### Vanilla LDA  Now let us look at LDA. @@ -116,7 +118,7 @@ This can be seen as applying a linear transformation to $X$ to turn its  covariance matrix to identity. And thus the model becomes a sort of a  nearest neighbour classifier. -### Nearest neighbour classifier {#Nearest neighbour classifier} +### Nearest neighbour classifier  More specifically, we want to transform the first term of (0) to a norm  to get a classifier based on nearest neighbour modulo $\log \pi_i$: @@ -147,7 +149,7 @@ So we just need to make $A = D_x^{-1} V_x^T$. When it comes to  prediction, just transform $x$ with $A$, and find the nearest centroid  $A \mu_i$ (again, modulo $\log \pi_i$) and label the input with $i$. -### Dimensionality reduction {#Dimensionality reduction} +### Dimensionality reduction  We can further simplify the prediction by dimensionality reduction.  Assume $n_c \le n$. Then the centroid spans an affine space of dimension @@ -174,7 +176,7 @@ will result in a lossy compression / regularisation equivalent to doing  analysis](https://en.wikipedia.org/wiki/Principal_component_analysis) on  $(M - \bar x) V_x D_x^{-1}$. -### Fisher discriminant analysis {#Fisher discriminant analysis} +### Fisher discriminant analysis  The Fisher discriminant analysis involves finding an $n$-dimensional  vector $a$ that maximises between-class covariance with respect to @@ -209,7 +211,7 @@ column of $V_m$.  Therefore, the solution to Fisher discriminant analysis is  $a = c V_x D_x^{-1} V_m$ with $p = 1$. -### Linear model {#Linear model} +### Linear model  The model is called linear discriminant analysis because it is a linear  model. To see this, let $B = V_m^T D_x^{-1} V_x^T$ be the matrix of @@ -231,7 +233,7 @@ thus the decision boundaries are linear.  This is how scikit-learn implements LDA, by inheriting from  `LinearClassifierMixin` and redirecting the classification there. -Implementation {#Implementation} +Implementation  --------------  This is where things get interesting. How do I validate my understanding @@ -253,7 +255,7 @@ well-written though.  The result is  [here](https://github.com/ycpei/machine-learning/tree/master/discriminant-analysis). -### Fun facts about LDA {#Fun facts about LDA} +### Fun facts about LDA  One property that can be used to test the LDA implementation is the fact  that the scatter matrix $B(X - \bar x)^T (X - \bar X) B^T$ of the  | 
