aboutsummaryrefslogtreecommitdiff
path: root/posts/2019-01-03-discriminant-analysis.org
diff options
context:
space:
mode:
Diffstat (limited to 'posts/2019-01-03-discriminant-analysis.org')
-rw-r--r--posts/2019-01-03-discriminant-analysis.org9
1 files changed, 9 insertions, 0 deletions
diff --git a/posts/2019-01-03-discriminant-analysis.org b/posts/2019-01-03-discriminant-analysis.org
index 34c16bf..a0ada73 100644
--- a/posts/2019-01-03-discriminant-analysis.org
+++ b/posts/2019-01-03-discriminant-analysis.org
@@ -23,6 +23,7 @@ under CC BY-SA and GNU FDL./
** Theory
:PROPERTIES:
:CUSTOM_ID: theory
+ :ID: 69be3baf-7f60-42f2-9184-ee8840eea554
:END:
Quadratic discriminant analysis (QDA) is a classical classification
algorithm. It assumes that the data is generated by Gaussian
@@ -69,6 +70,7 @@ be independent.
*** QDA
:PROPERTIES:
:CUSTOM_ID: qda
+ :ID: f6e95892-01cf-4569-b01e-22ed238d0577
:END:
We look at QDA.
@@ -94,6 +96,7 @@ sample for each class.
*** Vanilla LDA
:PROPERTIES:
:CUSTOM_ID: vanilla-lda
+ :ID: 5a6ca0ca-f385-4054-9b19-9cac69b1a59a
:END:
Now let us look at LDA.
@@ -127,6 +130,7 @@ nearest neighbour classifier.
*** Nearest neighbour classifier
:PROPERTIES:
:CUSTOM_ID: nearest-neighbour-classifier
+ :ID: 8880764c-6fbe-4023-97dd-9711c7c50ea9
:END:
More specifically, we want to transform the first term of (0) to a norm
to get a classifier based on nearest neighbour modulo $\log \pi_i$:
@@ -160,6 +164,7 @@ $A \mu_i$ (again, modulo $\log \pi_i$) and label the input with $i$.
*** Dimensionality reduction
:PROPERTIES:
:CUSTOM_ID: dimensionality-reduction
+ :ID: 70e1afc1-9c45-4a35-a842-48573e077b36
:END:
We can further simplify the prediction by dimensionality reduction.
Assume $n_c \le n$. Then the centroid spans an affine space of dimension
@@ -195,6 +200,7 @@ words, the prediction does not change regardless of =n_components=.
*** Fisher discriminant analysis
:PROPERTIES:
:CUSTOM_ID: fisher-discriminant-analysis
+ :ID: 05ff25da-8c52-4f20-a0ac-4422f19e10ce
:END:
The Fisher discriminant analysis involves finding an $n$-dimensional
vector $a$ that maximises between-class covariance with respect to
@@ -232,6 +238,7 @@ $a = c V_x D_x^{-1} V_m$ with $p = 1$.
*** Linear model
:PROPERTIES:
:CUSTOM_ID: linear-model
+ :ID: feb827b6-0064-4192-b96b-86a942c8839e
:END:
The model is called linear discriminant analysis because it is a linear
model. To see this, let $B = V_m^T D_x^{-1} V_x^T$ be the matrix of
@@ -256,6 +263,7 @@ This is how scikit-learn implements LDA, by inheriting from
** Implementation
:PROPERTIES:
:CUSTOM_ID: implementation
+ :ID: b567283c-20ee-41a8-8216-7392066a5ac5
:END:
This is where things get interesting. How do I validate my understanding
of the theory? By implementing and testing the algorithm.
@@ -279,6 +287,7 @@ The result is
*** Fun facts about LDA
:PROPERTIES:
:CUSTOM_ID: fun-facts-about-lda
+ :ID: f1d47f43-27f6-49dd-bd0d-2e685c38e241
:END:
One property that can be used to test the LDA implementation is the fact
that the scatter matrix $B(X - \bar x)^T (X - \bar X) B^T$ of the