1 files changed, 24 insertions, 0 deletions
diff --git a/microposts/random-forests.org b/microposts/random-forests.org
new file mode 100644
index 0000000..f52c176
--- /dev/null
+++ b/microposts/random-forests.org
@@ -0,0 +1,24 @@
+#+title: random-forests
+
+#+date: <2018-05-15>
+
+[[https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/info][Stanford
+Lagunita's statistical learning course]] has some excellent lectures on
+random forests. It starts with explanations of decision trees, followed
+by bagged trees and random forests, and ends with boosting. From these
+lectures it seems that:
+
+1. The term "predictors" in statistical learning = "features" in machine
+   learning.
+2. The main idea of random forests of dropping predictors for individual
+   trees and aggregate by majority or average is the same as the idea of
+   dropout in neural networks, where a proportion of neurons in the
+   hidden layers are dropped temporarily during different minibatches of
+   training, effectively averaging over an emsemble of subnetworks. Both
+   tricks are used as regularisations, i.e. to reduce the variance. The
+   only difference is: in random forests, all but a square root number
+   of the total number of features are dropped, whereas the dropout
+   ratio in neural networks is usually a half.
+
+By the way, here's a comparison between statistical learning and machine
+learning from the slides of the Statistcal Learning course: