aboutsummaryrefslogtreecommitdiff
path: root/microposts/random-forests.org
diff options
context:
space:
mode:
Diffstat (limited to 'microposts/random-forests.org')
-rw-r--r--microposts/random-forests.org24
1 files changed, 24 insertions, 0 deletions
diff --git a/microposts/random-forests.org b/microposts/random-forests.org
new file mode 100644
index 0000000..f52c176
--- /dev/null
+++ b/microposts/random-forests.org
@@ -0,0 +1,24 @@
+#+title: random-forests
+
+#+date: <2018-05-15>
+
+[[https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/info][Stanford
+Lagunita's statistical learning course]] has some excellent lectures on
+random forests. It starts with explanations of decision trees, followed
+by bagged trees and random forests, and ends with boosting. From these
+lectures it seems that:
+
+1. The term "predictors" in statistical learning = "features" in machine
+ learning.
+2. The main idea of random forests of dropping predictors for individual
+ trees and aggregate by majority or average is the same as the idea of
+ dropout in neural networks, where a proportion of neurons in the
+ hidden layers are dropped temporarily during different minibatches of
+ training, effectively averaging over an emsemble of subnetworks. Both
+ tricks are used as regularisations, i.e. to reduce the variance. The
+ only difference is: in random forests, all but a square root number
+ of the total number of features are dropped, whereas the dropout
+ ratio in neural networks is usually a half.
+
+By the way, here's a comparison between statistical learning and machine
+learning from the slides of the Statistcal Learning course: