aboutsummaryrefslogtreecommitdiff
path: root/microposts
diff options
context:
space:
mode:
authorYuchen Pei <me@ypei.me>2018-05-09 14:16:03 +0200
committerYuchen Pei <me@ypei.me>2018-05-09 14:16:03 +0200
commitff0ab387f61ea0d35a73d599356794a41d694abb (patch)
treec86dba3d01df80a07445b515ce651738184db0d9 /microposts
parent9b15030d9e410a94382616334bfce3db302ec76a (diff)
added a mpost
Diffstat (limited to 'microposts')
-rw-r--r--microposts/neural-nets-activation.md6
1 files changed, 6 insertions, 0 deletions
diff --git a/microposts/neural-nets-activation.md b/microposts/neural-nets-activation.md
new file mode 100644
index 0000000..a0d7a20
--- /dev/null
+++ b/microposts/neural-nets-activation.md
@@ -0,0 +1,6 @@
+---
+date: 2018-05-09
+---
+> What makes the rectified linear activation function better than the sigmoid or tanh functions? At present, we have a poor understanding of the answer to this question. Indeed, rectified linear units have only begun to be widely used in the past few years. The reason for that recent adoption is empirical: a few people tried rectified linear units, often on the basis of hunches or heuristic arguments. They got good results classifying benchmark data sets, and the practice has spread. In an ideal world we'd have a theory telling us which activation function to pick for which application. But at present we're a long way from such a world. I should not be at all surprised if further major improvements can be obtained by an even better choice of activation function. And I also expect that in coming decades a powerful theory of activation functions will be developed. Today, we still have to rely on poorly understood rules of thumb and experience.
+
+Michael Nielsen, [Neutral networks and deep learning](http://neuralnetworksanddeeplearning.com/chap6.html#convolutional_neural_networks_in_practice) \ No newline at end of file