aboutsummaryrefslogtreecommitdiff
path: root/microposts/neural-nets-activation.md
blob: a0d7a209766f0b6c4fefc0c482188096c4d79d22 (plain) (blame)
1
2
3
4
5
6
---
date: 2018-05-09
---
> What makes the rectified linear activation function better than the sigmoid or tanh functions? At present, we have a poor understanding of the answer to this question. Indeed, rectified linear units have only begun to be widely used in the past few years. The reason for that recent adoption is empirical: a few people tried rectified linear units, often on the basis of hunches or heuristic arguments. They got good results classifying benchmark data sets, and the practice has spread. In an ideal world we'd have a theory telling us which activation function to pick for which application. But at present we're a long way from such a world. I should not be at all surprised if further major improvements can be obtained by an even better choice of activation function. And I also expect that in coming decades a powerful theory of activation functions will be developed. Today, we still have to rely on poorly understood rules of thumb and experience.

Michael Nielsen, [Neutral networks and deep learning](http://neuralnetworksanddeeplearning.com/chap6.html#convolutional_neural_networks_in_practice)