#+title: neural-nets-activation

#+date: <2018-05-09>

#+begin_quote
  What makes the rectified linear activation function better than the
  sigmoid or tanh functions? At present, we have a poor understanding of
  the answer to this question. Indeed, rectified linear units have only
  begun to be widely used in the past few years. The reason for that
  recent adoption is empirical: a few people tried rectified linear
  units, often on the basis of hunches or heuristic arguments. They got
  good results classifying benchmark data sets, and the practice has
  spread. In an ideal world we'd have a theory telling us which
  activation function to pick for which application. But at present
  we're a long way from such a world. I should not be at all surprised
  if further major improvements can be obtained by an even better choice
  of activation function. And I also expect that in coming decades a
  powerful theory of activation functions will be developed. Today, we
  still have to rely on poorly understood rules of thumb and experience.
#+end_quote

Michael Nielsen,
[[http://neuralnetworksanddeeplearning.com/chap6.html#convolutional_neural_networks_in_practice][Neutral
networks and deep learning]]