path: root/microposts/rnn-fsm.org



#+title: rnn-fsm

#+date: <2018-05-11>

Related to [[neural-turing-machine][a previous micropost]].

[[http://www.cs.toronto.edu/~rgrosse/csc321/lec9.pdf][These slides from
Toronto]] are a nice introduction to RNN (recurrent neural network) from
a computational point of view. It states that RNN can simulate any FSM
(finite state machine, a.k.a. finite automata abbr. FA) with a toy
example computing the parity of a binary string.

[[http://www.deeplearningbook.org/contents/rnn.html][Goodfellow et.
al.'s book]] (see page 372 and 374) goes one step further, stating that
RNN with a hidden-to-hidden layer can simulate Turing machines, and not
only that, but also the /universal/ Turing machine abbr. UTM (the book
referenced
[[https://www.sciencedirect.com/science/article/pii/S0022000085710136][Siegelmann-Sontag]]),
a property not shared by the weaker network where the hidden-to-hidden
layer is replaced by an output-to-hidden layer (page 376).

By the way, the RNN with a hidden-to-hidden layer has the same
architecture as the so-called linear dynamical system mentioned in
[[https://www.coursera.org/learn/neural-networks/lecture/Fpa7y/modeling-sequences-a-brief-overview][Hinton's
video]].

From what I have learned, the universality of RNN and feedforward
networks are therefore due to different arguments, the former coming
from Turing machines and the latter from an analytical view of
approximation by step functions.
#+title: rnn-fsm

#+date: <2018-05-11>

Related to [[neural-turing-machine][a previous micropost]].

[[http://www.cs.toronto.edu/~rgrosse/csc321/lec9.pdf][These slides from
Toronto]] are a nice introduction to RNN (recurrent neural network) from
a computational point of view. It states that RNN can simulate any FSM
(finite state machine, a.k.a. finite automata abbr. FA) with a toy
example computing the parity of a binary string.

[[http://www.deeplearningbook.org/contents/rnn.html][Goodfellow et.
al.'s book]] (see page 372 and 374) goes one step further, stating that
RNN with a hidden-to-hidden layer can simulate Turing machines, and not
only that, but also the /universal/ Turing machine abbr. UTM (the book
referenced
[[https://www.sciencedirect.com/science/article/pii/S0022000085710136][Siegelmann-Sontag]]),
a property not shared by the weaker network where the hidden-to-hidden
layer is replaced by an output-to-hidden layer (page 376).

By the way, the RNN with a hidden-to-hidden layer has the same
architecture as the so-called linear dynamical system mentioned in
[[https://www.coursera.org/learn/neural-networks/lecture/Fpa7y/modeling-sequences-a-brief-overview][Hinton's
video]].

From what I have learned, the universality of RNN and feedforward
networks are therefore due to different arguments, the former coming
from Turing machines and the latter from an analytical view of
approximation by step functions.