1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
|
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Yuchen's Microblog</title>
<link rel="stylesheet" href="../assets/css/default.css" />
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>
<script src="../assets/js/analytics.js" type="text/javascript"></script>
</head>
<body>
<header>
<span class="logo">
<a href="microblog.html">Yuchen's Microblog</a>
</span>
<nav>
<a href="index.html">About</a><a href="microblog-feed.xml">Feed</a>
</nav>
</header>
<div class="main">
<div class="bodyitem">
<span id=rnn-fsm><p><a href="#rnn-fsm">2018-05-11</a></p></span>
<h3 id="some-notes-on-rnn-fsm-fa-tm-and-utm">Some notes on RNN, FSM / FA, TM and UTM</h3>
<p>Related to <a href="#neural-turing-machine">a previous micropost</a>.</p>
<p><a href="http://www.cs.toronto.edu/~rgrosse/csc321/lec9.pdf">These slides from Toronto</a> are a nice introduction to RNN (recurrent neural network) from a computational point of view. It states that RNN can simulate any FSM (finite state machine, a.k.a. finite automata abbr. FA) with a toy example computing the parity of a binary string.</p>
<p><a href="http://www.deeplearningbook.org/contents/rnn.html">Goodfellow et. al.’s book</a> (see page 372 and 374) goes one step further, stating that RNN with a hidden-to-hidden layer can simulate Turing machines, and not only that, but also the <em>universal</em> Turing machine abbr. UTM (the book referenced <a href="https://www.sciencedirect.com/science/article/pii/S0022000085710136">Siegelmann-Sontag</a>), a property not shared by the weaker network where the hidden-to-hidden layer is replaced by an output-to-hidden layer (page 376).</p>
<p>By the way, the RNN with a hidden-to-hidden layer has the same architecture as the so-called linear dynamical system mentioned in <a href="https://www.coursera.org/learn/neural-networks/lecture/Fpa7y/modeling-sequences-a-brief-overview">Hinton’s video</a>.</p>
<p>From what I have learned, the universality of RNN and feedforward networks are therefore due to different arguments, the former coming from Turing machines and the latter from an analytical view of approximation by step functions.</p>
</div>
<div class="bodyitem">
<span id=math-writing-decoupling><p><a href="#math-writing-decoupling">2018-05-10</a></p></span>
<h3 id="writing-readable-mathematics-like-writing-an-operating-system">Writing readable mathematics like writing an operating system</h3>
<p>One way to write readable mathematics is to decouple concepts. One idea is the following template. First write a toy example with all the important components present in this example, then analyse each component individually and elaborate how (perhaps more complex) variations of the component can extend the toy example and induce more complex or powerful versions of the toy example. Through such incremental development, one should be able to arrive at any result in cutting edge research after a pleasant journey.</p>
<p>It’s a bit like the UNIX philosophy, where you have a basic system of modules like IO, memory management, graphics etc, and modify / improve each module individually (H/t <a href="http://nand2tetris.org/">NAND2Tetris</a>).</p>
<p>The book <a href="http://neuralnetworksanddeeplearning.com/">Neutral networks and deep learning</a> by Michael Nielsen is an example of such approach. It begins the journey with a very simple neutral net with one hidden layer, no regularisation, and sigmoid activations. It then analyses each component including cost functions, the back propagation algorithm, the activation functions, regularisation and the overall architecture (from fully connected to CNN) individually and improve the toy example incrementally. Over the course the accuracy of the example of mnist grows incrementally from 95.42% to 99.67%.</p>
</div>
<div class="bodyitem">
<span id=neural-nets-activation><p><a href="#neural-nets-activation">2018-05-09</a></p></span>
<blockquote>
<p>What makes the rectified linear activation function better than the sigmoid or tanh functions? At present, we have a poor understanding of the answer to this question. Indeed, rectified linear units have only begun to be widely used in the past few years. The reason for that recent adoption is empirical: a few people tried rectified linear units, often on the basis of hunches or heuristic arguments. They got good results classifying benchmark data sets, and the practice has spread. In an ideal world we’d have a theory telling us which activation function to pick for which application. But at present we’re a long way from such a world. I should not be at all surprised if further major improvements can be obtained by an even better choice of activation function. And I also expect that in coming decades a powerful theory of activation functions will be developed. Today, we still have to rely on poorly understood rules of thumb and experience.</p>
</blockquote>
<p>Michael Nielsen, <a href="http://neuralnetworksanddeeplearning.com/chap6.html#convolutional_neural_networks_in_practice">Neutral networks and deep learning</a></p>
</div>
<div class="bodyitem">
<span id=neural-turing-machine><p><a href="#neural-turing-machine">2018-05-09</a></p></span>
<blockquote>
<p>One way RNNs are currently being used is to connect neural networks more closely to traditional ways of thinking about algorithms, ways of thinking based on concepts such as Turing machines and (conventional) programming languages. <a href="https://arxiv.org/abs/1410.4615">A 2014 paper</a> developed an RNN which could take as input a character-by-character description of a (very, very simple!) Python program, and use that description to predict the output. Informally, the network is learning to “understand” certain Python programs. <a href="https://arxiv.org/abs/1410.5401">A second paper, also from 2014</a>, used RNNs as a starting point to develop what they called a neural Turing machine (NTM). This is a universal computer whose entire structure can be trained using gradient descent. They trained their NTM to infer algorithms for several simple problems, such as sorting and copying.</p>
<p>As it stands, these are extremely simple toy models. Learning to execute the Python program <code>print(398345+42598)</code> doesn’t make a network into a full-fledged Python interpreter! It’s not clear how much further it will be possible to push the ideas. Still, the results are intriguing. Historically, neural networks have done well at pattern recognition problems where conventional algorithmic approaches have trouble. Vice versa, conventional algorithmic approaches are good at solving problems that neural nets aren’t so good at. No-one today implements a web server or a database program using a neural network! It’d be great to develop unified models that integrate the strengths of both neural networks and more traditional approaches to algorithms. RNNs and ideas inspired by RNNs may help us do that.</p>
</blockquote>
<p>Michael Nielsen, <a href="http://neuralnetworksanddeeplearning.com/chap6.html#other_approaches_to_deep_neural_nets">Neural networks and deep learning</a></p>
</div>
<div class="bodyitem">
<span id=nlp-arxiv><p><a href="#nlp-arxiv">2018-05-08</a></p></span>
<p>Primer Science is a tool by a startup called Primer that uses NLP to summarize contents (but not single papers, yet) on arxiv. A developer of this tool predicts in <a href="https://twimlai.com/twiml-talk-136-taming-arxiv-w-natural-language-processing-with-john-bohannon/#">an interview</a> that progress on AI’s ability to extract meanings from AI research papers will be the biggest accelerant on AI research.</p>
</div>
<div class="bodyitem">
<span id=neural-nets-regularization><p><a href="#neural-nets-regularization">2018-05-08</a></p></span>
<blockquote>
<p>no-one has yet developed an entirely convincing theoretical explanation for why regularization helps networks generalize. Indeed, researchers continue to write papers where they try different approaches to regularization, compare them to see which works better, and attempt to understand why different approaches work better or worse. And so you can view regularization as something of a kludge. While it often helps, we don’t have an entirely satisfactory systematic understanding of what’s going on, merely incomplete heuristics and rules of thumb.</p>
<p>There’s a deeper set of issues here, issues which go to the heart of science. It’s the question of how we generalize. Regularization may give us a computational magic wand that helps our networks generalize better, but it doesn’t give us a principled understanding of how generalization works, nor of what the best approach is.</p>
</blockquote>
<p>Michael Nielsen, <a href="http://neuralnetworksanddeeplearning.com/chap3.html#why_does_regularization_help_reduce_overfitting">Neural networks and deep learning</a></p>
</div>
<div class="bodyitem">
<span id=sql-injection-video><p><a href="#sql-injection-video">2018-05-08</a></p></span>
<p>Computerphile has some brilliant educational videos on computer science, like <a href="https://www.youtube.com/watch?v=ciNHn38EyRc">a demo of SQL injection</a>, <a href="https://www.youtube.com/watch?v=eis11j_iGMs">a toy example of the lambda calculus</a>, and <a href="https://www.youtube.com/watch?v=9T8A89jgeTI">explaining the Y combinator</a>.</p>
</div>
<div class="bodyitem">
<span id=learning-knowledge-graph-reddit-journal-club><p><a href="#learning-knowledge-graph-reddit-journal-club">2018-05-07</a></p></span>
<h3 id="learning-via-knowledge-graph-and-reddit-journal-clubs">Learning via knowledge graph and reddit journal clubs</h3>
<p>It is a natural idea to look for ways to learn things like going through a skill tree in a computer RPG.</p>
<p>For example I made a <a href="https://ypei.me/posts/2015-04-02-juggling-skill-tree.html">DAG for juggling</a>.</p>
<p>Websites like <a href="https://knowen.org">Knowen</a> and <a href="https://metacademy.org">Metacademy</a> explore this idea with added flavour of open collaboration.</p>
<p>The design of Metacademy looks quite promising. It also has a nice tagline: “your package manager for knowledge”.</p>
<p>There are so so many tools to assist learning / research / knowledge sharing today, and we should keep experimenting, in the hope that eventually one of them will scale.</p>
<p>On another note, I often complain about the lack of a place to discuss math research online, but today I found on Reddit some journal clubs on machine learning: <a href="https://www.reddit.com/r/MachineLearning/comments/8aluhs/d_machine_learning_wayr_what_are_you_reading_week/">1</a>, <a href="https://www.reddit.com/r/MachineLearning/comments/8elmd8/d_anyone_having_trouble_reading_a_particular/">2</a>. If only we had this for maths. On the other hand r/math does have some interesting recurring threads as well: <a href="https://www.reddit.com/r/math/wiki/everythingaboutx">Everything about X</a> and <a href="https://www.reddit.com/r/math/search?q=what+are+you+working+on?+author:automoderator+&sort=new&restrict_sr=on&t=all">What Are You Working On?</a>. Hopefully these threads can last for years to come.</p>
</div>
<div class="bodyitem">
<span id=simple-solution-lack-of-math-rendering><p><a href="#simple-solution-lack-of-math-rendering">2018-05-02</a></p></span>
<h3 id="pastebin-for-the-win">Pastebin for the win</h3>
<p>The lack of maths rendering in major online communication platforms like instant messaging, email or Github has been a minor obsession of mine for quite a while, as I saw it as a big factor preventing people from talking more maths online. But today I realised this is totally a non-issue. Just do what people on IRC have been doing since the inception of the universe: use a (latex) pastebin.</p>
</div>
<div class="bodyitem">
<span id=neural-networks-programming-paradigm><p><a href="#neural-networks-programming-paradigm">2018-05-01</a></p></span>
<blockquote>
<p>Neural networks are one of the most beautiful programming paradigms ever invented. In the conventional approach to programming, we tell the computer what to do, breaking big problems up into many small, precisely defined tasks that the computer can easily perform. By contrast, in a neural network we don’t tell the computer how to solve our problem. Instead, it learns from observational data, figuring out its own solution to the problem at hand.</p>
</blockquote>
<p>Michael Nielsen - <a href="http://neuralnetworksanddeeplearning.com/about.html">What this book (Neural Networks and Deep Learning) is about</a></p>
<p>Unrelated to the quote, note that Nielsen’s book is licensed under <a href="https://creativecommons.org/licenses/by-nc/3.0/deed.en_GB">CC BY-NC</a>, so one can build on it and redistribute non-commercially.</p>
</div>
<div class="bodyitem">
<span id=google-search-not-ai><p><a href="#google-search-not-ai">2018-04-30</a></p></span>
<blockquote>
<p>But, users have learned to accommodate to Google not the other way around. We know what kinds of things we can type into Google and what we can’t and we keep our searches to things that Google is likely to help with. We know we are looking for texts and not answers to start a conversation with an entity that knows what we really need to talk about. People learn from conversation and Google can’t have one. It can pretend to have one using Siri but really those conversations tend to get tiresome when you are past asking about where to eat.</p>
</blockquote>
<p>Roger Schank - <a href="http://www.rogerschank.com/fraudulent-claims-made-by-IBM-about-Watson-and-AI">Fraudulent claims made by IBM about Watson and AI</a></p>
</div>
<div class="bodyitem">
<span id=hacker-ethics><p><a href="#hacker-ethics">2018-04-06</a></p></span>
<blockquote>
<ul>
<li>Access to computers—and anything that might teach you something about the way the world works—should be unlimited and total. Always yield to the Hands-On Imperative!</li>
<li>All information should be free.</li>
<li>Mistrust Authority—Promote Decentralization.</li>
<li>Hackers should be judged by their hacking, not bogus criteria such as degrees, age, race, or position.</li>
<li>You can create art and beauty on a computer.</li>
<li>Computers can change your life for the better.</li>
</ul>
</blockquote>
<p><a href="https://en.wikipedia.org/wiki/Hacker_ethic">The Hacker Ethic</a>, <a href="https://en.wikipedia.org/wiki/Hackers:_Heroes_of_the_Computer_Revolution">Hackers: Heroes of Computer Revolution</a>, by Steven Levy</p>
</div>
<div class="bodyitem">
<span id=static-site-generator><p><a href="#static-site-generator">2018-03-23</a></p></span>
<blockquote>
<p>“Static site generators seem like music databases, in that everyone eventually writes their own crappy one that just barely scratches the itch they had (and I’m no exception).”</p>
</blockquote>
<p><a href="https://news.ycombinator.com/item?id=7747651">__david__@hackernews</a></p>
<p>So did I.</p>
</div>
</div>
</body>
</html>
|