path: root/microposts/learning-undecidable.org



#+title: learning-undecidable

#+date: <2019-01-27>

My take on the
[[https://www.nature.com/articles/s42256-018-0002-3][Nature paper
/Learning can be undecidable/]]:

Fantastic article, very clearly written.

So it reduces a kind of learninability called estimating the maximum
(EMX) to the cardinality of real numbers which is undecidable.

When it comes to the relation between EMX and the rest of machine
learning framework, the article mentions that EMX belongs to "extensions
of PAC learnability include Vapnik's statistical learning setting and
the equivalent general learning setting by Shalev-Shwartz and
colleagues" (I have no idea what these two things are), but it does not
say whether EMX is representative of or reduces to common learning
tasks. So it is not clear whether its undecidability applies to ML at
large.

Another condition to the main theorem is the union bounded closure
assumption. It seems a reasonable property of a family of sets, but then
again I wonder how that translates to learning.

The article says "By now, we know of quite a few independence [from
mathematical axioms] results, mostly for set theoretic questions like
the continuum hypothesis, but also for results in algebra, analysis,
infinite combinatorics and more. Machine learning, so far, has escaped
this fate." but the description of the EMX learnability makes it more
like a classical mathematical / theoretical computer science problem
rather than machine learning.

An insightful conclusion: "How come learnability can neither be proved
nor refuted? A closer look reveals that the source of the problem is in
defining learnability as the existence of a learning function rather
than the existence of a learning algorithm. In contrast with the
existence of algorithms, the existence of functions over infinite
domains is a (logically) subtle issue."

In relation to practical problems, it uses an example of ad targeting.
However, A lot is lost in translation from the main theorem to this ad
example.

The EMX problem states: given a domain X, a distribution P over X which
is unknown, some samples from P, and a family of subsets of X called F,
find A in F that approximately maximises P(A).

The undecidability rests on X being the continuous [0, 1] interval, and
from the insight, we know the problem comes from the cardinality of
subsets of the [0, 1] interval, which is "logically subtle".

In the ad problem, the domain X is all potential visitors, which is
finite because there are finite number of people in the world. In this
case P is a categorical distribution over the 1..n where n is the
population of the world. One can have a good estimate of the parameters
of a categorical distribution by asking for sufficiently large number of
samples and computing the empirical distribution. Let's call the
estimated distribution Q. One can choose the from F (also finite) the
set that maximises Q(A) which will be a solution to EMX.

In other words, the theorem states: EMX is undecidable because not all
EMX instances are decidable, because there are some nasty ones due to
infinities. That does not mean no EMX instance is decidable. And I think
the ad instance is decidable. Is there a learning task that actually
corresponds to an undecidable EMX instance? I don't know, but I will not
believe the result of this paper is useful until I see one.

h/t Reynaldo Boulogne
#+title: learning-undecidable

#+date: <2019-01-27>

My take on the
[[https://www.nature.com/articles/s42256-018-0002-3][Nature paper
/Learning can be undecidable/]]:

Fantastic article, very clearly written.

So it reduces a kind of learninability called estimating the maximum
(EMX) to the cardinality of real numbers which is undecidable.

When it comes to the relation between EMX and the rest of machine
learning framework, the article mentions that EMX belongs to "extensions
of PAC learnability include Vapnik's statistical learning setting and
the equivalent general learning setting by Shalev-Shwartz and
colleagues" (I have no idea what these two things are), but it does not
say whether EMX is representative of or reduces to common learning
tasks. So it is not clear whether its undecidability applies to ML at
large.

Another condition to the main theorem is the union bounded closure
assumption. It seems a reasonable property of a family of sets, but then
again I wonder how that translates to learning.

The article says "By now, we know of quite a few independence [from
mathematical axioms] results, mostly for set theoretic questions like
the continuum hypothesis, but also for results in algebra, analysis,
infinite combinatorics and more. Machine learning, so far, has escaped
this fate." but the description of the EMX learnability makes it more
like a classical mathematical / theoretical computer science problem
rather than machine learning.

An insightful conclusion: "How come learnability can neither be proved
nor refuted? A closer look reveals that the source of the problem is in
defining learnability as the existence of a learning function rather
than the existence of a learning algorithm. In contrast with the
existence of algorithms, the existence of functions over infinite
domains is a (logically) subtle issue."

In relation to practical problems, it uses an example of ad targeting.
However, A lot is lost in translation from the main theorem to this ad
example.

The EMX problem states: given a domain X, a distribution P over X which
is unknown, some samples from P, and a family of subsets of X called F,
find A in F that approximately maximises P(A).

The undecidability rests on X being the continuous [0, 1] interval, and
from the insight, we know the problem comes from the cardinality of
subsets of the [0, 1] interval, which is "logically subtle".

In the ad problem, the domain X is all potential visitors, which is
finite because there are finite number of people in the world. In this
case P is a categorical distribution over the 1..n where n is the
population of the world. One can have a good estimate of the parameters
of a categorical distribution by asking for sufficiently large number of
samples and computing the empirical distribution. Let's call the
estimated distribution Q. One can choose the from F (also finite) the
set that maximises Q(A) which will be a solution to EMX.

In other words, the theorem states: EMX is undecidable because not all
EMX instances are decidable, because there are some nasty ones due to
infinities. That does not mean no EMX instance is decidable. And I think
the ad instance is decidable. Is there a learning task that actually
corresponds to an undecidable EMX instance? I don't know, but I will not
believe the result of this paper is useful until I see one.

h/t Reynaldo Boulogne