
Uniform convergence (combinatorics)
Encyclopedia
For a class of predicates
defined on a set
and a set of samples
, where
, the empirical frequency of
on
is
. The Uniform Convergence Theorem states, roughly,that if
is "simple" and we draw samples independently (with replacement) from
according to a distribution
, then with high probability all the empirical frequency will be close to its expectation, where the expectation is given by
. Here "simple" means that the Vapnik-Chernovenkis dimension of the class
is small relative to the size of the sample.
In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole.
is a set of
-valued functions defined on a set
and
is a probability distribution on
then for
and
a positive integer, we have,
where, for any
,
is defined as: For any
-valued functions
over
and
,
And for any natural number
the shattering number
is defined as.
From the point of Learning Theory one can consider
to be the Concept/Hypothesis
class defined over the instance set
. Before getting into the details of the proof of the theorem we will state Sauer's Lemma which we will need in our proof.
to the VC Dimension.
Lemma:
, where
is the VC Dimension
of the concept class
.
Corollary:
.
We present the technical details of the proof.
for some
and
Then for
,
.
Proof:
By the triangle inequality,
if
and
then
.
Therefore,
Now for
fix an
such that
. For this
, we shall show that
Thus for any
,
and hence
. And hence we perform the first step of our high level idea.
Notice,
is a binomial random variable with expectation
and variance
. By Chebyshev's inequality
we get,
be the set of all permutations of
that swaps
and
in some subset of
.
Lemma: Let
be any subset of
and
any probability distribution on
. Then,
where the expectation is over
chosen according to
, and the probability is over
chosen uniformly from
.
Proof:
For any
The maximum is guaranteed to exist since there is only a finite set of values that probability under a random permutation can take.
Proof:
Let us define
and
which is atmost
. This means there are functions
such that for any
between
and
with
for
.
We see that
iff for some
in
satisfies,
.
Hence if we define
if
and
otherwise.
For
and
, we have that
iff for some
in
satisfies
. By union bound we get,
Since, the distribution over the permutations
is uniform for each
, so
equals
, with equal probability.
Thus,
where the probability on the right is over
and both the possibilities are equally likely. By Hoeffding's inequality
, this is at most
.
Finally, combining all the three parts of the proof we get the Uniform Convergence Theorem.












In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole.
Uniform convergence theorem statement
If






-
for some
where, for any

-
,
-
and
.
indicates that the probability is taken over
consisting of
i.i.d. draws from the distribution
.





-
.
And for any natural number


-
.
From the point of Learning Theory one can consider

Concept class
A concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept class is a subject of computational learning theory....
class defined over the instance set

Sauer's lemma:
Sauer's Lemma relates the shattering number
Lemma:


VC dimension
In statistical learning theory, or sometimes computational learning theory, the VC dimension is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter...
of the concept class

Corollary:

Proof of uniform convergence theorem
Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof.- Symmetrization: We transform the problem of analyzing
into the problem of analyzing
, where
and
are i.i.d samples of size
drawn according to the distribution
. One can view
as the original randomly drawn sample of length
, while
may be thought as the testing sample which is used to estimate
.
- Permutation: Since
and
are picked identically and independently, so swapping elements between them will not change the probability distribution on
and
. So, we will try to bound the probability of
for some
by considering the effect of a specific collection of permutations of the joint sample
. Specifically, we consider permutations
which swap
and
in some subset of
. The symbol
means the concatenation of
and
.
- Reduction to a finite class: We can now restrict the function class
to a fixed joint sample and hence, if
has finite VC Dimension, it reduces to the problem to one involving a finite function class.
We present the technical details of the proof.
Symmetrization
Lemma: Let

-
for some
.
Then for


Proof:
By the triangle inequality,
if



Therefore,
-
and
-
and
[since
and
are independent].
Now for




-
.
Thus for any



Notice,



Chebyshev's inequality
In probability theory, Chebyshev’s inequality guarantees that in any data sample or probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k2 of the distribution’s values can be more than k standard deviations away from the mean...
we get,
-
for the mentioned bound on
. Here we use the fact that
for
.
Permutations
Let





Lemma: Let




where the expectation is over




Proof:
For any

-
[since coordinate permutations preserve the product distribution
].
-
-
-
[Because
is finite]
-
[The expectation]
-
.
The maximum is guaranteed to exist since there is only a finite set of values that probability under a random permutation can take.
Reduction to a finite class
Lemma: Basing on the previous lemma,-
.
Proof:
Let us define









We see that




Hence if we define



For






-
.
Since, the distribution over the permutations




Thus,
-
,
where the probability on the right is over

Hoeffding's inequality
In probability theory, Hoeffding's inequality provides an upper bound on the probability for the sum of random variables to deviate from its expected value. Hoeffding's inequality was proved by Wassily Hoeffding.LetX_1, \dots, X_n \!...
, this is at most

Finally, combining all the three parts of the proof we get the Uniform Convergence Theorem.