Borel's paradox
Encyclopedia
In probability theory
, the Borel–Kolmogorov paradox (sometimes known as Borel's paradox) is a paradox
relating to conditional probability
with respect to an event
of probability zero (also known as a null set
). It is named after Émile Borel
and Andrey Kolmogorov
.
The paradox lies in the fact that a conditional distribution with respect to such an event is ambiguous unless it is viewed as an observation from a continuous random variable. Furthermore, it is dependent on how this random variable is defined.
on a sphere. What is its conditional distribution
on a great circle
? Because of the symmetry of the sphere, one might expect that the distribution is uniform and independent of the choice of coordinates. However, two analyses give contradictory results:
One distribution is uniform, the other is not. Yet both seem to be referring to the same great circle in different coordinate systems.
lies in a set E given that θ = 0 can be written
P(Φ ∈ E | θ = 0).
Elementary probability theory suggests this can be computed as
P(Φ ∈ E and θ=0)/P(θ=0), but that expression is not well-defined since
P(θ=0) = 0. Measure theory provides a way to define a conditional
probability, using the family of events Rab = {θ : a < θ < b} which
are horizontal rings consisting of all points with latitude between a
and b. Rab can be used to construct a function
fE(θ) = P(Φ ∈ E|θ=θ),
which can then be evaluated at fE(0) to give P(Φ ∈ E|θ=0).
See conditional expectation
for more information.
The resolution of the paradox is to notice that in case (2), P(θ ∈ F
| Φ=0) is defined using the events
Lab = {Φ : a < Φ < b}, which are
vertical wedges (more precisely lunes), consisting of all points whose
longitude varies between a and b. So although
P(Φ|θ=0) and P(θ|Φ=0)
each provide a probability distribution on a great circle, one of them
is defined using rings, and the other using lunes. Thus it is not
surprising after all that P(Φ|θ=0) and P(θ|Φ=0) have different
distributions.
Consider two continuous random variables (U,V) with joint density pUV. Now, let W = V / g(U) for some positive-valued, continuous function g. By change of variables, the joint density of (U,W) is:
Note that W = 0 if and only if V = 0, so it would appear that the conditional distribution of U should be the same under each of these events. However:
whereas
which are not equal unless g is constant.
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
, the Borel–Kolmogorov paradox (sometimes known as Borel's paradox) is a paradox
Paradox
Similar to Circular reasoning, A paradox is a seemingly true statement or group of statements that lead to a contradiction or a situation which seems to defy logic or intuition...
relating to conditional probability
Conditional probability
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...
with respect to an event
Event (probability theory)
In probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...
of probability zero (also known as a null set
Null set
In mathematics, a null set is a set that is negligible in some sense. For different applications, the meaning of "negligible" varies. In measure theory, any set of measure 0 is called a null set...
). It is named after Émile Borel
Émile Borel
Félix Édouard Justin Émile Borel was a French mathematician and politician.Borel was born in Saint-Affrique, Aveyron. Along with René-Louis Baire and Henri Lebesgue, he was among the pioneers of measure theory and its application to probability theory. The concept of a Borel set is named in his...
and Andrey Kolmogorov
Andrey Kolmogorov
Andrey Nikolaevich Kolmogorov was a Soviet mathematician, preeminent in the 20th century, who advanced various scientific fields, among them probability theory, topology, intuitionistic logic, turbulence, classical mechanics and computational complexity.-Early life:Kolmogorov was born at Tambov...
.
The paradox lies in the fact that a conditional distribution with respect to such an event is ambiguous unless it is viewed as an observation from a continuous random variable. Furthermore, it is dependent on how this random variable is defined.
A great circle puzzle
Suppose that a random variable has a uniform distributionUniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...
on a sphere. What is its conditional distribution
Conditional distribution
Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value...
on a great circle
Great circle
A great circle, also known as a Riemannian circle, of a sphere is the intersection of the sphere and a plane which passes through the center point of the sphere, as opposed to a general circle of a sphere where the plane is not required to pass through the center...
? Because of the symmetry of the sphere, one might expect that the distribution is uniform and independent of the choice of coordinates. However, two analyses give contradictory results:
- 1. If the coordinates are chosen so that the great circle is an equatorEquatorAn equator is the intersection of a sphere's surface with the plane perpendicular to the sphere's axis of rotation and containing the sphere's center of mass....
(latitudeLatitudeIn geography, the latitude of a location on the Earth is the angular distance of that location south or north of the Equator. The latitude is an angle, and is usually measured in degrees . The equator has a latitude of 0°, the North pole has a latitude of 90° north , and the South pole has a...
θ = 0), the conditional distribution for a longitudeLongitudeLongitude is a geographic coordinate that specifies the east-west position of a point on the Earth's surface. It is an angular measurement, usually expressed in degrees, minutes and seconds, and denoted by the Greek letter lambda ....
Φ defined on the interval (–π,π) is - 2. If the great circle is a line of longitude with Φ = 0, the conditional distribution for θ on the interval (–π/2,π/2) is
One distribution is uniform, the other is not. Yet both seem to be referring to the same great circle in different coordinate systems.
Explanation and implications
In case (1) above, the conditional probability that the longitude Φlies in a set E given that θ = 0 can be written
P(Φ ∈ E | θ = 0).
Elementary probability theory suggests this can be computed as
P(Φ ∈ E and θ=0)/P(θ=0), but that expression is not well-defined since
P(θ=0) = 0. Measure theory provides a way to define a conditional
probability, using the family of events Rab = {θ : a < θ < b} which
are horizontal rings consisting of all points with latitude between a
and b. Rab can be used to construct a function
fE(θ) = P(Φ ∈ E|θ=θ),
which can then be evaluated at fE(0) to give P(Φ ∈ E|θ=0).
See conditional expectation
Conditional expectation
In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....
for more information.
The resolution of the paradox is to notice that in case (2), P(θ ∈ F
| Φ=0) is defined using the events
Lab = {Φ : a < Φ < b}, which are
vertical wedges (more precisely lunes), consisting of all points whose
longitude varies between a and b. So although
P(Φ|θ=0) and P(θ|Φ=0)
each provide a probability distribution on a great circle, one of them
is defined using rings, and the other using lunes. Thus it is not
surprising after all that P(Φ|θ=0) and P(θ|Φ=0) have different
distributions.
Further example
An implication is that conditional density functions are not invariant under coordinate transformation of the conditioning variable.Consider two continuous random variables (U,V) with joint density pUV. Now, let W = V / g(U) for some positive-valued, continuous function g. By change of variables, the joint density of (U,W) is:
Note that W = 0 if and only if V = 0, so it would appear that the conditional distribution of U should be the same under each of these events. However:
whereas
which are not equal unless g is constant.