probability-aez-notes

Probability notes
Miscellaneous
Distributions
Bibliography

Probability notes

\(\sigma\)-field

A \(\sigma\)-field, \(\Sigma\), over a set \(\Omega\) is a subset of the power set of \(\Omega\) (ie it is a set of subsets of \(\Omega\)) with three properties:

\(\Sigma\) contains \(\Omega\),
\(\Sigma\) is closed under complements, and
\(\Sigma\) is closed under countable unions.

Probability measure

Given a set, \(\Omega\) and a \(\sigma\)-field, \(\Sigma\), over it, a real valued function, \(\mu\) is a measure if

\(\mu\) is non-negative,
\(\mu(\emptyset) = 0\), and
\(\mu\) has countable additivity, ie the measure of a union of a countable set of disjoint sets is the sum of the measures of them.

A probability measure is a measure where \(\mu(\Omega) = 1\).

Conditional probability

For a probability measure if \(P(B) > 0\), then the probability of \(A\) given \(B\) is defined by

\[ P(A\mid B) = \frac{P(A \cap B)}{P(B)} \]

Poisson process

This is a continuous time process which counts the arrivals which occur at exponentially distributed times with rate \(\lambda\).

Properties

Theorem For a Poisson process with rate \(\lambda \geq 0\), the number of events in an interval of duration \(t\) is a Poisson random variable with mean \(\lambda t\).

Probability generating function

The master equations for this process are

\[ \frac{dp_0}{dt} = - \lambda p_0 \]

and for \(i>0\)

\[ \frac{dp_i}{dt} = \lambda p_{i-1} - \lambda p_{i} \]

Note that the rate at which the population increases in independent of the size of the population. We can multiply through by \(z^{i}\) and sum over \(i\) to get the following PDE for the PGF, \(g(z,t)\)

\[ \frac{\partial g}{\partial t} = \lambda (z - 1) g \]

which, considering the initial condition \(g(z,0)=1\) gives us the PGF

\[ g(z,t) = \exp\{ \lambda t (z - 1) \} \]

which is the PGF for a Poisson distribution with rate \(\lambda t\).

Probability generating function (PGF)

For a discrete random variable \(X\) taking values from the non-negative integers, the probability generating function is defined as

\[ G(z) = \mathbb{E}(z^{X}) = \sum_{x=0}^{\infty} f_X(x) z^{x} \]

where \(f_X\) is the probability mass function (PMF).

Birth process

Consider a process where individuals give birth with rate \(\lambda\). In this case the rate at which the population increases in size depends on the current population size. The master equation for this is

\[ \frac{dp_i}{dt} = \lambda (i - 1) p_{i-1} - \lambda i p_i \]

for \(i\geq n\) where the process starts with a population of size \(n\): \(p_{n}(0) = 1\). Multiplying through by \(z^i\) and summing over \(i\) gives the following PDE for the PGF, \(g(z,t)\)

\[ \frac{\partial g}{\partial t} = \lambda (z^2 - z) \frac{\partial g}{\partial z}. \]

Given the initial condition, the boundary condition for the PDE is \(g(z,0) = z^n\). The method of characteristics looks suitable for solving this; the function, \(g\), will be constant along the curves satisfying

\[ \frac{dz}{dt} = \lambda (z - z^2). \]

(Hopefully you remember how to do partial fractions!) This leads us to conclude that the function is constant upon curves where

\[ C = \frac{z \exp\{ -\lambda t\}}{1 - z (1 - \exp\{ -\lambda t\})}. \]

So we know that \(g\) must be a function of the RHS of the previous equation. With the initial condition \(n=1\) — which due to independence is usually sufficient to consider — that function is the identity function so

\[ g(z,t) = \frac{z \exp\{ -\lambda t\}}{1 - z (1 - \exp\{ -\lambda t\})}. \]

From this we can then see that

\[ \lim _{z\to 1} g_{z}(z,t) = \exp\{\lambda t\} \]

which agrees with the mean-field approximation. To check that this is actually a solution to the PDE, we can use the following little bit of Maxima

define(g(z,t), z * exp(-l * t) / (1 - z * ( 1- exp(-l * t))));

is(equal(l * z * (z - 1) * diff(g(z,t), z), diff(g(z,t), t)));
is(equal(g(z,0), z));

Birth-death process

Consider a process where individuals give birth at a rate \(\lambda\) and die at a rate \(\mu\). The master equation for this is

\[ \frac{dp_i}{dt} = \lambda (i - 1) p_{i-1} - \lambda i p_{i} + \mu (i + 1) p_{i+1} - \mu i p_{i}. \]

with the condition that \(p_i=0\) for \(i<0\). Multiplying through by \(z^i\) and summing over \(i\) gives the following PDE for the PGF, \(g(z,t)\)

\[ \frac{\partial g}{\partial t} = (1 - z) ( \mu - \lambda z) \frac{\partial g}{\partial z}. \]

\[ \frac{dz}{dt} = (1 - z) ( \mu - \lambda z). \]

(Hopefully you remember how to do partial fractions!) This leads us to conclude that the function is constant upon curves where

\[ C = \frac{(\lambda z - \mu) \exp\{ - (\lambda - \mu) t \} - \mu (z-1)}{(\lambda z - \mu) \exp\{ - (\lambda - \mu) t \} - \lambda (z-1)}. \]

So we know that the solution can only be a function of the RHS of the previous equation. Since this simplifies to \(z\) at \(t=0\) and assuming the initial condition \(n=1\) we get the solution

\[ g(z,t) = \frac{(\lambda z - \mu) \exp\{ - (\lambda - \mu) t \} - \mu (z-1)}{(\lambda z - \mu) \exp\{ - (\lambda - \mu) t \} - \lambda (z-1)}. \]

We can then test this in maxima with the following

define(g(z,t), ((b * z - a) * exp(- (b - a) * t) - a * ( z- 1)) / ((b * z - a) * exp(- (b - a) * t) - b * ( z- 1)));

is(equal((1-z)*(b*z-a) * diff(g(z,t), z), -diff(g(z,t), t)));
is(equal(g(z,0), z));

Taking the derivative and the limit we get that the expected population size is \(exp\{(λ-μ)t\} which again agrees with the mean field approximation.

Moment closure

For notes on moment closure, see my statistics notes.

Change/Transformation of variable

This is a particularly nifty trick when dealing with tricky MCMC problems, see my statistics notes.

Random walk

Polya's Random Walk Theorem

Theorem The simple random walk on \(\mathbb{Z}^{d}\) is recurrent in dimensions \(d=1,2\) and transient in dimensions \(d\geq 3\).

Jonathon Novak has a particularly clear proof of this result.

Geometric series

\[ \sum_{k=0}^{\infty} r^{k} = \frac{1}{1 - r} \]

for \(|r| < 1\).

Squeeze theorem

Let \(\{a_n\}\) and \(\{c_n\}\) be two sequences which converge to \(l\). If there is an \(N\) such that for all \(n\geq N\) we have \(a_n \leq b_n \leq c_n\), then \(\{b_n\}\) also converges to \(l\).

Miscellaneous

Secretary problem

Distributions

Dirichlet process

The Dirichlet process is not from the Dirichlet distribution.
There are three main ways to think about the DP:
- the Chinese restraunt process,
- the stick-breaking process,
- and the Polya urn scheme
The DP has "rich-get-richer" behaviour.
The DP is defined by a base distribution, \(H\), and a scaling parameter, \(\alpha\).
- For larger values of \(\alpha\) the distribution looks more like \(H\), for smaller values it looks like a resampled sample from \(H\) where previously sampled values become more likely to be resampled.
This has found substantial use in Bayesian nonparameteric inference where it is used as a prior.

Log-normal

Notation

\[ X \sim \text{Lognormal}(\mu,\sigma^2) \]

where \(\mu\) can be any real number and \(\sigma>0\).

Properties

Density
\[ f(x) = \frac{1}{x \sigma \sqrt{2 \pi}} \exp\left\{ - \frac{{(\log(x) - \mu)}^{2}}{2\sigma^{2}} \right\} \]

If \(g(x)\) is the density of \(N(\mu, \sigma)\), then

\[ f(x) = g(\log x) / x. \]
```
load(distrib);

print(is(equal(pdf_lognormal(x, m, s), unit_step(x) * pdf_normal(log(x), m, s) / x)));
```
which outputs the message
```
: true
```

Negative binomial

Notation

\[ X \sim \text{NB}(r,p) \]

where \(r>0\) and \(p\in[0,1]\).

Properties

Note that sometimes the distribution is parameterised in terms of \(1-p\) which will change the following expressions.

Mean and variance

\[ \mathbb{E}X = \frac{pr}{1-p} \]

\[ \mathbb{V}X = \frac{pr}{(1-p)^2} \]

and so

\[ p = 1 - \frac{m}{v}, \quad r = \frac{m^2}{v - m} \]
Probability generating function
\[ G(z, r, p) = \left(\frac{1-p}{1-pz}\right)^r \]

which has some useful properties:
- \(G(\alpha z, r, p) = \left(\frac{1 - p}{1 - pz}\right)^r G(z, r, \alpha p)\)
- \(\partial_{z}^{n} G(z, r, p) = r^{\bar{n}} \left( \frac{p}{1-p} \right)^{n} G(z, r + n, p)\) where \(x^{\bar{n}} = x (x + 1) \dots (x + n - 1)\) (ie the Pochhammer symbol¹).
- \(z G(z, r, p) = G(z, r, p) / p - (1 - p) G(z, r - 1, p) / p\)
There is a maxima script for showing this is true here.
Moment generating function

\[ M_{X}(t) = \left(\frac{1-p}{1-p e^{t}}\right)^r \]

Poisson

Notation

\[ X \sim \text{Pois}(\lambda\) \]

where \(\lambda > 0\).

Properties

\[ \mathbb{E}X = \lambda \]

\[ \mathbb{V}X = \lambda \]

The moment generating function is

\[ G_{x}(z) = \exp[ \lambda (e^z - 1) ] \]

Bivariate discrete distributions

Trivariate reduction

Suppose that \(X = X_1 + X_3\) and \(Y = X_2 + X_3\) have the desired marginal distributions where the \(X_i\) are independent. This method of constructing bivariate distributions is called trivariate reduction. For example, a bivariate Poisson distribution can be constructed in this way.

Multiplicative method

For marginal PMFs \(f_X\) and \(f_Y\) the multiplicative method produces a bivariate distribution with PMF

\[ f_X(x)f_Y(y)(1 + \lambda (g_1(x) - c_1) (g_2(y) - c_2)) \]

where the \(g_i\) are functions that are bounded and \(c_1 = \mathbb{E}g_1(X)\) and \(c_2 = \mathbb{E}g_2(Y)\), with the expectations taken over the given marginal. This construction allows for both positive and negative correlations but limitations on \(\lambda\) can bound the strength of the correlation. See Lakshminarayana et al (1999) for an example using a Poisson distribution and Famoye (2010) for a bivariate negative binomial distribution

Bivariate Poisson (by trivariate reduction)

This distribution is constrained to have a non-negative correlation. See Karlis and Ntzoufras (2003) for an example.

Notation

\[ (X,Y) \sim \text{BP}(\lambda_{1},\lambda_{2},\lambda_{3}) \]

when \(X = X_{1} + X_{3}\) and \(Y = X_{2} + X_{3}\), and the \(X_{i} \sim \text{Pois}(\lambda_{i})\) (independently).

Bivariate negative binomial (by multiplicative method)

See Famoye (2010) for a bivariate negative binomial distribution constructed using the multiplicative method described above.

Bivariate negative binomial (by generating function)

This distribution comes from Edwards and Gurland (1961), who call it the compound correlated bivariate Poisson distribution. The PGF of this distribution is

\[ g_{X_1, X_2}(z_1, z_2) = \left( A + B_1 z_1 +B_2 z_2 + B_{12} z_1 z_2 \right)^{-k} \]

where \(A = 1 - B_1 - B_2 - B_{12}\). The PGF can then be used to get the PGFs of the marginal distributions and from this we can get the MGF and hence the moments.

Although Edwards and Gurland (1961) provide expressions for the PMF, they suggest that the recurrence relations that they provide can provide a computationally more convenient approach.

Method of moments parameters

The following snippets demonstrate how to compute the method of moments estimators given the first two moments of the first marginal and the covariance and the mean of the second marginal. We can define the PGF in maxima and then compute the mean and variance of the marginal distributions and check that these match up with the values given by Edwards and Gurland (1961).

g(z1, z2) := ((1 - B1 - B2 - B12) + B1 * z1 + B2 * z2 + B12 * z1 * z2) ^ (-k);

m1(t) := ratsimp(g(exp(t), 1));

m11 : subst(0, t, diff(m1(t), t, 1));
m12 : subst(0, t, diff(m1(t), t, 2));

var1 : collectterms(expand(m12 - m11^2), k);

/* check that the variance is correct */
display(is(equal(var1, -k * (B1 + B12) * (1 - B1 - B12))));

which outputs the message

:             is(equal(var1, (- k) (B1 + B12) (1 - B1 - B12))) = true

It is useful to define \(c = B_1 + B_{12}\). Then we can solve for \(k\) and \(c\) based on the mean and variance of the first marginal distribution.

/* use c = B1 + B12 and solve for c and k based on the marginal distribution */
display(solve([ex1 = -k * c, vx1 = ex1 * (1 - c)], [k, c])[1]);

which outputs the message

: solve([ex1 = - c k, vx1 = (1 - c) ex1], [k, c])(1) =
:                                                          2
:                                                       ex1            vx1 - ex1
:                                                [k = ---------, c = - ---------]
:                                                     vx1 - ex1           ex1

The next step is to check that the expression given for the covariance is correct (and get an equivalent expression in terms of \(c\)).

g(z1, z2) := ((1 - B1 - B2 - B12) + B1 * z1 + B2 * z2 + B12 * z1 * z2) ^ (-k);
m1(t) := ratsimp(g(exp(t), 1));
m11 : subst(0, t, diff(m1(t), t, 1));

m2(t) := ratsimp(g(1, exp(t)));
m21 : subst(0, t, diff(m2(t), t));
eXY : subst(0, t2, subst(0, t1, diff(diff(g(exp(t1), exp(t2)), t1), t2)));

/* check the covariance is correct */
cov_in_c : k * ((c - B1) * (c + B2 - 1) + B1 * B2);
display(is(
    equal(
      eXY - m11 * m21,
      subst(B1 + B12, c, cov_in_c)
      )
    ));

which outputs the message

:          is(equal(eXY - m11 m21, subst(B1 + B12, c, cov_in_c))) = true

Finally, we can use the covariance and the mean of the second marginal distribution to solve for \(B_1\) and \(B_2\), which then then gives us \(B_{12}\) from the solution for \(c\).

g(z1, z2) := ((1 - B1 - B2 - B12) + B1 * z1 + B2 * z2 + B12 * z1 * z2) ^ (-k);
m1(t) := ratsimp(g(exp(t), 1));
m11 : subst(0, t, diff(m1(t), t, 1));

m2(t) := ratsimp(g(1, exp(t)));
m21 : subst(0, t, diff(m2(t), t));
eXY : subst(0, t2, subst(0, t1, diff(diff(g(exp(t1), exp(t2)), t1), t2)));

/* check the covariance is correct */
cov_in_c : k * ((c - B1) * (c + B2 - 1) + B1 * B2);

/* solve for B1 and B2 based on the covariance and mean of the second marginal */
display(solve([ex2 = subst(c - B1, B12, m21), cov12 = cov_in_c], [B1, B2])[1]);

which outputs the message

: solve([ex2 = - (c + B2 - B1) k, cov12 = ((c - B1) (c + B2 - 1) + B1 B2) k],
:                                  c k + c ex2 + cov12       (c - 1) ex2 + cov12
:             [B1, B2])(1) = [B1 = -------------------, B2 = -------------------]
:                                           k                         k

Bibliography

edwards1961class

@article{edwards1961class,
  author =       {Carol B. Edwards and John Gurland},
  title =        {{A} {C}lass of {D}istributions {A}pplicable to {A}ccidents},
  journal =      {Journal of the American Statistical Association},
  volume =       56,
  number =       295,
  pages =        {503-517},
  year =         1961,
  publisher =    {Taylor & Francis},
  doi =          {10.1080/01621459.1961.10480641},
  url =          {
                  https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1961.10480641
                  }
}

famoye2010bivariate

@article{famoye2010bivariate,
  journal =      {Journal of Applied Statistics},
  pages =        {969--981},
  volume =       37,
  publisher =    {Taylor & Francis},
  number =       6,
  year =         2010,
  title =        {On the bivariate negative binomial regression model},
  author =       {Famoye, Felix},
  url =          {http://www.tandfonline.com/doi/abs/10.1080/02664760902984618}
}

karlis2003analysis

@article{karlis2003analysis,
  journal =      {Journal Of The Royal Statistical Society Series D-The
                  Statistician},
  pages =        {381--393},
  volume =       52,
  publisher =    {BLACKWELL PUBL LTD},
  year =         2003,
  title =        {{A}nalysis of sports data by using bivariate {P}oisson models},
  author =       {Karlis, D and Ntzoufras, L},
}

lakshminarayana1999bivariate

@article{lakshminarayana1999bivariate,
  journal =      {Communications in Statistics - Theory and Methods},
  pages =        {267--276},
  volume =       28,
  publisher =    {Marcel Dekker, Inc},
  number =       2,
  year =         1999,
  title =        {{O}n a {B}ivariate {P}oisson {D}istribution},
  author =       {Lakshminarayana, J and Pandit, S.N.N and Srinivasa Rao, K},
  url =          {http://www.tandfonline.com/doi/abs/10.1080/03610929908832297}
}

Footnotes:

See these notes for a definition of the Pochhammer symbol.

probability-aez-notes

Table of Contents

Probability notes

\(\sigma\)-field

Probability measure

Conditional probability

Poisson process

Properties

Probability generating function

Probability generating function (PGF)

Birth process

Birth-death process

Moment closure

Change/Transformation of variable

Random walk

Polya's Random Walk Theorem

Geometric series

Squeeze theorem

Miscellaneous

Distributions

Dirichlet process

Log-normal

Notation

Properties

Negative binomial

Notation

Properties

Poisson

Notation

Properties

Bivariate discrete distributions

Trivariate reduction

Multiplicative method

Bivariate Poisson (by trivariate reduction)

Notation

Bivariate negative binomial (by multiplicative method)

Bivariate negative binomial (by generating function)

Method of moments parameters

Bibliography

edwards1961class

famoye2010bivariate

karlis2003analysis

lakshminarayana1999bivariate

Footnotes: