probability-aez-notes
Table of Contents
Probability notes
\(\sigma\)-field
A \(\sigma\)-field, \(\Sigma\), over a set \(\Omega\) is a subset of the power set of \(\Omega\) (ie it is a set of subsets of \(\Omega\)) with three properties:
- \(\Sigma\) contains \(\Omega\),
- \(\Sigma\) is closed under complements, and
- \(\Sigma\) is closed under countable unions.
Probability measure
Given a set, \(\Omega\) and a \(\sigma\)-field, \(\Sigma\), over it, a real valued function, \(\mu\) is a measure if
- \(\mu\) is non-negative,
- \(\mu(\emptyset) = 0\), and
- \(\mu\) has countable additivity, ie the measure of a union of a countable set of disjoint sets is the sum of the measures of them.
A probability measure is a measure where \(\mu(\Omega) = 1\).
Conditional probability
For a probability measure if \(P(B) > 0\), then the probability of \(A\) given \(B\) is defined by
\[ P(A\mid B) = \frac{P(A \cap B)}{P(B)} \]
Poisson process
This is a continuous time process which counts the arrivals which occur at exponentially distributed times with rate \(\lambda\).
Properties
Theorem For a Poisson process with rate \(\lambda \geq 0\), the number of events in an interval of duration \(t\) is a Poisson random variable with mean \(\lambda t\).
Probability generating function
The master equations for this process are
\[ \frac{dp_0}{dt} = - \lambda p_0 \]
and for \(i>0\)
\[ \frac{dp_i}{dt} = \lambda p_{i-1} - \lambda p_{i} \]
Note that the rate at which the population increases in independent of the size of the population. We can multiply through by \(z^{i}\) and sum over \(i\) to get the following PDE for the PGF, \(g(z,t)\)
\[ \frac{\partial g}{\partial t} = \lambda (z - 1) g \]
which, considering the initial condition \(g(z,0)=1\) gives us the PGF
\[ g(z,t) = \exp\{ \lambda t (z - 1) \} \]
which is the PGF for a Poisson distribution with rate \(\lambda t\).
Probability generating function (PGF)
For a discrete random variable \(X\) taking values from the non-negative integers, the probability generating function is defined as
\[ G(z) = \mathbb{E}(z^{X}) = \sum_{x=0}^{\infty} f_X(x) z^{x} \]
where \(f_X\) is the probability mass function (PMF).
Birth process
Consider a process where individuals give birth with rate \(\lambda\). In this case the rate at which the population increases in size depends on the current population size. The master equation for this is
\[ \frac{dp_i}{dt} = \lambda (i - 1) p_{i-1} - \lambda i p_i \]
for \(i\geq n\) where the process starts with a population of size \(n\): \(p_{n}(0) = 1\). Multiplying through by \(z^i\) and summing over \(i\) gives the following PDE for the PGF, \(g(z,t)\)
\[ \frac{\partial g}{\partial t} = \lambda (z^2 - z) \frac{\partial g}{\partial z}. \]
Given the initial condition, the boundary condition for the PDE is \(g(z,0) = z^n\). The method of characteristics looks suitable for solving this; the function, \(g\), will be constant along the curves satisfying
\[ \frac{dz}{dt} = \lambda (z - z^2). \]
(Hopefully you remember how to do partial fractions!) This leads us to conclude that the function is constant upon curves where
\[ C = \frac{z \exp\{ -\lambda t\}}{1 - z (1 - \exp\{ -\lambda t\})}. \]
So we know that \(g\) must be a function of the RHS of the previous equation. With the initial condition \(n=1\) — which due to independence is usually sufficient to consider — that function is the identity function so
\[ g(z,t) = \frac{z \exp\{ -\lambda t\}}{1 - z (1 - \exp\{ -\lambda t\})}. \]
From this we can then see that
\[ \lim _{z\to 1} g_{z}(z,t) = \exp\{\lambda t\} \]
which agrees with the mean-field approximation. To check that this is actually a solution to the PDE, we can use the following little bit of Maxima
define(g(z,t), z * exp(-l * t) / (1 - z * ( 1- exp(-l * t)))); is(equal(l * z * (z - 1) * diff(g(z,t), z), diff(g(z,t), t))); is(equal(g(z,0), z));
Birth-death process
Consider a process where individuals give birth at a rate \(\lambda\) and die at a rate \(\mu\). The master equation for this is
\[ \frac{dp_i}{dt} = \lambda (i - 1) p_{i-1} - \lambda i p_{i} + \mu (i + 1) p_{i+1} - \mu i p_{i}. \]
with the condition that \(p_i=0\) for \(i<0\). Multiplying through by \(z^i\) and summing over \(i\) gives the following PDE for the PGF, \(g(z,t)\)
\[ \frac{\partial g}{\partial t} = (1 - z) ( \mu - \lambda z) \frac{\partial g}{\partial z}. \]
Given the initial condition, the boundary condition for the PDE is \(g(z,0) = z^n\). The method of characteristics looks suitable for solving this; the function, \(g\), will be constant along the curves satisfying
\[ \frac{dz}{dt} = (1 - z) ( \mu - \lambda z). \]
(Hopefully you remember how to do partial fractions!) This leads us to conclude that the function is constant upon curves where
\[ C = \frac{(\lambda z - \mu) \exp\{ - (\lambda - \mu) t \} - \mu (z-1)}{(\lambda z - \mu) \exp\{ - (\lambda - \mu) t \} - \lambda (z-1)}. \]
So we know that the solution can only be a function of the RHS of the previous equation. Since this simplifies to \(z\) at \(t=0\) and assuming the initial condition \(n=1\) we get the solution
\[ g(z,t) = \frac{(\lambda z - \mu) \exp\{ - (\lambda - \mu) t \} - \mu (z-1)}{(\lambda z - \mu) \exp\{ - (\lambda - \mu) t \} - \lambda (z-1)}. \]
We can then test this in maxima with the following
define(g(z,t), ((b * z - a) * exp(- (b - a) * t) - a * ( z- 1)) / ((b * z - a) * exp(- (b - a) * t) - b * ( z- 1))); is(equal((1-z)*(b*z-a) * diff(g(z,t), z), -diff(g(z,t), t))); is(equal(g(z,0), z));
Taking the derivative and the limit we get that the expected population size is \(exp\{(λ-μ)t\} which again agrees with the mean field approximation.
Moment closure
For notes on moment closure, see my statistics notes.
Change/Transformation of variable
This is a particularly nifty trick when dealing with tricky MCMC problems, see my statistics notes.
Random walk
Polya's Random Walk Theorem
Theorem The simple random walk on \(\mathbb{Z}^{d}\) is recurrent in dimensions \(d=1,2\) and transient in dimensions \(d\geq 3\).
Jonathon Novak has a particularly clear proof of this result.
Geometric series
\[ \sum_{k=0}^{\infty} r^{k} = \frac{1}{1 - r} \]
for \(|r| < 1\).
Squeeze theorem
Let \(\{a_n\}\) and \(\{c_n\}\) be two sequences which converge to \(l\). If there is an \(N\) such that for all \(n\geq N\) we have \(a_n \leq b_n \leq c_n\), then \(\{b_n\}\) also converges to \(l\).
Miscellaneous
Distributions
Dirichlet process
- The Dirichlet process is not from the Dirichlet distribution.
- There are three main ways to think about the DP:
- the Chinese restraunt process,
- the stick-breaking process,
- and the Polya urn scheme
- The DP has "rich-get-richer" behaviour.
- The DP is defined by a base distribution, \(H\), and a scaling parameter, \(\alpha\).
- For larger values of \(\alpha\) the distribution looks more like \(H\), for smaller values it looks like a resampled sample from \(H\) where previously sampled values become more likely to be resampled.
- This has found substantial use in Bayesian nonparameteric inference where it is used as a prior.
Log-normal
Notation
\[ X \sim \text{Lognormal}(\mu,\sigma^2) \]
where \(\mu\) can be any real number and \(\sigma>0\).
Properties
- Density
\[ f(x) = \frac{1}{x \sigma \sqrt{2 \pi}} \exp\left\{ - \frac{{(\log(x) - \mu)}^{2}}{2\sigma^{2}} \right\} \]
If \(g(x)\) is the density of \(N(\mu, \sigma)\), then
\[ f(x) = g(\log x) / x. \]
load(distrib); print(is(equal(pdf_lognormal(x, m, s), unit_step(x) * pdf_normal(log(x), m, s) / x)));
which outputs the message
: true
Negative binomial
Notation
\[ X \sim \text{NB}(r,p) \]
where \(r>0\) and \(p\in[0,1]\).
Properties
Note that sometimes the distribution is parameterised in terms of \(1-p\) which will change the following expressions.
- Mean and variance
\[ \mathbb{E}X = \frac{pr}{1-p} \]
\[ \mathbb{V}X = \frac{pr}{(1-p)^2} \]
and so
\[ p = 1 - \frac{m}{v}, \quad r = \frac{m^2}{v - m} \]
- Probability generating function
\[ G(z, r, p) = \left(\frac{1-p}{1-pz}\right)^r \]
which has some useful properties:
- \(G(\alpha z, r, p) = \left(\frac{1 - p}{1 - pz}\right)^r G(z, r, \alpha p)\)
- \(\partial_{z}^{n} G(z, r, p) = r^{\bar{n}} \left( \frac{p}{1-p} \right)^{n} G(z, r + n, p)\) where \(x^{\bar{n}} = x (x + 1) \dots (x + n - 1)\) (ie the Pochhammer symbol1).
- \(z G(z, r, p) = G(z, r, p) / p - (1 - p) G(z, r - 1, p) / p\)
There is a maxima script for showing this is true here.
- Moment generating function
\[ M_{X}(t) = \left(\frac{1-p}{1-p e^{t}}\right)^r \]
Poisson
Notation
\[ X \sim \text{Pois}(\lambda\) \]
where \(\lambda > 0\).
Properties
\[ \mathbb{E}X = \lambda \]
\[ \mathbb{V}X = \lambda \]
The moment generating function is
\[ G_{x}(z) = \exp[ \lambda (e^z - 1) ] \]
Bivariate discrete distributions
Trivariate reduction
Suppose that \(X = X_1 + X_3\) and \(Y = X_2 + X_3\) have the desired marginal distributions where the \(X_i\) are independent. This method of constructing bivariate distributions is called trivariate reduction. For example, a bivariate Poisson distribution can be constructed in this way.
Multiplicative method
For marginal PMFs \(f_X\) and \(f_Y\) the multiplicative method produces a bivariate distribution with PMF
\[ f_X(x)f_Y(y)(1 + \lambda (g_1(x) - c_1) (g_2(y) - c_2)) \]
where the \(g_i\) are functions that are bounded and \(c_1 = \mathbb{E}g_1(X)\) and \(c_2 = \mathbb{E}g_2(Y)\), with the expectations taken over the given marginal. This construction allows for both positive and negative correlations but limitations on \(\lambda\) can bound the strength of the correlation. See Lakshminarayana et al (1999) for an example using a Poisson distribution and Famoye (2010) for a bivariate negative binomial distribution
Bivariate Poisson (by trivariate reduction)
This distribution is constrained to have a non-negative correlation. See Karlis and Ntzoufras (2003) for an example.
Notation
\[ (X,Y) \sim \text{BP}(\lambda_{1},\lambda_{2},\lambda_{3}) \]
when \(X = X_{1} + X_{3}\) and \(Y = X_{2} + X_{3}\), and the \(X_{i} \sim \text{Pois}(\lambda_{i})\) (independently).
Bivariate negative binomial (by multiplicative method)
See Famoye (2010) for a bivariate negative binomial distribution constructed using the multiplicative method described above.
Bivariate negative binomial (by generating function)
This distribution comes from Edwards and Gurland (1961), who call it the compound correlated bivariate Poisson distribution. The PGF of this distribution is
\[ g_{X_1, X_2}(z_1, z_2) = \left( A + B_1 z_1 +B_2 z_2 + B_{12} z_1 z_2 \right)^{-k} \]
where \(A = 1 - B_1 - B_2 - B_{12}\). The PGF can then be used to get the PGFs of the marginal distributions and from this we can get the MGF and hence the moments.
Although Edwards and Gurland (1961) provide expressions for the PMF, they suggest that the recurrence relations that they provide can provide a computationally more convenient approach.
Method of moments parameters
The following snippets demonstrate how to compute the method of moments estimators given the first two moments of the first marginal and the covariance and the mean of the second marginal. We can define the PGF in maxima and then compute the mean and variance of the marginal distributions and check that these match up with the values given by Edwards and Gurland (1961).
g(z1, z2) := ((1 - B1 - B2 - B12) + B1 * z1 + B2 * z2 + B12 * z1 * z2) ^ (-k); m1(t) := ratsimp(g(exp(t), 1)); m11 : subst(0, t, diff(m1(t), t, 1)); m12 : subst(0, t, diff(m1(t), t, 2)); var1 : collectterms(expand(m12 - m11^2), k); /* check that the variance is correct */ display(is(equal(var1, -k * (B1 + B12) * (1 - B1 - B12))));
which outputs the message
: is(equal(var1, (- k) (B1 + B12) (1 - B1 - B12))) = true
It is useful to define \(c = B_1 + B_{12}\). Then we can solve for \(k\) and \(c\) based on the mean and variance of the first marginal distribution.
/* use c = B1 + B12 and solve for c and k based on the marginal distribution */ display(solve([ex1 = -k * c, vx1 = ex1 * (1 - c)], [k, c])[1]);
which outputs the message
: solve([ex1 = - c k, vx1 = (1 - c) ex1], [k, c])(1) = : 2 : ex1 vx1 - ex1 : [k = ---------, c = - ---------] : vx1 - ex1 ex1
The next step is to check that the expression given for the covariance is correct (and get an equivalent expression in terms of \(c\)).
g(z1, z2) := ((1 - B1 - B2 - B12) + B1 * z1 + B2 * z2 + B12 * z1 * z2) ^ (-k); m1(t) := ratsimp(g(exp(t), 1)); m11 : subst(0, t, diff(m1(t), t, 1)); m2(t) := ratsimp(g(1, exp(t))); m21 : subst(0, t, diff(m2(t), t)); eXY : subst(0, t2, subst(0, t1, diff(diff(g(exp(t1), exp(t2)), t1), t2))); /* check the covariance is correct */ cov_in_c : k * ((c - B1) * (c + B2 - 1) + B1 * B2); display(is( equal( eXY - m11 * m21, subst(B1 + B12, c, cov_in_c) ) ));
which outputs the message
: is(equal(eXY - m11 m21, subst(B1 + B12, c, cov_in_c))) = true
Finally, we can use the covariance and the mean of the second marginal distribution to solve for \(B_1\) and \(B_2\), which then then gives us \(B_{12}\) from the solution for \(c\).
g(z1, z2) := ((1 - B1 - B2 - B12) + B1 * z1 + B2 * z2 + B12 * z1 * z2) ^ (-k); m1(t) := ratsimp(g(exp(t), 1)); m11 : subst(0, t, diff(m1(t), t, 1)); m2(t) := ratsimp(g(1, exp(t))); m21 : subst(0, t, diff(m2(t), t)); eXY : subst(0, t2, subst(0, t1, diff(diff(g(exp(t1), exp(t2)), t1), t2))); /* check the covariance is correct */ cov_in_c : k * ((c - B1) * (c + B2 - 1) + B1 * B2); /* solve for B1 and B2 based on the covariance and mean of the second marginal */ display(solve([ex2 = subst(c - B1, B12, m21), cov12 = cov_in_c], [B1, B2])[1]);
which outputs the message
: solve([ex2 = - (c + B2 - B1) k, cov12 = ((c - B1) (c + B2 - 1) + B1 B2) k], : c k + c ex2 + cov12 (c - 1) ex2 + cov12 : [B1, B2])(1) = [B1 = -------------------, B2 = -------------------] : k k
Bibliography
edwards1961class
@article{edwards1961class, author = {Carol B. Edwards and John Gurland}, title = {{A} {C}lass of {D}istributions {A}pplicable to {A}ccidents}, journal = {Journal of the American Statistical Association}, volume = 56, number = 295, pages = {503-517}, year = 1961, publisher = {Taylor & Francis}, doi = {10.1080/01621459.1961.10480641}, url = { https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1961.10480641 } }
famoye2010bivariate
@article{famoye2010bivariate, journal = {Journal of Applied Statistics}, pages = {969--981}, volume = 37, publisher = {Taylor & Francis}, number = 6, year = 2010, title = {On the bivariate negative binomial regression model}, author = {Famoye, Felix}, url = {http://www.tandfonline.com/doi/abs/10.1080/02664760902984618} }
karlis2003analysis
@article{karlis2003analysis, journal = {Journal Of The Royal Statistical Society Series D-The Statistician}, pages = {381--393}, volume = 52, publisher = {BLACKWELL PUBL LTD}, year = 2003, title = {{A}nalysis of sports data by using bivariate {P}oisson models}, author = {Karlis, D and Ntzoufras, L}, }
lakshminarayana1999bivariate
@article{lakshminarayana1999bivariate, journal = {Communications in Statistics - Theory and Methods}, pages = {267--276}, volume = 28, publisher = {Marcel Dekker, Inc}, number = 2, year = 1999, title = {{O}n a {B}ivariate {P}oisson {D}istribution}, author = {Lakshminarayana, J and Pandit, S.N.N and Srinivasa Rao, K}, url = {http://www.tandfonline.com/doi/abs/10.1080/03610929908832297} }
Footnotes:
See these notes for a definition of the Pochhammer symbol.