Sampling from common distributions#

From the ground up!

  • toc: true

  • badges: true

  • comments: true

  • author: Nipun Batra

  • categories: [ML]

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

PRNG#

Uniform distribution#

I am pasting the code from vega-vis

export default function(seed) {
  // Random numbers using a Linear Congruential Generator with seed value
  // Uses glibc values from https://en.wikipedia.org/wiki/Linear_congruential_generator
  return function() {
    seed = (1103515245 * seed + 12345) % 2147483647;
    return seed / 2147483647;
  };
}
def random_gen(seed, num):
    out = np.zeros(num)
    out[0] =  (1103515245 * seed + 12345) % 2147483647
    for i in range(1, num):
        out[i] = (1103515245 * out[i-1] + 12345) % 2147483647
    return out/2147483647
        
        
plt.hist(random_gen(0, 5000), density=True, alpha=0.5, label='Our implementation');
../../_images/0a399320b00369dfb4298ab291129c13859cafa9c8f9543b1bfc2a4d4a49d361.png
plt.hist(random_gen(0, 5000), density=True, alpha=0.5, label='Our implementation');
plt.hist(np.random.random(5000), density=True, alpha=0.4, label='Numpy implementation');
plt.legend()
<matplotlib.legend.Legend at 0x7f51d3cbd290>
../../_images/56a6d8cab346ed993fa6b9837b891cf250125466ab68820c95180f9104042c68.png

Inverse transform sampling#

Exponential distribution#

Borrowing from Wikipedia.

The probability density function (pdf) of an exponential distribution is $\( f(x ; \lambda)=\left\{\begin{array}{ll} \lambda e^{-\lambda x} & x \geq 0 \\ 0 & x \leq 0 \end{array}\right. \)$

The exponential distribution is sometimes parametrized in terms of the scale parameter \(\beta=1 / \lambda:\) $\( f(x ; \beta)=\left\{\begin{array}{ll} \frac{1}{\beta} e^{-x / \beta} & x \geq 0 \\ 0 & x<0 \end{array}\right. \)$

The cumulative distribution function is given by $\( F(x ; \lambda)=\left\{\begin{array}{ll} 1-e^{-\lambda x} & x \geq 0 \\ 0 & x<0 \end{array}\right. \)$

from scipy.stats import expon
rvs = [expon(scale=s) for s in [1/1., 1/2., 1/3.]]
x = np.arange(0, 10, 0.1)
for i, lambda_val in enumerate([1, 2, 3]):
    plt.plot(x, rvs[i].pdf(x), lw=2, label=r'$\lambda=%s$' %lambda_val)
plt.legend()
<matplotlib.legend.Legend at 0x7f51da6d4e10>
../../_images/8f0709c47df488650252febe8de7de5f78e905b106bf2124ab35c06fd5695a05.png

For the purposes of this notebook, I will be looking only at the standard exponential or set the scale to 1.

Let us now view the CDF of the standard exponential.

fig, ax = plt.subplots(nrows=2, sharex=True)
ax[0].plot(x, expon().pdf(x), lw=2)
ax[0].set_title("PDF")
ax[1].set_title("CDF")
ax[1].plot(x, expon().cdf(x), lw=2,)
[<matplotlib.lines.Line2D at 0x7f51d8985a90>]
../../_images/74da23f4b96ced167e16b732f960cc8655db68dfc21f9ef4a4c71fbf52c71162.png
r = expon.rvs(size=1000)
plt.hist(r, normed=True, bins=100)
plt.plot(x, expon().pdf(x), lw=2)
/home/nipunbatra-pc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  """Entry point for launching an IPython kernel.
[<matplotlib.lines.Line2D at 0x7f51d86b5ad0>]
../../_images/d4944d88677cc3496bbfbc63c79a5efcaea8e933325a3e7606875ef63b7126eb.png

Inverse of the CDF of exponential

The cumulative distribution function is given by $\( F(x ; \lambda)=\left\{\begin{array}{ll} 1-e^{-\lambda x} & x \geq 0 \\ 0 & x<0 \end{array}\right. \)$

Let us consider only \(x \geq 0\).

Let \(u = F^{-1}\) be the inverse of the CDF of \(F\).

\[\begin{split} u = 1-e^{-\lambda x} \\ 1- u = e^{-\lambda x} \\ \log(1-u) = -\lambda x\\ x = -\frac{\log(1-u)}{\lambda} \end{split}\]
def inverse_transform(lambda_val, num_samples):
    u = np.random.random(num_samples)
    x = -np.log(1-u)/lambda_val
    return x
plt.hist(inverse_transform(1, 5000), bins=100, normed=True,label='Generated using our function');
plt.plot(x, expon().pdf(x), lw=4, label='Generated using scipy')
plt.legend()
/home/nipunbatra-pc/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  """Entry point for launching an IPython kernel.
<matplotlib.legend.Legend at 0x7f51d5f95410>
../../_images/3d5acc831ce28c05c7d750e641a947563a2d726d8eaea6859ffe7f4c12772ad7.png

Drawing samples from Laplace distribution#

A random variable has a Laplace \((\mu, b)\) distribution if its probability density function is $\( f(x | \mu, b)=\frac{1}{2 b} \exp \left(-\frac{|x-\mu|}{b}\right) \)$

\[F^{-1}(u)=\mu-b \operatorname{sgn}(u-0.5) \ln (1-2|u-0.5|)\]
def inverse_transform_laplace(b, mu, num_samples):
    u = np.random.random(num_samples)
    x = mu-b*np.sign(u-0.5)*np.log(1-2*np.abs(u-0.5))
    return x
from scipy.stats import laplace
plt.hist(inverse_transform_laplace(1, 0, 5000),bins=100, density=True, label='Generated using \nour function');
x_n = np.linspace(-10, 10, 100)
plt.plot(x_n, laplace().pdf(x_n), lw=4, label='Generated from\n scipy')
plt.legend()
<matplotlib.legend.Legend at 0x7f51d4a99750>
../../_images/d1cdd2fb7d73ebacf3f0b27602d8936d8bfd220b4c669df547fc67b83d08a374.png

Box-Muller transform#