02: Fetch week 2

This commit is contained in:
2022-09-15 10:31:34 +02:00
parent 9ffbf26dc7
commit c2cc34c270

View File

@ -0,0 +1,717 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4c431265",
"metadata": {},
"source": [
"# Exercise sheet\n",
"\n",
"Some general remarks about the exercises:\n",
"* For your convenience functions from the lecture are included below. Feel free to reuse them without copying to the exercise solution box.\n",
"* For each part of the exercise a solution box has been added, but you may insert additional boxes. Do not hesitate to add Markdown boxes for textual or LaTeX answers (via `Cell > Cell Type > Markdown`). But make sure to replace any part that says `YOUR CODE HERE` or `YOUR ANSWER HERE` and remove the `raise NotImplementedError()`.\n",
"* Please make your code readable by humans (and not just by the Python interpreter): choose informative function and variable names and use consistent formatting. Feel free to check the [PEP 8 Style Guide for Python](https://www.python.org/dev/peps/pep-0008/) for the widely adopted coding conventions or [this guide for explanation](https://realpython.com/python-pep8/).\n",
"* Make sure that the full notebook runs without errors before submitting your work. This you can do by selecting `Kernel > Restart & Run All` in the jupyter menu.\n",
"* For some exercises test cases have been provided in a separate cell in the form of `assert` statements. When run, a successful test will give no output, whereas a failed test will display an error message.\n",
"* Each sheet has 100 points worth of exercises. Note that only the grades of sheets number 2, 4, 6, 8 count towards the course examination. Submitting sheets 1, 3, 5, 7 & 9 is voluntary and their grades are just for feedback.\n",
"\n",
"Please fill in your name here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "026433a4",
"metadata": {},
"outputs": [],
"source": [
"NAME = \"\"\n",
"NAMES_OF_COLLABORATORS = \"\""
]
},
{
"cell_type": "markdown",
"id": "3b1bff64",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "41d26cde",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "de05c5cadee95d63f1acb0ab3f82894f",
"grade": false,
"grade_id": "cell-f29a87a28188c3d0",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"__Exercise sheet 2__\n",
"\n",
"Code from the lecture:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb41d2a1",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "5435cd2800cbe70e733a364b79e86c9b",
"grade": false,
"grade_id": "cell-a6520f459483332d",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pylab as plt\n",
"from scipy.integrate import quad\n",
"\n",
"rng = np.random.default_rng()\n",
"%matplotlib inline\n",
"\n",
"def inversion_sample(f_inverse):\n",
" '''Obtain an inversion sample based on the inverse-CDF f_inverse.'''\n",
" return f_inverse(rng.random())\n",
"\n",
"def compare_plot(samples,pdf,xmin,xmax,bins):\n",
" '''Draw a plot comparing the histogram of the samples to the expectation coming from the pdf.'''\n",
" xval = np.linspace(xmin,xmax,bins+1)\n",
" binsize = (xmax-xmin)/bins\n",
" # Calculate the expected numbers by numerical integration of the pdf over the bins\n",
" expected = np.array([quad(pdf,xval[i],xval[i+1])[0] for i in range(bins)])/binsize\n",
" measured = np.histogram(samples,bins,(xmin,xmax))[0]/(len(samples)*binsize)\n",
" plt.plot(xval,np.append(expected,expected[-1]),\"-k\",drawstyle=\"steps-post\")\n",
" plt.bar((xval[:-1]+xval[1:])/2,measured,width=binsize)\n",
" plt.xlim(xmin,xmax)\n",
" plt.legend([\"expected\",\"histogram\"])\n",
" plt.show()\n",
" \n",
"def gaussian(x):\n",
" return np.exp(-x*x/2)/np.sqrt(2*np.pi)"
]
},
{
"cell_type": "markdown",
"id": "3317e002",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "d2c3d8374cf18fd1a12c91353f28dbcf",
"grade": false,
"grade_id": "cell-e6c28b1e3e8371c3",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"## Sampling random variables via the inversion method \n",
"__(35 Points)__\n",
"\n",
"Recall from the lecture that for any real random variable $X$ we can construct an explicit random variable via the inversion method that is identically distributed. This random variable is given by $F_X^{-1}(U)$ where $F_X$ is the CDF of $X$ and $U$ is a uniform random variable on $(0,1)$ and \n",
"\n",
"$$\n",
"F_X^{-1}(p) := \\inf\\{ x\\in\\mathbb{R} : F_X(x) \\geq p\\}.\n",
"$$\n",
"\n",
"This gives a very general way of sampling $X$ in a computer program, as you will find out in this exercise.\n",
"\n",
"__(a)__ Let $X$ be an **exponential random variable** with **rate** $\\lambda$, i.e. a continuous random variable with probability density function $f_X(x) = \\lambda e^{-\\lambda x}$ for $x > 0$. Write a function `f_inverse_exponential` that computes $F_X^{-1}(p)$. Illustrate the corresponding sampler with the help of the function `compare_plot` above. __(10 pts)__"
]
},
{
"cell_type": "markdown",
"id": "6f2c475a",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "4292b1a356454d496a93ef6555f0a7ae",
"grade": true,
"grade_id": "cell-311fd25e116f5066",
"locked": false,
"points": 5,
"schema_version": 3,
"solution": true,
"task": false
}
},
"source": [
"YOUR ANSWER HERE"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6b6428c",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "90de5b60de4e43881ab85442cdff704a",
"grade": false,
"grade_id": "cell-06ef7d054d38f5c6",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"def f_inv_exponential(lam,p):\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()\n",
" \n",
"# plotting\n",
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "804aedbf",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "bce45fa412ba32138080832767338e9d",
"grade": true,
"grade_id": "cell-2022e00546cf1bb0",
"locked": true,
"points": 5,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"from nose.tools import assert_almost_equal\n",
"assert_almost_equal(f_inv_exponential(1.0,0.6),0.916,delta=0.001)\n",
"assert_almost_equal(f_inv_exponential(0.3,0.2),0.743,delta=0.001)"
]
},
{
"cell_type": "markdown",
"id": "d590b09d",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "08fdb1c6ca42806566800f06d7ffb22b",
"grade": false,
"grade_id": "cell-f7e0d9b58c948be5",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"__(b)__ Let now $X$ have the **Pareto distribution** of **shape** $\\alpha > 0$ on $(b,\\infty)$, which has probability density function $f_X(x) = \\alpha b^{\\alpha} x^{-\\alpha-1}$ for $x > b$. Write a function `f_inv_pareto` that computes $F_X^{-1}(p)$. Compare a histogram with a plot of $f_X(x)$ to verify your function numerically. __(10 pts)__"
]
},
{
"cell_type": "markdown",
"id": "47c7a42f",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "1d1fc6a16462f0d238005fdb33a99857",
"grade": true,
"grade_id": "cell-199713328dcd510d",
"locked": false,
"points": 5,
"schema_version": 3,
"solution": true,
"task": false
}
},
"source": [
"YOUR ANSWER HERE"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e177f32d",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "eb07f40a935275cf5883204fc817beaa",
"grade": false,
"grade_id": "cell-074f6a1fd6375c22",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"### Solution\n",
"def f_inv_pareto(alpha,b,p):\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()\n",
"\n",
"# plotting\n",
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c0e1426f",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "62920089752d067b0945eb1d6d98135f",
"grade": true,
"grade_id": "cell-726b321246679d28",
"locked": true,
"points": 5,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"from nose.tools import assert_almost_equal\n",
"assert_almost_equal(f_inv_pareto(1.0,1.5,0.6),3.75,delta=0.0001)\n",
"assert_almost_equal(f_inv_pareto(2.0,2.25,0.3),2.689,delta=0.001)"
]
},
{
"cell_type": "markdown",
"id": "66d91446",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "0f3c9abbe9fe756c5cf4bdd6a8a37ac2",
"grade": false,
"grade_id": "cell-50306550727804ca",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"__(c)__ Let $X$ be a discrete random variable taking values in $\\{1,2,\\ldots,n\\}$. Write a Python function `f_inv_discrete` that takes the probability mass function $p_X$ as a list `prob_list` given by $[p_X(1),\\ldots,p_X(n)]$ and returns a random sample with the distribution of $X$ using the inversion method. Verify the working of your function numerically on an example. __(15 pts)__"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "210f1302",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "93d51c9c889dd5ba3490e0ee298d4240",
"grade": false,
"grade_id": "cell-694eb1261c2dc217",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"def f_inv_discrete(prob_list,p):\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()\n",
"\n",
"# plotting\n",
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c691f0a",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "b11d87e414ba9dfe2741d73dd95a2f12",
"grade": true,
"grade_id": "cell-140af6b31464fbef",
"locked": true,
"points": 15,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"assert f_inv_discrete([0.5,0.5],0.4)==1\n",
"assert f_inv_discrete([0.5,0.5],0.8)==2\n",
"assert f_inv_discrete([0,0,1],0.1)==3"
]
},
{
"cell_type": "markdown",
"id": "47546d37",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "32dd38f0f963c6132fcbe3ef1f5b9682",
"grade": false,
"grade_id": "cell-49fd13dc534dfa28",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"## Central limit theorem? \n",
"__(35 Points)__\n",
"\n",
"In this exercise we will have a closer look at central limits of the Pareto distribution, for which you implemented a random sampler in the previous exercise. By performing the appropriate integrals it is straightforward to show that \n",
"\n",
"$$ \n",
"\\mathbb{E}[X] = \\begin{cases} \\infty & \\text{for }\\alpha \\leq 1 \\\\ \\frac{\\alpha b}{\\alpha - 1} & \\text{for }\\alpha > 1 \\end{cases}, \\qquad \\operatorname{Var}(X) = \\begin{cases} \\infty & \\text{for }\\alpha \\leq 2 \\\\ \\frac{\\alpha b^2}{(\\alpha - 1)^2(\\alpha-2)} & \\text{for }\\alpha > 2 \\end{cases}.\n",
"$$\n",
"\n",
"This shows in particular that the distribution is **heavy tailed**, in the sense that some moments $\\mathbb{E}[X^k]$ diverge."
]
},
{
"cell_type": "markdown",
"id": "ccae582d",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "e6d5659ef88eccfb693b35a088d0d50f",
"grade": false,
"grade_id": "cell-a05e255c144ef6c5",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"__(a)__ Write a function `sample_Zn` that produces a random sample for $Z_n= \\frac{\\sqrt{n}}{\\sigma_X}(\\bar{X}_n - \\mathbb{E}[X])$ given $\\alpha>2$, $b>0$ and $n\\geq 1$. Visually verify the central limit theorem for $\\alpha = 4$, $b=1$ and $n=1000$ by comparing a histogram of $Z_n$ to the standard normal distribution (you may use `compare_plot`). __(10 pts)__"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "82fe6efd",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "177917ec75361799067d6c23a28569cd",
"grade": false,
"grade_id": "cell-b7186322b09717f8",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"def sample_Zn(alpha,b,n):\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()\n",
"\n",
"# Plotting\n",
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5360d77",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "e50b33644ddd6bce391b36cefcc2e308",
"grade": true,
"grade_id": "cell-5d16b014bef9d86f",
"locked": true,
"points": 10,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"assert_almost_equal(np.mean([sample_Zn(3.5,2.1,100) for _ in range(100)]),0,delta=0.3)\n",
"assert_almost_equal(np.std([sample_Zn(3.5,2.1,100) for _ in range(100)]),1,delta=0.3)"
]
},
{
"cell_type": "markdown",
"id": "6192f05d",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "08ece68d59de21d798d9a955f59be690",
"grade": false,
"grade_id": "cell-3e7a23657e9b8374",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"__(b)__ Now take $\\alpha = 3/2$ and $b=1$. \n",
"With some work (which you do not have to do) one can show that the characteristic function of $X$ admits the following expansion around $t=0$,\n",
"\n",
"$$\n",
"\\varphi_X(t) = 1 + 3 i t - (|t|+i t)\\,\\sqrt{2\\pi|t|} + O(t^{2}).\n",
"$$\n",
"\n",
"Based on this, prove the **generalized CLT** for this particular distribution $X$ which states that $Z_n = c\\, n^{1/3} (\\bar{X}_n - \\mathbb{E}[X])$ in the limit $n\\rightarrow\\infty$ converges in distribution, with a to-be-determined choice of overall constant $c$, to a limiting random variable $\\mathcal{S}$ with characteristic function \n",
"\n",
"$$\n",
"\\varphi_{\\mathcal{S}}(t) = \\exp\\big(-(|t|+it)\\sqrt{|t|}\\big).\n",
"$$\n",
"\n",
"__(15 pts)__"
]
},
{
"cell_type": "markdown",
"id": "9735cd88",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "dfd8683eea5663baa81f138a2809722b",
"grade": true,
"grade_id": "cell-b25551eca32c4807",
"locked": false,
"points": 15,
"schema_version": 3,
"solution": true,
"task": false
}
},
"source": [
"YOUR ANSWER HERE"
]
},
{
"cell_type": "markdown",
"id": "5b1d9f54",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "342020128f929d47eabfdf9c075ff20c",
"grade": false,
"grade_id": "cell-d1701433c3c77172",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"__(c)__ The random variable $\\mathcal{S}$ has a [stable Lévy distribution](https://en.wikipedia.org/wiki/Stable_distribution) with index $\\alpha = 3/2$ and skewness $\\beta = 1$. Its probability density function $f_{\\mathcal{S}}(x)$ does not admit a simple expression, but can be accessed numerically using SciPy's `scipy.stats.levy_stable.pdf(x,1.5,1.0)`. Verify numerically that the generalized CLT of part (b) holds by comparing an appropriate histogram to this PDF. __(10 pts)__"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b06896e5",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "c6fe081427f342c354ee8a9b3b3331e7",
"grade": true,
"grade_id": "cell-e08d054985cfa762",
"locked": false,
"points": 10,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"from scipy.stats import levy_stable\n",
"\n",
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
},
{
"cell_type": "markdown",
"id": "f49856d8",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "d8c57e5a527eaad8318e7d31dba01694",
"grade": false,
"grade_id": "cell-bc80caacda124bf9",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"## Joint probability density functions and sampling the normal distribution \n",
"__(30 Points)__\n",
"\n",
"Let $\\Phi$ be a uniform random variable on $(0,2\\pi)$ and $R$ an independent continuous random variable with probability density function $f_R(r) = r\\,e^{-r^2/2}$ for $r>0$. Set $X = R \\cos \\Phi$ and $Y = R \\sin \\Phi$. This is called the **Box-Muller transform**.\n",
"\n",
"__(a)__ Since $\\Phi$ and $R$ are independent, the joint probability density of $\\Phi$ and $R$ is $f_{\\Phi,R}(\\phi,r) = f_\\Phi(\\phi)f_R(r) = \\frac{1}{2\\pi}\\, r\\,e^{-r^2/2}$. Show by change of variables that $X$ and $Y$ are also independent and both distributed as a standard normal distribution $\\mathcal{N}$. __(15 pts)__"
]
},
{
"cell_type": "markdown",
"id": "aa3821de",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "2514e6664aeb4e24a9e881522a8f3a0f",
"grade": true,
"grade_id": "cell-4f20e3b730ba0d23",
"locked": false,
"points": 15,
"schema_version": 3,
"solution": true,
"task": false
}
},
"source": [
"YOUR ANSWER HERE"
]
},
{
"cell_type": "markdown",
"id": "5d064cef",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "1af73334332fe512ef7d0edb5803a58d",
"grade": false,
"grade_id": "cell-2f07fdb2a906bb71",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"source": [
"__(b)__ Write a function to sample a pair of independent normal random variables using the Box-Muller transform. Hint: to sample $R$ you can use the inversion method of the first exercise. Produce a histogram to check the distribution of your normal variables. __(15 pts)__"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4023f99",
"metadata": {
"deletable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "86173970c865da7b0cb8ab78ec4a87b6",
"grade": true,
"grade_id": "cell-9bf8873cce1d179c",
"locked": false,
"points": 15,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"def random_normal_pair():\n",
" '''Return two independent normal random variables.'''\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()\n",
" return x, y\n",
"\n",
"# Plotting\n",
"# YOUR CODE HERE\n",
"raise NotImplementedError()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}