From c2cc34c270644f73ebda93e994372a3743f2bceb Mon Sep 17 00:00:00 2001 From: Kees van Kempen Date: Thu, 15 Sep 2022 10:31:34 +0200 Subject: [PATCH] 02: Fetch week 2 --- Exercise sheet 2/exercise_sheet_02.ipynb | 717 +++++++++++++++++++++++ 1 file changed, 717 insertions(+) create mode 100644 Exercise sheet 2/exercise_sheet_02.ipynb diff --git a/Exercise sheet 2/exercise_sheet_02.ipynb b/Exercise sheet 2/exercise_sheet_02.ipynb new file mode 100644 index 0000000..2dc4de8 --- /dev/null +++ b/Exercise sheet 2/exercise_sheet_02.ipynb @@ -0,0 +1,717 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4c431265", + "metadata": {}, + "source": [ + "# Exercise sheet\n", + "\n", + "Some general remarks about the exercises:\n", + "* For your convenience functions from the lecture are included below. Feel free to reuse them without copying to the exercise solution box.\n", + "* For each part of the exercise a solution box has been added, but you may insert additional boxes. Do not hesitate to add Markdown boxes for textual or LaTeX answers (via `Cell > Cell Type > Markdown`). But make sure to replace any part that says `YOUR CODE HERE` or `YOUR ANSWER HERE` and remove the `raise NotImplementedError()`.\n", + "* Please make your code readable by humans (and not just by the Python interpreter): choose informative function and variable names and use consistent formatting. Feel free to check the [PEP 8 Style Guide for Python](https://www.python.org/dev/peps/pep-0008/) for the widely adopted coding conventions or [this guide for explanation](https://realpython.com/python-pep8/).\n", + "* Make sure that the full notebook runs without errors before submitting your work. This you can do by selecting `Kernel > Restart & Run All` in the jupyter menu.\n", + "* For some exercises test cases have been provided in a separate cell in the form of `assert` statements. When run, a successful test will give no output, whereas a failed test will display an error message.\n", + "* Each sheet has 100 points worth of exercises. Note that only the grades of sheets number 2, 4, 6, 8 count towards the course examination. Submitting sheets 1, 3, 5, 7 & 9 is voluntary and their grades are just for feedback.\n", + "\n", + "Please fill in your name here:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "026433a4", + "metadata": {}, + "outputs": [], + "source": [ + "NAME = \"\"\n", + "NAMES_OF_COLLABORATORS = \"\"" + ] + }, + { + "cell_type": "markdown", + "id": "3b1bff64", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "cell_type": "markdown", + "id": "41d26cde", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "de05c5cadee95d63f1acb0ab3f82894f", + "grade": false, + "grade_id": "cell-f29a87a28188c3d0", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "__Exercise sheet 2__\n", + "\n", + "Code from the lecture:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cb41d2a1", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "5435cd2800cbe70e733a364b79e86c9b", + "grade": false, + "grade_id": "cell-a6520f459483332d", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pylab as plt\n", + "from scipy.integrate import quad\n", + "\n", + "rng = np.random.default_rng()\n", + "%matplotlib inline\n", + "\n", + "def inversion_sample(f_inverse):\n", + " '''Obtain an inversion sample based on the inverse-CDF f_inverse.'''\n", + " return f_inverse(rng.random())\n", + "\n", + "def compare_plot(samples,pdf,xmin,xmax,bins):\n", + " '''Draw a plot comparing the histogram of the samples to the expectation coming from the pdf.'''\n", + " xval = np.linspace(xmin,xmax,bins+1)\n", + " binsize = (xmax-xmin)/bins\n", + " # Calculate the expected numbers by numerical integration of the pdf over the bins\n", + " expected = np.array([quad(pdf,xval[i],xval[i+1])[0] for i in range(bins)])/binsize\n", + " measured = np.histogram(samples,bins,(xmin,xmax))[0]/(len(samples)*binsize)\n", + " plt.plot(xval,np.append(expected,expected[-1]),\"-k\",drawstyle=\"steps-post\")\n", + " plt.bar((xval[:-1]+xval[1:])/2,measured,width=binsize)\n", + " plt.xlim(xmin,xmax)\n", + " plt.legend([\"expected\",\"histogram\"])\n", + " plt.show()\n", + " \n", + "def gaussian(x):\n", + " return np.exp(-x*x/2)/np.sqrt(2*np.pi)" + ] + }, + { + "cell_type": "markdown", + "id": "3317e002", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "d2c3d8374cf18fd1a12c91353f28dbcf", + "grade": false, + "grade_id": "cell-e6c28b1e3e8371c3", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "## Sampling random variables via the inversion method \n", + "__(35 Points)__\n", + "\n", + "Recall from the lecture that for any real random variable $X$ we can construct an explicit random variable via the inversion method that is identically distributed. This random variable is given by $F_X^{-1}(U)$ where $F_X$ is the CDF of $X$ and $U$ is a uniform random variable on $(0,1)$ and \n", + "\n", + "$$\n", + "F_X^{-1}(p) := \\inf\\{ x\\in\\mathbb{R} : F_X(x) \\geq p\\}.\n", + "$$\n", + "\n", + "This gives a very general way of sampling $X$ in a computer program, as you will find out in this exercise.\n", + "\n", + "__(a)__ Let $X$ be an **exponential random variable** with **rate** $\\lambda$, i.e. a continuous random variable with probability density function $f_X(x) = \\lambda e^{-\\lambda x}$ for $x > 0$. Write a function `f_inverse_exponential` that computes $F_X^{-1}(p)$. Illustrate the corresponding sampler with the help of the function `compare_plot` above. __(10 pts)__" + ] + }, + { + "cell_type": "markdown", + "id": "6f2c475a", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "4292b1a356454d496a93ef6555f0a7ae", + "grade": true, + "grade_id": "cell-311fd25e116f5066", + "locked": false, + "points": 5, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "source": [ + "YOUR ANSWER HERE" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6b6428c", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "90de5b60de4e43881ab85442cdff704a", + "grade": false, + "grade_id": "cell-06ef7d054d38f5c6", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "outputs": [], + "source": [ + "def f_inv_exponential(lam,p):\n", + " # YOUR CODE HERE\n", + " raise NotImplementedError()\n", + " \n", + "# plotting\n", + "# YOUR CODE HERE\n", + "raise NotImplementedError()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "804aedbf", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "bce45fa412ba32138080832767338e9d", + "grade": true, + "grade_id": "cell-2022e00546cf1bb0", + "locked": true, + "points": 5, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "outputs": [], + "source": [ + "from nose.tools import assert_almost_equal\n", + "assert_almost_equal(f_inv_exponential(1.0,0.6),0.916,delta=0.001)\n", + "assert_almost_equal(f_inv_exponential(0.3,0.2),0.743,delta=0.001)" + ] + }, + { + "cell_type": "markdown", + "id": "d590b09d", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "08fdb1c6ca42806566800f06d7ffb22b", + "grade": false, + "grade_id": "cell-f7e0d9b58c948be5", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "__(b)__ Let now $X$ have the **Pareto distribution** of **shape** $\\alpha > 0$ on $(b,\\infty)$, which has probability density function $f_X(x) = \\alpha b^{\\alpha} x^{-\\alpha-1}$ for $x > b$. Write a function `f_inv_pareto` that computes $F_X^{-1}(p)$. Compare a histogram with a plot of $f_X(x)$ to verify your function numerically. __(10 pts)__" + ] + }, + { + "cell_type": "markdown", + "id": "47c7a42f", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "1d1fc6a16462f0d238005fdb33a99857", + "grade": true, + "grade_id": "cell-199713328dcd510d", + "locked": false, + "points": 5, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "source": [ + "YOUR ANSWER HERE" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e177f32d", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "eb07f40a935275cf5883204fc817beaa", + "grade": false, + "grade_id": "cell-074f6a1fd6375c22", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "outputs": [], + "source": [ + "### Solution\n", + "def f_inv_pareto(alpha,b,p):\n", + " # YOUR CODE HERE\n", + " raise NotImplementedError()\n", + "\n", + "# plotting\n", + "# YOUR CODE HERE\n", + "raise NotImplementedError()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c0e1426f", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "62920089752d067b0945eb1d6d98135f", + "grade": true, + "grade_id": "cell-726b321246679d28", + "locked": true, + "points": 5, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "outputs": [], + "source": [ + "from nose.tools import assert_almost_equal\n", + "assert_almost_equal(f_inv_pareto(1.0,1.5,0.6),3.75,delta=0.0001)\n", + "assert_almost_equal(f_inv_pareto(2.0,2.25,0.3),2.689,delta=0.001)" + ] + }, + { + "cell_type": "markdown", + "id": "66d91446", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "0f3c9abbe9fe756c5cf4bdd6a8a37ac2", + "grade": false, + "grade_id": "cell-50306550727804ca", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "__(c)__ Let $X$ be a discrete random variable taking values in $\\{1,2,\\ldots,n\\}$. Write a Python function `f_inv_discrete` that takes the probability mass function $p_X$ as a list `prob_list` given by $[p_X(1),\\ldots,p_X(n)]$ and returns a random sample with the distribution of $X$ using the inversion method. Verify the working of your function numerically on an example. __(15 pts)__" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "210f1302", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "93d51c9c889dd5ba3490e0ee298d4240", + "grade": false, + "grade_id": "cell-694eb1261c2dc217", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "outputs": [], + "source": [ + "def f_inv_discrete(prob_list,p):\n", + " # YOUR CODE HERE\n", + " raise NotImplementedError()\n", + "\n", + "# plotting\n", + "# YOUR CODE HERE\n", + "raise NotImplementedError()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3c691f0a", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "b11d87e414ba9dfe2741d73dd95a2f12", + "grade": true, + "grade_id": "cell-140af6b31464fbef", + "locked": true, + "points": 15, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "outputs": [], + "source": [ + "assert f_inv_discrete([0.5,0.5],0.4)==1\n", + "assert f_inv_discrete([0.5,0.5],0.8)==2\n", + "assert f_inv_discrete([0,0,1],0.1)==3" + ] + }, + { + "cell_type": "markdown", + "id": "47546d37", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "32dd38f0f963c6132fcbe3ef1f5b9682", + "grade": false, + "grade_id": "cell-49fd13dc534dfa28", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "## Central limit theorem? \n", + "__(35 Points)__\n", + "\n", + "In this exercise we will have a closer look at central limits of the Pareto distribution, for which you implemented a random sampler in the previous exercise. By performing the appropriate integrals it is straightforward to show that \n", + "\n", + "$$ \n", + "\\mathbb{E}[X] = \\begin{cases} \\infty & \\text{for }\\alpha \\leq 1 \\\\ \\frac{\\alpha b}{\\alpha - 1} & \\text{for }\\alpha > 1 \\end{cases}, \\qquad \\operatorname{Var}(X) = \\begin{cases} \\infty & \\text{for }\\alpha \\leq 2 \\\\ \\frac{\\alpha b^2}{(\\alpha - 1)^2(\\alpha-2)} & \\text{for }\\alpha > 2 \\end{cases}.\n", + "$$\n", + "\n", + "This shows in particular that the distribution is **heavy tailed**, in the sense that some moments $\\mathbb{E}[X^k]$ diverge." + ] + }, + { + "cell_type": "markdown", + "id": "ccae582d", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "e6d5659ef88eccfb693b35a088d0d50f", + "grade": false, + "grade_id": "cell-a05e255c144ef6c5", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "__(a)__ Write a function `sample_Zn` that produces a random sample for $Z_n= \\frac{\\sqrt{n}}{\\sigma_X}(\\bar{X}_n - \\mathbb{E}[X])$ given $\\alpha>2$, $b>0$ and $n\\geq 1$. Visually verify the central limit theorem for $\\alpha = 4$, $b=1$ and $n=1000$ by comparing a histogram of $Z_n$ to the standard normal distribution (you may use `compare_plot`). __(10 pts)__" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "82fe6efd", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "177917ec75361799067d6c23a28569cd", + "grade": false, + "grade_id": "cell-b7186322b09717f8", + "locked": false, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "outputs": [], + "source": [ + "def sample_Zn(alpha,b,n):\n", + " # YOUR CODE HERE\n", + " raise NotImplementedError()\n", + "\n", + "# Plotting\n", + "# YOUR CODE HERE\n", + "raise NotImplementedError()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5360d77", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "e50b33644ddd6bce391b36cefcc2e308", + "grade": true, + "grade_id": "cell-5d16b014bef9d86f", + "locked": true, + "points": 10, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "outputs": [], + "source": [ + "assert_almost_equal(np.mean([sample_Zn(3.5,2.1,100) for _ in range(100)]),0,delta=0.3)\n", + "assert_almost_equal(np.std([sample_Zn(3.5,2.1,100) for _ in range(100)]),1,delta=0.3)" + ] + }, + { + "cell_type": "markdown", + "id": "6192f05d", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "08ece68d59de21d798d9a955f59be690", + "grade": false, + "grade_id": "cell-3e7a23657e9b8374", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "__(b)__ Now take $\\alpha = 3/2$ and $b=1$. \n", + "With some work (which you do not have to do) one can show that the characteristic function of $X$ admits the following expansion around $t=0$,\n", + "\n", + "$$\n", + "\\varphi_X(t) = 1 + 3 i t - (|t|+i t)\\,\\sqrt{2\\pi|t|} + O(t^{2}).\n", + "$$\n", + "\n", + "Based on this, prove the **generalized CLT** for this particular distribution $X$ which states that $Z_n = c\\, n^{1/3} (\\bar{X}_n - \\mathbb{E}[X])$ in the limit $n\\rightarrow\\infty$ converges in distribution, with a to-be-determined choice of overall constant $c$, to a limiting random variable $\\mathcal{S}$ with characteristic function \n", + "\n", + "$$\n", + "\\varphi_{\\mathcal{S}}(t) = \\exp\\big(-(|t|+it)\\sqrt{|t|}\\big).\n", + "$$\n", + "\n", + "__(15 pts)__" + ] + }, + { + "cell_type": "markdown", + "id": "9735cd88", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "dfd8683eea5663baa81f138a2809722b", + "grade": true, + "grade_id": "cell-b25551eca32c4807", + "locked": false, + "points": 15, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "source": [ + "YOUR ANSWER HERE" + ] + }, + { + "cell_type": "markdown", + "id": "5b1d9f54", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "342020128f929d47eabfdf9c075ff20c", + "grade": false, + "grade_id": "cell-d1701433c3c77172", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "__(c)__ The random variable $\\mathcal{S}$ has a [stable Lévy distribution](https://en.wikipedia.org/wiki/Stable_distribution) with index $\\alpha = 3/2$ and skewness $\\beta = 1$. Its probability density function $f_{\\mathcal{S}}(x)$ does not admit a simple expression, but can be accessed numerically using SciPy's `scipy.stats.levy_stable.pdf(x,1.5,1.0)`. Verify numerically that the generalized CLT of part (b) holds by comparing an appropriate histogram to this PDF. __(10 pts)__" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b06896e5", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "c6fe081427f342c354ee8a9b3b3331e7", + "grade": true, + "grade_id": "cell-e08d054985cfa762", + "locked": false, + "points": 10, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "outputs": [], + "source": [ + "from scipy.stats import levy_stable\n", + "\n", + "# YOUR CODE HERE\n", + "raise NotImplementedError()" + ] + }, + { + "cell_type": "markdown", + "id": "f49856d8", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "d8c57e5a527eaad8318e7d31dba01694", + "grade": false, + "grade_id": "cell-bc80caacda124bf9", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "## Joint probability density functions and sampling the normal distribution \n", + "__(30 Points)__\n", + "\n", + "Let $\\Phi$ be a uniform random variable on $(0,2\\pi)$ and $R$ an independent continuous random variable with probability density function $f_R(r) = r\\,e^{-r^2/2}$ for $r>0$. Set $X = R \\cos \\Phi$ and $Y = R \\sin \\Phi$. This is called the **Box-Muller transform**.\n", + "\n", + "__(a)__ Since $\\Phi$ and $R$ are independent, the joint probability density of $\\Phi$ and $R$ is $f_{\\Phi,R}(\\phi,r) = f_\\Phi(\\phi)f_R(r) = \\frac{1}{2\\pi}\\, r\\,e^{-r^2/2}$. Show by change of variables that $X$ and $Y$ are also independent and both distributed as a standard normal distribution $\\mathcal{N}$. __(15 pts)__" + ] + }, + { + "cell_type": "markdown", + "id": "aa3821de", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "2514e6664aeb4e24a9e881522a8f3a0f", + "grade": true, + "grade_id": "cell-4f20e3b730ba0d23", + "locked": false, + "points": 15, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "source": [ + "YOUR ANSWER HERE" + ] + }, + { + "cell_type": "markdown", + "id": "5d064cef", + "metadata": { + "deletable": false, + "editable": false, + "nbgrader": { + "cell_type": "markdown", + "checksum": "1af73334332fe512ef7d0edb5803a58d", + "grade": false, + "grade_id": "cell-2f07fdb2a906bb71", + "locked": true, + "schema_version": 3, + "solution": false, + "task": false + } + }, + "source": [ + "__(b)__ Write a function to sample a pair of independent normal random variables using the Box-Muller transform. Hint: to sample $R$ you can use the inversion method of the first exercise. Produce a histogram to check the distribution of your normal variables. __(15 pts)__" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e4023f99", + "metadata": { + "deletable": false, + "nbgrader": { + "cell_type": "code", + "checksum": "86173970c865da7b0cb8ab78ec4a87b6", + "grade": true, + "grade_id": "cell-9bf8873cce1d179c", + "locked": false, + "points": 15, + "schema_version": 3, + "solution": true, + "task": false + } + }, + "outputs": [], + "source": [ + "def random_normal_pair():\n", + " '''Return two independent normal random variables.'''\n", + " # YOUR CODE HERE\n", + " raise NotImplementedError()\n", + " return x, y\n", + "\n", + "# Plotting\n", + "# YOUR CODE HERE\n", + "raise NotImplementedError()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}