03: Fetch week 3

2022-09-22 10:09:04 +02:00
parent 29f654868e
commit a4b83bd580
1 changed files with 603 additions and 0 deletions
--- a/3/exercise_sheet_03.ipynb
+++ b/3/exercise_sheet_03.ipynb
@ -0,0 +1,603 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ac27651a",
+   "metadata": {},
+   "source": [
+    "# Exercise sheet\n",
+    "\n",
+    "Some general remarks about the exercises:\n",
+    "* For your convenience functions from the lecture are included below. Feel free to reuse them without copying to the exercise solution box.\n",
+    "* For each part of the exercise a solution box has been added, but you may insert additional boxes. Do not hesitate to add Markdown boxes for textual or LaTeX answers (via `Cell > Cell Type > Markdown`). But make sure to replace any part that says `YOUR CODE HERE` or `YOUR ANSWER HERE` and remove the `raise NotImplementedError()`.\n",
+    "* Please make your code readable by humans (and not just by the Python interpreter): choose informative function and variable names and use consistent formatting. Feel free to check the [PEP 8 Style Guide for Python](https://www.python.org/dev/peps/pep-0008/) for the widely adopted coding conventions or [this guide for explanation](https://realpython.com/python-pep8/).\n",
+    "* Make sure that the full notebook runs without errors before submitting your work. This you can do by selecting `Kernel > Restart & Run All` in the jupyter menu.\n",
+    "* For some exercises test cases have been provided in a separate cell in the form of `assert` statements. When run, a successful test will give no output, whereas a failed test will display an error message.\n",
+    "* Each sheet has 100 points worth of exercises. Note that only the grades of sheets number 2, 4, 6, 8 count towards the course examination. Submitting sheets 1, 3, 5, 7 & 9 is voluntary and their grades are just for feedback.\n",
+    "\n",
+    "Please fill in your name here:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d8b13e80",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "NAME = \"\"\n",
+    "NAMES_OF_COLLABORATORS = \"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e21d4bc9",
+   "metadata": {},
+   "source": [
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97ec4790",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "markdown",
+     "checksum": "ba0b514985ed8d79af63814bda6ee328",
+     "grade": false,
+     "grade_id": "cell-70094cd1c9529a28",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "**Exercise sheet 3**\n",
+    "\n",
+    "Code from the lectures:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "948639fc",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "1e25d190936687e34fe3ac01d4dfce44",
+     "grade": false,
+     "grade_id": "cell-9a6bf70e64f84715",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pylab as plt\n",
+    "\n",
+    "rng = np.random.default_rng()\n",
+    "%matplotlib inline\n",
+    "\n",
+    "def sample_acceptance_rejection(sample_z,accept_probability):\n",
+    "    while True:\n",
+    "        x = sample_z()\n",
+    "        if rng.random() < accept_probability(x):\n",
+    "            return x     \n",
+    "        \n",
+    "def estimate_expectation(sampler,n):\n",
+    "    '''Compute beste estimate of mean and 1-sigma error with n samples.'''\n",
+    "    samples = [sampler() for _ in range(n)]\n",
+    "    return np.mean(samples), np.std(samples)/np.sqrt(n-1)\n",
+    "\n",
+    "def estimate_expectation_one_pass(sampler,n):\n",
+    "    sample_mean = sample_square_dev = 0.0\n",
+    "    for k in range(1,n+1):\n",
+    "        delta = sampler() - sample_mean\n",
+    "        sample_mean += delta / k\n",
+    "        sample_square_dev += (k-1)*delta*delta/k  \n",
+    "    return sample_mean, np.sqrt(sample_square_dev / (n*(n-1)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bce418c0",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "markdown",
+     "checksum": "90c9636c93064db3b008a2e9ec210117",
+     "grade": false,
+     "grade_id": "cell-8763c7e3277ef0ce",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Acceptance-rejection sampling\n",
+    "\n",
+    "**(35 points)**\n",
+    "\n",
+    "The goal of this exercise is to develop a fast sampling algorithm of the discrete random variable $X$ with probability mass function $$p_X(k) = \\frac{6}{\\pi^2} k^{-2}, \\qquad k=1,2,\\ldots$$\n",
+    "\n",
+    "__(a)__ Let $Z$ be the discrete random variable with $p_Z(k) = \\frac{1}{k} - \\frac{1}{k+1}$ for $k=1,2,\\ldots$. Write a function to compute the inverse CDF $F_Z^{-1}(u)$, such that you can use the inversion method to sample $Z$ efficiently. **(15 pts)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "564c7f00",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "da96fc27e757b00d6e55befc89f5afda",
+     "grade": false,
+     "grade_id": "cell-b0b3fcf6d9bf35a0",
+     "locked": false,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def f_inverse_Z(u):\n",
+    "    '''Compute the inverse CDF of Z, i.e. F_Z^{-1}(u) for 0 <= u <= 1.'''\n",
+    "    # YOUR CODE HERE\n",
+    "    raise NotImplementedError()\n",
+    "\n",
+    "def random_Z():\n",
+    "    return int(f_inverse_Z(rng.random())) # make sure to return an integer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2ca9d9ca",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "3eeabc3f6572ed217a002adefda34876",
+     "grade": true,
+     "grade_id": "cell-a28cb07f90510b94",
+     "locked": true,
+     "points": 15,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "assert f_inverse_Z(0.2)==1\n",
+    "assert f_inverse_Z(0.51)==2\n",
+    "assert f_inverse_Z(0.76)==4\n",
+    "assert f_inverse_Z(0.991)==111"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a1b3dcd",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "markdown",
+     "checksum": "78f978d46d5f742623cf174e88323ec9",
+     "grade": false,
+     "grade_id": "cell-576d3991eef6bf11",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "__(b)__ Implement a sampler for $X$ using acceptance-rejection based on the sampler of $Z$. For this you need to first determine a $c$ such that $p_X(k) \\leq c\\,p_Z(k)$ for all $k=1,2,\\ldots$, and then consider an acceptance probability $p_X(k) / (c p_Z(k))$. Verify the validity of your sampler numerically (e.g. for $k=1,\\ldots,10$). **(20 pts)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e201507",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "862ecc31c9ee75e459dafe5262ddc776",
+     "grade": false,
+     "grade_id": "cell-20fd55e7a4c4c531",
+     "locked": false,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def accept_probability_X(k):\n",
+    "    '''Return the appropriate acceptance probability on the event Z=k.'''\n",
+    "    # YOUR CODE HERE\n",
+    "    raise NotImplementedError()\n",
+    "    \n",
+    "def random_X():\n",
+    "    return sample_acceptance_rejection(random_Z,accept_probability_X)\n",
+    "\n",
+    "# Verify numerically\n",
+    "# YOUR CODE HERE\n",
+    "raise NotImplementedError()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1b0e1d47",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "641adac2c6462be641ae2d45a4d80e7d",
+     "grade": true,
+     "grade_id": "cell-04bad69f7cc2ad18",
+     "locked": true,
+     "points": 20,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from nose.tools import assert_almost_equal\n",
+    "assert min([random_X() for _ in range(10000)]) >= 1\n",
+    "assert_almost_equal([random_X() for _ in range(10000)].count(1),6079,delta=400)\n",
+    "assert_almost_equal([random_X() for _ in range(10000)].count(3),675,delta=75)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d010b3c8",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "markdown",
+     "checksum": "ec5e4c478f45388c302b82613e45a0c0",
+     "grade": false,
+     "grade_id": "cell-f831d5b44932cf2f",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Monte Carlo integration & Importance sampling\n",
+    "\n",
+    "**(30 Points)**\n",
+    "\n",
+    "Consider the integral \n",
+    "\n",
+    "$$\n",
+    "I = \\int_0^1 \\sin(\\pi x(1-x))\\mathrm{d}x = \\mathbb{E}[X], \\quad X=g(U), \\quad g(U)=\\sin(\\pi U(1-U)),\n",
+    "$$ \n",
+    "\n",
+    "where $U$ is a uniform random variable in $(0,1)$.\n",
+    "\n",
+    "__(a)__ Use Monte Carlo integration based on sampling $U$ to estimate $I$ with $1\\sigma$ error at most $0.001$. How many samples do you need? (It is not necessary to automate this: trial and error is sufficient.) **(10 pts)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bbf5c843",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "febbb705202776d2af26bd95c4c08625",
+     "grade": true,
+     "grade_id": "cell-47eab0537d21690e",
+     "locked": false,
+     "points": 10,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# YOUR CODE HERE\n",
+    "raise NotImplementedError()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26b8ff78",
+   "metadata": {},
+   "source": [
+    "__(b)__ Choose a random variable $Z$ on $(0,1)$ whose density resembles the integrand of $I$ and which you know how to sample efficiently (by inversion method, acceptance-rejection, or a built-in Python function). Estimate $I$ again using importance sampling, i.e. $I = \\mathbb{E}[X']$ where $X' = g(Z) f_U(Z)/f_Z(Z)$, with an error of at most 0.001. How many samples did you need this time? **(20 pts)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5793420a",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "d546b8c253155efa96a399ec15c24fb8",
+     "grade": true,
+     "grade_id": "cell-467d7240beab5b29",
+     "locked": false,
+     "points": 20,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def sample_nice_Z():\n",
+    "    '''Sample from the nice distribution Z'''\n",
+    "    # YOUR CODE HERE\n",
+    "    raise NotImplementedError()\n",
+    "    \n",
+    "def sample_X_prime():\n",
+    "    '''Sample from X'.'''\n",
+    "    # YOUR CODE HERE\n",
+    "    raise NotImplementedError()\n",
+    "    \n",
+    "# YOUR CODE HERE\n",
+    "raise NotImplementedError()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2a5b934",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "markdown",
+     "checksum": "8fd70da242e4db50dd7a7db9851e839a",
+     "grade": false,
+     "grade_id": "cell-59ae6e6f3f476d2a",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "## Direct sampling of Dyck paths\n",
+    "\n",
+    "**(35 points)**\n",
+    "\n",
+    "Direct sampling of random variables in high dimensions requires some luck and/or ingenuity. Here is an example of a probability distribution on $\\mathbb{Z}^{2n+1}$ that features prominently in the combinatorial literature and  can be sampled directly in an efficient manner. A sequence $\\mathbf{x}\\equiv(x_0,x_1,\\ldots,x_{2n})\\in\\mathbb{Z}^{2n+1}$ is said to be a **Dyck path** if $x_0=x_{2n}=0$, $x_i \\geq 0$ and $|x_{i}-x_{i-1}|=1$ for all $i=1,\\ldots,2n$. Dyck paths are counted by the Catalan numbers $C(n) = \\frac{1}{n+1}\\binom{2n}{n}$. Let $\\mathbf{X}=(X_0,\\ldots,X_n)$ be a **uniform Dyck path**, i.e. a random variable with probability mass function $p_{\\mathbf{X}}(\\mathbf{x}) = 1/C(n)$ for every Dyck path $\\mathbf{x}$. Here is one way to sample $\\mathbf{X}$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d11a0889",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "0ea1008043f2711810b9da3c2cf96a02",
+     "grade": false,
+     "grade_id": "cell-3598f5c8aabad8b9",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def random_dyck_path(n):\n",
+    "    '''Returns a uniform Dyck path of length 2n as an array [x_0, x_1, ..., x_{2n}] of length 2n.'''\n",
+    "    # produce a (2n+1)-step random walk from 0 to -1\n",
+    "    increments = [1]*n +[-1]*(n+1)\n",
+    "    rng.shuffle(increments)\n",
+    "    unconstrained_walk = np.cumsum(increments)\n",
+    "    # determine the first time it reaches its minimum\n",
+    "    argmin = np.argmin(unconstrained_walk)\n",
+    "    # cyclically permute the increments to ensure walk stays non-negative until last step\n",
+    "    rotated_increments = np.roll(increments,-argmin)\n",
+    "    # turn off the superfluous -1 step\n",
+    "    rotated_increments[0] = 0\n",
+    "    # produce dyck path from increments\n",
+    "    walk = np.cumsum(rotated_increments)\n",
+    "    return walk\n",
+    "\n",
+    "\n",
+    "plt.plot(random_dyck_path(50))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e6ad6ad",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "markdown",
+     "checksum": "e88c23c1f0a6e2d23e0cd8ab4e2a20fd",
+     "grade": false,
+     "grade_id": "cell-8c2933e84038cd5e",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "__(a)__ Let $H$ be the (maximal) height of $X$, i.e. $H=\\max_i X_i$. Estimate the expected height $\\mathbb{E}[H]$ for $n = 2^5, 2^6, \\ldots, 2^{11}$ (including error bars). Determine the growth $\\mathbb{E}[H] \\approx a\\,n^\\beta$ via an appropriate fit. *Hint*: use the `scipy.optimize.curve_fit` function with the option `sigma = ...` to incorporate the standard errors on $\\mathbb{E}[H]$ in the fit. Note that when you supply the errors appropriately, fitting on linear or logarithmic scale should result in the same answer. **(25 pts)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2b79f35a",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "54439c27bcec7884b6168c5bf92c27fd",
+     "grade": true,
+     "grade_id": "cell-310664cb8eaaa831",
+     "locked": false,
+     "points": 10,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Collect height estimates\n",
+    "n_values = [2**k for k in range(5,11+1)]\n",
+    "# YOUR CODE HERE\n",
+    "raise NotImplementedError()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "77d3723b",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "960be7ade957300785f4b69233a8f462",
+     "grade": true,
+     "grade_id": "cell-0ca47ced5547d67c",
+     "locked": false,
+     "points": 10,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from scipy.optimize import curve_fit\n",
+    "\n",
+    "# Fitting\n",
+    "# YOUR CODE HERE\n",
+    "raise NotImplementedError()\n",
+    "print(\"Fit parameters: a = {}, beta = {}\".format(a_fit,beta_fit))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "edd467e4",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "421cc39b43f6f9ebc46886d7ca0ab1e4",
+     "grade": true,
+     "grade_id": "cell-7ebce3629edd06a2",
+     "locked": false,
+     "points": 5,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Plotting\n",
+    "# YOUR CODE HERE\n",
+    "raise NotImplementedError()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "142aa5cb",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "nbgrader": {
+     "cell_type": "markdown",
+     "checksum": "1df181e550da9f9c1edff06f45d51984",
+     "grade": false,
+     "grade_id": "cell-2dcc8f8e438aad86",
+     "locked": true,
+     "schema_version": 3,
+     "solution": false,
+     "task": false
+    }
+   },
+   "source": [
+    "__(b)__ Produce a histogram of the height $H / \\sqrt{n}$ for $n = 2^5, 2^6, \\ldots, 2^{11}$ and $3000$ samples each and demonstrate with a plot that it appears to converge in distribution as $n\\to\\infty$. *Hint*: you could call `plt.hist(...,density=True,histtype='step')` for each $n$ to plot the densities on top of each other. **(10 pts)**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eed4339c",
+   "metadata": {
+    "deletable": false,
+    "nbgrader": {
+     "cell_type": "code",
+     "checksum": "cb48ed2a17571debd5f778f87f34a2ab",
+     "grade": true,
+     "grade_id": "cell-b2bb42d9970a7271",
+     "locked": false,
+     "points": 10,
+     "schema_version": 3,
+     "solution": true,
+     "task": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# YOUR CODE HERE\n",
+    "raise NotImplementedError()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}