{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example - Reading COGs in Parallel\n",
    "\n",
    "Cloud Optimized Geotiffs (COGs) can be internally chunked, which makes it possible to read them in parallel from multiple threads. However, the libraries `rioxarray` builds on, `rasterio` and `GDAL`, require some care to be used safely from multiple threads within a single process. By default, [rioxarray.open_rasterio](../rioxarray.rst#rioxarray-open-rasterio) will acquire a per-process lock when reading a chunk of a COG.\n",
    "\n",
    "If you're using `rioxarray` with [Dask](http://docs.dask.org/) through the `chunks` keyword, you can also specify the `lock=False` keyword to ensure that reading *and* operating on your data happen in parallel.\n",
    "\n",
    "Note: Also see [Reading and Writing with Dask](dask_read_write.ipynb)\n",
    "\n",
    "## Scheduler Choice\n",
    "\n",
    "Dask has [several schedulers](https://docs.dask.org/en/latest/scheduling.html) which run computations in parallel. Which scheduler is best depends on a variety of factors, including whether your computation holds Python's Global Interpreter Lock, whether how much data needs to be moved around, and whether you need more than one machine's computational power. This section about read-locks only applies if you have more than one thread in a process. This will happen with Dask's [local threaded scheduler](https://docs.dask.org/en/latest/scheduling.html#local-threads) and its [distributed scheduler](https://distributed.dask.org/en/latest/) when configured to use more than one thread per worker.\n",
    "\n",
    "By default, `xarray` objects will use the local `threaded` scheduler."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading without Locks\n",
    "\n",
    "To read a COG without any locks, you'd specify `lock=False`. This tells `rioxarray` to open a new `rasterio.DatasetReader` in each thread, rather than trying to share one amongst multiple threads."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import rioxarray\n",
    "\n",
    "url = (\n",
    "    \"https://naipeuwest.blob.core.windows.net/naip/v002/md/2013/md_100cm_2013/\"\n",
    "    \"39076/m_3907617_ne_18_1_20130924.tif\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.4 s, sys: 361 ms, total: 2.76 s\n",
      "Wall time: 3.32 s\n"
     ]
    }
   ],
   "source": [
    "ds = rioxarray.open_rasterio(url, lock=False, chunks=(4, \"auto\", -1))\n",
    "%time _ = ds.mean().compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: these timings are from a VM in the same Azure data center that's hosting the COG. Running this locally will give different times.\n",
    "\n",
    "## Chunking\n",
    "\n",
    "For maximum read performance, the chunking pattern you request should align with the internal chunking of the COG. Typically this means reading the data in a \"row major\" format: your chunks should be as wide as possible along the columns. We did that above with the chunks of `(4, \"auto\", -1)`. The `-1` says \"include all the columns\", and the `\"auto\"` will make the chunking along the rows as large as possible while staying in a reasonable limit (specified in `dask.config.get(\"array.chunk-size\")`).\n",
    "\n",
    "If we flipped that, and instead read as much of the rows as possible, we'll see slower performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 8.58 s, sys: 1.08 s, total: 9.66 s\n",
      "Wall time: 11.2 s\n"
     ]
    }
   ],
   "source": [
    "ds = rioxarray.open_rasterio(url, lock=False, chunks=(1, -1, \"auto\"))\n",
    "%time _ = ds.mean().compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That said, reading is typically just the first step in a larger computation. You'd want to consider what chunking is best for your whole computation. See https://docs.dask.org/en/latest/array-chunks.html for more on choosing chunks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Caching Considerations\n",
    "\n",
    "Specifying `lock=False` will disable some internal caching done by xarray or rasterio. For example, the first and second reads here are roughly the same, since nothing is cached."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.49 s, sys: 392 ms, total: 2.88 s\n",
      "Wall time: 3.25 s\n"
     ]
    }
   ],
   "source": [
    "ds = rioxarray.open_rasterio(url, lock=False, chunks=(4, \"auto\", -1))\n",
    "%time _ = ds.mean().compute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.48 s, sys: 292 ms, total: 2.78 s\n",
      "Wall time: 2.97 s\n"
     ]
    }
   ],
   "source": [
    "%time _ = ds.mean().compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default and when a lock is passed in, the initial read is slower (since some threads are waiting around for a lock)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.15 s, sys: 284 ms, total: 2.44 s\n",
      "Wall time: 5.03 s\n"
     ]
    }
   ],
   "source": [
    "ds = rioxarray.open_rasterio(url, chunks=(4, \"auto\", -1))  # use the default locking\n",
    "%time _ = ds.mean().compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But thanks to caching, subsequent reads are much faster."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 223 ms, sys: 64.9 ms, total: 288 ms\n",
      "Wall time: 200 ms\n"
     ]
    }
   ],
   "source": [
    "%time _ = ds.mean().compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you're repeatedly reading subsets of the data, using the default lock or `lock=some_lock_object` to benefit from the caching."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}