Example - Reading COGs in Parallel
Cloud Optimized Geotiffs (COGs) can be internally chunked, which makes it possible to read them in parallel from multiple threads. However, the libraries rioxarray
builds on, rasterio
and GDAL
, require some care to be used safely from multiple threads within a single process. By default, rioxarray.open_rasterio will acquire a per-process lock when reading a chunk of a COG.
If you’re using rioxarray
with Dask through the chunks
keyword, you can also specify the lock=False
keyword to ensure that reading and operating on your data happen in parallel.
Note: Also see Reading and Writing with Dask
Scheduler Choice
Dask has several schedulers which run computations in parallel. Which scheduler is best depends on a variety of factors, including whether your computation holds Python’s Global Interpreter Lock, whether how much data needs to be moved around, and whether you need more than one machine’s computational power. This section about read-locks only applies if you have more than one thread in a process. This will happen with Dask’s local threaded scheduler and its distributed scheduler when configured to use more than one thread per worker.
By default, xarray
objects will use the local threaded
scheduler.
Reading without Locks
To read a COG without any locks, you’d specify lock=False
. This tells rioxarray
to open a new rasterio.DatasetReader
in each thread, rather than trying to share one amongst multiple threads.
[1]:
import rioxarray
url = (
"https://naipeuwest.blob.core.windows.net/naip/v002/md/2013/md_100cm_2013/"
"39076/m_3907617_ne_18_1_20130924.tif"
)
[2]:
ds = rioxarray.open_rasterio(url, lock=False, chunks=(4, "auto", -1))
%time _ = ds.mean().compute()
CPU times: user 2.4 s, sys: 361 ms, total: 2.76 s
Wall time: 3.32 s
Note: these timings are from a VM in the same Azure data center that’s hosting the COG. Running this locally will give different times.
Chunking
For maximum read performance, the chunking pattern you request should align with the internal chunking of the COG. Typically this means reading the data in a “row major” format: your chunks should be as wide as possible along the columns. We did that above with the chunks of (4, "auto", -1)
. The -1
says “include all the columns”, and the "auto"
will make the chunking along the rows as large as possible while staying in a reasonable limit (specified in
dask.config.get("array.chunk-size")
).
If we flipped that, and instead read as much of the rows as possible, we’ll see slower performance.
[2]:
ds = rioxarray.open_rasterio(url, lock=False, chunks=(1, -1, "auto"))
%time _ = ds.mean().compute()
CPU times: user 8.58 s, sys: 1.08 s, total: 9.66 s
Wall time: 11.2 s
That said, reading is typically just the first step in a larger computation. You’d want to consider what chunking is best for your whole computation. See https://docs.dask.org/en/latest/array-chunks.html for more on choosing chunks.
Caching Considerations
Specifying lock=False
will disable some internal caching done by xarray or rasterio. For example, the first and second reads here are roughly the same, since nothing is cached.
[2]:
ds = rioxarray.open_rasterio(url, lock=False, chunks=(4, "auto", -1))
%time _ = ds.mean().compute()
CPU times: user 2.49 s, sys: 392 ms, total: 2.88 s
Wall time: 3.25 s
[3]:
%time _ = ds.mean().compute()
CPU times: user 2.48 s, sys: 292 ms, total: 2.78 s
Wall time: 2.97 s
By default and when a lock is passed in, the initial read is slower (since some threads are waiting around for a lock).
[2]:
ds = rioxarray.open_rasterio(url, chunks=(4, "auto", -1)) # use the default locking
%time _ = ds.mean().compute()
CPU times: user 2.15 s, sys: 284 ms, total: 2.44 s
Wall time: 5.03 s
But thanks to caching, subsequent reads are much faster.
[3]:
%time _ = ds.mean().compute()
CPU times: user 223 ms, sys: 64.9 ms, total: 288 ms
Wall time: 200 ms
If you’re repeatedly reading subsets of the data, using the default lock or lock=some_lock_object
to benefit from the caching.