Example - Reading COGs in Parallel

Cloud Optimized Geotiffs (COGs) can be internally chunked, which makes it possible to read them in parallel from multiple threads. However, the libraries rioxarray builds on, rasterio and GDAL, require some care to be used safely from multiple threads within a single process. By default, rioxarray.open_rasterio will acquire a per-process lock when reading a chunk of a COG.

If you’re using rioxarray with Dask through the chunks keyword, you can also specify the lock=False keyword to ensure that reading and operating on your data happen in parallel.

Note: Also see Reading and Writing with Dask

Scheduler Choice

Dask has several schedulers which run computations in parallel. Which scheduler is best depends on a variety of factors, including whether your computation holds Python’s Global Interpreter Lock, whether how much data needs to be moved around, and whether you need more than one machine’s computational power. This section about read-locks only applies if you have more than one thread in a process. This will happen with Dask’s local threaded scheduler and its distributed scheduler when configured to use more than one thread per worker.

By default, xarray objects will use the local threaded scheduler.

Reading without Locks

To read a COG without any locks, you’d specify lock=False. This tells rioxarray to open a new rasterio.DatasetReader in each thread, rather than trying to share one amongst multiple threads.

[1]:
import rioxarray

url = (
    "https://naipeuwest.blob.core.windows.net/naip/v002/md/2013/md_100cm_2013/"
    "39076/m_3907617_ne_18_1_20130924.tif"
)
[2]:
ds = rioxarray.open_rasterio(url, lock=False, chunks=(4, "auto", -1))
%time _ = ds.mean().compute()
CPU times: user 2.4 s, sys: 361 ms, total: 2.76 s
Wall time: 3.32 s

Note: these timings are from a VM in the same Azure data center that’s hosting the COG. Running this locally will give different times.

Chunking

For maximum read performance, the chunking pattern you request should align with the internal chunking of the COG. Typically this means reading the data in a “row major” format: your chunks should be as wide as possible along the columns. We did that above with the chunks of (4, "auto", -1). The -1 says “include all the columns”, and the "auto" will make the chunking along the rows as large as possible while staying in a reasonable limit (specified in dask.config.get("array.chunk-size")).

If we flipped that, and instead read as much of the rows as possible, we’ll see slower performance.

[2]:
ds = rioxarray.open_rasterio(url, lock=False, chunks=(1, -1, "auto"))
%time _ = ds.mean().compute()
CPU times: user 8.58 s, sys: 1.08 s, total: 9.66 s
Wall time: 11.2 s

That said, reading is typically just the first step in a larger computation. You’d want to consider what chunking is best for your whole computation. See https://docs.dask.org/en/latest/array-chunks.html for more on choosing chunks.

Caching Considerations

Specifying lock=False will disable some internal caching done by xarray or rasterio. For example, the first and second reads here are roughly the same, since nothing is cached.

[2]:
ds = rioxarray.open_rasterio(url, lock=False, chunks=(4, "auto", -1))
%time _ = ds.mean().compute()
CPU times: user 2.49 s, sys: 392 ms, total: 2.88 s
Wall time: 3.25 s
[3]:
%time _ = ds.mean().compute()
CPU times: user 2.48 s, sys: 292 ms, total: 2.78 s
Wall time: 2.97 s

By default and when a lock is passed in, the initial read is slower (since some threads are waiting around for a lock).

[2]:
ds = rioxarray.open_rasterio(url, chunks=(4, "auto", -1))  # use the default locking
%time _ = ds.mean().compute()
CPU times: user 2.15 s, sys: 284 ms, total: 2.44 s
Wall time: 5.03 s

But thanks to caching, subsequent reads are much faster.

[3]:
%time _ = ds.mean().compute()
CPU times: user 223 ms, sys: 64.9 ms, total: 288 ms
Wall time: 200 ms

If you’re repeatedly reading subsets of the data, using the default lock or lock=some_lock_object to benefit from the caching.