Example - Reading and Writing with Dask
[1]:
import multiprocessing
# Linux/OSX:
import multiprocessing.popen_spawn_posix
# Windows:
# import multiprocessing.popen_spawn_win32
import threading
from dask.distributed import Client, LocalCluster, Lock
import rioxarray
Tips for using dask locks:
Be careful about what lock you use for your process. It is required to have a lock for each worker, so the more fine-grained the better.
The reading and writing processes need the same type of lock. They don’t have to share the same lock, but they do nead a lock of the same type.
See docs for:
DataArray: rio.to_raster()
Dataset: rio.to_raster()
No distributed computing example
Note: Without a lock provided, to_raster
does not use dask to write to disk.
[2]:
xds = rioxarray.open_rasterio(
"../../test/test_data/compare/small_dem_3m_merged.tif",
chunks=True,
)
xds.rio.to_raster("simple_write.tif", tiled=True)
Multithreaded example
[3]:
xds = rioxarray.open_rasterio(
"../../test/test_data/compare/small_dem_3m_merged.tif",
chunks=True,
lock=False,
# lock=threading.Lock(), # when too many file handles open
xds.rio.to_raster(
"dask_thread.tif", tiled=True, lock=threading.Lock(),
)
Multiple worker example
[4]:
with LocalCluster() as cluster, Client(cluster) as client:
xds = rioxarray.open_rasterio(
"../../test/test_data/compare/small_dem_3m_merged.tif",
chunks=True,
lock=False,
# lock=Lock("rio-read", client=client), # when too many file handles open
)
xds.rio.to_raster(
"dask_multiworker.tif",
tiled=True,
lock=Lock("rio", client=client),
)