Switching from rasterio

Reasons to switch from rasterio to rioxarray

Usually, switching from rasterio to rioxarray means you are working with rasters and you have to adapt your code to xarray.

xarray is a powerful abstraction of both the raster dataset and the raster array. There is a lot of advantages to unite these two notions under the same object, as it simplifies the use of the functions, using attributes stored in the object rather than passing arguments to the functions.

xarray comes also with a lot of very interesting built-in functions and can leverage several backends to replace numpy in cases where it is limiting (out-of-memory computation, running the code on clusters, on GPU…). Dask is one of the most well-knwown. rioxarray handles some basic dask features in I/O (see Dask I/O example) but is not designed to support dask in more evolved functions such as reproject.

Beware, xarray comes also with gotchas! You can see some of them in the dedicated section.

Note

rasterio Dataset and xarray Dataset are two completely different things! Please be careful with these overlapping names.

Equivalences between rasterio and rioxarray

To ease the switch from rasterio and rioxarray, here is a table of the usual parameters or functions used.

ds stands for rasterio Dataset

Profile

Here is the parameters that you can derive from rasterio’s Dataset profile:

rasterio from ds.profile

rioxarray from DataArray

blockxsize

encoding["preferred_chunks"]["x"]

blockysize

encoding["preferred_chunks"]["y"]

compress

Unused in rioxarray

count

rio.count

crs

rio.crs

driver

Unused in rioxarray

dtype

encoding["rasterio_dtype"]

height

rio.height

interleave

Unused in rioxarray

nodata

rio.nodata (or encoded_nodata )

tiled

Unused in rioxarray

transform

rio.transform()

width

rio.width

The values not used in rioxarray comes from the abstraction of the dataset in xarray: a dataset no longer belongs to a file on disk even if read from it. The driver and other file-related notions are meaningless in this context.

Other dataset parameters

rasterio from ds

rioxarray from DataArray

gcps or get_gcps()

rio.get_gcps()

rpcs

rio.get_rpcs()

bounds

rio.bounds()

Functions

rasterio

rioxarray

rasterio.open()

rioxarray.open_rasterio() or xarray.open_dataset(..., engine="rasterio", decode_coords="all")

read()

compute() (load data into memory)

ds.read(... window=)

rio.isel_window()

write()

rio.to_raster()

mask(..., crop=False)

rio.clip(..., drop=False)

mask(..., crop=True)

rio clip(..., drop=True)

reproject()

rio.reproject()

merge()

rioxarray.merge.merge_arrays()

fillnodata()

rio.interpolate_na()

By default, xarray is lazy and therefore not loaded into memory, hence the compute equivalent to read.

Going back to rasterio

rioxarray 0.21+ enables recreating a rasterio Dataset from rioxarray. This is useful when translating your code from rasterio to rioxarray, even if it is sub-optimal, because the array will be loaded and written in memory behind the hood. It is always better to look for rioxarray’s native functions.

with dataarray.rio.to_rasterio_dataset() as rio_ds:
    ...