Configure GDAL environment

rioxarray relies on rasterio so setting up GDAL environment stays the same. See rasterio’s documentation for more insights.

Setting up GDAL environment is very useful when working with cloud-stored data. You can find Development Seed’s TiTiler environment proposition here.

With Dask clusters

When setting up a Dask cluster, be sure to pass the GDAL environment (and AWS session) to every workers. First create a function setting up the env and then submit it to every worker.

import os
from dask.distributed import Client


def set_env():
    # Set that to dask workers to make process=True work on cloud
    os.environ["AWS_S3_ENDPOINT"] = os.getenv("AWS_S3_ENDPOINT")
    os.environ["AWS_S3_AWS_ACCESS_KEY_ID"] = os.getenv("AWS_S3_AWS_ACCESS_KEY_ID")
    os.environ["AWS_S3_AWS_SECRET_ACCESS_KEY"] = os.getenv(
        "AWS_S3_AWS_SECRET_ACCESS_KEY"
    )
    os.environ["CPL_VSIL_CURL_ALLOWED_EXTENSIONS"] = ".vrt"


# Run the client
with Client(processes=True) as client:

    # Propagate the env variables
    client.run(set_env)
    ...

Note

There are gotchas with the environment and dask workers, see this discussion for more insights.