Nodata Management

If you have opened a dataset and the nodata value can be determined, you can access it via the rio.nodata or rio.encoded_nodata accessors.

If your dataset’s nodata value cannot be determined, you can use the rio.write_nodata method.

Search order for nodata (DataArray only):

  1. Check if DataArray values are masked. If they are masked, return NaN. If the DataArray is masked, the original nodata value can be retreived from rio.encoded_nodata.

  2. Look in attributes (attrs) of your data array for the _FillValue then missing_value then fill_value and finally nodata.

  3. Look in the nodatavals attribute. This is for backwards compatibility with xarray.open_rasterio. We recommend using rioxarray.open_rasterio instead.

API Documentation

[1]:
import rioxarray
import xarray

file_path = "../../test/test_data/input/tmmx_20190121.nc"

Example of loading unmaksed data

In this case, the nodata value is in the attributes.

[2]:
xds = xarray.open_dataset(file_path, mask_and_scale=False) # performs mask_and_scale by default
rds = rioxarray.open_rasterio(file_path)
[3]:
print("nodata:")
print(f"- xarray.open_dataset: {xds.air_temperature.rio.nodata}")
print(f"- rioxarray.open_rasterio: {rds.air_temperature.rio.nodata}")
print("\nencoded_nodata:")
print(f"- xarray.open_dataset: {xds.air_temperature.rio.encoded_nodata}")
print(f"- rioxarray.open_rasterio: {rds.air_temperature.rio.encoded_nodata}")
nodata:
- xarray.open_dataset: 32767
- rioxarray.open_rasterio: 32767

encoded_nodata:
- xarray.open_dataset: None
- rioxarray.open_rasterio: None
[4]:
print("attributes:")
print(f"\n- xarray.open_dataset:\n    {xds.air_temperature.attrs}")
print(f"\n- rioxarray.open_rasterio:\n    {rds.air_temperature.attrs}")
attributes:

- xarray.open_dataset:
    {'_FillValue': 32767, 'units': 'K', 'description': 'Daily Maximum Temperature', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'missing_value': 32767, 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'coordinate_system': 'WGS84,EPSG:4326', 'scale_factor': 0.1, 'add_offset': 220.0, '_Unsigned': 'true'}

- rioxarray.open_rasterio:
    {'add_offset': 220.0, 'coordinates': 'day', 'coordinate_system': 'WGS84,EPSG:4326', 'description': 'Daily Maximum Temperature', 'dimensions': 'lon lat time', 'long_name': 'tmmx', 'missing_value': 32767, 'scale_factor': 0.1, 'standard_name': 'tmmx', 'units': 'K', '_FillValue': 32767.0, '_Unsigned': 'true'}

Example of data loaded in with mask_and_scale=True

When the dataset is opened with mask_and_scale=True with rioxarray.open_rasterio or xarray.open_dataset, the nodata metadata is written to the encoding attribute. Then, when the dataset is written using to_netcdf or rio.to_raster the data is decoded and it writes the original nodata value to the raster.

When this happens, rio.nodata returns numpy.nan and rio.encoded_nodata contains the original value.

[5]:
xds = xarray.open_dataset(file_path) # performs mask_and_scale by default
rds = rioxarray.open_rasterio(file_path, mask_and_scale=True)
/home/snowal/miniconda/envs/midas/lib/python3.10/site-packages/rioxarray/_io.py:618: SerializationWarning: variable 'air_temperature' has _Unsigned attribute but is not of integer type. Ignoring attribute.
  rioda = open_rasterio(
[6]:
print("nodata:")
print(f"- xarray.open_dataset: {xds.air_temperature.rio.nodata}")
print(f"- rioxarray.open_rasterio: {rds.air_temperature.rio.nodata}")
print("\nencoded_nodata:")
print(f"- xarray.open_dataset: {xds.air_temperature.rio.encoded_nodata}")
print(f"- rioxarray.open_rasterio: {rds.air_temperature.rio.encoded_nodata}")
nodata:
- xarray.open_dataset: nan
- rioxarray.open_rasterio: nan

encoded_nodata:
- xarray.open_dataset: 32767.0
- rioxarray.open_rasterio: 32767.0
[7]:
print("attributes:")
print(f"\n- xarray.open_dataset:\n    {xds.air_temperature.attrs}")
print(f"\n- rioxarray.open_rasterio:\n    {rds.air_temperature.attrs}")
attributes:

- xarray.open_dataset:
    {'units': 'K', 'description': 'Daily Maximum Temperature', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'coordinate_system': 'WGS84,EPSG:4326'}

- rioxarray.open_rasterio:
    {'coordinates': 'day', 'coordinate_system': 'WGS84,EPSG:4326', 'description': 'Daily Maximum Temperature', 'dimensions': 'lon lat time', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'units': 'K'}
[8]:
print("encoding:")
print(f"\n- xarray.open_dataset:\n    {xds.air_temperature.encoding}")
print(f"\n- rioxarray.open_rasterio:\n    {rds.air_temperature.encoding}")
encoding:

- xarray.open_dataset:
    {'zlib': True, 'shuffle': True, 'complevel': 5, 'fletcher32': False, 'contiguous': False, 'chunksizes': (585, 1386), 'source': '/home/snowal/scripts/rioxarray/test/test_data/input/tmmx_20190121.nc', 'original_shape': (585, 1386), 'dtype': dtype('uint16'), '_Unsigned': 'true', 'missing_value': 32767, '_FillValue': 32767, 'scale_factor': 0.1, 'add_offset': 220.0, 'coordinates': 'day'}

- rioxarray.open_rasterio:
    {'_Unsigned': 'true', 'dtype': 'uint16', 'grid_mapping': 'crs', 'scale_factor': 0.1, 'add_offset': 220.0, '_FillValue': 32767.0, 'missing_value': 32767, 'source': 'netcdf:../../test/test_data/input/tmmx_20190121.nc:air_temperature', 'rasterio_dtype': 'uint16'}

Manually masking data

If you use xarray.where to mask you data, then you need to ensure that the attributes stored on the DataArray reflect the correct values. rio.write_nodata() can help ensure that the nodata attributes are written correctly.

[9]:
xds = xarray.open_dataset(file_path, mask_and_scale=False) # performs mask_and_scale by default
raster = xds.air_temperature
raster = raster.where(raster != raster.rio.nodata)
# nodata does not reflect the data has been masked
print(f"nodata: {raster.rio.nodata}")
print(f"encoded_nodata: {raster.rio.encoded_nodata}")
nodata: 32767.0
encoded_nodata: None
[10]:
# update nodata value to show the data has been masked
raster.rio.write_nodata(raster.rio.nodata, encoded=True, inplace=True)
print(f"nodata: {raster.rio.nodata}")
print(f"encoded_nodata: {raster.rio.encoded_nodata}")
nodata: nan
encoded_nodata: 32767.0