Nodata Management
If you have opened a dataset and the nodata value can be determined, you can access it via the rio.nodata
or rio.encoded_nodata
accessors.
If your dataset’s nodata value cannot be determined, you can use the rio.write_nodata
method.
Search order for nodata (DataArray only):
Check if DataArray values are masked. If they are masked, return
NaN
. If the DataArray is masked, the original nodata value can be retreived fromrio.encoded_nodata
.Look in attributes (
attrs
) of your data array for the_FillValue
thenmissing_value
thenfill_value
and finallynodata
.Look in the
nodatavals
attribute. This is for backwards compatibility withxarray.open_rasterio
. We recommend usingrioxarray.open_rasterio
instead.
API Documentation
[1]:
import rioxarray
import xarray
file_path = "../../test/test_data/input/tmmx_20190121.nc"
Example of loading unmaksed data
In this case, the nodata value is in the attributes.
[2]:
xds = xarray.open_dataset(file_path, mask_and_scale=False) # performs mask_and_scale by default
rds = rioxarray.open_rasterio(file_path)
[3]:
print("nodata:")
print(f"- xarray.open_dataset: {xds.air_temperature.rio.nodata}")
print(f"- rioxarray.open_rasterio: {rds.air_temperature.rio.nodata}")
print("\nencoded_nodata:")
print(f"- xarray.open_dataset: {xds.air_temperature.rio.encoded_nodata}")
print(f"- rioxarray.open_rasterio: {rds.air_temperature.rio.encoded_nodata}")
nodata:
- xarray.open_dataset: 32767
- rioxarray.open_rasterio: 32767
encoded_nodata:
- xarray.open_dataset: None
- rioxarray.open_rasterio: None
[4]:
print("attributes:")
print(f"\n- xarray.open_dataset:\n {xds.air_temperature.attrs}")
print(f"\n- rioxarray.open_rasterio:\n {rds.air_temperature.attrs}")
attributes:
- xarray.open_dataset:
{'_FillValue': 32767, 'units': 'K', 'description': 'Daily Maximum Temperature', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'missing_value': 32767, 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'coordinate_system': 'WGS84,EPSG:4326', 'scale_factor': 0.1, 'add_offset': 220.0, '_Unsigned': 'true'}
- rioxarray.open_rasterio:
{'add_offset': 220.0, 'coordinates': 'day', 'coordinate_system': 'WGS84,EPSG:4326', 'description': 'Daily Maximum Temperature', 'dimensions': 'lon lat time', 'long_name': 'tmmx', 'missing_value': 32767, 'scale_factor': 0.1, 'standard_name': 'tmmx', 'units': 'K', '_FillValue': 32767.0, '_Unsigned': 'true'}
Example of data loaded in with mask_and_scale=True
When the dataset is opened with mask_and_scale=True
with rioxarray.open_rasterio
or xarray.open_dataset
, the nodata metadata is written to the encoding attribute. Then, when the dataset is written using to_netcdf
or rio.to_raster
the data is decoded and it writes the original nodata value to the raster.
When this happens, rio.nodata
returns numpy.nan
and rio.encoded_nodata
contains the original value.
[5]:
xds = xarray.open_dataset(file_path) # performs mask_and_scale by default
rds = rioxarray.open_rasterio(file_path, mask_and_scale=True)
/home/snowal/miniconda/envs/midas/lib/python3.10/site-packages/rioxarray/_io.py:618: SerializationWarning: variable 'air_temperature' has _Unsigned attribute but is not of integer type. Ignoring attribute.
rioda = open_rasterio(
[6]:
print("nodata:")
print(f"- xarray.open_dataset: {xds.air_temperature.rio.nodata}")
print(f"- rioxarray.open_rasterio: {rds.air_temperature.rio.nodata}")
print("\nencoded_nodata:")
print(f"- xarray.open_dataset: {xds.air_temperature.rio.encoded_nodata}")
print(f"- rioxarray.open_rasterio: {rds.air_temperature.rio.encoded_nodata}")
nodata:
- xarray.open_dataset: nan
- rioxarray.open_rasterio: nan
encoded_nodata:
- xarray.open_dataset: 32767.0
- rioxarray.open_rasterio: 32767.0
[7]:
print("attributes:")
print(f"\n- xarray.open_dataset:\n {xds.air_temperature.attrs}")
print(f"\n- rioxarray.open_rasterio:\n {rds.air_temperature.attrs}")
attributes:
- xarray.open_dataset:
{'units': 'K', 'description': 'Daily Maximum Temperature', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'coordinate_system': 'WGS84,EPSG:4326'}
- rioxarray.open_rasterio:
{'coordinates': 'day', 'coordinate_system': 'WGS84,EPSG:4326', 'description': 'Daily Maximum Temperature', 'dimensions': 'lon lat time', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'units': 'K'}
[8]:
print("encoding:")
print(f"\n- xarray.open_dataset:\n {xds.air_temperature.encoding}")
print(f"\n- rioxarray.open_rasterio:\n {rds.air_temperature.encoding}")
encoding:
- xarray.open_dataset:
{'zlib': True, 'shuffle': True, 'complevel': 5, 'fletcher32': False, 'contiguous': False, 'chunksizes': (585, 1386), 'source': '/home/snowal/scripts/rioxarray/test/test_data/input/tmmx_20190121.nc', 'original_shape': (585, 1386), 'dtype': dtype('uint16'), '_Unsigned': 'true', 'missing_value': 32767, '_FillValue': 32767, 'scale_factor': 0.1, 'add_offset': 220.0, 'coordinates': 'day'}
- rioxarray.open_rasterio:
{'_Unsigned': 'true', 'dtype': 'uint16', 'grid_mapping': 'crs', 'scale_factor': 0.1, 'add_offset': 220.0, '_FillValue': 32767.0, 'missing_value': 32767, 'source': 'netcdf:../../test/test_data/input/tmmx_20190121.nc:air_temperature', 'rasterio_dtype': 'uint16'}
Manually masking data
If you use xarray.where
to mask you data, then you need to ensure that the attributes stored on the DataArray reflect the correct values. rio.write_nodata() can help ensure that the nodata attributes are written correctly.
[9]:
xds = xarray.open_dataset(file_path, mask_and_scale=False) # performs mask_and_scale by default
raster = xds.air_temperature
raster = raster.where(raster != raster.rio.nodata)
# nodata does not reflect the data has been masked
print(f"nodata: {raster.rio.nodata}")
print(f"encoded_nodata: {raster.rio.encoded_nodata}")
nodata: 32767.0
encoded_nodata: None
[10]:
# update nodata value to show the data has been masked
raster.rio.write_nodata(raster.rio.nodata, encoded=True, inplace=True)
print(f"nodata: {raster.rio.nodata}")
print(f"encoded_nodata: {raster.rio.encoded_nodata}")
nodata: nan
encoded_nodata: 32767.0