{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Nodata Management" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have opened a dataset and the nodata value can be determined, you can access it via the `rio.nodata` or `rio.encoded_nodata` accessors.\n", "\n", "If your dataset's nodata value cannot be determined, you can use the `rio.write_nodata` method.\n", "\n", "### Search order for nodata (DataArray only):\n", "1. Check if DataArray values are masked. If they are masked, return `NaN`. If the DataArray is masked, the original nodata value can be retreived from `rio.encoded_nodata`.\n", "2. Look in attributes (`attrs`) of your data array for the `_FillValue` then `missing_value` then `fill_value` and finally `nodata`.\n", "3. Look in the `nodatavals` attribute. This is for backwards compatibility with `xarray.open_rasterio`. We recommend using `rioxarray.open_rasterio` instead.\n", "\n", "### API Documentation\n", "\n", "- [rio.write_nodata()](../rioxarray.rst#rioxarray.raster_array.RasterArray.write_nodata)\n", "- [rio.nodata](../rioxarray.rst#rioxarray.raster_array.RasterArray.nodata)\n", "- [rio.encoded_nodata](../rioxarray.rst#rioxarray.raster_array.RasterArray.encoded_nodata)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import rioxarray\n", "import xarray\n", "\n", "file_path = \"../../test/test_data/input/tmmx_20190121.nc\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example of loading unmaksed data\n", "\n", "In this case, the nodata value is in the attributes." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "xds = xarray.open_dataset(file_path, mask_and_scale=False) # performs mask_and_scale by default\n", "rds = rioxarray.open_rasterio(file_path)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nodata:\n", "- xarray.open_dataset: 32767\n", "- rioxarray.open_rasterio: 32767\n", "\n", "encoded_nodata:\n", "- xarray.open_dataset: None\n", "- rioxarray.open_rasterio: None\n" ] } ], "source": [ "print(\"nodata:\")\n", "print(f\"- xarray.open_dataset: {xds.air_temperature.rio.nodata}\")\n", "print(f\"- rioxarray.open_rasterio: {rds.air_temperature.rio.nodata}\")\n", "print(\"\\nencoded_nodata:\")\n", "print(f\"- xarray.open_dataset: {xds.air_temperature.rio.encoded_nodata}\")\n", "print(f\"- rioxarray.open_rasterio: {rds.air_temperature.rio.encoded_nodata}\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "attributes:\n", "\n", "- xarray.open_dataset:\n", " {'_FillValue': 32767, 'units': 'K', 'description': 'Daily Maximum Temperature', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'missing_value': 32767, 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'coordinate_system': 'WGS84,EPSG:4326', 'scale_factor': 0.1, 'add_offset': 220.0, '_Unsigned': 'true'}\n", "\n", "- rioxarray.open_rasterio:\n", " {'add_offset': 220.0, 'coordinates': 'day', 'coordinate_system': 'WGS84,EPSG:4326', 'description': 'Daily Maximum Temperature', 'dimensions': 'lon lat time', 'long_name': 'tmmx', 'missing_value': 32767, 'scale_factor': 0.1, 'standard_name': 'tmmx', 'units': 'K', '_FillValue': 32767.0, '_Unsigned': 'true'}\n" ] } ], "source": [ "print(\"attributes:\")\n", "print(f\"\\n- xarray.open_dataset:\\n {xds.air_temperature.attrs}\")\n", "print(f\"\\n- rioxarray.open_rasterio:\\n {rds.air_temperature.attrs}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example of data loaded in with mask_and_scale=True\n", "\n", "When the dataset is opened with `mask_and_scale=True` with `rioxarray.open_rasterio` or `xarray.open_dataset`, the\n", "nodata metadata is written to the encoding attribute. Then, when the dataset is written using\n", "`to_netcdf` or `rio.to_raster` the data is decoded and it writes the original nodata value to the raster.\n", "\n", "When this happens, `rio.nodata` returns `numpy.nan` and `rio.encoded_nodata` contains the original value." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/snowal/miniconda/envs/midas/lib/python3.10/site-packages/rioxarray/_io.py:618: SerializationWarning: variable 'air_temperature' has _Unsigned attribute but is not of integer type. Ignoring attribute.\n", " rioda = open_rasterio(\n" ] } ], "source": [ "xds = xarray.open_dataset(file_path) # performs mask_and_scale by default\n", "rds = rioxarray.open_rasterio(file_path, mask_and_scale=True)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nodata:\n", "- xarray.open_dataset: nan\n", "- rioxarray.open_rasterio: nan\n", "\n", "encoded_nodata:\n", "- xarray.open_dataset: 32767.0\n", "- rioxarray.open_rasterio: 32767.0\n" ] } ], "source": [ "print(\"nodata:\")\n", "print(f\"- xarray.open_dataset: {xds.air_temperature.rio.nodata}\")\n", "print(f\"- rioxarray.open_rasterio: {rds.air_temperature.rio.nodata}\")\n", "print(\"\\nencoded_nodata:\")\n", "print(f\"- xarray.open_dataset: {xds.air_temperature.rio.encoded_nodata}\")\n", "print(f\"- rioxarray.open_rasterio: {rds.air_temperature.rio.encoded_nodata}\")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "attributes:\n", "\n", "- xarray.open_dataset:\n", " {'units': 'K', 'description': 'Daily Maximum Temperature', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'coordinate_system': 'WGS84,EPSG:4326'}\n", "\n", "- rioxarray.open_rasterio:\n", " {'coordinates': 'day', 'coordinate_system': 'WGS84,EPSG:4326', 'description': 'Daily Maximum Temperature', 'dimensions': 'lon lat time', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'units': 'K'}\n" ] } ], "source": [ "print(\"attributes:\")\n", "print(f\"\\n- xarray.open_dataset:\\n {xds.air_temperature.attrs}\")\n", "print(f\"\\n- rioxarray.open_rasterio:\\n {rds.air_temperature.attrs}\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "encoding:\n", "\n", "- xarray.open_dataset:\n", " {'zlib': True, 'shuffle': True, 'complevel': 5, 'fletcher32': False, 'contiguous': False, 'chunksizes': (585, 1386), 'source': '/home/snowal/scripts/rioxarray/test/test_data/input/tmmx_20190121.nc', 'original_shape': (585, 1386), 'dtype': dtype('uint16'), '_Unsigned': 'true', 'missing_value': 32767, '_FillValue': 32767, 'scale_factor': 0.1, 'add_offset': 220.0, 'coordinates': 'day'}\n", "\n", "- rioxarray.open_rasterio:\n", " {'_Unsigned': 'true', 'dtype': 'uint16', 'grid_mapping': 'crs', 'scale_factor': 0.1, 'add_offset': 220.0, '_FillValue': 32767.0, 'missing_value': 32767, 'source': 'netcdf:../../test/test_data/input/tmmx_20190121.nc:air_temperature', 'rasterio_dtype': 'uint16'}\n" ] } ], "source": [ "print(\"encoding:\")\n", "print(f\"\\n- xarray.open_dataset:\\n {xds.air_temperature.encoding}\")\n", "print(f\"\\n- rioxarray.open_rasterio:\\n {rds.air_temperature.encoding}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Manually masking data\n", "\n", "If you use `xarray.where` to mask you data, then you need to ensure that the\n", "attributes stored on the DataArray reflect the correct values.\n", "[rio.write_nodata()](../rioxarray.rst#rioxarray.raster_array.RasterArray.write_nodata) can help ensure that the nodata attributes are written correctly." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nodata: 32767.0\n", "encoded_nodata: None\n" ] } ], "source": [ "xds = xarray.open_dataset(file_path, mask_and_scale=False) # performs mask_and_scale by default\n", "raster = xds.air_temperature \n", "raster = raster.where(raster != raster.rio.nodata)\n", "# nodata does not reflect the data has been masked\n", "print(f\"nodata: {raster.rio.nodata}\")\n", "print(f\"encoded_nodata: {raster.rio.encoded_nodata}\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nodata: nan\n", "encoded_nodata: 32767.0\n" ] } ], "source": [ "# update nodata value to show the data has been masked\n", "raster.rio.write_nodata(raster.rio.nodata, encoded=True, inplace=True)\n", "print(f\"nodata: {raster.rio.nodata}\")\n", "print(f\"encoded_nodata: {raster.rio.encoded_nodata}\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 4 }