Example - Zonal Statistics

This is useful in the case where you want to get regional statistics for a raster.

[1]:
import geopandas
import numpy
import rioxarray
import xarray

from geocube.api.core import make_geocube

%matplotlib inline

Create the data mask by rasterizing the unique ID of the vector data

See docs for make_geocube

[2]:
# This assumes you are running this example from a clone of
# https://github.com/corteva/geocube/
# You could also use the full path:
# https://raw.githubusercontent.com/corteva/geocube/master/test/test_data/input/soil_data_group.geojson
ssurgo_data = geopandas.read_file("../../test/test_data/input/soil_data_group.geojson")
ssurgo_data = ssurgo_data.loc[ssurgo_data.hzdept_r==0]
# convert the key to group to the vector data to an integer as that is one of the
# best data types for this type of mapping. If your data is not integer,
# then consider using a mapping of your data to an integer with something
# like a categorical dtype.
ssurgo_data["mukey"] = ssurgo_data.mukey.astype(int)
[3]:
# load in source elevation data subset relevant for the vector data
elevation = rioxarray.open_rasterio(
    "https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/current/n42w091/USGS_13_n42w091.tif", masked=True
).rio.clip(
    ssurgo_data.geometry.values, ssurgo_data.crs, from_disk=True
).sel(band=1).drop("band")
elevation.name = "elevation"
[4]:
out_grid = make_geocube(
    vector_data=ssurgo_data,
    measurements=["mukey"],
    like=elevation, # ensure the data are on the same grid
)
[5]:
# merge the two together
out_grid["elevation"] = (elevation.dims, elevation.values, elevation.attrs, elevation.encoding)
out_grid
[5]:
<xarray.Dataset>
Dimensions:      (y: 178, x: 178)
Coordinates:
  * y            (y) float64 41.5 41.5 41.5 41.5 ... 41.48 41.48 41.48 41.48
  * x            (x) float64 -90.6 -90.6 -90.6 -90.6 ... -90.58 -90.58 -90.58
    spatial_ref  int64 0
Data variables:
    mukey        (y, x) float64 1.988e+05 1.988e+05 ... 1.987e+05 1.987e+05
    elevation    (y, x) float64 173.7 172.0 171.1 170.6 ... 181.0 181.2 181.4
[6]:
out_grid.mukey.plot.imshow()
[6]:
<matplotlib.image.AxesImage at 0x7f2dd5398550>
../_images/examples_zonal_statistics_7_1.png
[7]:
out_grid.elevation.plot()
[7]:
<matplotlib.collections.QuadMesh at 0x7f2dd5282ed0>
../_images/examples_zonal_statistics_8_1.png

Get the elevation statistics of each region using the mask

[8]:
grouped_elevation = out_grid.drop("spatial_ref").groupby(out_grid.mukey)
grid_mean = grouped_elevation.mean().rename({"elevation": "elevation_mean"})
grid_min = grouped_elevation.min().rename({"elevation": "elevation_min"})
grid_max = grouped_elevation.max().rename({"elevation": "elevation_max"})
grid_std = grouped_elevation.std().rename({"elevation": "elevation_std"})
[9]:
zonal_stats = xarray.merge([grid_mean, grid_min, grid_max, grid_std]).to_dataframe()
zonal_stats
[9]:
elevation_mean elevation_min elevation_max elevation_std
mukey
198692.0 173.130925 169.394562 188.442505 3.307044
198714.0 175.045866 170.214157 179.716675 2.150987
198724.0 179.931131 178.237244 181.490387 0.630699
198750.0 176.187118 167.951233 190.138763 3.815206
198754.0 171.633084 167.610321 181.611298 2.997591
271425.0 167.974433 167.951233 168.653015 0.079769
271431.0 176.718101 170.258133 180.460220 2.731732
[10]:
ssurgo_data = ssurgo_data.merge(zonal_stats, on="mukey")
[11]:
ssurgo_data.plot(column="elevation_mean", legend=True)
[11]:
<AxesSubplot:>
../_images/examples_zonal_statistics_13_1.png
[12]:
ssurgo_data.plot(column="elevation_max", legend=True)
[12]:
<AxesSubplot:>
../_images/examples_zonal_statistics_14_1.png