Example - Mapping Grid Data to Vector Data

This is useful in the case where you want to get statistics for a specific raster over a certain region. In this example, the vector data is a random region with soil information using SSURGO data.

[1]:
import json

import geopandas

from geocube.api.core import make_geocube

%matplotlib inline
[2]:
ssurgo_data = geopandas.read_file("../../test/test_data/input/soil_data_group.geojson")
[3]:
ssurgo_data.head()
[3]:
cokey mukey drclassdcd hzdept_r chkey hzdepb_r claytotal_r sandtotal_r silttotal_r geometry
0 12577452 271425 Somewhat poorly drained 0.0 100034090 5.0 23.067675 9.978338 66.953987 MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...
1 12577452 271425 Somewhat poorly drained 5.0 100034090 15.0 23.067675 9.978338 66.953987 MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...
2 12577452 271425 Somewhat poorly drained 15.0 100034091 30.0 23.067675 9.978338 66.953987 MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...
3 12577452 271425 Somewhat poorly drained 30.0 100034092 45.0 23.067675 9.978338 66.953987 MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...
4 12577452 271425 Somewhat poorly drained 45.0 100034093 60.0 23.231643 9.961941 66.806416 MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...
[4]:
# convert the key to group to the vector data to an integer as that is one of the
# best data types for this type of mapping. If your data is not integer,
# then consider using a mapping of your data to an integer with something
# like a categorical dtype.
ssurgo_data["mukey"] = ssurgo_data.mukey.astype(int)

Convert data to grid

See docs for make_geocube

[5]:
out_grid = make_geocube(
    vector_data=ssurgo_data,
    group_by='hzdept_r',
    resolution=(-0.0001, 0.0001)
)
out_grid
[5]:
<xarray.Dataset>
Dimensions:      (hzdept_r: 11, x: 165, y: 165)
Coordinates:
  * y            (y) float64 41.5 41.5 41.5 41.5 ... 41.48 41.48 41.48 41.48
  * x            (x) float64 -90.6 -90.6 -90.6 -90.6 ... -90.58 -90.58 -90.58
  * hzdept_r     (hzdept_r) float64 0.0 5.0 15.0 30.0 ... 90.0 105.0 120.0 150.0
    spatial_ref  int64 0
Data variables:
    mukey        (hzdept_r, y, x) float64 1.988e+05 1.988e+05 ... 1.987e+05
    hzdepb_r     (hzdept_r, y, x) float64 5.0 5.0 5.0 5.0 ... 180.0 180.0 180.0
    claytotal_r  (hzdept_r, y, x) float64 26.0 26.0 26.0 26.0 ... 21.0 21.0 21.0
    sandtotal_r  (hzdept_r, y, x) float64 38.0 38.0 38.0 38.0 ... 10.0 10.0 10.0
    silttotal_r  (hzdept_r, y, x) float64 36.0 36.0 36.0 36.0 ... 69.0 69.0 69.0
Attributes:
    grid_mapping:  spatial_ref

Get the mean/median of each region using the unique ID

[6]:
grid_mean = out_grid.sel(hzdept_r=15).groupby(out_grid.mukey.sel(hzdept_r=15)).mean()
grid_mean.to_dataframe()
[6]:
hzdept_r spatial_ref hzdepb_r claytotal_r sandtotal_r silttotal_r
mukey
198692.0 15.0 0 30.0 23.000000 7.000000 70.000000
198714.0 15.0 0 30.0 5.000000 87.000000 8.000000
198724.0 15.0 0 30.0 21.000000 10.000000 69.000000
198750.0 15.0 0 30.0 12.000000 63.000000 25.000000
198754.0 15.0 0 30.0 26.000000 38.000000 36.000000
271425.0 15.0 0 30.0 23.067675 9.978338 66.953987
271431.0 15.0 0 30.0 14.000000 55.000000 31.000000
[7]:
grid_median = out_grid.sel(hzdept_r=75).groupby(out_grid.mukey.sel(hzdept_r=75)).median()
grid_median.to_dataframe()
[7]:
hzdept_r spatial_ref hzdepb_r claytotal_r sandtotal_r silttotal_r
mukey
198692.0 75.0 0 90.0 23.000000 7.000000 70.000000
198714.0 75.0 0 90.0 7.800000 86.466667 5.733333
198724.0 75.0 0 90.0 21.000000 10.000000 69.000000
198750.0 75.0 0 90.0 12.000000 63.000000 25.000000
198754.0 75.0 0 90.0 26.000000 38.000000 36.000000
271425.0 75.0 0 90.0 24.564966 10.120497 65.314537
271431.0 75.0 0 90.0 8.333333 74.666667 17.000000