Example - Mapping Grid Data to Vector Data

This is useful in the case where you want to get statistics for a specific raster over a certain region. In this example, the vector data is a random region with soil information using SSURGO data.

[1]:

import json

import geopandas

from geocube.api.core import make_geocube

%matplotlib inline

[2]:

ssurgo_data = geopandas.read_file("../../test/test_data/input/soil_data_group.geojson")

[3]:

ssurgo_data.head()

[3]:

	cokey	mukey	drclassdcd	hzdept_r	chkey	hzdepb_r	claytotal_r	sandtotal_r	silttotal_r	geometry
0	12577452	271425	Somewhat poorly drained	0.0	100034090	5.0	23.067675	9.978338	66.953987	MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...
1	12577452	271425	Somewhat poorly drained	5.0	100034090	15.0	23.067675	9.978338	66.953987	MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...
2	12577452	271425	Somewhat poorly drained	15.0	100034091	30.0	23.067675	9.978338	66.953987	MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...
3	12577452	271425	Somewhat poorly drained	30.0	100034092	45.0	23.067675	9.978338	66.953987	MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...
4	12577452	271425	Somewhat poorly drained	45.0	100034093	60.0	23.231643	9.961941	66.806416	MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ...

[4]:

# convert the key to group to the vector data to an integer as that is one of the
# best data types for this type of mapping. If your data is not integer,
# then consider using a mapping of your data to an integer with something
# like a categorical dtype.
ssurgo_data["mukey"] = ssurgo_data.mukey.astype(int)

Convert data to grid

See docs for make_geocube

[5]:

out_grid = make_geocube(
    vector_data=ssurgo_data,
    group_by='hzdept_r',
    resolution=(-0.0001, 0.0001)
)
out_grid

[5]:

<xarray.Dataset>
Dimensions:      (hzdept_r: 11, x: 165, y: 165)
Coordinates:
  * y            (y) float64 41.5 41.5 41.5 41.5 ... 41.48 41.48 41.48 41.48
  * x            (x) float64 -90.6 -90.6 -90.6 -90.6 ... -90.58 -90.58 -90.58
  * hzdept_r     (hzdept_r) float64 0.0 5.0 15.0 30.0 ... 90.0 105.0 120.0 150.0
    spatial_ref  int64 0
Data variables:
    mukey        (hzdept_r, y, x) float64 1.988e+05 1.988e+05 ... 1.987e+05
    hzdepb_r     (hzdept_r, y, x) float64 5.0 5.0 5.0 5.0 ... 180.0 180.0 180.0
    claytotal_r  (hzdept_r, y, x) float64 26.0 26.0 26.0 26.0 ... 21.0 21.0 21.0
    sandtotal_r  (hzdept_r, y, x) float64 38.0 38.0 38.0 38.0 ... 10.0 10.0 10.0
    silttotal_r  (hzdept_r, y, x) float64 36.0 36.0 36.0 36.0 ... 69.0 69.0 69.0
Attributes:
    grid_mapping:  spatial_ref

Get the mean/median of each region using the unique ID

[6]:

grid_mean = out_grid.sel(hzdept_r=15).groupby(out_grid.mukey.sel(hzdept_r=15)).mean()
grid_mean.to_dataframe()

[6]:

	hzdept_r	spatial_ref	hzdepb_r	claytotal_r	sandtotal_r	silttotal_r
mukey
198692.0	15.0	0	30.0	23.000000	7.000000	70.000000
198714.0	15.0	0	30.0	5.000000	87.000000	8.000000
198724.0	15.0	0	30.0	21.000000	10.000000	69.000000
198750.0	15.0	0	30.0	12.000000	63.000000	25.000000
198754.0	15.0	0	30.0	26.000000	38.000000	36.000000
271425.0	15.0	0	30.0	23.067675	9.978338	66.953987
271431.0	15.0	0	30.0	14.000000	55.000000	31.000000

[7]:

grid_median = out_grid.sel(hzdept_r=75).groupby(out_grid.mukey.sel(hzdept_r=75)).median()
grid_median.to_dataframe()

[7]:

	hzdept_r	spatial_ref	hzdepb_r	claytotal_r	sandtotal_r	silttotal_r
mukey
198692.0	75.0	0	90.0	23.000000	7.000000	70.000000
198714.0	75.0	0	90.0	7.800000	86.466667	5.733333
198724.0	75.0	0	90.0	21.000000	10.000000	69.000000
198750.0	75.0	0	90.0	12.000000	63.000000	25.000000
198754.0	75.0	0	90.0	26.000000	38.000000	36.000000
271425.0	75.0	0	90.0	24.564966	10.120497	65.314537
271431.0	75.0	0	90.0	8.333333	74.666667	17.000000