Example - Mapping Grid Data to Vector Data
This is useful in the case where you want to get statistics for a specific raster over a certain region. In this example, the vector data is a random region with soil information using SSURGO data.
[1]:
import json
import geopandas
from geocube.api.core import make_geocube
%matplotlib inline
[2]:
ssurgo_data = geopandas.read_file("../../test/test_data/input/soil_data_group.geojson")
[3]:
ssurgo_data.head()
[3]:
cokey | mukey | drclassdcd | hzdept_r | chkey | hzdepb_r | claytotal_r | sandtotal_r | silttotal_r | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 12577452 | 271425 | Somewhat poorly drained | 0.0 | 100034090 | 5.0 | 23.067675 | 9.978338 | 66.953987 | MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ... |
1 | 12577452 | 271425 | Somewhat poorly drained | 5.0 | 100034090 | 15.0 | 23.067675 | 9.978338 | 66.953987 | MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ... |
2 | 12577452 | 271425 | Somewhat poorly drained | 15.0 | 100034091 | 30.0 | 23.067675 | 9.978338 | 66.953987 | MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ... |
3 | 12577452 | 271425 | Somewhat poorly drained | 30.0 | 100034092 | 45.0 | 23.067675 | 9.978338 | 66.953987 | MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ... |
4 | 12577452 | 271425 | Somewhat poorly drained | 45.0 | 100034093 | 60.0 | 23.231643 | 9.961941 | 66.806416 | MULTIPOLYGON (((-90.59735 41.49255, -90.59730 ... |
[4]:
# convert the key to group to the vector data to an integer as that is one of the
# best data types for this type of mapping. If your data is not integer,
# then consider using a mapping of your data to an integer with something
# like a categorical dtype.
ssurgo_data["mukey"] = ssurgo_data.mukey.astype(int)
Convert data to grid
See docs for make_geocube
[5]:
out_grid = make_geocube(
vector_data=ssurgo_data,
group_by='hzdept_r',
resolution=(-0.0001, 0.0001)
)
out_grid
[5]:
<xarray.Dataset>
Dimensions: (hzdept_r: 11, x: 165, y: 165)
Coordinates:
* y (y) float64 41.5 41.5 41.5 41.5 ... 41.48 41.48 41.48 41.48
* x (x) float64 -90.6 -90.6 -90.6 -90.6 ... -90.58 -90.58 -90.58
* hzdept_r (hzdept_r) float64 0.0 5.0 15.0 30.0 ... 90.0 105.0 120.0 150.0
spatial_ref int64 0
Data variables:
mukey (hzdept_r, y, x) float64 1.988e+05 1.988e+05 ... 1.987e+05
hzdepb_r (hzdept_r, y, x) float64 5.0 5.0 5.0 5.0 ... 180.0 180.0 180.0
claytotal_r (hzdept_r, y, x) float64 26.0 26.0 26.0 26.0 ... 21.0 21.0 21.0
sandtotal_r (hzdept_r, y, x) float64 38.0 38.0 38.0 38.0 ... 10.0 10.0 10.0
silttotal_r (hzdept_r, y, x) float64 36.0 36.0 36.0 36.0 ... 69.0 69.0 69.0
Attributes:
grid_mapping: spatial_ref
Get the mean/median of each region using the unique ID
[6]:
grid_mean = out_grid.sel(hzdept_r=15).groupby(out_grid.mukey.sel(hzdept_r=15)).mean()
grid_mean.to_dataframe()
[6]:
hzdept_r | spatial_ref | hzdepb_r | claytotal_r | sandtotal_r | silttotal_r | |
---|---|---|---|---|---|---|
mukey | ||||||
198692.0 | 15.0 | 0 | 30.0 | 23.000000 | 7.000000 | 70.000000 |
198714.0 | 15.0 | 0 | 30.0 | 5.000000 | 87.000000 | 8.000000 |
198724.0 | 15.0 | 0 | 30.0 | 21.000000 | 10.000000 | 69.000000 |
198750.0 | 15.0 | 0 | 30.0 | 12.000000 | 63.000000 | 25.000000 |
198754.0 | 15.0 | 0 | 30.0 | 26.000000 | 38.000000 | 36.000000 |
271425.0 | 15.0 | 0 | 30.0 | 23.067675 | 9.978338 | 66.953987 |
271431.0 | 15.0 | 0 | 30.0 | 14.000000 | 55.000000 | 31.000000 |
[7]:
grid_median = out_grid.sel(hzdept_r=75).groupby(out_grid.mukey.sel(hzdept_r=75)).median()
grid_median.to_dataframe()
[7]:
hzdept_r | spatial_ref | hzdepb_r | claytotal_r | sandtotal_r | silttotal_r | |
---|---|---|---|---|---|---|
mukey | ||||||
198692.0 | 75.0 | 0 | 90.0 | 23.000000 | 7.000000 | 70.000000 |
198714.0 | 75.0 | 0 | 90.0 | 7.800000 | 86.466667 | 5.733333 |
198724.0 | 75.0 | 0 | 90.0 | 21.000000 | 10.000000 | 69.000000 |
198750.0 | 75.0 | 0 | 90.0 | 12.000000 | 63.000000 | 25.000000 |
198754.0 | 75.0 | 0 | 90.0 | 26.000000 | 38.000000 | 36.000000 |
271425.0 | 75.0 | 0 | 90.0 | 24.564966 | 10.120497 | 65.314537 |
271431.0 | 75.0 | 0 | 90.0 | 8.333333 | 74.666667 | 17.000000 |