Add and use custom metadata in python_metadata

Here we show how the use the custom metadata reader class to add additional variables to the python metadata stored with the ISMN time series.

Data setup

Here we use one of the testdata samples provided in this package (stored in the test_data folder). This archive contains 2 sensors at 2 stations in the COSMOS network and 2 sensors at the fraye station of the FR_Aqui network. The goal is to assign an additional metadata variable to the sensors at the ‘fray’ station. The data is taken from the VODCA archive (https://zenodo.org/record/2575599) and describes vegetation density on Jan 1st 2010. We store the value in a csv file (vod.csv in the same directory as this notebook) structured like this (in our example only for one station, but normally we would add a line for as many ISMN stations as possible):

network;station;vod_k;vod_x
FR_Aqui;fraye;0.64922965;0.39021793

Set metadata reader

Then we set up the metadata reader. Here we use one of the predefined readers, but you can (and usually have to) also write your own reader as long as it inherits from the abstract class ismn.custom.CustomMetaReader and implements a function read_metadata which uses the information from previously loaded metadata for a station to find the matching entries in the provided data, and either returns a ismn.meta.MetaData object or a dictionary of metadata variables and the according values. Normally you use either the station latitude, longitude and sometimes also the sensor depth information; maybe even the station name.

[2]:
from ismn.interface import ISMN_Interface
import shutil
import tempfile
from ismn.custom import CustomStationMetadataCsv
[3]:
my_meta_reader = CustomStationMetadataCsv('vod.csv')

This custom metadata reader is now passed to the ISMN Interface (you can also pass more than one). Upon collecting metadata for all sensors, it will compare the station and network name with the ones provided in the csv file, and add the new metadata variable to the python_metadata when a matching case is found. If the python_metadata folder already exists, it must be deleted before the collection can happen.

[4]:
with tempfile.TemporaryDirectory() as meta_path:
    ds = ISMN_Interface('../../tests/test_data/Data_seperate_files_20170810_20180809', custom_meta_reader=(my_meta_reader,), meta_path=meta_path)
Processing metadata for all ismn stations into folder ../../tests/test_data/Data_seperate_files_20170810_20180809.
This may take a few minutes, but is only done once...
Hint: Use `parallel=True` to speed up metadata generation for large datasets
Files Processed: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  4.97it/s]
Metadata generation finished after 0 Seconds.
Metadata and Log stored in /tmp/tmp8omp4vpr
Found existing ismn metadata in /tmp/tmp8omp4vpr/Data_seperate_files_20170810_20180809.csv.

The newly added values are now found in the metadata for the ‘fraye’ station.

[5]:
ds['FR_Aqui']['fraye'].metadata[['vod_k', 'vod_x']]

[5]:
MetaData([
  MetaVar([vod_k, 0.64922965, None]),
  MetaVar([vod_x, 0.39021793, None])
])

But not for other stations (in this case pandas automatically assigns NaN to the variable).

[13]:
ds['COSMOS']['ARM-1'][0].metadata[['vod_k', 'vod_x']]
[13]:
MetaData([
  MetaVar([vod_k, nan, None]),
  MetaVar([vod_x, nan, None])
])

We can now use them as any other metadata variable, e.g. to find the station with a specific value.

[24]:
ids = ds.get_dataset_ids(variable='soil_moisture', filter_meta_dict={'vod_k': 0.64922965})
data, meta = ds.read(ids, return_meta=True)
meta
[24]:
2
variable key
clay_fraction val 4.0
depth_from 0.0
depth_to 0.3
climate_KG val Cfb
climate_insitu val unknown
elevation val 52.42
instrument val ThetaProbe-ML2X
depth_from 0.05
depth_to 0.05
latitude val 44.467
lc_2000 val 70
lc_2005 val 70
lc_2010 val 70
lc_insitu val unknown
longitude val -0.7269
network val FR_Aqui
organic_carbon val 2.18
depth_from 0.0
depth_to 0.3
sand_fraction val 87.0
depth_from 0.0
depth_to 0.3
saturation val 0.49
depth_from 0.0
depth_to 0.3
silt_fraction val 9.0
depth_from 0.0
depth_to 0.3
station val fraye
timerange_from val 2013-08-13 10:00:00
timerange_to val 2020-01-01 00:00:00
variable val soil_moisture
depth_from 0.05
depth_to 0.05
vod_k val 0.64923
vod_x val 0.390218