Vocabulary Mapping
| Author | Peter Konopatzky |
|---|---|
| Technical Contacts | Peter Konopatzky, Andreas Walter |
| Version | 0.1 |
This document formely lived on AWI Confluence and is only relevant for dataproducts based on the (deprecated) GeoCSV (.sdi.tab + .sdi.meta.json) as exchange format.
In context of MareHUB and the Viewer on Marine Data data from different sources gets processed and provided as OGC Web Services. Mapping incoming data to unified names (target vocabulary) is part of this process. This page offers technical file specifications used during this process. It is not about the process (however, if in special cases there is process information, it is marked as such). For general information on processes, see the page on Standard Operating Procedures.
Most important related reading might be this page about Data Harmonization and the Mapping Principle.
Overview
There are two "types" of files. Those adding mapping rules and those introducing target vocabulary data can be mapped to. Mapping rules do not work without the according target vocabulary. Both are tab-separated files, only differing in column names.
Base specs as follows:
- tab-separated text file
- UTF-8-encoding
- file extension:
.sdi.mapping.tab - order of columns matter
- columns without values (e.g.
sphere_name) can be dropped - custom columns can be appended but will get ignored (might be useful for future verification, for example)
Target Vocabulary Files
Units
| column order | column header | value is mandatory | description |
|---|---|---|---|
| 1 | unit_name | yes | name of unit |
| 2 | unit_symbol | yes | symbol of unit |
| 3+ | whatever | no | can be used for comments, notes or reminders – will get technically ignored |
Example
tsv
unit_name unit_symbol
meter m
square meter m²
degree °
degree Celsius °C
meter per second m/s
centimeter per second cm/sunit_name | unit_symbol |
|---|---|
meter | m |
square meter | m² |
degree | ° |
degree Celsius | °C |
meter per second | m/s |
centimeter per second | cm/s |
Parameters
Column header for parameter vocabulary is parameter_group because currently it is used as rough grouping instead of precise mapping.
| column order | column header | value is mandatory | description |
|---|---|---|---|
| 1 | parameter_group | yes | name of parameter or parameter group |
| 2 | parameder_sdn | no | SDN of parameter |
| 3 | parameter_nerc_uri | no | NERC URI of parameter |
| 4+ | whatever | no | can be used for comments, notes or reminders – will get technically ignored |
Example
tsv
parameter_group
Chlorophyll
Salinity
Sample ID
Temperatureparameter_group |
|---|
Chlorophyll |
Salinity |
Sample ID |
Temperature |
Methods
Column header for method vocabulary is method_group because currently it is used as rough grouping instead of precise mapping.
| column order | column header | value is mandatory | description |
|---|---|---|---|
| 1 | parameter_group | yes | name of method or method group |
| 2+ | whatever | no | can be used for comments, notes or reminders – will get technically ignored |
Example
tsv
method_group
ungrouped
unspecified
direct
indirect
count
electricmethod_group |
|---|
ungrouped |
unspecified |
direct |
indirect |
count |
electric |
Spheres
Process information: Status Quo is NERC spheres. Please consult AG Seafloor/Ocean Obs (Norbert Anselm) and AG Portal/Viewer (Peter Konopatzky) before introducing other sphere vocabulary.
| column order | column header | value is mandatory | description |
|---|---|---|---|
| 1 | sphere_name | yes | name of sphere |
| 2 | sphere_sdn | no | SDN of sphere |
| 3 | sphere_nerc_uri | no | NERC URI of sphere |
| 4+ | whatever | no | can be used for comments, notes or reminders – will get technically ignored |
Status Quo
tsv
sphere_name sphere_sdn sphere_nerc_uri
atmosphere SDN:S21::S21S001 http://vocab.nerc.ac.uk/collection/S21/current/S21S001/1/
water body SDN:S21::S21S027 http://vocab.nerc.ac.uk/collection/S21/current/S21S027/1/
surface ice SDN:S21::S21S009 http://vocab.nerc.ac.uk/collection/S21/current/S21S009/1/
rock SDN:S21::S21S038 http://vocab.nerc.ac.uk/collection/S21/current/S21S038/1/
biota SDN:S21::S21S037 http://vocab.nerc.ac.uk/collection/S21/current/S21S037/2/
not applicable SDN:S21::S21S017 http://vocab.nerc.ac.uk/collection/S21/current/S21S017/1/
Earth SDN:S21::S21S006 http://vocab.nerc.ac.uk/collection/S21/current/S21S006/1/
bed SDN:S21::S21S003 http://vocab.nerc.ac.uk/collection/S21/current/S21S003/1/
cave atmosphere SDN:S21::S21S033 http://vocab.nerc.ac.uk/collection/S21/current/S21S033/1/
experiment water sample SDN:S21::S21S011 http://vocab.nerc.ac.uk/collection/S21/current/S21S011/2/
geological sample SDN:S21::S21S039 http://vocab.nerc.ac.uk/collection/S21/current/S21S039/1/
groundwater SDN:S21::S21S005 http://vocab.nerc.ac.uk/collection/S21/current/S21S005/1/
peat SDN:S21::S21S019 http://vocab.nerc.ac.uk/collection/S21/current/S21S019/1/
rainwater SDN:S21::S21S020 http://vocab.nerc.ac.uk/collection/S21/current/S21S020/1/
sediment SDN:S21::S21S022 http://vocab.nerc.ac.uk/collection/S21/current/S21S022/2/
sediment pore water SDN:S21::S21S023 http://vocab.nerc.ac.uk/collection/S21/current/S21S023/1/
snow SDN:S21::S21S024 http://vocab.nerc.ac.uk/collection/S21/current/S21S024/1/
stalactite SDN:S21::S21S034 http://vocab.nerc.ac.uk/collection/S21/current/S21S034/2/
stalagmite SDN:S21::S21S025 http://vocab.nerc.ac.uk/collection/S21/current/S21S025/1/
suspended particulate material SDN:S21::S21S026 http://vocab.nerc.ac.uk/collection/S21/current/S21S026/1/
water body plus atmosphere SDN:S21::S21S028 http://vocab.nerc.ac.uk/collection/S21/current/S21S028/1/
wet sediment SDN:S21::S21S031 http://vocab.nerc.ac.uk/collection/S21/current/S21S031/1/Mapping Rules Files
Output columns need to hold target vocabulary established via target vocabulary files.
Unit Mapping
| column order | column header | value is mandatory | description |
|---|---|---|---|
| 1 | unit_string | yes | mapping input: any string that should get mapped |
| 2 | unit_name | yes | mapping output: unit name (see Unit Vocabulary) |
| 3+ | whatever | no | can be used for comments, notes or reminders – will get technically ignored |
Example
tsv
unit_string unit_name comment
°C degree Celsius
◦C degree Celsius weird alternative degree character, found in GLODAP
?C degree Celsius broken encoding, found in COSYNA SOS
degC degree Celsius
cm/s centimeter per secondunit_string | unit_name | comment |
|---|---|---|
°C | degree Celsius | |
◦C | degree Celsius | weird alternative degree character, found in GLODAP |
?C | degree Celsius | broken encoding, found in COSYNA SOS |
degC | degree Celsius | |
cm/s | centimeter per second |
Parameter Mapping
| column order | column header | value is mandatory | description |
|---|---|---|---|
| 1 | parameter_string | yes | mapping input: any string that should get mapped |
| 2 | parameter_group | yes | mapping output: any known method name/group (see Parameter Vocabulary) |
| 3 | sphere_name | yes | mapping output: any known sphere name (see Sphere Vocabulary) |
| 4+ | whatever | no | can be used for comments, notes or reminders – will get technically ignored |
Example
tsv
parameter_string parameter_group sphere_name
AirTemperature Temperature atmosphere
SeaSurfaceTemperature Temperature
TEMP_13.0 Temperature
Temperature Temperatureparameter_string | parameter_group | sphere_name |
|---|---|---|
AirTemperature | Temperature | atmosphere |
SeaSurfaceTemperature | Temperature | |
TEMP_13.0 | Temperature | |
Temperature | Temperature |
Method Mapping
| column order | column header | value is mandatory | description |
|---|---|---|---|
| 1 | parameter_group | yes | mapping input: any known parameter name (see Parameter Vocabulary) |
| 2 | method_string | yes | mapping input: any string that should get mapped |
| 3 | method_group | yes | mapping output: any known method name/group (see Method Vocabulary) |
| 4 | sphere_name | no | mapping output: any known sphere name (see Sphere Vocabulary) |
| 5+ | whatever | no | can be used for comments, notes or reminders – will get technically ignored |
Example
tsv
parameter_group method_string method_group
Chlorophyll High Performance Liquid Chromatography direct
Chlorophyll Fluorometry indirect
Chlorophyll Acetone extraction (Turner Designs) indirectparameter_group | method_string | method_group |
|---|---|---|
Chlorophyll | High Performance Liquid Chromatography | direct |
Chlorophyll | Fluorometry | indirect |
Chlorophyll | Acetone extraction (Turner Designs) | indirect |
Combined Example
Imagine you have the following data you want to have integrated into our SDI. It already comes in handy O2A GeoCSV (note from 2026: deprecated) format, including metadata files. Two data files (and two metadata files) with comparable datausing different vocabulary, and almost none of them using the vocabulary you want.
json
{
"version": "2.0",
"events": [
{
"name": "Kono's Trip"
}
],
"parameters": [
{
"name": "Caffeine Level",
"unit": "clicks/minute"
},
{
"name": "Blutalkoholkonzentration",
"unit": "Promille",
"method": "ACE Breathalyser AF - 33"
}
]
}tsv
date_time_start event_name Caffeine Level [clicks/minute] Blutalkoholkonzentration [Promille] geometry
1982-12-29T11:02:00 Kono's Trip 200 0.20 POINT(-4.3 49.6)
1982-12-29T11:45:00 Kono's Trip 121 1.10 POINT(-4.3 49.6)
1982-12-29T13:21:00 Kono's Trip 84 0.40 POINT(-4.3 49.6)json
{
"version": "2.0",
"events": [
{
"name": "Andreas' Adventure"
}
],
"parameters": [
{
"name": "caffeine level",
"unit": "clicks/min"
},
{
"name": "alcohol concentration",
"unit": "‰",
"method": "YOMA Alcohol Tester"
}
]
}date_time_start event_name caffeine level [clicks/min] alcohol concentration [‰] geometry
1982-12-30T11:02:00 Andreas' Adventure 156 0.40 POINT(-1.3 50.6)
1982-12-30T11:45:00 Andreas' Adventure 144 1.00 POINT(-1.3 50.6)
1982-12-30T13:21:00 Andreas' Adventure 112 0.50 POINT(-1.3 50.6)The following mapping files would be good solution to properly add above data to our SDI and have it integrated into VEF-based viewers. The most important part is the parameter mapping. Without this, data cannot be integrated into our parameter measurement layers. Unit and method mapping are recommended for proper filtering but can be left out. In any case both source strings/names and mapping results will be shown/accessible in viewers.
tsv
unit_name unit_symbol
permille ‰
clicks per minute cpmtsv
parameter_group
caffeine level
blood alcohol contenttsv
method_group
breathalyzertsv
unit_string unit_name
clicks/minute clicks per minute
clicks/min◦C clicks per minute
Promille permille
‰ permilletsv
parameter_string parameter_group
Caffeine Level caffeine level
caffeine level caffeine level
Blutalkoholkonzentration blood alcohol content
alcohol concentration blood alcohol contenttsv
parameter_group method_string method_group
blood alcohol content ACE Breathalyser AF - 33 breathalyzer
blood alcohol content YOMA Alcohol Tester breathalyzer