Skip to content
O2A Documentation

streams and dataset IDs – the hidden champions

The O2A STREAMS provide near real-time (NRT) data, either for monitoring purposes in O2A DASHBOARDS or to use the data in downstream applications, such as follow-polarstern or in sea ice portal.

With the release of streams users are able to post "their" data to the NRT database themselves. Only constrains are:

get metadata

python
import requests
import json
import pandas as pd
from _io import StringIO

api_url = "https://ingest.o2a-data.de/rest/"
urn1 = "vessel:meteor:tsg_meteor:tsg_stb_meteor:salinity"
time1 = "2025-08-02T00:00:00"
time2 = "2025-08-02T23:59:59"

resp = requests.get(api_url +
                    "datasets?where=streams.code=IN=(" +
                    urn1 +
                    ");datetimeMax<='" +
                    time2 +
                    "';datetimeMin>='" +
                    time1 +
                    "'"
                    )

json.loads(resp.content)

The resulting json output is:

json
{
  "offset": 0,
  "hits": 1,
  "totalHits": 1,
  "records": [
    {
      "id": 4839394,
      "name": "",
      "datetime": "2025-08-04T02:35:51.587745",
      "datetimeMin": "2025-08-02T00:00:00",
      "datetimeMax": "2025-08-02T23:59:58",
      "values": 699686,
      "username": ""
    }
  ],
  "duration": 41
}

In this case only one record is available for the request.

  • id refers to the dataset ID, it is a unique identifier
  • datetime is the time when the dataset entered the database
  • datetimeMin is the earliest time of the data itself
  • datetimeMax is the latest time of the data itself
  • values is the count of all values in the dataset, it also includes the datetime element
  • username is the username who was responsible for ingestion, if '' or i.ngest@awi.de is the author the dataset was ingested centralized

Be aware that rather unspecific requests might result in huge server-side responses. Therefore it is strongly recommended apply filtering on the server-side by rsql (some hints might be found here ). Larger responses need to be paginated.

If you know a specific dataset ID its context info can be retrieved like this:

python
resp = requests.get(api_url +
                    "datasets/" +
                    str(4839394)
                    )

json.loads(resp.content)

The output looks familiar:

json
{
  "id": 4839394,
  "name": "",
  "datetime": "2025-08-04T02:35:51.587745",
  "datetimeMin": "2025-08-02T00:00:00",
  "datetimeMax": "2025-08-02T23:59:58",
  "values": 699686,
  "username": ""
}

There is more to discover -- the streams itself. Basically each stream represents one parameter URN and the corresponding data. We read the streams for the dataset ID 4839394 and print the first three items of the output (list).

python
resp = requests.get(api_url +
                    "datasets/" +
                    str(4839394) +
                    "/streams"
                    )

json.loads(resp.content)[0:3]
  • itemId numeric integer id of the item as in REGISTRY
  • itemUuid technically a string, uuid of the item as in REGISTRY
  • itemUrl the link leading to the item in REGISTRY
  • code string, the parameter code as in REGISTRY
  • id integer, the numeric of the data stream
  • unit string, the unit in which the data is measured
json
[
  {
    "itemId": 6072,
    "itemUuid": "0cc456ca-517b-475e-ad41-f0a42b3ac36c",
    "itemUrl": "https://registry.o2a-data.de/items/6072",
    "code": "vessel:meteor:course",
    "id": 4291,
    "unit": ""
  },
  {
    "itemId": 6072,
    "itemUuid": "0cc456ca-517b-475e-ad41-f0a42b3ac36c",
    "itemUrl": "https://registry.o2a-data.de/items/6072",
    "code": "vessel:meteor:headt",
    "id": 7792,
    "unit": "deg"
  },
  {
    "itemId": 6072,
    "itemUuid": "0cc456ca-517b-475e-ad41-f0a42b3ac36c",
    "itemUrl": "https://registry.o2a-data.de/items/6072",
    "code": "vessel:meteor:poslat",
    "id": 4287,
    "unit": ""
  }
]

As can be seen poslat and headt are not properly defined in REGISTRY, since the units are empty ''. Nonetheless the respective data streams contain numeric content.

get data data

By a litte extension of the call the data itself can be downloaded as well. In this case the content is retrieved as json. As a json it gives some extra. By asking for the keys it can be seen what is available.

python
resp = requests.get(api_url +
                    "datasets/" +
                    str(4839394) +
                    "/data" +
                    "?format=application/json",
                    headers = {"accept": "application/json"},
                    )

a=json.loads(resp.content)

a.keys()
dict_keys(['datetimeMin', 'datetimeMax', 'withQualityFlags', 'sensors', 'data'])

a['sensors']
['vessel:meteor:course', 'vessel:meteor:headt', 'vessel:meteor:poslat', 'vessel:meteor:poslon', 'vessel:meteor:sound_velocity', 'vessel:meteor:speed_over_ground', 'vessel:meteor:tsg_meteor:tsg_bb_meteor:conductivity', 'vessel:meteor:tsg_meteor:tsg_bb_meteor:sound_velocity_external', 'vessel:meteor:tsg_meteor:tsg_bb_meteor:density', 'vessel:meteor:tsg_meteor:tsg_bb_meteor:salinity', 'vessel:meteor:tsg_meteor:tsg_bb_meteor:sound_velocity_internal', 'vessel:meteor:tsg_meteor:tsg_bb_meteor:water_temperature_sbe45', 'vessel:meteor:tsg_meteor:tsg_bb_meteor:water_temperature_sbe38', 'vessel:meteor:tsg_meteor:tsg_stb_meteor:conductivity', 'vessel:meteor:tsg_meteor:tsg_stb_meteor:sound_velocity_external', 'vessel:meteor:tsg_meteor:tsg_stb_meteor:density', 'vessel:meteor:tsg_meteor:tsg_stb_meteor:salinity', 'vessel:meteor:tsg_meteor:tsg_stb_meteor:sound_velocity_internal', 'vessel:meteor:tsg_meteor:tsg_stb_meteor:water_temperature_sbe45', 'vessel:meteor:tsg_meteor:tsg_stb_meteor:water_temperature_sbe38']

a['data'][0:2]
[['2025-08-02T00:00:00.000', 309.1, 309.2, 43.927850416666665, -35.98171421666667, 1525.6, 10.5, None, None, None, None, None, None, None, None, None, None, None, None, None, None], ['2025-08-02T00:00:01.000', 308.5, 309.1, 43.92788051666667, -35.981766633333336, 1525.6, 10.4, None, None, None, None, None, None, None, None, None, None, None, None, None, None]]

A little less structured, but much more performative (from a database perspective) is the retrieval of tab-separated values. The tabular output is converted to a pandas dataframe.

python
resp = requests.get(api_url +
                    "datasets/" +
                    str(4839394) +
                    "/data" +
                    "?format=text/tab-separated-values"
                    )

a = pd.read_csv(StringIO(resp.text), sep="\t")

a.columns
Index(['datetime', 'vessel:meteor:course []', 'vessel:meteor:headt [deg]',
       'vessel:meteor:poslat []', 'vessel:meteor:poslon []',
       'vessel:meteor:sound_velocity [m/s]',
       'vessel:meteor:speed_over_ground [knot]',
       'vessel:meteor:tsg_meteor:tsg_bb_meteor:conductivity [S/m]',
       'vessel:meteor:tsg_meteor:tsg_bb_meteor:sound_velocity_external [m/s]',
       'vessel:meteor:tsg_meteor:tsg_bb_meteor:density [kg/m3]',
       'vessel:meteor:tsg_meteor:tsg_bb_meteor:salinity [PSU]',
       'vessel:meteor:tsg_meteor:tsg_bb_meteor:sound_velocity_internal [m/s]',
       'vessel:meteor:tsg_meteor:tsg_bb_meteor:water_temperature_sbe45 [°C]',
       'vessel:meteor:tsg_meteor:tsg_bb_meteor:water_temperature_sbe38 [°C]',
       'vessel:meteor:tsg_meteor:tsg_stb_meteor:conductivity [S/m]',
       'vessel:meteor:tsg_meteor:tsg_stb_meteor:sound_velocity_external [m/s]',
       'vessel:meteor:tsg_meteor:tsg_stb_meteor:density [kg/m3]',
       'vessel:meteor:tsg_meteor:tsg_stb_meteor:salinity [PSU]',
       'vessel:meteor:tsg_meteor:tsg_stb_meteor:sound_velocity_internal [m/s]',
       'vessel:meteor:tsg_meteor:tsg_stb_meteor:water_temperature_sbe45 [°C]',
       'vessel:meteor:tsg_meteor:tsg_stb_meteor:water_temperature_sbe38 [°C]'],
      dtype='object')

a.head()
                  datetime  vessel:meteor:course []  ...  vessel:meteor:tsg_meteor:tsg_stb_meteor:water_temperature_sbe45 [°C]  vessel:meteor:tsg_meteor:tsg_stb_meteor:water_temperature_sbe38 [°C]
0  2025-08-02T00:00:00.000                    309.1  ...                                                NaN                                                                   NaN
1  2025-08-02T00:00:01.000                    308.5  ...                                                NaN                                                                   NaN
2  2025-08-02T00:00:02.000                    308.2  ...                                                NaN                                                                   NaN
3  2025-08-02T00:00:03.000                    308.4  ...                                                NaN                                                                   NaN
4  2025-08-02T00:00:04.000                    308.8  ...                                                NaN                                                                   NaN

[5 rows x 21 columns]