IGSN Catalogs at WDC-Terra /igsnoaip/oai Endpoint

Analysis of all IGSN catalogs at WDC-Terra, hosted by GFZ Potsdam.

Note

The following does not have to do much with mDIS in particular. It is still useful background knowledge for anyone who intends to work with IGSNs, the International Generic Sample Number.
mDIS supports generation (or rather pre-allocation) of IGSNs as a built-in feature, but mDIS does not centrally register IGSNs on the public internet.

How to query WDC-Terra for IGSNs. Parse XML responses.

This article shows how to turn IGSN XML into useful information.

Use command-line tools to query the WDC-Terra IGSN endpoint.

The URL is always: https://doidb.wdc-terra.org/igsnoaip/oai?.

  1. Send HTTP GET requests to WDC-Terra, GFZ Potsdam, Germany with curl.
  2. It is always above URL, but with different query string parameters:
    • Identify,ListMetadataFormats,ListSets,GetRecord,ListIdentifiers,ListRecords.
    • and other common parameters, e.g. for filtering and pagination.
  3. Parse XML with xmllint and xidel, and format with csvlook, jq, and perl.

All queries below are made against the WDC-Terra IGSN endpoint:

https://doidb.wdc-terra.org/igsnoaip/oai

The following code snippets are Bash shellscripts. They demonstrate a drill-down into WDC-Terra, in order to learn more about IGSNs.

Exploratory queries

There exists a Search-GUI for all IGSNs stored at GFZ/WDC Terra. The GFZ Dataservices page is https://dataservices.gfz-potsdam.de/igsn-new/ (opens new window).
This page represents a graphical user interface (GUI) to the data services. It is designed for interactive use by non-programmers, and has very limited capabilities.

For developers, however, API access to the sample metadata is also possible.

The API is not RESTful. Rather it is a "Metadata-Harvesting, OAI-PMH" interface. OAI-PMH is also quite limited. That standard is described in OAI-PMH (opens new window). The specification was developed in 2002-2004 by the Open Archives Initiative (OAI).
OAI-PMH offers only 6 verbs (Identify,ListMetadataFormats,ListSets,GetRecord,ListIdentifiers,ListRecords), which must be used as query string parameters: verb=ListMetadataFormats for example. See "verbType" in its xml schema (opens new window).
OAI-PMH predates REST. For simple read-only access we can still use this old limited standard.

Examples

Metadata Output Formats

# 1) find all metadata Formats in WDC-Terra -IGSN-Endpoint
url="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb="verb=ListMetadataFormats"
curl -ksL "${url}${url_verb}" \
  | xmllint --format -  \
  | xidel -s -e "[//metadataPrefix, //metadataFormat]"
1
2
3
4
5
6

will return basically the following (# comments illustrate key characteristics of the responses in that format)

igsn         # returns XML according to IGSN registration schema
oai_dc       # many elements in "dc:" namespace (Dublin Core)

So there are 2 Metadata formats that this endpoint is able to return: igsn, and oai_dc.
Clients can gather detailed records about repository datasets in each of these formats, or "XML dialects". See below for details.

Wait, that was all? No, these were wrapped in some XML. Let's take a closer look at the response.

What were the element names in the returned XML documents?

# 1) find all metadata Formats in WDC-Terra -IGSN-Endpoint
url="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb="verb=ListMetadataFormats"
curl -ksL "${url}${url_verb}" |  xidel -s -e "distinct-values(//*/name())"
1
2
3
4

This command pipeline will return the following XML element names (< ... > Brackets removed for readability):

OAI-PMH
responseDate
request
ListMetadataFormats
metadataFormat
metadataPrefix         # <==== this was the interesting element here
schema
metadataNamespace

How do metadataformats igsn and oai_dc differ?

To answer this question, we need a single identifier from any catalog (see below), and then use that to get at least one record from the catalog.


# set some variables, will be used in the following examples / codeblocks
url_endpoint="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb="verb=ListIdentifiers"
url_params="&set=GFZ&from=2022-01-01"

url_mdfmt_oai_dc="&metadataPrefix=oai_dc"
url_mdfmt_igsn="&metadataPrefix=igsn"

1
2
3
4
5
6
7
8
9

Lets'use the GFZ catalog. (For a list and how to get the list, see below).


curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_oai_dc}" \
  |  xmllint --format - \
  |  xidel -s -e "(//header)[1]/identifier"
1
2
3
4

will return oai:registry.igsn.org:10432760

Two metadata formats are supported.

Full XML instance documents are not shown here, for brevity.

Rather we will get high-level info about this record, each in the distinct metadata formats igsn and oai_dc. Just count the number of lines in the responses:

url_verb="verb=GetRecord"
url_params="&identifier=oai:registry.igsn.org:10432760"
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_oai_dc}" \
  | xmllint --format - | wc -l # 23
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_igsn}" \
  | xmllint --format - | wc -l # 27
1
2
3
4
5
6

These are the element names in the returned XML documents:

curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_igsn}"   \
  | xmllint --format - \
  |  xidel -s -e "distinct-values(//*/name())" 
1
2
3

Elements

The elements in the two different metadata formats oai_dc,igsn are:

# oai_dc igsn
1 OAI-PMH OAI-PMH
2 responseDate responseDate
3 request request
4 GetRecord GetRecord
5 record record
6 header header
7 identifier identifier
8 datestamp datestamp
9 setSpec setSpec
10 metadata metadata
11 oai_dc:dc sample
12 dc:creator sampleNumber
13 dc:identifier registrant
14 registrantName
15 log
16 logElement

oai_dc Response

There are two similarly named identifier elements in the oai_dc schema: identifier and dc:identifier.

What do their values look like?

curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_oai_dc}"  \
  | xmllint --format - \
  | xidel  -s -e "[//header/identifier, //dc:identifier]"
1
2
3

Result:

["oai:registry.igsn.org:10432760",
["http://hdl.handle.net/10273/GFOTN0016", "igsn:10273/GFOTN0016"]]

The dc:identifier element is actually an array with 2 equivalent IGSN representations: as URL and IGSN handle.

(Check out the actual link target in the response,
http://hdl.handle.net/10273/GFOTN0016 (opens new window)): a landing page.

Ultimately, the IGSN gets resolved to
https://dataservices.gfz-potsdam.de/igsn/esg/index.php?igsn=GFOTN0016 (opens new window).

That is not an OAI-PMH search interface but a simple HTML landing page.

igsn Schema Response

2 Identifiers

There are two identifier elements in the oai_dc schema: identifier and sampleNumber.

What do their values look like?

curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_igsn}"  \
  | xmllint --format - \
  | xidel -s -e "{ 'identifier': //header/identifier, 'sampleNumber': //metadata//sampleNumber }"  
  
1
2
3
4

Result:

{
"identifier": "oai:registry.igsn.org:10432760",
"sampleNumber": "10273/GFOTN0016"
}

Note that

  1. For identifier, The 10432760 seems to be an autoincremented value; an id column internal to the catalog.
  2. There is no http or https link given for the sampleNumber property of the processed response. You have to create that link yourself, from the IGSN fragment.

Detailed exploratory queries

All Sets

List all sets / catalogs

Find "Sets" at the IGSN-Endpoint of WDC-Terra. This returns the high-level structure of datasets (or catalogs) in the repository.

url="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb2="verb=ListSets"
# show Element names in the resultset
curl -ksL "${url}${url_verb2}" |   xidel -s -e "distinct-values(//*/name())"
1
2
3
4

XML Element Names:

OAI-PMH
responseDate #
request
ListSets # returns a list of sets set # container element, not a "setter" command setSpec # short name of the set - why not in uppercase? setName # long name of the set - why not uppercase? resumptionToken

Format ListSets resultset as table:

curl -ksL "${url}${url_verb2}" \
  | xmllint --format -  \
  | xidel -s -e "[//setSpec, //setName]" \
  | jq -r '. | transpose| .[] | @tsv' \
  | grep -v "reference quality" \
  | csvlook -H -l 
1
2
3
4
5
6

These are the "Sets" in the IGSN Endpoint in WDC-Terra. The table below shows the text-content of the <setName> and the <setSpec> elements, parsed from the XML output of the query.

# setSpec setName
1 REFQUALITY Reference quality citations only.
2 ANDS Australian National Data Service
3 ANDS.AUSCOPE AuScope
4 CNRS Centre national de la recherche scientifique
5 CNRS.CNRS Centre national de la recherche scientifique
6 CSIRO CSIRO
7 CSIRO.CSIRO CSIRO
8 GEOAUS Geoscience Australia
9 GEOAUS.AU Geoscience Australia
10 GFZ GFZ Allocator
11 GFZ.GFZ Deutsches GeoForschungsZentrum GFZ
12 IEDA Integrated Earth Data Applications
13 IEDA.SESAR System for Earth Sample Registration
14 IFREMER Institut français de recherche pour l'exploitation de la mer
15 IFREMER.IGSN Institut français de recherche pour l'exploitation de la mer
16 KIGAM Korea Institute of Geoscience and Mineral Resources
17 KIGAM.DC Korea Institute of Geoscience and Mineral Resources
18 LITHODAT Lithodat Pty Ltd.
19 LITHODAT.AG AuScope Geochemistry Network
20 LITHODAT.LD LITHODAT PTY LTD
21 MARUM MARUM Center for Marine Environmental Studies at University of Bremen
22 MARUM.HB MARUM Center for Marine Environmental Sciences
23 UKI Universität Kiel
24 UKI.BOT Botanisches Institut
25 UKI.GEOMAR GEOMAR
26 UKI.RZ Universität Kiel - Rechenzentrum

A single setSpec

Taking a closer look at one of these "sets", GFZ.GFZ, which is the setSpec for geosamples submitted by GFZ employees:

# GFZ.GFZ is the setSpec for samples submitted by GFZ employees
url_setspec="https://doidb.wdc-terra.org/igsnoaip/oai?verb=ListIdentifiers"
url_params="&metadataPrefix=igsn&&set=GFZ.GFZ&from=2022-02-01"

# expected Element names in the resultset:
# OAI-PMH responseDate request ListIdentifiers header identifier datestamp setSpec resumptionToken


curl -ksL "${url_setspec}${url_params}"   \
  | xmllint --format -    \
  | xidel -s -e "//header"   \
  | grep -v GFZ \
  | cat -s   \
  | perl -00 -nE "my @l = split(/\n/sg, \$_); \$l[1] =~ s/:+\d\dZ// ;say(qq(\$l[1] \$l[0]))" \
  | perl -pE "s/ +/ /g" \
  | csvcut -d" " -c 2,3 \
  | csvlook -H -l
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

has returned a (slightly edited) list of registration dates and oai-Identifiers for datasets.

First (or last?) 11 rows are:

# Registration Date Searchable Identifier
1 2022-02-17 11:29:00 oai:registry.igsn.org:10456328
2 2022-02-22 15:09:00 oai:registry.igsn.org:10456656
3 2022-02-22 15:10:00 oai:registry.igsn.org:10456657
4 2022-02-22 15:10:00 oai:registry.igsn.org:10456658
5 2022-02-22 15:10:00 oai:registry.igsn.org:10456659
6 2022-02-22 15:10:00 oai:registry.igsn.org:10456660
7 2022-02-22 15:10:00 oai:registry.igsn.org:10456661
8 2022-02-22 15:10:00 oai:registry.igsn.org:10456662
9 2022-02-22 15:10:00 oai:registry.igsn.org:10456663
10 2022-02-22 15:10:00 oai:registry.igsn.org:10456664
11 2022-02-22 15:10:00 oai:registry.igsn.org:10456665

TODO: What are these identifiers exactly?


ListRecords Verb

Get recent Datasets in WDC-Terra with ListRecords verb:

# List of all registered datasets on WDC-Terra, IGSN Endpoint, 
# GFZ.GFZ Set since last 6 months
last_6_months=$(date +"%Y-%m-%d" -d "6 months ago")
url="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb3="verb=ListRecords"
urlparams="&from=${last_6_months}&metadataPrefix=igsn&set=GFZ.GFZ"
# (1) return raw XML, 1 single line 
# (also check for <error></error>)
           curl -ksL  "${url}${url_verb3}${urlparams}"
response=$(curl -ksL  "${url}${url_verb3}${urlparams}")

if [[ $response == *"<error code=\"noRecordsMatch\">"* ]]; then
  last_known="2023-03-28"
  urlparams="&from=${last_known}&metadataPrefix=igsn&set=GFZ.GFZ"
fi

# 4 similar queries (2)-(5):



# (2) return raw XML, pretty-printed, colorized
curl -ksL "${url}${url_verb3}${urlparams}" | xmllint --format - | bat -l  xml -p  -

# (3) return XML, extract+show text only (good for human reading)
curl -ksL "${url}${url_verb3}${urlparams}" | xmllint --format - \
  | xidel -s -e "//record[1]" \
  | perl -pE "s/^\s+$//s"

# (4) transform to json, csvlook with --no-header and line-numbering
# Nr, Date, IGSN
curl -ksL  "${url}${url_verb3}${urlparams}" \
  | xidel -s -e "[//datestamp, //sampleNumber]" \
  | jq -r '. | transpose| .[] | @csv'    \
  | csvlook -H -l \
  | perl -plE "s/ \d\d:\d\d:\d\d\+\d+\d+:\d\d//"

# (5) format output as html fragment (1 link per line)
curl -ksL  "${url}${url_verb3}${urlparams}" \
  | xidel -s -e "[//dc:identifier[0], //sampleNumber]" \
  | jq -r '. | transpose| .[] | @tsv' \
  | perl -nl -E '@F = split /\t/;$i=$F[1];$hdl=qq(http://hdl.handle.net/$F[1]); $i=~s#.+/##;$i=qq(http://igsn.org/$i);say qq(<a href="$hdl"><code>$hdl</code></a> == <a href="$i"><code>$i</code></a> <br/>)'   \
  | tail -5 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

Query (1) and (2) are unformatted XML, and formatted/Pretty-printed XML output, respectively.

These are very lengthy documents, and not shown here. The following queries (3), (4), and (5) return about the same info, in more readable form:

Query (3) returned this today, in 2023:

Textcontent of a single item from ListRecords output.

Output:

  oai:registry.igsn.org:10834375
  2023-03-28T20:33:57Z
  GFZ
  GFZ.GFZ
    10273/GFBNO7002EHG0001
      GFZ Data Services

Check it out:

So the text-content is not very long, generally.
The date is the publishing date, not the submission date. Submission timestamp is hidden as an attribute.

To get those, you would need this command, for example:

# We are using a different verb here and identifier:
#  /igsnoaip/oai?verb=GetRecord&identifier=oai:registry.igsn.org:10432760&metadataPrefix=igsn
# from 2022
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_igsn}"    \
  | xmllint --format -   \
  | xidel  -s -e "[//header/identifier, //metadata//sampleNumber, 
                   //metadata//logElement/@timeStamp, //header/dateStamp]"  
1
2
3
4
5
6
7

Result:

["oai:registry.igsn.org:10432760", "10273/GFOTN0016",
"2017-05-30T00:00:33.237+02:00", "2022-01-20T12:43:59Z"]

Note that there were 5 years between the submission date and the publishing date. (Why? was the record updated/extended 5 years after creation, and this is the update-date)

Query 4 returned this:

(shortened and edited for prettiness)

# Date Resolvable URL
1 2021-11-26 http://hdl.handle.net/10273/GFKW10007
2 2021-11-26 http://hdl.handle.net/10273/GFKW1000F
3 2021-11-26 http://hdl.handle.net/10273/GFKW1000C
4 2021-11-26 http://hdl.handle.net/10273/GFKW10006
5 2021-11-26 http://hdl.handle.net/10273/GFKW1000E
6 2021-11-26 http://hdl.handle.net/10273/GFKW1000B
7 2021-11-26 http://hdl.handle.net/10273/GFKW10005

Query 5 returned this (as of today):

Unlike in Query 4, URLs were formatted as clickable HTML links.
Click on any URL to see the full metadata associated with the IGSN.

http://hdl.handle.net/10273/GFFJH00HS == http://igsn.org/GFFJH00HS
http://hdl.handle.net/10273/GFFJH009V == http://igsn.org/GFFJH009V
http://hdl.handle.net/10273/GFFJH0093 == http://igsn.org/GFFJH0093
http://hdl.handle.net/10273/GFFJH00HQ == http://igsn.org/GFFJH00HQ
http://hdl.handle.net/10273/GFFJH009T == http://igsn.org/GFFJH009T


GetRecord: Focused Queries

# Return a complete record for a single dataset, 
# selected from the above results (s. setSpec query)
url2="https://doidb.wdc-terra.org/igsnoaip/oai?verb=GetRecord"
url2params="&metadataPrefix=oai_dc&identifier=oai:registry.igsn.org:10153979"
curl -ksL  "${url2}${url2params}" \
  | xmllint --format - \
  | bat -p -l xml -
  #| xidel -s -e "/" --xml # same as: bat -p -l xml -
1
2
3
4
5
6
7
8

Returns a <GetRecord> XML:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="xsl/oaitohtml.xsl"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2022-05-19T18:19:18Z</responseDate>
  <request verb="GetRecord" metadataPrefix="oai_dc" identifier="oai:registry.igsn.org:10153979">http://doidb.wdc-terra.org/igsnoaip/oai</request>
  <GetRecord>
    <record>
      <header>
        <identifier>oai:registry.igsn.org:10153979</identifier>
        <datestamp>2021-11-26T16:16:07Z</datestamp>
        <setSpec>GFZ</setSpec>
        <setSpec>GFZ.GFZ</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:creator>GFZ Data Services</dc:creator>
          <dc:identifier>http://hdl.handle.net/10273/GFKW10007</dc:identifier>
          <dc:identifier>igsn:10273/GFKW10007</dc:identifier>
        </oai_dc:dc>
      </metadata>
    </record>
  </GetRecord>
</OAI-PMH>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Query a single record, return in various formats:

Querystring-Parameter metadataPrefix determines the Reponse Format:

This seems to be similar to HTTP content negotiation, but instead of sending
Accept: application/json headers, the client must send all info in HTTP Querystring Parameters such as:

  • verb=GetRecord&metadataPrefix=oai_dc&identifer=... ).
  • verb=GetRecord&metadataPrefix=igsn&identifer=... ).
url_mdfmt="https://doidb.wdc-terra.org/igsnoaip/oai?verb=GetRecord&identifier=oai:registry.igsn.org:10153979"
url_mdfmt_oai="&metadataPrefix=oai_dc"
url_mdfmt_igsn="&metadataPrefix=igsn"
curl -ksL "${url_mdfmt}${url_mdfmt_oai}"  | xmllint --format - | wc -l # 23 lines
curl -ksL "${url_mdfmt}${url_mdfmt_igsn}" | xmllint --format - | wc -l # 27 lines
1
2
3
4
5

This returns xml responses which are 23 and 27 lines long, respectively. However the files do contain almost the same information. A useful extra information in the igsn-formatted xml-doc is a <log> element which is not present in oai_dc formatted docs:

<log>
  <logElement event="submitted" timeStamp="2017-05-30T00:00:33.237+02:00"/>
</log>
1
2
3

<log> seems to be the record submission date. Time Zone is not recorded, and if it were recorded would that be the timezone of the sample owner (registrant), or the one of the Allocator server?

TO DO

TODO: Add a real command-line query here that returns more IGSN info, all valid types of Links, and DOI links. The example above were just the most recent datasets at time of writing.

All setSpecs and total number of records:

This obtains the counts of IGSNs registered, by institution.

# Get list of all sets, 
# Then for each set get number of records 
# from @completeListSize attribute
url_sets="https://doidb.wdc-terra.org/igsnoaip/oai?verb=ListSets"
last_20_years=$(date +"%Y-%m-%d" -d "20 years ago")
urlparams="&from=${last_20_years}&metadataPrefix=igsn&set="
url_records="https://doidb.wdc-terra.org/igsnoaip/oai?verb=ListRecords"
# return list of all catalogs/sets
curl -ksL "$url_sets"   \
  | xmllint --format -    \
  | xidel -s -e "//setSpec" 

# return list of all catalogs/sets, and count the number of records in each.
# => remove "head -2" to get all results.
# this can take a while! 
curl -ksL "$url_sets"   \
  | xmllint --format -    \
  | xidel -s -e "//setSpec" \
  | head -2 \
  | xargs -i bash -c "curl -ksL  \"${url_records}${urlparams}{}\" | xidel -s -e \"concat('{}: ', //resumptionToken/@completeListSize)\"" \
  | grep -v REFQUALITY

#(result not shown here)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Same as previous , but with total record count for each Dataset, run twice (in July, 2022; Oct. 2023).

# SetSpec IGSN-Count 2022 IGSN-Count 2023
1 ANDS 3202 3202
2 ANDS.AUSCOPE 3202 3202
3 CNRS 9493 14104
4 CNRS.CNRS 9493 14104
5 CSIRO 33248 33248
6 CSIRO.CSIRO 33248 33248
7 GEOAUS 5267441 5478954
8 GEOAUS.AU 5267441 5478954
9 GFZ 11315 35639
10 GFZ.GFZ 11315 35639
11 IEDA 4737180 4795828
12 IEDA.SESAR 4737180 4795828
13 IFREMER 37480 41929
14 IFREMER.IGSN 37480 41929
15 KIGAM 2884 2954
16 KIGAM.DC 2884 2954
17 LITHODAT 42 210
18 LITHODAT.AG 37 192
19 LITHODAT.LD 5 18
20 MARUM 136587 160912
21 MARUM.HB 136587 160912
22 UKI 412 196461
23 UKI.BOT 1 196000
24 UKI.GEOMAR 411 461

Why are duplicate rows returned?
Probably because (newer) registrants such as UKI and comprise 2 subcatalogs, e.g. UKI.BOT,UKI.GEOMAR. For future expansion?

What a difference a year makes!
id SetSpec IGSN-Count 2022 IGSN-Count 2023 Difference Percentage Change
7 GEOAUS 5267441 5478954 211513 +4%
22 UKI 412 196461 196049 +47584%
11 IEDA 4737180 4795828 58648 +1%
20 MARUM 136587 160912 24325 +17%
9 GFZ 11315 35639 24324 +214%
3 CNRS 9493 14104 4611 +48%
13 IFREMER 37480 41929 4449 +11%
17 LITHODAT 42 210 168 +400%
15 KIGAM 2884 2954 70 +2%
1 ANDS 3202 3202 0 0%
5 CSIRO 33248 33248 0 0%

IGSN Registration over time

Activity of all allocators combined, only until 2022.

Timeseries of Total number of IGSNs allocated per year.
Timeseries of Total number of IGSNs allocated per year.

Is the trend declining or ascending? Not very conclusive.

Closer look at allocators individually

Activity of the largest allocators, GEOAUS and IEDA (SESAR (opens new window)).

timeseries of GEOAUS and SESAR: IGSNs allocated
Timeseries of GEOAUS and SESAR: IGSNs allocated per year

Only two allocators have contributed the largest share, in 3 batches in 2015, 2020, 2021.

Activity of all other allocators, including GFZ:

timeseries of GFZ and other institutions: IGSNs allocated
Timeseries of GFZ and other Allocation Agents: IGSNs allocated, per year.

TO DO

TODO

Describe the relationships or differences

  • technical diffs between the DataCite Metadata Store, and GFZ Registries
  • process diffs between DOI minting and IGSN minting

How to query a publicly available database for IGSNs:

GFZ DOI Catalogs, analyzed at GFZ's WDC-Terra
DOI to IGSNs - From a single DOIS to multiple IGSNs at GFZ's WDC-Terra
mDIS IGSNs for System Administrators
official documentation page (opens new window) (external, quite old)

Notes

(External) Some examples use the powerful xidel (opens new window), a little-known command-line tool for XML querying. It is free software.

Unfinished

Some info about DOIs seems to be hidden in XML attributes. (Is it relevant?)

# attribute nodes, show only their unique names
url_verb3="verb=ListRecords"
urlparams="&from=2002-07-29&metadataPrefix=igsn&set="
curl -ksL "${url}${url_verb3}${urlparams}" |   xidel -s -e 'distinct-values(//@*)/text()'

# get attribute values
curl -ksL "${url}${url_verb3}${urlparams}" \
  | xidel -s -e "distinct-values(//node()[attribute::*])/text()" \
  | perl -ple 's/\s+$//msg' \
  | perl -ple 's/^\s+//msg' \
  | cat -s -n \
  | head -20
1
2
3
4
5
6
7
8
9
10
11
12
  1 2022-07-15T06:45:33Zhttp://doidb.wdc-terra.org/igsnoaip/oai
  2 oai:registry.igsn.org:63002042017-04-05T20:00:23ZGFZGFZ.GFZ
  3 oai:registry.igsn.org:64467112017-05-30T22:30:22ZGFZGFZ.GFZ
  4  10273/GFFB1003G
  6    GFZ Data Services
...
538 10273/GFFB1001J
539
540   10273/GFFB1001He
541
542      GFZ Data Services

(TBC)