IGSN Catalogs at WDC-Terra /igsnoaip/oai Endpoint
Analysis of all IGSN catalogs at WDC-Terra, hosted by GFZ Potsdam.
Note
The following does not have to do much with mDIS in particular. It is still useful background knowledge for anyone who intends to work with IGSNs, the International Generic Sample Number.
mDIS supports generation (or rather pre-allocation) of IGSNs as a built-in feature, but mDIS does not centrally register IGSNs on the public internet.
How to query WDC-Terra for IGSNs. Parse XML responses.
This article shows how to turn IGSN XML into useful information.
Use command-line tools to query the WDC-Terra IGSN endpoint.
The URL is always: https://doidb.wdc-terra.org/igsnoaip/oai?
.
- Send HTTP GET requests to WDC-Terra, GFZ Potsdam, Germany with
curl
. - It is always above URL, but with different query string parameters:
Identify
,ListMetadataFormats
,ListSets
,GetRecord
,ListIdentifiers
,ListRecords
.- and other common parameters, e.g. for filtering and pagination.
- Parse XML with
xmllint
andxidel
, and format withcsvlook
,jq
, andperl
.
All queries below are made against the WDC-Terra IGSN endpoint:
https://doidb.wdc-terra.org/igsnoaip/oai
The following code snippets are Bash shellscripts. They demonstrate a drill-down into WDC-Terra, in order to learn more about IGSNs.
Exploratory queries
There exists a Search-GUI for all IGSNs stored at GFZ/WDC Terra. The GFZ Dataservices page is https://dataservices.gfz-potsdam.de/igsn-new/ (opens new window).
This page represents a graphical user interface (GUI) to the data services. It is designed for interactive use by non-programmers, and has very limited capabilities.
For developers, however, API access to the sample metadata is also possible.
The API is not RESTful. Rather it is a "Metadata-Harvesting, OAI-PMH" interface. OAI-PMH is also quite limited. That standard is described in OAI-PMH (opens new window). The specification was developed in 2002-2004 by the Open Archives Initiative (OAI).
OAI-PMH offers only 6 verbs (Identify
,ListMetadataFormats
,ListSets
,GetRecord
,ListIdentifiers
,ListRecords
), which must be used as query string parameters: verb=ListMetadataFormats
for example. See "verbType" in its xml schema (opens new window).
OAI-PMH predates REST. For simple read-only access we can still use this old limited standard.
Examples
Metadata Output Formats
# 1) find all metadata Formats in WDC-Terra -IGSN-Endpoint
url="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb="verb=ListMetadataFormats"
curl -ksL "${url}${url_verb}" \
| xmllint --format - \
| xidel -s -e "[//metadataPrefix, //metadataFormat]"
2
3
4
5
6
will return basically the following (# comments
illustrate key characteristics of the responses in that format)
igsn # returns XML according to IGSN registration schema oai_dc # many elements in "dc:" namespace (Dublin Core)
So there are 2 Metadata formats that this endpoint is able to return: igsn
, and oai_dc
.
Clients can gather detailed records about repository datasets in each of these formats, or "XML dialects".
See below for details.
Wait, that was all? No, these were wrapped in some XML. Let's take a closer look at the response.
What were the element names in the returned XML documents?
# 1) find all metadata Formats in WDC-Terra -IGSN-Endpoint
url="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb="verb=ListMetadataFormats"
curl -ksL "${url}${url_verb}" | xidel -s -e "distinct-values(//*/name())"
2
3
4
This command pipeline will return the following XML element names (< ... >
Brackets removed for readability):
OAI-PMH responseDate request ListMetadataFormats metadataFormat metadataPrefix # <==== this was the interesting element here schema metadataNamespace
How do metadataformats igsn
and oai_dc
differ?
To answer this question, we need a single identifier from any catalog (see below), and then use that to get at least one record from the catalog.
# set some variables, will be used in the following examples / codeblocks
url_endpoint="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb="verb=ListIdentifiers"
url_params="&set=GFZ&from=2022-01-01"
url_mdfmt_oai_dc="&metadataPrefix=oai_dc"
url_mdfmt_igsn="&metadataPrefix=igsn"
2
3
4
5
6
7
8
9
Lets'use the GFZ catalog. (For a list and how to get the list, see below).
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_oai_dc}" \
| xmllint --format - \
| xidel -s -e "(//header)[1]/identifier"
2
3
4
will return oai:registry.igsn.org:10432760
Two metadata formats are supported.
Full XML instance documents are not shown here, for brevity.
Rather we will get high-level info about this record, each in the distinct metadata formats igsn
and oai_dc
.
Just count the number of lines in the responses:
url_verb="verb=GetRecord"
url_params="&identifier=oai:registry.igsn.org:10432760"
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_oai_dc}" \
| xmllint --format - | wc -l # 23
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_igsn}" \
| xmllint --format - | wc -l # 27
2
3
4
5
6
These are the element names in the returned XML documents:
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_igsn}" \
| xmllint --format - \
| xidel -s -e "distinct-values(//*/name())"
2
3
Elements
The elements in the two different metadata formats oai_dc
,igsn
are:
# | oai_dc | igsn |
---|---|---|
1 | OAI-PMH | OAI-PMH |
2 | responseDate | responseDate |
3 | request | request |
4 | GetRecord | GetRecord |
5 | record | record |
6 | header | header |
7 | identifier | identifier |
8 | datestamp | datestamp |
9 | setSpec | setSpec |
10 | metadata | metadata |
11 | oai_dc:dc | sample |
12 | dc:creator | sampleNumber |
13 | dc:identifier | registrant |
14 | registrantName | |
15 | log | |
16 | logElement |
oai_dc Response
There are two similarly named identifier
elements in the oai_dc
schema: identifier
and dc:identifier
.
What do their values look like?
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_oai_dc}" \
| xmllint --format - \
| xidel -s -e "[//header/identifier, //dc:identifier]"
2
3
Result:
["oai:registry.igsn.org:10432760",
["http://hdl.handle.net/10273/GFOTN0016", "igsn:10273/GFOTN0016"]]
The dc:identifier
element is actually an array with 2 equivalent IGSN representations: as URL and IGSN handle.
(Check out the actual link target in the response,
http://hdl.handle.net/10273/GFOTN0016
(opens new window)): a landing page.
Ultimately, the IGSN gets resolved to
https://dataservices.gfz-potsdam.de/igsn/esg/index.php?igsn=GFOTN0016
(opens new window).
That is not an OAI-PMH search interface but a simple HTML landing page.
igsn Schema Response
2 Identifiers
There are two identifier elements in the oai_dc schema: identifier
and sampleNumber
.
What do their values look like?
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_igsn}" \
| xmllint --format - \
| xidel -s -e "{ 'identifier': //header/identifier, 'sampleNumber': //metadata//sampleNumber }"
2
3
4
Result:
{ "identifier": "oai:registry.igsn.org:10432760", "sampleNumber": "10273/GFOTN0016" }
Note that
- For
identifier
, The10432760
seems to be an autoincremented value; anid
column internal to the catalog. - There is no http or https link given for the
sampleNumber
property of the processed response. You have to create that link yourself, from the IGSN fragment.
Detailed exploratory queries
All Sets
List all sets / catalogs
Find "Sets" at the IGSN-Endpoint of WDC-Terra. This returns the high-level structure of datasets (or catalogs) in the repository.
url="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb2="verb=ListSets"
# show Element names in the resultset
curl -ksL "${url}${url_verb2}" | xidel -s -e "distinct-values(//*/name())"
2
3
4
XML Element Names:
OAI-PMH responseDate # request
ListSets # returns a list of sets set # container element, not a "setter" command setSpec # short name of the set - why not in uppercase? setName # long name of the set - why not uppercase? resumptionToken
Format ListSets
resultset as table:
curl -ksL "${url}${url_verb2}" \
| xmllint --format - \
| xidel -s -e "[//setSpec, //setName]" \
| jq -r '. | transpose| .[] | @tsv' \
| grep -v "reference quality" \
| csvlook -H -l
2
3
4
5
6
These are the "Sets" in the IGSN Endpoint in WDC-Terra. The table below shows the text-content of the <setName>
and the <setSpec>
elements, parsed from the XML output of the query.
# | setSpec | setName |
---|---|---|
1 | REFQUALITY | Reference quality citations only. |
2 | ANDS | Australian National Data Service |
3 | ANDS.AUSCOPE | AuScope |
4 | CNRS | Centre national de la recherche scientifique |
5 | CNRS.CNRS | Centre national de la recherche scientifique |
6 | CSIRO | CSIRO |
7 | CSIRO.CSIRO | CSIRO |
8 | GEOAUS | Geoscience Australia |
9 | GEOAUS.AU | Geoscience Australia |
10 | GFZ | GFZ Allocator |
11 | GFZ.GFZ | Deutsches GeoForschungsZentrum GFZ |
12 | IEDA | Integrated Earth Data Applications |
13 | IEDA.SESAR | System for Earth Sample Registration |
14 | IFREMER | Institut français de recherche pour l'exploitation de la mer |
15 | IFREMER.IGSN | Institut français de recherche pour l'exploitation de la mer |
16 | KIGAM | Korea Institute of Geoscience and Mineral Resources |
17 | KIGAM.DC | Korea Institute of Geoscience and Mineral Resources |
18 | LITHODAT | Lithodat Pty Ltd. |
19 | LITHODAT.AG | AuScope Geochemistry Network |
20 | LITHODAT.LD | LITHODAT PTY LTD |
21 | MARUM | MARUM Center for Marine Environmental Studies at University of Bremen |
22 | MARUM.HB | MARUM Center for Marine Environmental Sciences |
23 | UKI | Universität Kiel |
24 | UKI.BOT | Botanisches Institut |
25 | UKI.GEOMAR | GEOMAR |
26 | UKI.RZ | Universität Kiel - Rechenzentrum |
A single setSpec
Taking a closer look at one of these "sets", GFZ.GFZ
, which is the setSpec for geosamples submitted by GFZ employees:
# GFZ.GFZ is the setSpec for samples submitted by GFZ employees
url_setspec="https://doidb.wdc-terra.org/igsnoaip/oai?verb=ListIdentifiers"
url_params="&metadataPrefix=igsn&&set=GFZ.GFZ&from=2022-02-01"
# expected Element names in the resultset:
# OAI-PMH responseDate request ListIdentifiers header identifier datestamp setSpec resumptionToken
curl -ksL "${url_setspec}${url_params}" \
| xmllint --format - \
| xidel -s -e "//header" \
| grep -v GFZ \
| cat -s \
| perl -00 -nE "my @l = split(/\n/sg, \$_); \$l[1] =~ s/:+\d\dZ// ;say(qq(\$l[1] \$l[0]))" \
| perl -pE "s/ +/ /g" \
| csvcut -d" " -c 2,3 \
| csvlook -H -l
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
has returned a (slightly edited) list of registration dates and oai-Identifiers for datasets.
First (or last?) 11 rows are:
# | Registration Date | Searchable Identifier |
---|---|---|
1 | 2022-02-17 11:29:00 | oai:registry.igsn.org:10456328 |
2 | 2022-02-22 15:09:00 | oai:registry.igsn.org:10456656 |
3 | 2022-02-22 15:10:00 | oai:registry.igsn.org:10456657 |
4 | 2022-02-22 15:10:00 | oai:registry.igsn.org:10456658 |
5 | 2022-02-22 15:10:00 | oai:registry.igsn.org:10456659 |
6 | 2022-02-22 15:10:00 | oai:registry.igsn.org:10456660 |
7 | 2022-02-22 15:10:00 | oai:registry.igsn.org:10456661 |
8 | 2022-02-22 15:10:00 | oai:registry.igsn.org:10456662 |
9 | 2022-02-22 15:10:00 | oai:registry.igsn.org:10456663 |
10 | 2022-02-22 15:10:00 | oai:registry.igsn.org:10456664 |
11 | 2022-02-22 15:10:00 | oai:registry.igsn.org:10456665 |
TODO: What are these identifiers exactly?
ListRecords Verb
Get recent Datasets in WDC-Terra with ListRecords
verb:
# List of all registered datasets on WDC-Terra, IGSN Endpoint,
# GFZ.GFZ Set since last 6 months
last_6_months=$(date +"%Y-%m-%d" -d "6 months ago")
url="https://doidb.wdc-terra.org/igsnoaip/oai?"
url_verb3="verb=ListRecords"
urlparams="&from=${last_6_months}&metadataPrefix=igsn&set=GFZ.GFZ"
# (1) return raw XML, 1 single line
# (also check for <error></error>)
curl -ksL "${url}${url_verb3}${urlparams}"
response=$(curl -ksL "${url}${url_verb3}${urlparams}")
if [[ $response == *"<error code=\"noRecordsMatch\">"* ]]; then
last_known="2023-03-28"
urlparams="&from=${last_known}&metadataPrefix=igsn&set=GFZ.GFZ"
fi
# 4 similar queries (2)-(5):
# (2) return raw XML, pretty-printed, colorized
curl -ksL "${url}${url_verb3}${urlparams}" | xmllint --format - | bat -l xml -p -
# (3) return XML, extract+show text only (good for human reading)
curl -ksL "${url}${url_verb3}${urlparams}" | xmllint --format - \
| xidel -s -e "//record[1]" \
| perl -pE "s/^\s+$//s"
# (4) transform to json, csvlook with --no-header and line-numbering
# Nr, Date, IGSN
curl -ksL "${url}${url_verb3}${urlparams}" \
| xidel -s -e "[//datestamp, //sampleNumber]" \
| jq -r '. | transpose| .[] | @csv' \
| csvlook -H -l \
| perl -plE "s/ \d\d:\d\d:\d\d\+\d+\d+:\d\d//"
# (5) format output as html fragment (1 link per line)
curl -ksL "${url}${url_verb3}${urlparams}" \
| xidel -s -e "[//dc:identifier[0], //sampleNumber]" \
| jq -r '. | transpose| .[] | @tsv' \
| perl -nl -E '@F = split /\t/;$i=$F[1];$hdl=qq(http://hdl.handle.net/$F[1]); $i=~s#.+/##;$i=qq(http://igsn.org/$i);say qq(<a href="$hdl"><code>$hdl</code></a> == <a href="$i"><code>$i</code></a> <br/>)' \
| tail -5
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Query (1) and (2) are unformatted XML, and formatted/Pretty-printed XML output, respectively.
These are very lengthy documents, and not shown here. The following queries (3), (4), and (5) return about the same info, in more readable form:
Query (3) returned this today, in 2023:
Textcontent of a single item from ListRecords
output.
Output:
oai:registry.igsn.org:10834375 2023-03-28T20:33:57Z GFZ GFZ.GFZ 10273/GFBNO7002EHG0001 GFZ Data Services
Check it out:
So the text-content is not very long, generally.
The date is the publishing date, not the submission date. Submission timestamp is hidden as an attribute.
To get those, you would need this command, for example:
# We are using a different verb here and identifier:
# /igsnoaip/oai?verb=GetRecord&identifier=oai:registry.igsn.org:10432760&metadataPrefix=igsn
# from 2022
curl -ksL "${url_endpoint}${url_verb}${url_params}${url_mdfmt_igsn}" \
| xmllint --format - \
| xidel -s -e "[//header/identifier, //metadata//sampleNumber,
//metadata//logElement/@timeStamp, //header/dateStamp]"
2
3
4
5
6
7
Result:
["oai:registry.igsn.org:10432760", "10273/GFOTN0016", "2017-05-30T00:00:33.237+02:00", "2022-01-20T12:43:59Z"]
Note that there were 5 years between the submission date and the publishing date. (Why? was the record updated/extended 5 years after creation, and this is the update-date)
Query 4 returned this:
(shortened and edited for prettiness)
# | Date | Resolvable URL |
---|---|---|
1 | 2021-11-26 | http://hdl.handle.net/10273/GFKW10007 |
2 | 2021-11-26 | http://hdl.handle.net/10273/GFKW1000F |
3 | 2021-11-26 | http://hdl.handle.net/10273/GFKW1000C |
4 | 2021-11-26 | http://hdl.handle.net/10273/GFKW10006 |
5 | 2021-11-26 | http://hdl.handle.net/10273/GFKW1000E |
6 | 2021-11-26 | http://hdl.handle.net/10273/GFKW1000B |
7 | 2021-11-26 | http://hdl.handle.net/10273/GFKW10005 |
Query 5 returned this (as of today):
Unlike in Query 4, URLs were formatted as clickable HTML links.
Click on any URL to see the full metadata associated with the IGSN.
http://hdl.handle.net/10273/GFFJH00HS
== http://igsn.org/GFFJH00HS
http://hdl.handle.net/10273/GFFJH009V
== http://igsn.org/GFFJH009V
http://hdl.handle.net/10273/GFFJH0093
== http://igsn.org/GFFJH0093
http://hdl.handle.net/10273/GFFJH00HQ
== http://igsn.org/GFFJH00HQ
http://hdl.handle.net/10273/GFFJH009T
== http://igsn.org/GFFJH009T
GetRecord: Focused Queries
# Return a complete record for a single dataset,
# selected from the above results (s. setSpec query)
url2="https://doidb.wdc-terra.org/igsnoaip/oai?verb=GetRecord"
url2params="&metadataPrefix=oai_dc&identifier=oai:registry.igsn.org:10153979"
curl -ksL "${url2}${url2params}" \
| xmllint --format - \
| bat -p -l xml -
#| xidel -s -e "/" --xml # same as: bat -p -l xml -
2
3
4
5
6
7
8
Returns a <GetRecord>
XML:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="xsl/oaitohtml.xsl"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2022-05-19T18:19:18Z</responseDate>
<request verb="GetRecord" metadataPrefix="oai_dc" identifier="oai:registry.igsn.org:10153979">http://doidb.wdc-terra.org/igsnoaip/oai</request>
<GetRecord>
<record>
<header>
<identifier>oai:registry.igsn.org:10153979</identifier>
<datestamp>2021-11-26T16:16:07Z</datestamp>
<setSpec>GFZ</setSpec>
<setSpec>GFZ.GFZ</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:creator>GFZ Data Services</dc:creator>
<dc:identifier>http://hdl.handle.net/10273/GFKW10007</dc:identifier>
<dc:identifier>igsn:10273/GFKW10007</dc:identifier>
</oai_dc:dc>
</metadata>
</record>
</GetRecord>
</OAI-PMH>
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Query a single record, return in various formats:
Querystring-Parameter metadataPrefix
determines the Reponse Format:
This seems to be similar to HTTP content negotiation, but instead of sending
Accept: application/json
headers, the client must send all info in HTTP Querystring Parameters such as:
verb=GetRecord&metadataPrefix=oai_dc&identifer=...
).verb=GetRecord&metadataPrefix=igsn&identifer=...
).
url_mdfmt="https://doidb.wdc-terra.org/igsnoaip/oai?verb=GetRecord&identifier=oai:registry.igsn.org:10153979"
url_mdfmt_oai="&metadataPrefix=oai_dc"
url_mdfmt_igsn="&metadataPrefix=igsn"
curl -ksL "${url_mdfmt}${url_mdfmt_oai}" | xmllint --format - | wc -l # 23 lines
curl -ksL "${url_mdfmt}${url_mdfmt_igsn}" | xmllint --format - | wc -l # 27 lines
2
3
4
5
This returns xml responses which are 23 and 27 lines long, respectively. However the files do contain almost the same information.
A useful extra information in the igsn
-formatted xml-doc is a <log>
element which is not present in oai_dc
formatted docs:
<log>
<logElement event="submitted" timeStamp="2017-05-30T00:00:33.237+02:00"/>
</log>
2
3
<log>
seems to be the record submission date. Time Zone is not recorded, and if it were recorded would that be the timezone of the sample owner (registrant), or the one of the Allocator server?
TO DO
TODO: Add a real command-line query here that returns more IGSN info, all valid types of Links, and DOI links. The example above were just the most recent datasets at time of writing.
All setSpecs and total number of records:
This obtains the counts of IGSNs registered, by institution.
# Get list of all sets,
# Then for each set get number of records
# from @completeListSize attribute
url_sets="https://doidb.wdc-terra.org/igsnoaip/oai?verb=ListSets"
last_20_years=$(date +"%Y-%m-%d" -d "20 years ago")
urlparams="&from=${last_20_years}&metadataPrefix=igsn&set="
url_records="https://doidb.wdc-terra.org/igsnoaip/oai?verb=ListRecords"
# return list of all catalogs/sets
curl -ksL "$url_sets" \
| xmllint --format - \
| xidel -s -e "//setSpec"
# return list of all catalogs/sets, and count the number of records in each.
# => remove "head -2" to get all results.
# this can take a while!
curl -ksL "$url_sets" \
| xmllint --format - \
| xidel -s -e "//setSpec" \
| head -2 \
| xargs -i bash -c "curl -ksL \"${url_records}${urlparams}{}\" | xidel -s -e \"concat('{}: ', //resumptionToken/@completeListSize)\"" \
| grep -v REFQUALITY
#(result not shown here)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Same as previous , but with total record count for each Dataset, run twice (in July, 2022; Oct. 2023).
# | SetSpec | IGSN-Count 2022 | IGSN-Count 2023 |
---|---|---|---|
1 | ANDS | 3202 | 3202 |
2 | ANDS.AUSCOPE | 3202 | 3202 |
3 | CNRS | 9493 | 14104 |
4 | CNRS.CNRS | 9493 | 14104 |
5 | CSIRO | 33248 | 33248 |
6 | CSIRO.CSIRO | 33248 | 33248 |
7 | GEOAUS | 5267441 | 5478954 |
8 | GEOAUS.AU | 5267441 | 5478954 |
9 | GFZ | 11315 | 35639 |
10 | GFZ.GFZ | 11315 | 35639 |
11 | IEDA | 4737180 | 4795828 |
12 | IEDA.SESAR | 4737180 | 4795828 |
13 | IFREMER | 37480 | 41929 |
14 | IFREMER.IGSN | 37480 | 41929 |
15 | KIGAM | 2884 | 2954 |
16 | KIGAM.DC | 2884 | 2954 |
17 | LITHODAT | 42 | 210 |
18 | LITHODAT.AG | 37 | 192 |
19 | LITHODAT.LD | 5 | 18 |
20 | MARUM | 136587 | 160912 |
21 | MARUM.HB | 136587 | 160912 |
22 | UKI | 412 | 196461 |
23 | UKI.BOT | 1 | 196000 |
24 | UKI.GEOMAR | 411 | 461 |
Why are duplicate rows returned?
Probably because (newer) registrants such asUKI
and comprise 2 subcatalogs, e.g.UKI.BOT
,UKI.GEOMAR
. For future expansion?
What a difference a year makes!
id | SetSpec | IGSN-Count 2022 | IGSN-Count 2023 | Difference | Percentage Change |
---|---|---|---|---|---|
7 | GEOAUS | 5267441 | 5478954 | 211513 | +4% |
22 | UKI | 412 | 196461 | 196049 | +47584% |
11 | IEDA | 4737180 | 4795828 | 58648 | +1% |
20 | MARUM | 136587 | 160912 | 24325 | +17% |
9 | GFZ | 11315 | 35639 | 24324 | +214% |
3 | CNRS | 9493 | 14104 | 4611 | +48% |
13 | IFREMER | 37480 | 41929 | 4449 | +11% |
17 | LITHODAT | 42 | 210 | 168 | +400% |
15 | KIGAM | 2884 | 2954 | 70 | +2% |
1 | ANDS | 3202 | 3202 | 0 | 0% |
5 | CSIRO | 33248 | 33248 | 0 | 0% |
IGSN Registration over time
Activity of all allocators combined, only until 2022.
Is the trend declining or ascending? Not very conclusive.
Closer look at allocators individually
Activity of the largest allocators, GEOAUS and IEDA (SESAR (opens new window)).
Only two allocators have contributed the largest share, in 3 batches in 2015, 2020, 2021.
Activity of all other allocators, including GFZ:
TO DO
TODO
Describe the relationships or differences
- technical diffs between the DataCite Metadata Store, and GFZ Registries
- process diffs between DOI minting and IGSN minting
Links
How to query a publicly available database for IGSNs:
- OAI-PMH Examples (opens new window) for IGSN Endpoint
- GFZ dataservices IGSN old (opens new window) and IGSN new (opens new window)
- Poster: "Reusing the DataCite Metadata Store as DOI registration proxy and IGSN registry" (opens new window) by J. Klump and D. Ulbricht. Poster for AGU Fall Meeting 2012.
- Querying the GFZ "DOI Endpoint" (for DOIS, not for IGSNs): Similar Page
- Datacite DOI Endpoint at GFZ's WDC-Terra (under constr.)
Related pages
GFZ DOI Catalogs, analyzed at GFZ's WDC-Terra
DOI to IGSNs - From a single DOIS to multiple IGSNs at GFZ's WDC-Terra
mDIS IGSNs for System Administrators
official documentation page (opens new window) (external, quite old)
Notes
(External) Some examples use the powerful xidel (opens new window), a little-known command-line tool for XML querying. It is free software.
Unfinished
Some info about DOIs seems to be hidden in XML attributes. (Is it relevant?)
# attribute nodes, show only their unique names
url_verb3="verb=ListRecords"
urlparams="&from=2002-07-29&metadataPrefix=igsn&set="
curl -ksL "${url}${url_verb3}${urlparams}" | xidel -s -e 'distinct-values(//@*)/text()'
# get attribute values
curl -ksL "${url}${url_verb3}${urlparams}" \
| xidel -s -e "distinct-values(//node()[attribute::*])/text()" \
| perl -ple 's/\s+$//msg' \
| perl -ple 's/^\s+//msg' \
| cat -s -n \
| head -20
2
3
4
5
6
7
8
9
10
11
12
1 2022-07-15T06:45:33Zhttp://doidb.wdc-terra.org/igsnoaip/oai 2 oai:registry.igsn.org:63002042017-04-05T20:00:23ZGFZGFZ.GFZ 3 oai:registry.igsn.org:64467112017-05-30T22:30:22ZGFZGFZ.GFZ 4 10273/GFFB1003G 6 GFZ Data Services ... 538 10273/GFFB1001J 539 540 10273/GFFB1001He 541 542 GFZ Data Services
(TBC)