From a single DOI to multiple IGSNs
Note
The following does not have to do much with mDIS in particular. It is still useful background knowledge for anyone who intends to work with IGSNs, the International Generic Sample Number.
mDIS supports the generation of IGSNs as a built-in feature. This will link samples to their metadata.
mDIS is supposed to use and contribute to GFZ's DOI infrastructure by registering DOIs and IGSNs from geosamples.
From DOIs to IGSNs - in general, at GFZ's WDC-Terra
This article explores the DOI infrastructure.
- Resolve a
https://doi.org/...
URL from a GFZ Publication, - Get Text/JSON responses by passing different
Accept: text/...
HTTP headers, - Grab HTML response,
- Filter lots of IGSNs from the JSON,
- Turn these URLs into clickable links.
Query a DOI URL with various parameters
We will study this record:
Dataset on Biogeochemical cycling of Mg and Li isotopes in the Black Forest, Germany.
https://doi.org/10.5880/GFZ.3.3.2021.005
This DOI 10.5880/GFZ.3.3.2021.005
returns a dataset that contains 151 unique IGSN http-URLs in its response, of which there are 70 unique IGSNs.
DOIs and OAI-PMH Endpoint at GFZ
Resolve a DOI URL to a full https: URL, and count the IGSN URLs in the response dataset.
Use shell commands curl
, jq
, and Perl script urifind
. All of these can be installed from the standard Linux software repositories.
# Dataset on Biogeochemical cycling of Mg and Li isotopes in the Black Forest, Germany
uri="https://doi.org/10.5880/GFZ.3.3.2021.005"
# Similar queries:
# (1) return text, citation, ready for cut+paste into draft paper. 2-3 lines
curl -ksL "${uri}" -H "Accept: text/x-bibliography; style=apa"
# (2) return JSON, pretty-printed, with a bit more metadata
curl -ksL "${uri}" -H "Accept: application/vnd.citationstyles.csl+json"
# (2) return JSON, pretty-printed, with a bit more metadata, and list all keys of the JSON object
curl -ksL "${uri}" -H "Accept: application/vnd.citationstyles.csl+json" | jq keys_unsorted
# THESE ARE THE KEYS - top-level IGSN metadata
# [
# "type", "id", "categories",
# "author", "contributor", "issued",
# "abstract", "DOI", "publisher",
# "title", "URL", "copyright"
# ]
# (3) return human-readable HTML+lots of JavaScript, complete page
curl -ksL "${uri}"
# install urifind from https://metacpan.org/pod/URI::Find
# cpanm i URI::Find
# (4) ... and extract all URLs with urifind
curl -ksL "${uri}" | urifind
# (4) ... and extract all DOI URLs with grep
curl -ksL "${uri}" \
| urifind \
| grep -o -P 'https://doi.org/.+' \
| sort -u
# (5) ... and extract unique IGSN URLs
curl -ksL "${uri}" \
| urifind -u \
| grep 'http://igsn.org/' \
| perl -nl -E 'say qq(<a href="$_"><code>$_</code></a>)' \
| csvlook -H -l -t
Result of command (5) above: (70 unique IGSNs)
So this document contains 70 IGSNs.
Next steps could be:
Resolve all IGSNs above, check what is returned. Is it consistent?
Check out parent IGSN
Study other IGSN catalogs (OAI-PMH Sets)
- Find the most recent record in each of the sets. Some sets might have been abandoned, though.
- Some examples above were just the most recent datasets at the time of writing (2022).
Results might have changed since then.
Links
Documentation provided by GFZ:
How to query a publicly available "OAI-PMH" database for IGSNs: OAI-PMH Examples
GFZ dataservices
GFZ Poster: "Reusing the DataCite Metadata Store as DOI registration proxy and IGSN registry" by J. Klump and D. Ulbricht. Poster for AGU Fall Meeting 2012. PDF
Related pages
IGSN in mDIS - general introduction
IGSN Endpoint - send queries to GFZ's /igsnoaip
Endpoint for read-only access. Returns IGSN metadata in XML format.
WDC Terra DOI-based metadata catalogs - Exploring WBD Terra's Metadata formats
mDIS IGSNs for System Administrators (under construction)