From a single DOI to multiple IGSNs

Note

The following does not have to do much with mDIS in particular. It is still useful background knowledge for anyone who intends to work with IGSNs, the International Generic Sample Number.
mDIS supports generation of IGSNs as a built-in feature. This will link samples to their metadata.
mDIS is supposed to use and to contribute to GFZ's DOI infrastructure, by registering DOIs and IGSNs from geosamples.

From DOIs to IGSNs - in general, at GFZ's WDC-Terra

This article explores the DOI infrastructure.

  1. Resolve a https://doi.org/... URL from a GFZ Publication,
  2. Get Text/JSON responses by passing different Accept: text/... HTTP headers,
  3. Grab HTML response,
  4. Filter lots of IGSNs from the JSON,
  5. Turn these URLs into clickable links.

Query a DOI URL with various parameters

We will study this record:

Dataset on Biogeochemical cycling of Mg and Li isotopes in Black Forest, Germany.

https://doi.org/10.5880/GFZ.3.3.2021.005 (opens new window)

This DOI 10.5880/GFZ.3.3.2021.005 returns a dataset that contains 151 unique IGSN http-URLs in its response, of which are 70 unique IGSNs.

DOIs and OAI-PMH Endpoint at GFZ

Resolve a DOI URL to a full https: URL, and count the IGSN URLs in the response dataset.
Use shell commands curl, jq, and perlscript urifind. All of these can be installed from the standard Linux software repositories.

# Dataset on Biogeochemical cycling of Mg and Li isotopes in Black Forest, Germany
uri="https://doi.org/10.5880/GFZ.3.3.2021.005"

# Similar queries:
# (1) return text, citation, ready for cut+paste into draft paper. 2-3 lines
curl -ksL "${uri}" -H "Accept: text/x-bibliography; style=apa"

# (2) return JSON, pretty-printed, with a bit more metadata
curl -ksL "${uri}" -H "Accept: application/vnd.citationstyles.csl+json"

# (2) return JSON, pretty-printed, with a bit more metadata, and list all keys of the JSON object
curl -ksL "${uri}" -H "Accept: application/vnd.citationstyles.csl+json" | jq keys_unsorted
# THESE ARE THE KEYS - top-level IGSN metadata
# [
#   "type",     "id",           "categories",
#   "author",   "contributor",  "issued",
#   "abstract", "DOI",          "publisher",
#   "title",    "URL",          "copyright"
# ]

# (3) return humanreadable HTML+lots of JavaScript, complete page
curl -ksL "${uri}"

# install urifind from https://metacpan.org/pod/URI::Find
# cpanm i URI::Find
# (4) ... and extract all  URLs with urifind
curl -ksL "${uri}" | urifind 

# (4) ... and extract all doi URLs with grep
curl -ksL "${uri}" \
  | urifind \
  | grep -o -P 'https://doi.org/.+' \
  | sort -u 
  
# (5) ... and extract unique IGSN URLs
curl -ksL "${uri}" \
  | urifind -u \
  | grep 'http://igsn.org/' \
  | perl -nl -E 'say qq(<a href="$_"><code>$_</code></a>)' \
  | csvlook -H -l -t

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Result of command (5) above: (70 unique IGSNs)
# Link
1 http://igsn.org/
2 http://igsn.org/GFDUH00LU
3 http://igsn.org/GFDUH00LY
4 http://igsn.org/GFDUH00M1
5 http://igsn.org/GFDUH00M5
6 http://igsn.org/GFDUH00MB
7 http://igsn.org/GFDUH00MD
8 http://igsn.org/GFDUH00MF
9 http://igsn.org/GFDUH00N5
10 http://igsn.org/GFDUH00N8
11 http://igsn.org/GFDUH00N9
12 http://igsn.org/GFDUH00NB
13 http://igsn.org/GFDUH00NC
14 http://igsn.org/GFDUH00ND
15 http://igsn.org/GFDUH00HJ
16 http://igsn.org/GFDUH00HN
17 http://igsn.org/GFDUH00HR
18 http://igsn.org/GFDUH00HT
19 http://igsn.org/GFJUB0065
20 http://igsn.org/GFJUB0066
21 http://igsn.org/GFJUB0067
22 http://igsn.org/GFJUB0068
23 http://igsn.org/GFJUB0069
24 http://igsn.org/GFJUB006A
25 http://igsn.org/GFJUB006B
26 http://igsn.org/GFJUB006C
27 http://igsn.org/GFJUB006D
28 http://igsn.org/GFJUB006E
29 http://igsn.org/GFJUB006F
30 http://igsn.org/GFJUB006G
31 http://igsn.org/GFJUB006H
32 http://igsn.org/GFJUB006J
33 http://igsn.org/GFDIC0006
34 http://igsn.org/GFDIC0007
35 http://igsn.org/GFDIC0005
36 http://igsn.org/GFDIC0004
37 http://igsn.org/GFDIC0003
38 http://igsn.org/GFDIC0002
39 http://igsn.org/GFDIC0001
40 http://igsn.org/GFDUH00NH
41 http://igsn.org/GFJUB006K
42 http://igsn.org/GFJUB006L
43 http://igsn.org/GFJUB006M
44 http://igsn.org/GFJUB006N
45 http://igsn.org/GFJUB006P
46 http://igsn.org/GFJUB006Q
47 http://igsn.org/GFJUB006R
48 http://igsn.org/GFJUB006S
49 http://igsn.org/GFJUB006T
50 http://igsn.org/GFJUB006U
51 http://igsn.org/GFJUB006V
52 http://igsn.org/GFJUB006W
53 http://igsn.org/GFJUB006X
54 http://igsn.org/GFJUB006Y
55 http://igsn.org/GFJUB006Z
56 http://igsn.org/GFDIC0009
57 http://igsn.org/GFDIC0008
58 http://igsn.org/GFDUH00HE
59 http://igsn.org/GFDUH00J1
60 http://igsn.org/GFDUH00J2
61 http://igsn.org/GFDUH00HH
62 http://igsn.org/GFDUH00HK
63 http://igsn.org/GFDUH00HL
64 http://igsn.org/GFDUH00HM
65 http://igsn.org/GFDUH00HP
66 http://igsn.org/GFDUH00JA
67 http://igsn.org/GFDUH00HS
68 http://igsn.org/GFDUH00MC
69 http://igsn.org/GFDUH00ME
70 http://igsn.org/GFDUH00MG

So this document contains 70 IGSNs.

Next steps could be:

Resolve all IGSNs above, check what is returned. Is it consistent?
Check out parent IGSN

Study other IGSN catalogs (OAI-PMH Sets)

  • Find most recent record in each of the sets. Some sets might have been abandoned, though.
  • Some examples above were just the most recent datasets at time of writing (2022).
    Results might have changed since then.

Documentation provided by GFZ:

IGSN in mDIS - general introduction
IGSN Endpoint - send queries to GFZ's /igsnoaip Endpoint for read-only access. Returns IGSN metadata in XML format. WDC Terra DOI-based metadata catalogs- Exploring WBD Terra's Metadata formats
mDIS IGSNs for System Administrators (under construction)