IGSN - the International Geo Sample Number

Also known as the "International Generic Sample Number", as renamed recently (opens new window).

Basics vs. Advanced

The article is about the basics of IGSNs, and how to use them.
For a more advanced article about IGSNs in mDIS, see below, or read this more technical article about their specific implementation in mDIS.
For detailed information, see the About IGSN Page (opens new window) on their GitHub site.
IGSN also has a Wikipedia entry (opens new window).

IGSN Basics

IGSN is a globally unique and persistent identifier for materials and physical samples.

It is a unique identifier for a sample, and is used to link the sample to its metadata. The IGSN identifies a physical sample on the instance level, similar to a Vehicle Identification Number on a car, or any Serial Number.

An IGSN looks like this: <IGSN> = <Namespace><SampleCode>.
The <Namespace> is just a simple short prefix like ICDP or GF; and the <SampleCode> is a unique number for the sample. This is only a recommendation (opens new window).

Yes that is right, an IGSN has no real restrictions on its length, on its separator character between namespace and sample-code characters, or on its format. - Check the official list of namespace prefixes (opens new window).

Therefore, these are all valid IGSNs:

Pro tip: click the URLs and check out the landing pages.

Simple to use

If you have a "registered" IGSN, you can append it to http://igsn.org. Thus, the IGSN will get resolved. After resolution, you will see a landing page with more information about the sample.

Example: If your IGSN is GFDUH00N5, your link will be
http://igsn.org/GFDUH00N5 (opens new window).

And that's about it. The IGSN is an identifier for a landing page. The descriptive schema (opens new window) (see below) mentions what content should be there. But this is also only a "guideline". And there is no specification on how the items should be made searchable, e.g., to find related samples.

Another side benefit of IGSNs is that they can serve as URL-shortener for the landing page. However, this was never a design-goal.

Extra Information

Most of the following section has already been said. Here is more detail about the schema and about "IGSN resolution". (A schema file is the specification; a ruleset in XML syntax).

Registration Schema

The IGSN metadata (IGSN value, registrant, submission date...) should match the IGSN specification. XML instance docs generated from the Registration Schema (opens new window) are tiny. This document has the same function as an envelope of a letter.

Description Schema

Continuing the analogy: Whereas the Registration-Schema XML-Document is "the envelope", this XML document contains "the letter".
The Description Schema is larger than the Registration Schema and specifies how the actual sample information is structured.
In IGSN jargon, a sample is not called a sample but a resource.
The term "physical sample" was abstracted to "resource" (opens new window), perhaps because the term is ambiguous (one person's "sample" be another person's "sampleSeries". Perhaps to maintain similarity with other "Internet-standards" like RDF, the Resource Description Format.
The IGSN Description Schema is a bit more complex, but still has no normative power whatsoever.

Schemas in General

These IGSN metadata specifications are set of rules of what the metadata associated with the IGSN should look like.
These rulesets, or "schemas", were specified by the members of a non-profit organization called the IGSN e.V. (opens new window), and their network (opens new window) of allocation agents (opens new window).

The IGSN specification also describes how the IGSN identifier is used for data publication.

(Does it, really?)

The Bigger Picture - again

Just like mDIS Combined IDs, IGSNs are a generic way to identify samples. However, IGSN identifiers gain their full potential when they are registered and thus resolvable. That means the IGSN is associated with a link to a public page on the internet, showing metadata about the sample.

Example, using igsn.org link: http://igsn.org/GFFJH00HS
Example, using handle.net link: http://hdl.handle.net/10273/GFFJH00HS
See "Recent Developments" below for more examples!

Technically, hdl.handle.net is the resolver, whereas igsn.org is just a relay, an intermediary, that does not do much, except prepending 10273/ to the IGSN character sequence. Host igsn.org then just forwards the modified request to hdl.handle.net.

This "publishing to the world" aspect is value added by IGSNs and the IGSN registry.

IGSN advantages

IGSN's advantages are

  • persistence
  • uniqueness
  • generality
    • applicable to physical samples, pictures, wellholes, etc.
  • quality-assurance features
    • validation against a ruleset/schema during data entry and later
    • prefilling Selection Boxes and text fields with ranges of allowed values

The quality-assurance features are a design goal, but their actual implementation is up to the editors and data curators.

IGSN disadvantages

The IGSN infrastructure also has many disadvantages, e.g.

  • IGSN string is unfamiliar to users, on first encounter

    • a cryptic, unintuitive, unrecognizable format
    • has little built-in semantics
    • unlike any scientist's preferred naming scheme
    • unlike "mDIS Combined-IDs", which are hierarchically structured, and do have built-in semantics
    • Infixes of the identifier subtype (e.g. "E" for Expedition", "X" for sample) on 7th position is hard to find, and read
  • Clunky search-interface

    • cannot easily find "comparable" samples, e.g. samples from the same batch; samples from other registrants, but with same sample characteristics
    • cannot find lists of IGSNs in a single HTTP request. (You have to do a HTTP request for each IGSN).
  • Registration info is too basic. Metadata attributes in the Registration Schema are too few (n=4).

  • Descriptive Schema is in an unclear state

  • procedure how to update (correct) or extend (add more info) to published metadata records: also unclear

  • outdated tech stack (XML schema for data modeling; OAI-PMH for web services)

  • No standardization of contents

    • the target of IGSN resolution is arbitrary (even the use of a landing page is optional)
    • Metadata authors can write whatever they want on the sample's landing page;
    • Metadata on landing pages often is basic, redundant, and not very useful.
  • Complicated registration workflow with at least 4 parties: sample owner to registrant to allocator to datacite.org

  • Lack of privacy. If anyone resolves an IGSN link, most likely four external parties will take notice: the owners of igsn.org and handle.net; the igsn allocator and datacite.org.

  • Long delays between preallocation by mDIS and actual allocation by IGSN registry. This can take years because the process is not automated.

  • IGSN registration costs money. Financial details are not disclosed.

  • Some items not assignable. Some items you would like to assign IGSNs for cannot actually be given IGSNs. For example, High.Level Localtions (Sites, Outcrops, or Ponds...) cannot be given IGSNs. The highest-level unit of allocation is "hole".

  • "Parent IGSN" mechanism is poorly documented

  • Request-response pattern of a typical IGSN registration seems to be poorly documented. Transaction steps, XML Payload details...

  • Registrant activity, IGSN acceptance and yearly growth of registration rates have stalled or seem to be in decline.

  • Notion of "Controlled Vocabulary" is not clear. Is it a simple value-list that someone defined in an ad-hoc manner, or is it a resource on the internet with a lot of "machinery" attached (namespace allocation, mathematical description-logic, peer-review...)?

  • Support for times and dates = ? (IGSN Does not use ISO8601:2004 (opens new window))

In the use-cases related to mDIS, the IGSN has some shortcomings, e.g.

  • It solves the wrong problems.
    • Very few users are concerned by the problem of global uniqueness of the sample identifier. They are more concerned about local uniqueness (which can easily be accomplished by carefully creating a naming shcem, and -on the physical level- with a unique index, for example)
    • Very few users need their sample identifiers to be of uniform length. Often they would prefer a short, human-readable identifier.
  • The landing page is usually not under control of the sample owner (but the catalog owner).
  • Registration Metadata and Description Metadata are not linked.
  • mDIS would have to maintain 4 identifiers in parallel: The ID, the combined-id, the IGSN, the custom number preferred by sample-owner.
  • There are no repositories that use IGSN as primary identifier. Every repository uses its own identifier, and IGSN is only used as alternate identifier, with little quality-control. This is a problem for mDIS, because it means that mDIS cannot use IGSN as primary identifier, because it has to maintain many identifiers in parallel.

IGSN in mDIS

Viewing IGSNs

mDIS Users can search for IGSNs, and pre-allocate IGSNs, in the following objects:

  • Core
  • ProjectHole
  • CurationSectionSplit
  • CurationSample
  • CurationCuttings

IGSN appear in the mDIS user interface as shown in the screenshots below.

TODO: add screenshot -even it is fake - form+table of core objects, with "IGSN" column (and maybe some other columns)

The (not yet existing) screenshot above shows the IGSN in the Core object.

In the table above, a regular mDIS user does not have write permissions for the IGSN column. In the form, the IGSN-field is read-only.

An end user can only see the IGSN in the respective object, and -if the sample is registered- click on the link to the IGSN page. That IGSN page is a public page with metadata.

That is about all a regular user can do with IGSNs in mDIS.

IGSN metadata export

HOWEVER An end user also can export IGSNs data to XML reports.

Explain the xml-export feature here? Or in the admin-docs?

IGSN in mDIS - advanced

mDIS has built-in features for managing IGSNs:

The main tasks are Pre-Allocation, Export and Registration.

  • Pre-Allocate, or "automatically calculate and pre-assign" IGSNs for new samples inside mDIS.
    • mDIS does this with built-in feature. The code is called when a new record is created. It works similar to the Combined-ID-generation. The mDIS user does not have to do anything.
  • Export Samples, Cores, ... IGSN data into XML textfiles
    • in a file-format suitable for data interchange. These intermediate files are called IGSN reports.
      • They are "XML Instance Documents", and should be saved to disk with a filename extension .xml,
    • the mDIS exporter tries its best at checking that these XML documents conform to the XML Schema (.xsd) files published by the IGSN consortium, e.g. for Samples (opens new window).
      • the accuracy of the check is not guaranteed.
      • maybe they need some manual editing before they can be sent to the IGSN allocator.
    • These documents are the basis for manual IGSN registration
    • Manual registration is an alternative to mDIS's built-in registration feature (see below).
      • with a different app than mDIS
      • with command-line tools
  • Register IGSNs with allocation agents.
    • send XML data to an IGSN allocation agent
    • get some "receipt" from an allocation agent (?)
    • wait until IGSNs are registered by allocator, and become resolvable.
    • THIS IS NOT IMPLEMENTED YET.

Typically this is done by mDIS administrators. For details, see the IGSNs for Sysadmins page in the mDIS documentation. Or read on.

IGSNs for mDIS administrators

So how are [the DOIs for] IGSNs registered ("minted")? How does mDIS know which DOI-IGSN to assign to a sample?

This is done by mDIS administrators. An mDIS administrator typically starts to register IGSNs after the data entry phase is complete and when the project data have a certain level of quality. Then for each IGSN registered a DOI is allocated.

Then an mDIS administrator mints IGSNs by clicking on the "Mint IGSN" button in the mDIS user interface.

THIS WOULD BE A FAKE SCREENSHOT - TODO: add real screenshot

For an end user the "Mint IGSN" button is not visible.

For details, see the IGSNs for Sysadmins page in the mDIS documentation. Or read on.

Recent Developments

  • IGSN Syntax: There exist only guidelines (opens new window), no formal specification. At least not according to the "Descriptive Schema" (opens new window). (The Descriptive Schema is work in progress and in an unclear state.)
  • The abbreviation IGSN stands for International Geo Sample Number, but it is also known as the International Generic Sample Number.
    "Geo" was renamed to "Generic" in the late 2010s, but the change needs some time to "trickle through" the installed user base and the communities using IGSNs.
  • It is not clear to me which identifiers can be used for IGSNs: Yes, "Classic IGSNs" will still be resolvable, but IGSNs shall also receive a DOI. Because these are easier to resolve. ??? TODO: recent (2021/2022) development, clarify this.
  • One design goal was persistence and uniqueness, but in practice some IGSNs are neither persistent (have vanished from the internet) nor unique: duplicate IGSNs exist (why due to bad updates?).
  • There are also Parent IGSNs. What these are, you ask? It's complicated.
  • Datacite:
    • Existing IGSN ID handles will be registered IGSN ID DOIs and the handles aliased to the DOIs to ensure that these continue to resolve.
      (TBC)

    • IGSN Resolution Example: coming in late 2022? doi:IGSN-123456789 (opens new window) # WRONG? see (opens new window)...

    • IGSN Resolution Example : GFZ Searchpage for GFZ IGSNs (opens new window).
      General Example: https://dataservices.gfz-potsdam.de/igsn/esg/index.php?igsn=... .

    • You must replace the dots ... with a real IGSN, in the ?igsn=... HTTP Querystring Parameter.
      Note that this search page is a custom solution, not part of any standard. You can only get HTML from this page (not XML, JSON or other formats). And you can browse only selected catalogs at GFZ dataservices page.