Skip to main content

Review of the mDIS default database schema


Review of the mDIS default database schema

Contribution Jan. 2021
by H. Lorenz, Univ. Uppsala

Draft, ICDP-Internal

For now, this is an ICDP-internal document, for discussion and review. It is somewhat internal. It should not be used for reference by the general public.

Slightly edited and formatted for the Web by Knut Behrends, GFZ.
Edits entail highlighting, setting headings, and some Markdown-specific tweaks.

Review of mDIS DB-Schema

Comments on the data model of the GitLab master branch, 2021-01-25.

Table project_expedition

An expedition can have several moratoria. Data in a moratorium come from boreholes. Thus, I suggest moving the moratorium start/end information to project_hole. The only possible conflict with this solution would be if a borehole with a moratorium is deepened later and the data are subject to a new moratorium.

All location information should be given under project_site, because an expedition can have several locations, including land and offshore sites (fields project_location => site_location, type_of_drilling, country, state, county, city, location_description; location names could also be implemented via a gazetteer).

rock_classification and geological_age: is this JET specific? If not, move to project_site. This information can vary from site to site.

Funding agencies can vary from borehole to borehole; move funding_agency to project_hole.

mDIS data model (expedition.png File)

expedition annotated

Table project_site

Much of the information under project_site is actually borehole-specific and does not need to be common for all boreholes on a site.
Move the following information to project_hole:
drilling_method, drilling_method_details, platform_type, platform_name, platform_description, platform_operator, bit_sizes

mDIS data model (site.png File)

site annotated

Table project_hole

ukbgs_* fields shouldn’t be part of the master branch.

Position of the borehole:

  • coordinate_system: use EPSG codes to make the entry unambiguous and the data machine-readable.

  • latitude and longitude: with the option of specifying a coordinate system other than lat/lon, the fields should be called northing and easting.

  • What is core_depth_ccsf, core_depth_csf, drilling_depth_dsf, drilling_depth_drf?
    Even with the comments in the database, I don’t understand. Is this JET specific or part of the basic mDIS? I would stick with the driller’s depth and possibly a corrected depth. There is no need for double fields for below surface and below sea bottom, respectively, because this information is given higher in the hierarchy (type_of_drilling (land, sea, lake), which should be on site level).

  • direction and inclination need a better explanation – what is it? The direction of directional drilling at the surface? Of the line between the surface and the endpoint of the borehole, calculated from borehole survey data? Something else?

  • methods_in_hole is obsolete; implement the respective tables from the legacy DIS: campaign, run, files (already done for COSC, can be imported into the data model).

mDIS data model (hole.png File)

hole annotated

Table core_core

  • IGSN is missing NOT in core_core. - Maybe it is missing in project_site?

  • core_type should be explained as the method of coring, not the method of drilling.

  • drillers_depth and top_depth are redundant. Use top_depth and drop drillers_depth.

  • bottom_depth can be calculated as top_depth + drilled_length. Drop bottom_depth.

  • core_recovery_pc can be calculated as core_recovery/drilled_length*100. Drop core_recovery_pc.

  • What is continuity?
    The explanation for the column doesn’t help me. Is it JET only?
    Is it the condition of the drill core (e.g. like good = no significant disturbance/fracturing, partially fractured = numerous fractures occur, the drill core is still coherent; heavily fractured = the coherence of the drill core cannot be preserved)?
    If yes, move to core_section.

  • rqd_* and igsn_ukbgs are project-specific and shouldn’t be on GitLab master branch.

  • MCD => see separate discussion at the bottom of the page.

mDIS data model (core.png File)

`core_core` annotated

Table core_section

  • top_depth and bottom_depth can be calculated from core_core.top_depth and section_length of sections in the same core. Drop top_depth and bottom_depth.

  • ukbgs_* fields shouldn’t be part of the master branch.

  • MCD => see separate discussion.

mDIS data model (section.png File)

`core_section` annotated

Table curation_section_split

  • Duplicates a lot of information from core_section. Drop all duplicated fields.

Move mscl, xrf, radiographs, photographs, etc. to a separate table (e.g. curation_measurements).
It can then be linked to split sections or directly to the parent section if the core is not split at all or in case the measurements were made before the split.

  • ukbgs_* fields shouldn’t be part of the master branch.

mDIS data model (split.png File)

`curation_section_split` annotated

Table curation_sample

  • bottom = top + interval, drop bottom.

  • Any MCD information (including section_top_mbsf, sample_top_mbsf, sample_bottom_mbsf) is directly derived from MCD information in higher levels and thus can be referenced. Drop all MCD related fields.

  • What is amount? Rename to sample_size? This requires explanation.

  • What is volume? Why is it required in addition to amount?

  • The column scientist duplicates information from sample_request. Drop scientist.

  • approve* is then part of curation_sample_request.

  • ukbgs_* fields shouldn’t be part of the master branch.

mDIS data model (sample.png File)

`curation_sample` annotated

Table sample_request

  • Should this reside under curation, like curation_sample? i.e. curation_sample_request? Else change curation_sample to sample_sample (it’s both, operation and curation).

  • sample_volume and sample_amount, what is it? – same as the question in curation_sample above.

mDIS data model (request.png File)

`sample_request` annotated

Table geology_lithology etc.

These are project-specific and shouldn’t be on GitLab master branch.
We could establish examples according to best practices (use of published vocabularies, etc.) in the template manager but not generate it.

mDIS data model (lithology.png File)

`lithology_geology` - annotated

To-Dos; other Tables

  • Improvements to archive_file (and the process of assigning images to database objects).

  • Decide how to implement core boxes as objects that can be handled by the curation part of the mDIS (maybe clear from AWI order to IG?).

  • Implement controlled vocabularies for all list items.

MCD

`MCD`

What is MCD and how to handle it?

MCD - “metres corrected depth”

If I remember correctly, MCD simply stands for “metres corrected depth”.

Values can be specified at several levels in the hierarchy (core, section, sample) and in different ways (absolute, relative to top_depth).
Thus, all doors are widely open for database inconsistency due to redundant data (changed here but not there) and for scientific mistakes due to different ideas about how to interpret the data (does mcd_offset on section level add to or override mcd_offset on core level?).

mDIS data model (mcd.png File)

`mcd_*` and `curated_*` fields - annotated
We need to answer the following questions:

What is "MCD" in practice?

Where does the value come from? (Or “What was the original idea with it?”)

Problem: MCD and mcd_offset are redundant

(MCD = top_depth + mcd_offset). Which one do we want to keep?

  • Is there a reason for the introduction of MCD at core and section level?
    If yes, what is the reason for it? And what is the original intention to handle this in case there is both a core and a section value for mcd_offset?

  • Can we make the system of MCD corrections simpler and unambiguous?

MCD: My answers and suggestions

  1. In COSC, the drill core depth is corrected to the depth master downhole log (GRmaster). Televiewer imagery was correlated to GRmaster and then, in Corelyzer, the core scans were correlated to the televiewer images. The resulting depth was extracted from Corelyzer and imported to the mDIS as MCD.
    In the Corelyzer file format, the depth imported from mDIS is called MCD, and the present “working-depth” (as drill core is displayed on the screen, possibly after depth shifts by the user) as “depth,” hinting that the original workflows included some other depth correction procedure for MCD and thus, already provided Corelyzer with an MCD from the legacy DIS.

  2. It is good practice to provide absolute values only once (top_depth) and all other values relative to it, i.e. use mcd_offset and drop MCD.

  3. I don’t know, but I could imagine that it was only about unambiguous absolute values in the beginning and the mcd_offset was added later to easily allow users to evaluate the difference between MCD and driller’s depth. If all fields are filled and kept up-to-date at all times, this works. Else it is a mess.

  4. I would reduce MCD to mcd_offset at section level. An offset that affects the entire core run will have the same value for all sections in this core run (no offset, i.e. 0, by default). Offsets of individual sections are added to it. Any depth values on a section (top) can be calculated to driller’s depth (top_depth + top) or MCD (top_depth + mcd_offset + top).

I suggest changing the data model according to my answer to point 4.