mDIS REST API access with R

Developer page

Build your own reports and data analyses. Using the mDIS REST API, you can get mDIS data independently from any predefined forms and reports. All you need is a current login to mDIS.

This tutorial demonstrates how to get data from mDIS with R, and how to analyze some fake data from the ICDP JET Project. This dataset is not really downloadable at this time.

To load a similar dataset, you need to run the seed/example-dump Yii migration. This loads data from the ICDP DSEIS Project drilled in South Africa in 2017. You can also work with your own mDIS data, but then some URLs, URL parameters, axis labels and plot titles must be changed accordingly.

This tutorial does not demonstrate how to edit mDIS datasets (create, delete, duplicate, etc). Such API calls are available, but this tutorial focuses on the simple case of getting data in a read-only mode.
If you need to learn to get write-access to the mDIS REST API, study the JS code in the dis-data-gen repository (opens new window).

What the R code does

  • login to mDIS
  • fetch some data
  • convert it to a table (an R data frame)
  • simple exploratory analysis

TIP

Download the RMarkdown File directly, for usage in RStudio for example.

Necessary R packages

library(tidyverse)    # data processing + plotting helpers
library(jsonlite)     # read/write JSON
library(httr)         # give R web browsing capabilities
library(kableExtra)   # nicer tables
theme_set(theme_bw()) # ggplot layout theme
1
2
3
4
5

Login to mDIS

Get credentials, and URL info, where to get the data from. Assign it to the logindata list.

We read it in as a JSON structure, because the mDIS REST API will return some more such JSON structures later. If you struggle with the following code block, later blocks will be difficult to deal with, too.

my_c <- as_mapper( function(x, y) sprintf("%s=%s", x, y))
expedition <-c("Prees-2" = 4)
site <- c("Prees-2" = 5)
hole <- c("Prees-2" = 2)
qsp <- c(name = "core", "per-page" = 5000, "page" = 0, 
         sort = "id", 
         "filter[expedition_id]" = expedition, # JET 
         "filter[site_id]" = site,       # Prees
         "filter[hole_id]" = hole)       # main hole
qsparams <- map2_chr(names(qsp), qsp, my_c) %>% str_c(collapse="&")

logindata <- jsonlite::fromJSON(txt = sprintf('{
  "username": "knb",
  "password": "knbpassword",
  "urlhost": "https://jet.rundis.com",
  "urlpaths": [
    "/api/v1/auth/login",
    "/api/v1/form?%s"
  ]
}', qsparams))

#logindata$password = Sys.getenv("MDIS_PW")
#jsonlite::toJSON(logindata)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

You can insert your own password on line 10, or you can set it on line 18 (which is commented out at this time).

To actually connect and fetch data with the logindata, we need to create some helper functions first.

Create Functions

Make an mdis_api function - a single function for login and data fetching. This is suboptimal, but will suffice for now.

If the user is not logged in , the logindata list will not contain a token element. Thus we need to log in first, with data provided in logindata. We will receive a Security token ("Bearer Token"), which we will use as 32-character long "temporary password" from now on.

If the user is logged in, logindata$token will have a valid value, e.g. 9nxFYoXoBA7Eb6n1hc-FbwcTqeU5vf3a. Then we can perform HTTP GET requests, sending the token as an extra HTTP header.

# function according to the httr vignette
# returns simple s3 object for easier debugging
# used for login and HTTP-GETting data
mdis_api <- function(logindata, path) {
  url <- modify_url(logindata$urlhost, path = path)
  if(is.null(logindata$token)){
    resp <- POST(url, ua = logindata$ua,
            body = list(username = logindata$username,
                        password = logindata$password))
  } else {
    resp <- GET(url,
                ua = logindata$ua,
                add_headers(
                  c(
                  Authorization = sprintf("Bearer %s", logindata$token))))
  }
  if (http_type(resp) != "application/json") {
    stop(str_c("API did not return json!", 
               "Invalid mIDS filter setting (= wrong query string params)?    ", 
               resp, 
               sep ="\n"), 
         call. = FALSE)
  }

  parsed <- jsonlite::fromJSON(content(resp, "text"), simplifyVector = FALSE)
  
  if (http_error(resp)) {
    stop(
      sprintf(
        "mDIS API request failed [%s]\n%s\n<%s>",
        status_code(resp),
        parsed[[1]],
        resp$url
      ),
      call. = FALSE
    )
  }
  
  # return simple S3 object, see comment below
  structure(
    list(
      content = parsed,
      path = path,
      response = resp
    ),
    class = "mdis_api"
  )  
}

# Rather than simply returning the response as a list,
# I think it’s a good practice to make a simple S3 object.
# That way you can return the response and parsed object,
# and provide a nice print method.
# This will make debugging later on much much much more pleasant.

print.mdis_api <- function(x, ...) {
  cat("<mDIS ", x$path, ">\n", sep = "")
  str(x$content)
  invisible(x)
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

Actually log in. Get token and add it to logindata list.

# good practice: to identify yourself as user agent, ua
logindata$ua = user_agent("http://github.com/knbknb")

loggedin <- mdis_api(logindata, logindata$urlpaths[[1]])

logindata$token <- loggedin$content$token
1
2
3
4
5
6

Fetch data

Make API request to /api/v1/form?name=core&per-page=5000&page=0&sort=id&filter[expedition_id].Prees-2=4&filter[site_id].Prees-2=5&filter[hole_id].Prees-2=2 to get all cores of a JET drillhole "Prees-2" (site 5, hole 2)

parsed <- mdis_api(logindata, logindata$urlpaths[[2]])
1

JSON often contains JavaScript null values. In R, the jsonlite package converts these to R's built-in NULL. However in R, we prefer NA values instead. NA values are easier to work with than NULL objects.
Hence, replace NULLs with NAs in list parsed from JSON, and put items in an R data frame, cores_df.

cores <- as.list(parsed)$content$items %>%
  map(function(x) map(x, function(y) ifelse(is.null(y), NA, y)))

cores_df <- map_df(cores, as_tibble, .name_repair = "minimal")

cores_df <- cores_df %>%
  mutate(core_ondeck = as.POSIXct(core_ondeck))
1
2
3
4
5
6
7

The data

What does the table/data frame look like?

We got a 50x35 table from the mDIS REST API.

Column names, data types, some example values:

#skimr::skim(cores_df)
glimpse(cores_df)
1
2
## Rows: 50
## Columns: 35
## $ core_type        <chr> "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R…
## $ mcd_offset       <dbl> 1.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.…
## $ rqd_abundance    <chr> "0", "<25%", "abundant", "abundant", "abundant", "abundant", "abundant", "abundant", "abunda…
## $ id               <int> 2, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 35, 36, 37, 39, 40, 4…
## $ hole_id          <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ core             <int> 1, 2, 3, 5, 6, 4, 13, 8, 14, 11, 9, 12, 10, 7, 16, 30, 32, 33, 35, 37, 38, 39, 40, 41, 42, 4…
## $ combined_id      <chr> "5065_1_A_1", "5065_1_A_2", "5065_1_A_3", "5065_1_A_5", "5065_1_A_6", "5065_1_A_4", "5065_1_…
## $ analyst          <chr> "KH", "KH", "CK", "KH", "KH", "KB", "KB", "KB", "KH", "CK", "KH", "KH", "KB", "KH", "CK", "C…
## $ core_ondeck      <dttm> 2020-09-04 10:30:00, 2020-06-22 05:13:00, 2020-06-22 06:18:00, 2020-06-22 07:41:00, 2020-06…
## $ top_depth        <dbl> 0.00, 3.00, 3.32, 5.05, 7.28, 4.74, 16.65, 9.82, 17.09, 13.47, 10.42, 15.31, 13.03, 8.47, 18…
## $ drilled_length   <dbl> 3.00, 0.32, 1.42, 2.23, 1.19, 0.31, 0.44, 0.60, 0.53, 1.84, 2.61, 1.34, 0.44, 1.35, 1.69, 0.…
## $ bottom_depth     <dbl> 3.00, 3.32, 4.74, 7.28, 8.47, 5.05, 17.09, 10.42, 17.62, 15.31, 13.03, 16.65, 13.47, 9.82, 2…
## $ core_recovery    <dbl> 3.0000, 0.3136, 1.3206, 2.0500, 0.8000, 0.2900, 0.4100, 0.5900, 0.0700, 0.8600, 1.6400, 1.29…
## $ core_recovery_pc <int> 100, 98, 93, 92, 67, 94, 93, 98, 13, 47, 63, 96, 91, 73, 91, 0, 0, 98, 98, 84, 91, 93, 83, 9…
## $ continuity       <chr> "continuous", "fractures", "continuous", "continuous", "rubble", "fractures", "rubble", "bro…
## $ last_section     <int> 3, 1, 3, 2, 4, 1, 1, 2, 4, 3, 4, 4, 1, 4, 2, 1, 2, 1, 2, 2, 4, 4, 2, 2, 2, 3, 2, 2, 3, 2, 2,…
## $ core_catcher     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ core_diameter    <chr> "HQ", "HQ", "AQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "H…
## $ oriented         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ core_loss_reason <chr> NA, "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "unknown", "unknown", "-", "-", "-", "…
## $ rqd_intensity    <chr> NA, "0", "1", "2", "intense", "intense", "intense", "intense", "intense", "intense", "intens…
## $ comments         <chr> NA, "Elissa Kerluke PhD", "Electa Hodkiewicz", "Electa Hodkiewicz has \"borrowed\" it", "Ala…
## $ igsn             <chr> "ICDP5065EC20001", "ICDP5065ECE0001", "ICDP5065ECF0001", "ICDP5065ECG0001", "ICDP5065ECH0001…
## $ fluid_type       <chr> NA, "fresh", "salty", "fresh", "fresh", "fresh", "fresh", "fresh", "fresh", "fresh", "fresh"…
## $ bit_type         <chr> NA, "used", "used", "used", "used", "used", "used", "used", "used", "used", "used", "used", …
## $ barrel_length    <int> 6, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ drillers_depth   <dbl> NA, 3.314, 4.642, 7.200, 8.080, 5.030, 17.060, 10.410, 17.160, 14.330, 12.060, 16.600, 13.43…
## $ comments_2       <chr> NA, "check drillers depth", "check drillers depth", "check drillers depth?", "check drillers…
## $ mcd_top_depth    <int> 1, 3, 3, 5, 7, 4, 16, 9, 17, 13, 10, 15, 13, 8, 18, 21, 25, 28, 31, 35, 38, 44, 44, 44, 44, …
## $ methods_core     <list> ["MSCL", "MSCL", "MSCL", "MSCL", "MSCL", "MSCL", "MSCL", "Core Section Scan", "MSCL", "MSCL…
## $ igsn_ukbgs       <chr> "UK.BGS.SJ53SE52.1", "UK.BGS.SJ53SE52.002", "UK.BGS.SJ53SE52.003", NA, NA, NA, NA, NA, NA, N…
## $ site_id          <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
## $ expedition_id    <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ program_id       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ archive_files    <list> [[4, 5, 2, 2], [4, 5, 2, 14], [4, 5, 2, 15], [4, 5, 2, 16], [4, 5, 2, 17], [4, 5, 2, 18], […
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

A table view would be very, very wide so we list only the column definitions above, with some sample values.

Analysis

Do some descriptive statistics with the drillcore data.

Core Diameter Counts

We are studying drillhole 2, site 5 (ICDP JET project).
How "deep" did they drill?
(Remember, these are fake data)

subtitle <- str_c("(fake data:) JET Core diameters (mm), Hole ", hole["Prees-2"], ", site ", site["Prees-2"], collapse = " ")
cores_df %>%
  group_by(core_diameter) %>%
  summarize(`max_depth (m)`= max(bottom_depth), n_Cores = n(), .groups = "keep")  %>%
  arrange(desc(core_diameter)) %>%
  ungroup() %>% 
  knitr::kable(caption = subtitle, col.names = names(.)) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped"))
1
2
3
4
5
6
7
8
(fake data:) JET Core diameters (mm), Hole 2, site 5
core_diameter max_depth (m) n_Cores
TEMP 117.36 1
PQ 111.36 2
HQ 104.36 46
AQ 4.74 1

Who are the most active analysts/geologists?

cores_df %>%
  count(analyst, sort =TRUE) %>%
  knitr::kable(caption = "Initials of active curators in the JET project", col.names = names(.)) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped"))
1
2
3
4
Initials of active curators in the JET project
analyst n
KH 20
KB 15
CK 12
TG 2
SPH 1

Core loss

Tabulating core loss

table(as.Date(cores_df$core_ondeck), cores_df$core_loss_reason) %>%
  kable(caption = "When were JET cores lost? How many, and why?") %>%
  kableExtra::kable_styling(bootstrap_options = c("striped"))
1
2
3
When were JET cores lost? How many, and why?
- fallback unknown
2020-06-22 10 0 2
2020-06-23 2 0 0
2020-07-13 5 1 1
2020-07-14 17 2 3
2020-07-15 3 0 0
2020-09-04 0 0 0
2020-09-28 0 0 0
2020-09-29 0 0 0

The second column contains the number of cores recovered on this day.

Core recovery in percent

See figure.

cores_df %>% ggplot(aes(core_ondeck, core_recovery_pc/100, color=analyst)) +
  geom_jitter(alpha = 0.5) +
  geom_hline(yintercept = 1, color = "brown", linetype = 2, alpha = 0.5) +
  labs(title = "Fake Data (!): Prees-2 Core Recovery in Percent",
       subtitle = subtitle,
       y = "% Core Recovered",
       x = "Day of Recovery (2020)") +
  scale_y_continuous(labels = scales::percent_format(accuracy = NULL),  breaks = seq(0, 1.2, 0.1))
1
2
3
4
5
6
7
8

Values larger than 100% are possibly due to solid core material broken off a bit below the core barrel. Sometimes cores are retrieved with a bit of extra rock sticking out of the pipe.
Alternatively it might be elastic material (mud) decompressing and expanding in a plastic liner.

Many more analyses are now possible. These were just illustrations.

REST API

Documentation

See REST API page for extensive documentation.

Developer page