mDIS REST API access with R
Build your own reports and data analyses. Using the mDIS REST API, you can get mDIS data independently from any predefined forms and reports. All you need is a current login to mDIS.
This tutorial demonstrates how to get data from mDIS with R, and how to analyze some fake data from the ICDP JET Project. This dataset is not really downloadable at this time.
To load a similar dataset, you need to run the seed/example-dump
Yii migration. This loads data from the ICDP DSEIS Project drilled in South Africa in 2017. You can also work with your own mDIS data, but then some URLs, URL parameters, axis labels, and plot titles must be changed accordingly.
This tutorial does not demonstrate how to edit mDIS datasets (create, delete, duplicate, etc.). Such API calls are available, but this tutorial focuses on the simple case of getting data in a read-only mode.
If you need to learn to get write access to the mDIS REST API, study the JS code in the dis-data-gen repository.
What the R code does
- login to mDIS
- fetch some data
- convert it to a table (an R data frame)
- simple exploratory analysis
Tips
Download the RMarkdown File directly, for usage in RStudio, for example.
Necessary R packages
library(tidyverse) # data processing + plotting helpers
library(jsonlite) # read/write JSON
library(httr) # give R web browsing capabilities
library(kableExtra) # nicer tables
theme_set(theme_bw()) # ggplot layout theme
Login to mDIS
Get credentials and URL info, where to get the data from. Assign it to the logindata
list.
We read it in as a JSON structure because the mDIS REST API will return some more such JSON structures later. If you struggle with the following code block, later blocks will be difficult to deal with, too.
my_c <- as_mapper(function(x, y) sprintf("%s=%s", x, y))
expedition <- c("Prees-2" = 4)
site <- c("Prees-2" = 5)
hole <- c("Prees-2" = 2)
qsp <- c(name = "core", "per-page" = 5000, "page" = 0,
sort = "id",
"filter[expedition_id]" = expedition, # JET
"filter[site_id]" = site, # Prees
"filter[hole_id]" = hole) # main hole
qsparams <- map2_chr(names(qsp), qsp, my_c) %>% str_c(collapse = "&")
logindata <- jsonlite::fromJSON(txt = sprintf('{
"username": "knb",
"password": "knbpassword",
"urlhost": "https://jet.rundis.com",
"urlpaths": [
"/api/v1/auth/login",
"/api/v1/form?%s"
]
}', qsparams))
#logindata$password = Sys.getenv("MDIS_PW")
#jsonlite::toJSON(logindata)
You can insert your own password on line 10, or you can set it on line 18 (which is commented out at this time).
To actually connect and fetch data with the logindata
, we need to create some helper functions first.
Create Functions
Make an mdis_api
function - a single function for login and data fetching. This is suboptimal, but will suffice for now.
If the user is not logged in, the logindata
list will not contain a token
element. Thus we need to log in first, with data provided in logindata
. We will receive a Security token ("Bearer Token"), which we will use as a 32-character long "temporary password" from now on.
If the user is logged in, logindata$token
will have a valid value, e.g. 9nxFYoXoBA7Eb6n1hc-FbwcTqeU5vf3a
. Then we can perform HTTP GET requests, sending the token as an extra HTTP header.
# function according to the httr vignette
# returns simple s3 object for easier debugging
# used for login and HTTP-GETting data
mdis_api <- function(logindata, path) {
url <- modify_url(logindata$urlhost, path = path)
if (is.null(logindata$token)) {
resp <- POST(url, ua = logindata$ua,
body = list(username = logindata$username,
password = logindata$password))
} else {
resp <- GET(url,
ua = logindata$ua,
add_headers(
c(
Authorization = sprintf("Bearer %s", logindata$token))))
}
if (http_type(resp) != "application/json") {
stop(str_c("API did not return json!",
"Invalid mDIS filter setting (= wrong query string params)? ",
resp,
sep = "\n"),
call. = FALSE)
}
parsed <- jsonlite::fromJSON(content(resp, "text"), simplifyVector = FALSE)
if (http_error(resp)) {
stop(
sprintf(
"mDIS API request failed [%s]\n%s\n<%s>",
status_code(resp),
parsed[[1]],
resp$url
),
call. = FALSE
)
}
# return simple S3 object, see comment below
structure(
list(
content = parsed,
path = path,
response = resp
),
class = "mdis_api"
)
}
# Rather than simply returning the response as a list,
# I think it’s a good practice to make a simple S3 object.
# That way you can return the response and parsed object,
# and provide a nice print method.
# This will make debugging later on much, much more pleasant.
print.mdis_api <- function(x, ...) {
cat("<mDIS ", x$path, ">\n", sep = "")
str(x$content)
invisible(x)
}
Actually log in. Get token and add it to the logindata
list.
# good practice: to identify yourself as user agent, ua
logindata$ua = user_agent("http://github.com/knbknb")
loggedin <- mdis_api(logindata, logindata$urlpaths[[1]])
logindata$token <- loggedin$content$token
Fetch data
Make API request to /api/v1/form?name=core&per-page=5000&page=0&sort=id&filter[expedition_id].Prees-2=4&filter[site_id].Prees-2=5&filter[hole_id].Prees-2=2
to get all cores of a JET drillhole "Prees-2" (site 5, hole 2)
parsed <- mdis_api(logindata, logindata$urlpaths[[2]])
JSON often contains JavaScript null
values. In R, the jsonlite
package converts these to R's built-in NULL
. However, in R, we prefer NA
values instead. NA
values are easier to work with than NULL
objects.
Hence, replace NULLs with NAs in the list parsed from JSON, and put items in an R data frame, cores_df
.
cores <- as.list(parsed)$content$items %>%
map(function(x) map(x, function(y) ifelse(is.null(y), NA, y)))
cores_df <- map_df(cores, as_tibble, .name_repair = "minimal")
cores_df <- cores_df %>%
mutate(core_ondeck = as.POSIXct(core_ondeck))
The data
What does the table/data frame look like?
We got a 50x35 table from the mDIS REST API.
Column names, data types, some example values:
#skimr::skim(cores_df)
glimpse(cores_df)
## Rows: 50
## Columns: 35
## $ core_type <chr> "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R", "R…
## $ mcd_offset <dbl> 1.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0…
## $ rqd_abundance <chr> "0", "<25%", "abundant", "abundant", "abundant", "abundant", "abundant", "abundant", "abunda…
## $ id <int> 2, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 35, 36, 37, 39, 40, 4…
## $ hole_id <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ core <int> 1, 2, 3, 5, 6, 4, 13, 8, 14, 11, 9, 12, 10, 7, 16, 30, 32, 33, 35, 37, 38, 39, 40, 41, 42, 4…
## $ combined_id <chr> "5065_1_A_1", "5065_1_A_2", "5065_1_A_3", "5065_1_A_5", "5065_1_A_6", "5065_1_A_4", "5065_1_…
## $ analyst <chr> "KH", "KH", "CK", "KH", "KH", "KB", "KB", "KB", "KH", "CK", "KH", "KH", "KB", "KH", "CK", "C…
## $ core_ondeck <dttm> 2020-09-04 10:30:00, 2020-06-22 05:13:00, 2020-06-22 06:18:00, 2020-06-22 07:41:00, 2020-06…
## $ top_depth <dbl> 0.00, 3.00, 3.32, 5.05, 7.28, 4.74, 16.65, 9.82, 17.09, 13.47, 10.42, 15.31, 13.03, 8.47, 18…
## $ drilled_length <dbl> 3.00, 0.32, 1.42, 2.23, 1.19, 0.31, 0.44, 0.60, 0.53, 1.84, 2.61, 1.34, 0.44, 1.35, 1.69, 0.…
## $ bottom_depth <dbl> 3.00, 3.32, 4.74, 7.28, 8.47, 5.05, 17.09, 10.42, 17.62, 15.31, 13.03, 16.65, 13.47, 9.82, 2…
## $ core_recovery <dbl> 3.0000, 0.3136, 1.3206, 2.0500, 0.8000, 0.2900, 0.4100, 0.5900, 0.0700, 0.8600, 1.6400, 1.29…
## $ core_recovery_pc <int> 100, 98, 93, 92, 67, 94, 93, 98, 13, 47, 63, 96, 91, 73, 91, 0, 0, 98, 98, 84, 91, 93, 83, 9…
## $ continuity <chr> "continuous", "fractures", "continuous", "continuous", "rubble", "fractures", "rubble", "bro…
## $ last_section <int> 3, 1, 3, 2, 4, 1, 1, 2, 4, 3, 4, 4, 1, 4, 2, 1, 2, 1, 2, 2, 4, 4, 2, 2, 2, 3, 2, 2, 3, 2, 2,…
## $ core_catcher <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ core_diameter <chr> "HQ", "HQ", "AQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "HQ", "H…
## $ oriented <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ core_loss_reason <chr> NA, "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "unknown", "unknown", "-", "-", "-", "…
## $ rqd_intensity <chr> NA, "0", "1", "2", "intense", "intense", "intense", "intense", "intense", "intense", "intens…
## $ comments <chr> NA, "Elissa Kerluke PhD", "Electa Hodkiewicz", "Electa Hodkiewicz has \"borrowed\" it", "Ala…
## $ igsn <chr> "ICDP5065EC20001", "ICDP5065ECE0001", "ICDP5065ECF0001", "ICDP5065ECG0001", "ICDP5065ECH0001…
## $ fluid_type <chr> NA, "fresh", "salty", "fresh", "fresh", "fresh", "fresh", "fresh", "fresh", "fresh", "fresh"…
## $ bit_type <chr> NA, "used", "used", "used", "used", "used", "used", "used", "used", "used", "used", "used", …
## $ barrel_length <int> 6, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ drillers_depth <dbl> NA, 3.314, 4.642, 7.200, 8.080, 5.030, 17.060, 10.410, 17.160, 14.330, 12.060, 16.600, 13.43…
## $ comments_2 <chr> NA, "check drillers depth", "check drillers depth", "check drillers depth?", "check drillers…
## $ mcd_top_depth <int> 1, 3, 3, 5, 7, 4, 16, 9, 17, 13, 10, 15, 13, 8, 18, 21, 25, 28, 31, 35, 38, 44, 44, 44, 44, …
## $ methods_core <list> ["MSCL", "MSCL", "MSCL", "MSCL", "MSCL", "MSCL", "MSCL", "Core Section Scan", "MSCL", "MSCL…
## $ igsn_ukbgs <chr> "UK.BGS.SJ53SE52.1", "UK.BGS.SJ53SE52.002", "UK.BGS.SJ53SE52.003", NA, NA, NA, NA, NA, NA, N…
## $ site_id <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
## $ expedition_id <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ program_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ archive_files <list> [[4, 5, 2, 2], [4, 5, 2, 14], [4, 5, 2, 15], [4, 5, 2, 16], [4, 5, 2, 17], [4, 5, 2, 18], […
A table view would be very, very wide so we list only the column definitions above, with some sample values.
Analysis
Do some descriptive statistics with the drillcore data.
Core Diameter Counts
We are studying drillhole 2, site 5 (ICDP JET project).
How "deep" did they drill?
(Remember, these are fake data)
subtitle <- str_c("(fake data:) JET Core diameters (mm), Hole ", hole["Prees-2"], ", site ", site["Prees-2"], collapse = " ")
cores_df %>%
group_by(core_diameter) %>%
summarize(`max_depth (m)` = max(bottom_depth), n_Cores = n(), .groups = "keep") %>%
arrange(desc(core_diameter)) %>%
ungroup() %>%
knitr::kable(caption = subtitle, col.names = names(.)) %>%
kableExtra::kable_styling(bootstrap_options = c("striped"))
core_diameter | max_depth (m) | n_Cores |
---|---|---|
TEMP | 117.36 | 1 |
PQ | 111.36 | 2 |
HQ | 104.36 | 46 |
AQ | 4.74 | 1 |
Who are the most active analysts/geologists?
cores_df %>%
count(analyst, sort = TRUE) %>%
knitr::kable(caption = "Initials of active curators in the JET project", col.names = names(.)) %>%
kableExtra::kable_styling(bootstrap_options = c("striped"))
analyst | n |
---|---|
KH | 20 |
KB | 15 |
CK | 12 |
TG | 2 |
SPH | 1 |
Core loss
Tabulating core loss
table(as.Date(cores_df$core_ondeck), cores_df$core_loss_reason) %>%
kable(caption = "When were JET cores lost? How many, and why?") %>%
kableExtra::kable_styling(bootstrap_options = c("striped"))
- | fallback | unknown | |
---|---|---|---|
2020-06-22 | 10 | 0 | 2 |
2020-06-23 | 2 | 0 | 0 |
2020-07-13 | 5 | 1 | 1 |
2020-07-14 | 17 | 2 | 3 |
2020-07-15 | 3 | 0 | 0 |
2020-09-04 | 0 | 0 | 0 |
2020-09-28 | 0 | 0 | 0 |
2020-09-29 | 0 | 0 | 0 |
The second column contains the number of cores recovered on this day.
Core recovery in percent
See figure.
cores_df %>% ggplot(aes(core_ondeck, core_recovery_pc/100, color=analyst)) +
geom_jitter(alpha = 0.5) +
geom_hline(yintercept = 1, color = "brown", linetype = 2, alpha = 0.5) +
labs(title = "Fake Data (!): Prees-2 Core Recovery in Percent",
subtitle = subtitle,
y = "% Core Recovered",
x = "Day of Recovery (2020)") +
scale_y_continuous(labels = scales::percent_format(accuracy = NULL), breaks = seq(0, 1.2, 0.1))
Values larger than 100% are possibly due to solid core material broken off a bit below the core barrel. Sometimes cores are retrieved with a bit of extra rock sticking out of the pipe.
Alternatively, it might be elastic material (mud) decompressing and expanding in a plastic liner.
Many more analyses are now possible. These were just illustrations.
REST API
Documentation
See REST API page for extensive documentation.