(Detailed document rendering options not shown)
Get metadata formats
Metadata_formats are: dif, oai_datacite, iso19139, oai_dc, datacite.
Calculate a data frames of year-intervals
# | year | doy_first | doy_last | |
---|---|---|---|---|
1 | 1 | 2009 | 2009-01-01 | 2009-12-31 |
2 | 2 | 2010 | 2010-01-01 | 2010-12-31 |
3 | 3 | 2011 | 2011-01-01 | 2011-12-31 |
4 | 12 | 2020 | 2020-01-01 | 2020-12-31 |
5 | 13 | 2021 | 2021-01-01 | 2021-12-31 |
6 | 14 | 2022 | 2022-01-01 | 2022-12-31 |
Goal: Run HTTP GET request for each catalog - year combination.
Step 1/3: Prepare a dataframe of http-calls to verb/method verb=ListSets`
Step 2/3: Now compose getter function for fetchin XML Data via HTTP.
Step 3/3: Use the xml getter function to extend the dataframe of URLs with the count of DOIS assigned per year.
This will perform 322 HTTP requests (for all 14 years * 23 catalogs),
# | name | spec | year | doy_first | doy_last | req | cnt | |
---|---|---|---|---|---|---|---|---|
1 | 1 | ArboDat 2016 | DOIDB.ARBODAT | 2002 | 2002-01-01 | 2002-12-31 | https://doidb.wdc-terra.org/oaip/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2002-01-01&until=2002-12-31&set=DOIDB.ARBODAT | NA |
2 | 2 | CRC1211DB CRC 1211 Database | DOIDB.CRC1211 | 2002 | 2002-01-01 | 2002-12-31 | https://doidb.wdc-terra.org/oaip/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2002-01-01&until=2002-12-31&set=DOIDB.CRC1211 | NA |
3 | 3 | DEKORP - German Continental Seismic Reflection Program | DOIDB.DEKORP | 2002 | 2002-01-01 | 2002-12-31 | https://doidb.wdc-terra.org/oaip/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2002-01-01&until=2002-12-31&set=DOIDB.DEKORP | NA |
4 | 479 | SFB806 and CRC806-Database | DOIDB.SFB806 | 2022 | 2022-01-01 | 2022-12-31 | https://doidb.wdc-terra.org/oaip/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2022-01-01&until=2022-12-31&set=DOIDB.SFB806 | 2 |
5 | 480 | TERENO | DOIDB.TERENO | 2022 | 2022-01-01 | 2022-12-31 | https://doidb.wdc-terra.org/oaip/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2022-01-01&until=2022-12-31&set=DOIDB.TERENO | 103 |
6 | 481 | TR32DB CRC/Transregio 32 Database | DOIDB.TR32DB | 2022 | 2022-01-01 | 2022-12-31 | https://doidb.wdc-terra.org/oaip/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2022-01-01&until=2022-12-31&set=DOIDB.TR32DB | 33 |
7 | 482 | TRR228DB CRC/Transregio 228 Database | DOIDB.TRR228 | 2022 | 2022-01-01 | 2022-12-31 | https://doidb.wdc-terra.org/oaip/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2022-01-01&until=2022-12-31&set=DOIDB.TRR228 | 3 |
8 | 483 | WDS World Stress Map | DOIDB.WSM | 2022 | 2022-01-01 | 2022-12-31 | https://doidb.wdc-terra.org/oaip/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2022-01-01&until=2022-12-31&set=DOIDB.WSM | 5 |
Add a few columns to make calculations and plotting easier.
Checking data quality.
The sum of records in Catalog "DOIDB" should be equal to sum of records in all other data centers:
Currently ,
with , , so that is TRUE.
Same plot as before, zoomed in, subplots sorted by total number of records, with individual Y-axis scales.
The End.
Check out the analysis of IGSN catalogs at GFZ.