Life on Earth is sustained by complex interactions between organisms
and their environment. These biotic interactions can be captured in
datasets and published digitally. We present a review and archiving
process for such an openly accessible digital interactions dataset of
known origin and discuss its outcome. The dataset under review, named
globalbioticinteractions/msb-para, has fingerprint
hash://md5/4599b40f93c7e00b35a2b1c1442a7151, is 4.22GiB in size and
contains 57,433 interaction with 4 unique types of associations (e.g.,
parasiteOf) between 970 primary taxa (e.g., Acari) and 21,956 associated
taxon (e.g., Alopex lagopus). This report includes detailed summaries of
interaction data, a taxonomic review from multiple catalogs, and an
archived version of the dataset from which the reviews are derived.
Introduction
Data Review and Archive
Data review and archiving can be a time-consuming process, especially
when done manually. This review report aims to help facilitate both
activities. It automates the archiving of datasets, including Darwin
Core archives, and is a citable backup of a version of the dataset.
Additionally, an automatic review of species interaction claims made in
the dataset is generated and registered with Global Biotic Interactions
(J. H. Poelen, Simons,
and Mungall 2014).
This review includes summary statistics about, and observations
about, the dataset under review:
For additional metadata related to this dataset, please visit https://github.com/globalbioticinteractions/msb-para
and inspect associated metadata files including, but not limited to,
README.md, eml.xml, and/or globi.json.
Methods
The review is performed through programmatic scripts that leverage
tools like Preston (Elliott
et al. 2025), Elton (Kuhn, Poelen, and Leinweber 2025), Nomer
(Salim and Poelen
2025), globinizer (J. Poelen, Seltmann, and Mietchen 2024)
combined with third-party tools like grep, mlr, tail and head.
The review process can be described in the form of the script below
1.
# get versioned copy of the dataset (size approx. 4.22GiB) under review
elton pull globalbioticinteractions/msb-para
# generate review notes
elton review globalbioticinteractions/msb-para\
> review.tsv
# export indexed interaction records
elton interactions globalbioticinteractions/msb-para\
> interactions.tsv
# export names and align them with the Catalogue of Life using Nomer
elton names globalbioticinteractions/msb-para\
| nomer append col\
> name-alignment.tsv
network diagram showing the taxon family to taxon family interaction
claims in the dataset under review as interpreted by the Catalogue of
Life via Nomer Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024)
network diagram showing the taxon kingdom to taxon kingom
interaction claims in the dataset under review as interpreted by the
Catalogue of Life via Nomer Corpus of Taxonomic Resources (J. H. (ed. ). Poelen
2024)
taxonomic names found in the dataset under review aligned with the
Catalogue of Life as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped comma-separated values format
taxonomic names found in the dataset under review aligned with the
Catalogue of Life as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped html format
taxonomic names found in the dataset under review aligned with the
Catalogue of Life as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped tab-separated values format
taxonomic names found in the dataset under review aligned with
Discover Life bee species checklist as accessed through the Nomer Corpus
of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
comma-separated values format
taxonomic names found in the dataset under review aligned with
Discover Life bee species checklist as accessed through the Nomer Corpus
of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
html format
taxonomic names found in the dataset under review aligned with
Discover Life bee species checklist as accessed through the Nomer Corpus
of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
tab-separated values format
taxonomic names found in the dataset under review aligned with GBIF
Backbone Taxonomy as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped comma-separated values format
taxonomic names found in the dataset under review aligned with GBIF
Backbone Taxonomy as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped html format
taxonomic names found in the dataset under review aligned with GBIF
Backbone Taxonomy as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped tab-separated values format
taxonomic names found in the dataset under review aligned with
Integrated Taxonomic Information System (ITIS) as accessed through the
Nomer Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
comma-separated values format
taxonomic names found in the dataset under review aligned with
Integrated Taxonomic Information System (ITIS) as accessed through the
Nomer Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
html format
taxonomic names found in the dataset under review aligned with
Integrated Taxonomic Information System (ITIS) as accessed through the
Nomer Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
tab-separated values format
taxonomic names found in the dataset under review aligned with the
Mammal Diversity Database as accessed through the Nomer Corpus of
Taxonomic Resources (J.
H. (ed. ). Poelen 2024) in gzipped comma-separated values
format
taxonomic names found in the dataset under review aligned with
Mammal Diversity Database as accessed through the Nomer Corpus of
Taxonomic Resources (J.
H. (ed. ). Poelen 2024) in gzipped html format
taxonomic names found in the dataset under review aligned with
Mammal Diversity Database as accessed through the Nomer Corpus of
Taxonomic Resources (J.
H. (ed. ). Poelen 2024) in gzipped tab-separated values
format
taxonomic names found in the dataset under review aligned with the
NCBI Taxonomy as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped comma-separated values format
taxonomic names found in the dataset under review aligned with the
NCBI Taxonomy as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped html format
taxonomic names found in the dataset under review aligned with the
NCBI Taxonomy as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped tab-separated values format
taxonomic names found in the dataset under review aligned with the
Paleobiology Database as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped comma-separated values format
taxonomic names found in the dataset under review aligned with
Paleobiology Database as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped html format
taxonomic names found in the dataset under review aligned with
Paleobiology Database as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped tab-separated values format
taxonomic names found in the dataset under review aligned with the
Terrestrial Parasite Tracker (TPT) Taxonomic Resource as accessed
through the Nomer Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
comma-separated values format
taxonomic names found in the dataset under review aligned with the
Terrestrial Parasite Tracker (TPT) Taxonomic Resource as accessed
through the Nomer Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
html format
taxonomic names found in the dataset under review aligned with the
Terrestrial Parasite Tracker (TPT) Taxonomic Resource as accessed
through the Nomer Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
tab-separated values format
taxonomic names found in the dataset under review aligned with the
World of Flora Online as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped comma-separated values format
taxonomic names found in the dataset under review aligned with the
World of Flora Online as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped html format
taxonomic names found in the dataset under review aligned with the
World of Flora Online as accessed through the Nomer Corpus of Taxonomic
Resources (J. H. (ed. ).
Poelen 2024) in gzipped tab-separated values format
taxonomic names found in the dataset under review aligned with the
World Register of Marine Species (WoRMS) as accessed through the Nomer
Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
comma-separated values format
taxonomic names found in the dataset under review aligned with the
World Register of Marine Species (WoRMS) as accessed through the Nomer
Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
html format
taxonomic names found in the dataset under review aligned with the
World Register of Marine Species (WoRMS) as accessed through the Nomer
Corpus of Taxonomic Resources (J. H. (ed. ). Poelen 2024) in gzipped
tab-separated values format
metadata of this review expressed in Zenodo record metadata
Archived Dataset
Note that data.zip file in this
archive contains the complete, unmodified archived dataset under
review.
Biotic Interactions
Biotic Interaction Data
Model
In this review, biotic interactions (or biotic associations) are
modeled as a primary (aka subject, source) organism interacting with an
associate (aka object, target) organism. The dataset under review
classified the primary/associate organisms with specific taxa. The
primary and associate organisms The kind of interaction is documented as
an interaction type.
The dataset under review, named globalbioticinteractions/msb-para,
has fingerprint hash://md5/4599b40f93c7e00b35a2b1c1442a7151, is 4.22GiB
in size and contains 57,433 interaction with 4 unique types of
associations (e.g., parasiteOf) between 970 primary taxa (e.g., Acari)
and 21,956 associated taxon (e.g., Alopex lagopus).
An exhaustive list of indexed interaction claims can be found in
gzipped csv and tsv archives. To facilitate
discovery, a preview of claims available in the gzipped html page at indexed-interactions.html.gz are
shown below.
The exhaustive list was used to create the following data summaries
below.
Most Frequently Mentioned Interaction Types (up to 20 most
frequent)
interactionTypeName
count
parasiteOf
53419
coOccursWith
3939
interactsWith
48
hasParasite
27
Most Frequently Mentioned Primary Taxa (up to 20 most
frequent)
sourceTaxonName
count
Acari
6971
Siphonaptera
5926
Cestoda
2814
Nematoda
1869
Taenia
1869
Toxascaris
1750
Phthiraptera
1735
Ixodida
1680
Echinococcus multilocularis
1532
Polyplax borealis
1485
Echinococcus
857
Arthropoda
844
Mastophorus dipodomis
787
Heteromyoxyuris deserti
726
Hoplopleura acanthopus
704
Uncinaria
685
Hoplopleura arboricola
665
Hoplopleura
657
Trematoda
566
Most Frequently Mentioned Associate Taxa (up to 20 most
frequent)
targetTaxonName
count
Alopex lagopus
4326
Myodes rutilus
1866
Microtus oeconomus
1683
Aythya affinis
1200
Vulpes lagopus
722
Sorex cinereus
667
Canis lupus familiaris
638
Larus glaucescens
387
Ondatra zibethicus
355
Fratercula cirrhata
351
Canis lupus
330
Microtus pennsylvanicus
310
Microtus
252
Tamiasciurus hudsonicus
226
Stagnicola
226
Microtus abbreviatus
216
Physa
214
Microtus miurus
211
Trachemys scripta
211
Most Frequent Interactions between Primary and Associate Taxa
(up to 20 most frequent)
sourceTaxonName
interactionTypeName
targetTaxonName
count
Taenia
parasiteOf
Alopex lagopus
1303
Toxascaris
parasiteOf
Alopex lagopus
1048
Echinococcus multilocularis
parasiteOf
Alopex lagopus
573
Polyplax borealis
parasiteOf
Myodes rutilus
513
Echinococcus multilocularis
parasiteOf
Microtus oeconomus
495
Echinococcus
parasiteOf
Alopex lagopus
471
Acari
parasiteOf
Sorex cinereus
417
Uncinaria
parasiteOf
Alopex lagopus
415
Ascarididae
parasiteOf
Alopex lagopus
269
Echinococcus
parasiteOf
Microtus oeconomus
266
Taenia
parasiteOf
Vulpes lagopus
227
Toxascaris
parasiteOf
Vulpes lagopus
217
Amalaraeus dissimilis
parasiteOf
Myodes rutilus
207
Nematoda
parasiteOf
Fratercula cirrhata
158
Trichinella
parasiteOf
Microtus oeconomus
156
Echinococcus multilocularis
parasiteOf
Vulpes lagopus
150
Nematospiroides
parasiteOf
Microtus abbreviatus
139
Quinqueserialis
parasiteOf
Ondatra zibethicus
132
Toxascaris
parasiteOf
Canis lupus familiaris
125
Interaction Networks
The figures below provide a graph view on the dataset under review.
The first shows a summary network on the kingdom level, and the second
shows how interactions on the family level. It is important to note that
both network graphs were first aligned taxonomically using the Catalogue
of Life. Please refer to the original (or verbatim) taxonomic names for
a more original view on the interaction data.
Interactions on taxonomic kingdom rank as
interpreted by the Catalogue of Life download
svgInteractions on the taxonomic family rank
as interpreted by the Catalogue of Life. download
svg
Another way to discover the dataset under review is by searching for
it on the GloBI
website.
Taxonomic Alignment
As part of the review, all names are aligned against various name
catalogs (e.g., col, ncbi, discoverlife, gbif, itis, wfo, mdd, tpt,
pbdb, and worms). These alignments can help review name usage or aid in
selecting of a suitable taxonomic name resource.
Sample of Name Alignments
providedName
relationName
resolvedCatalogName
resolvedName
A25KN
NONE
col
A25KN
A25KT
NONE
col
A25KT
A25KU
NONE
col
A25KU
A25KW
NONE
col
A25KW
Distribution of Taxonomic Ranks of Aligned Names by Catalog.
Names that were not aligned with a catalog are counted as NAs. So, the
total number of unaligned names for a catalog will be listed in their NA
row.
resolvedCatalogName
resolvedRank
count
col
NA
1102
col
class
5
col
family
66
col
genus
266
col
infraorder
1
col
nanorder
2
col
order
18
col
parvorder
1
col
phylum
5
col
species
906
col
subclass
1
col
subfamily
2
col
subgenus
9
col
suborder
1
col
subspecies
64
col
superfamily
9
col
unranked
1
discoverlife
NA
2445
discoverlife
species
1
gbif
NA
917
gbif
class
5
gbif
family
75
gbif
genus
328
gbif
order
14
gbif
phylum
5
gbif
species
1025
gbif
subspecies
82
itis
NA
1386
itis
class
5
itis
family
60
itis
genus
224
itis
order
23
itis
phylum
6
itis
species
681
itis
subclass
5
itis
subfamily
3
itis
subgenus
1
itis
suborder
3
itis
subspecies
51
itis
superfamily
3
mdd
NA
2445
ncbi
NA
1092
ncbi
class
4
ncbi
family
72
ncbi
genus
295
ncbi
infraorder
2
ncbi
order
18
ncbi
phylum
6
ncbi
species
914
ncbi
subclass
5
ncbi
subfamily
2
ncbi
subgenus
2
ncbi
suborder
4
ncbi
subspecies
25
ncbi
superfamily
9
pbdb
NA
1872
pbdb
class
5
pbdb
family
23
pbdb
genus
89
pbdb
order
13
pbdb
phylum
5
pbdb
species
426
pbdb
subclass
1
pbdb
subfamily
1
pbdb
suborder
5
pbdb
subspecies
5
pbdb
subtribe
1
pbdb
superfamily
3
pbdb
unranked clade
4
tpt
NA
1845
tpt
family
5
tpt
genus
54
tpt
order
1
tpt
species
540
wfo
NA
2435
wfo
genus
10
worms
NA
1408
worms
class
4
worms
family
65
worms
genus
254
worms
infraorder
1
worms
order
19
worms
phylum
5
worms
species
665
worms
subclass
4
worms
subfamily
1
worms
subgenus
1
worms
suborder
3
worms
subphylum
1
worms
subspecies
6
worms
superfamily
10
Name relationship types per catalog. Name relationship type
“NONE” means that a name was not recognized by the associated catalog.
“SAME_AS” indicates either a “HAS_ACCEPTED_NAME” or “SYNONYM_OF” name
relationship type. We recognize that “SYNONYM_OF” encompasses many types
of nomenclatural synonymies (ICZN 1999) (e.g., junior synonym, senior
synonyms).
resolvedCatalogName
relationName
count
col
NONE
19532
col
HAS_ACCEPTED_NAME
1257
col
SYNONYM_OF
228
discoverlife
NONE
22757
discoverlife
HAS_ACCEPTED_NAME
1
gbif
NONE
19350
gbif
HAS_ACCEPTED_NAME
1514
gbif
SYNONYM_OF
304
itis
NONE
19829
itis
HAS_ACCEPTED_NAME
979
itis
SYNONYM_OF
142
mdd
NONE
20752
mdd
HAS_ACCEPTED_NAME
157
ncbi
NONE
21371
ncbi
SAME_AS
1294
ncbi
SYNONYM_OF
103
pbdb
NONE
20321
pbdb
HAS_ACCEPTED_NAME
542
pbdb
SYNONYM_OF
67
tpt
NONE
20293
tpt
HAS_ACCEPTED_NAME
629
tpt
SYNONYM_OF
36
wfo
NONE
20899
wfo
SYNONYM_OF
4
wfo
HAS_UNCHECKED_NAME
4
wfo
HAS_ACCEPTED_NAME
4
worms
NONE
19844
worms
HAS_ACCEPTED_NAME
967
worms
SYNONYM_OF
131
List of Available Name Alignment Reports
catalog name
alignment results
col
associated names alignments report in gzipped html, csv, and tsv)
ncbi
associated names alignments report in gzipped html, csv, and tsv)
discoverlife
associated names alignments report in gzipped html, csv, and tsv)
gbif
associated names alignments report in gzipped html, csv, and tsv)
itis
associated names alignments report in gzipped html, csv, and tsv)
wfo
associated names alignments report in gzipped html, csv, and tsv)
mdd
associated names alignments report in gzipped html, csv, and tsv)
tpt
associated names alignments report in gzipped html, csv, and tsv)
pbdb
associated names alignments report in gzipped html, csv, and tsv)
worms
associated names alignments report in gzipped html, csv, and tsv)
Additional Reviews
Elton, Nomer, and other tools may have difficulties interpreting
existing species interaction datasets. Or, they may misbehave, or
otherwise show unexpected behavior. As part of the review process,
detailed review notes are kept that document possibly misbehaving, or
confused, review bots. An sample of review notes associated with this
review can be found below.
First few lines in the review notes.
reviewDate
reviewCommentType
reviewComment
2025-04-11T18:51:08Z
note
found unresolved reference [060980]
2025-04-11T18:51:08Z
note
found unresolved reference [070456]
2025-04-11T18:51:08Z
note
found unresolved reference [077368]
2025-04-11T18:51:08Z
note
found unresolved reference [077679]
In addition, you can find the most frequently occurring notes in the
table below.
Most frequently occurring review notes, if any.
reviewComment
count
found unresolved reference [060980]
1
found unresolved reference [070456]
1
found unresolved reference [077368]
1
found unresolved reference [077679]
1
For additional information on review notes, please have a look at the
first 500 Review Notes in html format
or the download full gzipped csv or tsv archives.
GloBI Review Badge
As part of the review, a review badge is generated. This review badge
can be included in webpages to indicate the review status of the dataset
under review.
Note that if the badge is green, no review notes were generated. If
the badge is yellow, the review bots may need some help with
interpreting the species interaction data.
GloBI Index Badge
If the dataset under review has been registered with
GloBI, and has been succesfully indexed by GloBI, the GloBI Index
Status Badge will turn green. This means that the dataset under review
was indexed by GloBI and is available through GloBI services and derived
data products.
If you’d like to keep track of reviews or index status of the dataset
under review, please visit GloBI’s dataset index 5 for
badge examples.
Discussion
This review and archive provides a means of creating citable versions
of datasets that change frequently. This may be useful for dataset
managers, including natural history collection data managers, as a
backup archive of a shared Darwin Core archive. It also serves as a
means of creating a trackable citation for the dataset in an automated
way, while also including some information about the contents of the
dataset.
This review aims to provide a perspective on the dataset to aid in
understanding of species interaction claims discovered. However, it is
important to note that this review does not assess the quality
of the dataset. Instead, it serves as an indication of the open-ness6 and FAIRness (Wilkinson et
al. 2016; Trekels et al. 2023) of the dataset: to perform this
review, the data was likely openly available, Findable,
Accessible, Interoperable and
Reusable. The current Open-FAIR assessment is
qualitative, and a more quantitative approach can be implemented with
specified measurement units.
This report also showcases the reuse of machine-actionable
(meta)data, something highly recommended by the FAIR Data Principles
(Wilkinson et al.
2016). Making (meta)data machine-actionable enables more precise
procesing by computers, enabling even naive review bots like Nomer and
Elton to interpret the data effectively. This capability is crucial for
not just automating the generation of reports, but also for facilitating
seamless data exchanges, promoting interoperability.
Acknowledgements
We thank the many humans that created us and those who created and
maintained the data, software and other intellectual resources that were
used for producing this review. In addition, we are grateful for the
natural resources providing the basis for these human and bot
activities. Also, thanks to https://github.com/zygoballus for helping
improve the layout of the review tables.
Author contributions
Nomer was responsible for name alignments. Elton carried out dataset
extraction, and generated the review notes. Preston tracked, versioned,
and packaged, the dataset under review.
Kuhn, Tobias, and Michel Dumontier. 2014. “Trusty URIs:
Verifiable, Immutable, and Permanent Digital Artifacts for Linked
Data.” In The Semantic Web: Trends and Challenges,
edited by Valentina Presutti, Claudia d’Amato, Fabien Gandon, Mathieu
d’Aquin, Steffen Staab, and Anna Tordai, 395–410. Cham: Springer
International Publishing.
Poelen, Jorrit H. (ed.). 2024. “Nomer Corpus of Taxonomic
Resources Hash://Sha256/
B60c0d25a16ae77b24305782017b1a270b79b5d1746f832650 F2027ba536e276
Hash://Md5/17f1363a277ee0e4ecaf1b91c665e47e.” Zenodo. https://doi.org/10.5281/zenodo.12695629.
Poelen, Jorrit H., James D. Simons, and Chris J. Mungall. 2014.
“Global Biotic Interactions: An Open Infrastructure to Share and
Analyze Species-Interaction Datasets.”Ecological
Informatics 24 (November): 148–59. https://doi.org/10.1016/j.ecoinf.2014.08.005.
Trekels, Maarten, Debora Pignatari Drucker, José Augusto Salim, Jeff
Ollerton, Jorrit Poelen, Filipi Miranda Soares, Max Rünzel, Muo Kasina,
Quentin Groom, and Mariano Devoto. 2023. “WorldFAIR Project (D10.1) Agriculture-related pollinator
data standards use cases report.” Zenodo. https://doi.org/10.5281/zenodo.8176978.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg,
Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al.
2016. “The FAIR Guiding Principles for Scientific
Data Management and Stewardship.”Scientific Data 3 (1).
https://doi.org/10.1038/sdata.2016.18.