Movebank Data Repository Preservation Policy

Policy Statement

The Movebank Data Repository is dedicated to long-term archiving and preservation of the data it publishes.


Intended Audience

Movebank Data Repository staff, data depositors, and data users


Summary

This document aims at describing the Movebank Data Repository’s approach to achieving and guaranteeing long-term access to its contents. It describes the scope, mission and goals of the policy, the content of the repository and the requirements needed for long-term preservation from and archival and technical perspectives. Furthermore, the roles and responsibilities of the repository staff are described, as well as content coverage, how the repository ensures integrity and security, and how it is financed.


1. Scope and Goals of this plan

1.1. Scope

This Preservation Policy is valid for the Movebank Data Repository and all its published data packages, files, and metadata. This does not include the preservation of related websites, documents and other materials connected to the service.


1.2. Mission and Goals

The Movebank Data Repository (hereafter “repository”) is a service to provide publication and long-term archiving of animal movement and animal-borne sensor data stored in the Movebank database[1], in line with Movebank’s goal to “archive animal movement data for future use”. The repository focuses on data underlying peer-reviewed articles and other published reports describing data. Data and metadata are stored and managed in consistent formats following publicly documented standards. This policy is driven by guidelines including “Rules of good scientific practice”[2] from the Max Planck Society and the “Guidelines for Safeguarding Good Research Practice” of the Deutsche Forschungsgemeinschaft (German Research Foundation, DFG [3]). The preservation strategy follows principles such as the OAIS reference model[4] , the FAIR principles[5] and CoreTrustSeal[6]. This policy formalizes and communicates long-standing processes and workflows to ensure the long-term preservation of animal movement data.

The Movebank Data Repository started its service in 2012 and as an extension of the Movebank platform. It is hosted and run by the Communication, Information, Media Centre (KIM) [7] of the University of Konstanz in close cooperation with the Max Planck Institute of Animal Behavior[8]. KIM unites the library and the computing centre of the University of Konstanz. Its predecessor organisation, the former university library, started its work with the founding of the University in 1966. As a state-funded memory institution, one main purpose of the is the enabling and long-term guarantee of scientific information in general. The Max Planck Institute of Animal Behavior (MPIAB) is one of the leading research institutions in the field of animal ecology and animal tracking. Together Movebank and the Movebank Data Repository support researchers and practitioners in this field globally by making high-quality data available for reuse. The repository was developed and started its service as a DFG-funded project (MoveVRE) from 2010 to 2012. KIM has since overtaken the responsibility for the service in close cooperation with MPIAB.

The goal of this preservation policy is to

  • provide long-term access to high quality data to researchers
  • maintain accessibility and quality of published data
  • ensure authenticity, integrity, and security of published data
  • communicate the trustworthiness of the repository to its users

  • 2. Content and Community of the repository

    2.1. Characterisation of the content

    Sensors attached to living animals provide measurements that describe their movements, behavior, external environment and internal conditions. Movebank is a global platform for such data, with its core platform allowing users to flexibly organize and share data, including in-progress data collection and both public and controlled-access sharing. The repository is an optional service providing persistent, public archiving of curated datasets as static files exported from the Movebank platform and uploaded and distributed through the repository.

    The content of the repository consists of selected datasets submitted for publication by the data owner and stored in the Movebank database, as described further here . Submission through the Movebank platform ensures that published data are harmonized to documented formats and provides tools for data assessment, quality control, and edits during the review process. Such data can be archived in the repository if it is described in a written publication, such as a peer-reviewed journal article. Repository curation staff review submitted data, editing and enriching data and related documentation in coordination with the data depositor. When the review is complete, curation staff extract data from the Movebank database, create a readme file, and upload files to the repository. The date of data release is decided in coordination with the data depositor to align with publication of related peer-reviewed articles or based on other relevant considerations. As of spring 2021, the repository has published 234 data packages, including over 100 million animal location occurrences, that underlie research published by 99 journals.

    Data packages consist of the following information and components:

  • Animal tracking data packages are structured as csv files and follow Movebank’s published vocabulary [9] and data model[10]. The data contain geographic positions, other sensor measurements, and information about the animal species tracked, sensors used, and study methods.
  • Additional data files relevant to a data or analysis can be included, to be decided on a case-by-case basis. Additional files must be in an open format (e.g., .txt, .csv, .png; see https://en.wikipedia.org/wiki/List_of_open_formats for possible options), and the depositor must provide sufficient description of the file contents to reasonably expect that the information could be understood and reused by others.
  • Each data file is enriched with a readme document as a text file, which contains information about the files contained in the data package, related citations, terms of use and the license under which the data set is published.
  • Each data package is assigned a persistent identifier to support data citation and discovery. The repository uses DOIs (digital object identifiers )[11] for this task.
  • Data and metadata are published under a CC0 [12]license and can be downloaded by the public, allowing data access and reuse.
  • Metadata associated with each published data package and file are published and maintained on DataCite[13].
  • The publication of harmonized data using open file types lowers the barrier for using the data, as no proprietary software is needed, and increases the possibility of reuse for the data in the future, as published files are more likely to be supported by different stakeholders and applications. Updates to published data packages can be submitted and published as new data packages. Storage of data packages in the repository is guaranteed for a minimum of 10 years in accordance with DFG guidelines on the handling of research data [14]. After the expiration of this term the repository will not remove the data from its archive. To ensure future access to and reusability of the data, migration to other file formats will be evaluated.

    A planned future extension of the repository will add a new type of data package to the repository, publishing analysis workflow source code and metadata from MoveApps[15], a service extending the analysis capabilities of Movebank.

    2.2. Designated Community of the repository

    The target community of the repository includes researchers, government agencies, conservation organizations and other groups that collect and/or use animal tracking and other animal-borne sensor data. This includes the fields of movement ecology, wildlife management and conservation, and biodiversity informatics. The repository’s data curation staff provide editorial and review services to prepare data submissions, maintain metadata, and ensure relevance of the repository within the designated community. The data curator role functions as a connection between this community and the technical infrastructure of the repository.

    3. Requirements

    3.1. Archival requirements

    The repository follows requirements that need to be fulfilled to ensure long-term preservation of published data:

  • data published in the repository are accompanied by adequate documentation to enable their use and re-use;
  • data are checked, validated, and curated following predefined workflows;
  • data are described and enriched with metadata following standards and best practices;
  • data packages and metadata are stored and preserved for the long-term; and
  • the authenticity, integrity and reliablity of datasets preserved for future use are retained.
  • The repository maintains a further commitment to the FAIR Principles to make data findable, accessible, interoperable and reusable, as described in our Mission Statement.

    3.2. Technical requirements

    Technical hosting and maintenance of the repository are ensured by the IT department of the KIM. The department makes technical decisions with respect to state-of-the-art technology. It prefers open-source software with a focus on sustainability. The University is also an active member of the DSpace Konsortium Deutschland[16] community leading to exchange with other expert groups and repository developers. To expand features needed by the designated community, repository staff are in constant exchange with the designated community (see 2.2). This collaboration is fostered by the close cooperation with the MPIAB as a highly renowned research institution in this field. Communication with users can lead to requirements management which may lead to the implementation of new features.

    4. Roles and responsibilities

    All staff working on the repository (both from KIM and MPIAB) assist in fulfilling the needed requirements to ensure the continuity to data access and the service. Staff members will be guided to respect this policy. The product owner of the service is responsible for maintaining this policy.

    5. Integrity and security

    All workflows in the repository follow defined processes that are described in an internal manual and are planned to be visualized and formalized in the process portal of the University of Konstanz.

    The repository is committed to taking all necessary precautions to ensure the physical safety and security of the data it preserves. This includes communication with the IT security staff of the KIM on a regular basis and the development and monitoring of an information security concept. The virtual machines the repository runs on, including the repository software and the data collection, is backed up regularly in different locations on campus. Metadata published with every data package and file follow the DataCite standard. Checksums for checking data integrity after storage are generated. Data files are only accepted in open formats.

    6. Sustainability plans and funding

    To ensure and guarantee long-term support of the repository, it receives long-term funding from the KIM and the MPIAB, an institute of the Max Planck Society. The University of Konstanz and Max Planck Society are financed primarily through public funds of the German government and therefore offer stable long-term funding and hosting for the repository. Should the repository cease to exist due to unforeseen circumstances, all published data will be offered to the University archive of the University of Konstanz for further preservation.


    This preservation policy was originally approved by Petra Hätscher, Director of the Communication, Information, Media Centre (KIM) of the University of Konstanz on 25.03.2021. Changes to the policy were approved on 02.09.2021.



    References

    [1] https://www.movebank.org

    [2] https://www.mpg.de/197494/rulesScientificPractice.pdf

    [3] https://doi.org/10.5281/zenodo.3923602

    [4] http://www.oais.info/

    [5] https://www.go-fair.org/fair-principles/

    [6] https://www.coretrustseal.org/

    [7] https://www.kim.uni-konstanz.de/

    [8] https://www.ab.mpg.de/

    [9] http://vocab.nerc.ac.uk/collection/MVB/current/

    [10] https://www.movebank.org/cms/movebank-content/mb-data-model

    [11] https://www.doi.org/

    [12] https://creativecommons.org/publicdomain/zero/1.0/

    [13] https://doi.datacite.org/

    [14] https://www.dfg.de/en/research_funding/proposal_review_decision/applicants/research_data/index.html

    [15] https://moveapps.org/

    [16] https://wiki.lyrasis.org/display/DSPACE/DSpace-Konsortium+Deutschland



    Mapping to the OAIS reference model

    The Movebank Data Repository is a digital archive and as such aims for compliance with the OAIS reference model. This high-level mapping describes the way the Movebank Data Repository implements the functional entities of the OAIS reference model.


    Pre-Ingest

    Depositors contact Movebank Data Repository staff to initialize the submission process. This includes an explanation of the submission process, communicating the terms of use of the repository and giving depositors the possibility to ask questions in case of uncertainty. In this step the depositor grants the curator access to the content in the Movebank database to create a submission information package (SIP). Once the curator can access the data the ingest phase of submission process begins.


    Ingest

    The data curator retrieves the content information for the SIP from Movebank.org and executes quality assurance/quality control following predefined workflows. This includes formal evaluation as well as comparing the content information to related scientific outputs. Movebank as a homogenous submission source guarantees that all submissions are structured following Movebanks’ published vocabulary.

    Before ingest to the repository the data curator enriches the content conformation with preservation description information. This includes:

  • provenance information such as timestamps
  • context information referencing a scientific paper that builds on the Content Information
  • reference information, including a DOI as a persistent identifier to globally and uniquely identify the data set
  • access rights, describing the license under which the data set can be accessed and used
  • fixity information is added to the data set at a later point
  • Those information are either stored as metadata in the repository or in a readme that the data curator creates for every file in a data set. An Archival Information Package (AIP), the content information from the SIP plus the additional mentioned metadata, is ingested into the repository system.

    Storage

    Via the repository software the AIP is submitted to the archival storage. During this step fixity information is generated and added to the AIP to enable checks making evaluation of the integrity of the AIP possible. The archival infrastructure is part of the university data centre infrastructure. This leads the execution of routine workflows ensuring the backup of the storage in multiple locations and monitoring of hardware deterioration.

    Data Management

    Metadata of data sets can be updated if needed and the changes will be recorded. If the Content Information of the original changes and a publication of the new version is intended, this leads to a new submission, resulting in a new AIP with a new persistent identifier. The new version will be referenced in the metadata of the old version. Removal of published data is not a common case. If needed though, due to grave circumstances, data sets can be removed from the archive, while keeping metadata still available.

    Access

    The access function enables users to access the data sets of the repository. This is made possible with the repository software, including a graphical web interface, and searching functionalities. All content of the repository is licensed under a CC0 license, making the need for a role-management system and user authentication obsolete. To enhance findability of the data sets metadata are distributed to different meta search services via an OAI-PMH interface. When a user queries the repository for data it is made available as a Dissemination Information Package (DIP) for access and download.

    Preservation Planning

    Data sets in the repository have a guaranteed storage duration of at least 10 years. To ensure long-term access repository staff evaluates metadata, file formats and repository features according to the needs of the designated user community using surveys, contact via email and other means of communication. Integrity of the data sets is checked using generated checksums via the repository software. Status of hardware components is monitored with workflows of the Communication, Information, Media Centre (KIM) of the University of Konstanz data centre to guard against storage deterioration.

    Administration

    Administration features support staff in the day-to-day work with the repository. This includes features to support with depositor communication, controlling of the repository content and reporting of the results. The repository uses an external database for this purpose to collect additional information during submissions that offers improved querying capabilities for reporting purposes.