Step 5: Data archiving
Introduction
When sharing and publishing data, the duration of data storage and retention initially plays a secondary role. According to the DFG’s guidelines for good scientific practice, research data should be retained for “an appropriate period of time,” which generally covers a period of 10 years. [1]
A backup is not an archive – long-term reliable backup of digital information is only possible in a “real” digital archive.
Long-term backup and permanent storage and maintenance over 10 years is called long-term archiving; in the case of digital data, this is referred to as digital long-term archiving (DLTA).
For digital (long-term) archiving, research data must meet certain requirements so that it remains reusable over a long period of time, i.e., it can be reproduced and interpreted in the long term without loss. [2] The major challenge for long-term archiving is therefore to preserve data, metadata, and documents in such a way that both the readability of the files and the interpretability of their contents are guaranteed. Long-term archiving is subject to constant technical and socio-cultural change. The use of data formats that are as open and non-proprietary as possible is recommended.
Long-term archiving has three key areas: 1. documenting research data, 2. storing it for the long term, and 3. providing access to the data. [3]
Why should data be archived? [4]
- Verify, reproduce, and build on research results
- Ensuring the integrity, transparency, and traceability of research
- Long-term data preservation for the scientific community
- Keeping data usable (readable) and interpretable (understandable) in the long term
What are the challenges of long-term archiving?
- Technological change means that data carriers, file formats, software, and storage locations quickly become inaccessible and unusable.
- Ensure regular checks of data to maintain its usability vs. staff turnover, project deadlines.
- Provision of technical infrastructure and organizational measures.
- Establishment of workflows and standards (legal issues, quality assurance).
How can these challenges be solved?
- Comprehensive documentation and description using metadata, use of metadata standards
- Use of file formats that are compatible and suitable for long-term archiving and enable loss-free conversion to alternative formats
- Use of open file formats (openly documented, traceable specifications, manufacturer-independent, usable with different programs)
- Long-term archiving carried out by renowned data institutes/research data centers, so that researchers do not have to ensure the above requirements themselves
- Use of trustworthy, preferably certified archives
Legal aspects & ethics
Various legal aspects must be taken into account during the data archiving phase (see excursus in legal aspects in RDM). Copyright law plays a particularly important role. In the digital space, any reproduction that may affect the rights of authors, owners, publishers, etc. is relevant to copyright law. It is therefore essential to clearly define copyrights and obtain the appropriate permissions before archiving data. When archiving personal data, the consent and approval of the persons concerned is required. It must be determined after what period of time this data will be deleted in order to comply with data protection requirements. Mandatory anonymization is required to protect the privacy of those affected and to comply with legal requirements.
Contract law regulates how long data may or must be stored. It is important to follow these contractual requirements precisely in order to avoid legal conflicts and ensure the integrity of the archived data.
[1] Deutsche Forschungsgemeinschaft (2022). Leitlinien zur Sicherung guter wissenschaftlicher Praxis. Kodex. https://doi.org/10.5281/zenodo.6472827 , Leitlinie 17: Archivierung
[2] https://www.tu-braunschweig.de/forschung/forschungsdaten-transparenz/forschungsdaten/grundkurs-forschungsdatenmanagement/archivieren-publizieren-und-teilen-von-forschungsdaten/archivieren
[3] https://blog.rwth-aachen.de/forschungsdaten/2023/08/24/forschungsdaten-archivieren-publizieren
[4] https://www.uni-kassel.de/forschung/forschungsdatenmanagement/daten-managen/daten-archivieren-und-publiziere
Further information
Digital long-term archiving service at TIB: Archived objects and offerings
TIB Technical Information Library: Digital long-term archiving at the TIB. https://www.tib.eu/de/publizieren-archivieren/digitale-langzeitarchivierung
Excursus: Finding and selecting repositories
Information on suitable file formats for long-term archiving
Long-term archiving: Nestor Competence Network for Digital Long-term Archiving
Nestor is a registered association with members from various fields who are involved in the topic of “digital long-term archiving.” Professionals work together in various working groups, e.g., on topics such as AV media, digital preservation, or research data.
The nestor wiki offers a basic introduction to long-term archiving and information on standardization.
Upon successful review, nestor awards the nestor seal for trustworthy digital long-term archives.
Long-term archiving: Overview of archival file formats
KOST Coordination Office for the Permanent Archiving of Electronic Documents: Catalog of archival file formats: https://kost-ceco.ch/cms/kad_main_de.html
Standards for long-term archiving by GESIS (Social Science Data Archive)
GESIS Leibniz Institute for the Social Sciences: Long-term archiving.
Standards and discipline-specific solutions for long-term archiving
Altenhöner R and C Oellers (2012): Long-term archiving of research data. Standards and discipline-specific solutions. Scivero Verlag, ISBN 978-3-944417-00-4. https://www.konsortswd.de/publikation/langzeitarchivierung-von-forschungsdaten
