Research Data Management

inSileco & ArcticNet

2025-09-03

Webinar Structure

  1. Welcome (5 min)
  2. Why (10 min)
  3. What (10 min)
  4. When & Who (10 min)
  5. How to Guide (45 min)
  6. Future (10 min)
  7. Q&A (10 min)

Welcome

About us

About us

About us

inSileco & ArcticNet (since 2023)

  • Develop criteria for project Data Management
  • Review and provide feedback on project Data Management Plans
  • Support researchers with RDM practices and tools
  • Maintain and expand ArcticNet’s long-term data archive
  • Deliver training and capacity building (e.g. this webinar)

Why

Data & Open Science

10 min

More than papers

  • More than papers ➡️ Crossref Event Data
  • Researchers produce a diversity of artefacts
    • 🫙 protocols
    • 💻 code
    • 💾 datasets (observations, measurements, etc.)
    • 📢 public debates
  • Datasets are
    • getting bigger
    • getting more diversified
    • now published standalone

What is “data”?

  • Broad definition: recorded information supporting research findings
    • could include various artefacts, e.g., 💻 code
  • Data can be qualitative or quantitative
  • Data = information in a form that can be processed
    • often this means *structured collection of bits”
    • the structure = format
    • spreadsheet, images, etc.
  • Useful lens: the 5 V’s of Big Data
    • Volume, Variety, Velocity, Veracity, and Value

Data explosion (Volume)

GBIF: Global Biodiversity Information Facility

  • New questions & horizons ➡️ more data
  • Powerful technologies/tools enabling unprecedented data collection
  • Ex: Large Hadron Collider ➡️ ~40 Zettabytes in 2017
  • Ex: GBIF
    • 125 million records in 2007
    • 1.6 billion in 2020
    • 1,150% increase in just 13 years

Data heterogeneity (Variety)

  • Different objects, storage formats, technologies
  • Data vary widely across and within disciplines
  • Lack of standards hinders integration and reuse
  • Legacy practices (e.g., local storage) limit access and preservation
  • Prevalent in large Interdisciplinary Research Programs

Benefits (Value)

  • We need reliable data to better understand and predict
    • anticipate/mitigate future changes
    • Ex: good assessment of temperature and precipitation change
  • Some data are hard to collect
    • Arctic Data are good examples
  • Data are precious for future generations
    • We cannot collect past data

Open Science

  • Open Science
    • research ➡️ transparent and accessible
  • Open Access
    • publishing ➡️ results freely available
  • Reproducible research
    • practices ➡️ trust & verification
  • Increasingly drives journal & funder expectations
  • Sets the stage for today’s policies and compliance frameworks
  • Requires robust data stewardship

Why this matters for you

IRP: Interdisciplinary Research Program

  • Data is now a primary research output
  • Reduces risk of data loss or inaccessibility
  • Proper management increases visibility & citations
  • Strengthens compliance with funders & journals
  • Builds a foundation for collaboration in IRPs
    • data may be reuse in unexpected ways by colleagues

Key takeaways

  • Research outputs go beyond papers ➡️ data is central
  • The explosion & heterogeneity of data offers new horizons and create new technical challenges
  • The Open Science movement drives funder and journal expectations
  • For IRPs, effective data management enables collaboration, compliance, and visibility
  • Data management is not just compliance — it is a path to better science

What

Definition, Policies & Benefits

10 min

What is RDM?

RDM: Research Data Management

  • Active management of research data across its lifecycle
  • Includes planning, documentation, storage, preservation, sharing
  • Ensures data are usable, accessible, trustworthy
  • Encompasses both technical practices and governance

Benefits of RDM

  • Greater visibility & citations for datasets
  • Reduced risk of loss (backups, repositories)
  • Stronger collaboration & integration across teams and institutions
  • Improved efficiency via organized workflows
  • Network-level governance fosters new synergies
  • Enhanced credibility & compliance with funders

RDM in IRPs

  • Scale & complexity: multiple projects, teams, and disciplines
  • Heterogeneity: diverse data types, methods, and formats
  • Collaboration: shared datasets across institutions & regions
  • Continuity: long program lifespans require robust preservation
  • Accountability: funder compliance + community expectations
  • Opportunities: well-managed data fosters reuse, integration, and new insights

Tri-Agency RDM Policy (2021)

  • Applies across NSERC, SSHRC, CIHR
  • Institutions must develop and publish institutional RDM strategies
  • Researchers are expected to:
    • Prepare and maintain Data Management Plans
    • Deposit data in trusted repositories when appropriate
  • Ensures Canadian research aligns with international open science practices
  • Compliance increasingly linked to funding requirements

ArcticNet’s Policy (2025)

More details available in the How to Guide

Objectives:

  • Apply best practices in data stewardship (national & international standards)
  • Maximize value through accessibility, reuse, and transparency
  • Encourage collaboration and responsible data sharing
  • Provide guidance for sensitive data
  • Respect Indigenous data sovereignty

Key takeaways

  • RDM = active management of research data across its lifecycle
  • Benefits are real: visibility, efficiency, compliance, and collaboration
  • IRPs face special challenges: scale, heterogeneity, collaboration, continuity
  • Policies matter: Tri-Agency RDM Policy (national) and ArcticNet ADMP (network) set the expectations

When & Who

Timeline, Roles & Responsibilities

10 min

Why timing matters in IRPs

  • IRPs = distributed ecosystems ➡️ diverse goals, methods, and data practices
  • Effective RDM must start early and continue throughout the program
  • Early planning unlocks future reuse & collaboration
  • Data as an infrastructure for collaboration

Timeline and dual responsibilities

Network: balance autonomy & coordination

  • Standards & templates
  • Tools for metadata & discovery
  • Review, feedback & training
  • Synthesize & report

Timeline and dual responsibilities

Researchers: manage and document project data responsibly

  • Proposal & tentative data management plan
  • Develop and maintain project-level data management plan
  • Collect & document
  • Analyze
  • Archive

Benefits of shared responsibility

  • For Researchers
    • Reduced administrative burden (network reviews, templates, tools)
    • Increased visibility and citations for datasets
    • Easier compliance with funder requirements
    • Improved data reuse and discovery
    • Unexpected collaborations & new insights

Benefits of shared responsibility

  • For Researchers
    • Reduced administrative burden (network reviews, templates, tools)
    • Increased visibility and citations for datasets
    • Easier compliance with funder requirements
    • Improved data reuse and discovery
    • Unexpected collaborations & new insights
  • For the Network & IRP
    • Better integration of diverse datasets
    • Ability to track collaboration & impact
    • Stronger collective legacy beyond the program
    • Data infrastructure that supports future research

Key takeaways

  • Researcher responsibilities are central but part of a bigger system
  • Timing matters: plan early, update often, archive at the end
  • Network-level support transforms isolated datasets into drivers of collaboration
  • By speaking a common metadata language, IRPs ensure outputs remain visible, connected, and reusable

How to Guide

Building your Data Management Plan

45 min

What is a DMP?

DMP: Data Management Plan

A Data Management Plan (DMP) is a formal document, typically 1-2 pages long, that outlines how data will be handled during and after a project.

What is a DMP?

DMP: Data Management Plan

Benefits:

  • Required by many funders, including Tri-Agency and ArcticNet.
  • Ensures feasibility of research proposals
  • Demonstrates responsible stewardship of public funds
  • Sets expectations for storage, sharing, and preservation
  • Foundation for good collaboration and reuse
  • Easier compliance with certain journals
  • Improved visibility and citations for datasets

Why DMPs matter in IRPs

  • IRPs = distributed ecosystems ➡️ diverse goals, data, practices
  • Collective DMPs give visibility into expected outputs
  • Enable early coordination of standards and tools
  • Reveal overlaps, synergies, and cost-sharing opportunities
  • Reduce duplication and improve program coherence

ArcticNet Principles

In other words: what is expected of you as a researcher

  • ArcticNet funded data = a public good ➡️ as open as possible, as closed as necessary
  • Researchers must ensure:
    • Timely sharing ➡️ data made publicly available quickly, unless restricted
    • Publish metadata ➡️ publish and share your metadata (e.g. Polar Data Catalog)
    • Respect for Indigenous rights ➡️ uphold Inuit, First Nations, and Métis ownership, access, and control (CARE, OCAP®, NISR)
    • Citable & preserved ➡️ data should be publishable, citable, and preserved when appropriate
    • Interoperability & connectivity ➡️ link with Canadian & international Arctic data systems, avoid duplication
    • Best practices ➡️ follow ethical, legal, cultural, and funder requirements; use existing infrastructure where possible
    • Support & guidance ➡️ researchers engage with training, outreach, and resources provided

Key sections of a DMP

Answer these questions with substance and you will have a complete DMP:

  1. Data collection ➡️ What data, formats, volume, protocols?
  2. Documentation & metadata ➡️ How will data be described? Which standards?
  3. Storage & protection ➡️ Where will working data live, and how is it protected?
  4. Data Analysis ➡️ How will the data be analyzed?
  5. Preservation & archiving ➡️ Which repository, which formats for long-term?
  6. Sharing & reuse ➡️ Who can access it, when, under what license?
  7. Legal & ethics ➡️ How are legal, privacy, consent, Indigenous data rights addressed?
  8. Roles & resources ➡️ Who is responsible and what infrastructure is needed?

Tools and templates

Good practices

Tips to DMP by

  • Start early ➡️ draft DMP in the proposal stage
  • Treat it as a living document ➡️ update as project evolves
  • Reuse existing metadata forms / standards where possible (more on this later)
  • Keep it concise but actionable
  • Align with FAIR, CARE & TRUST principles

FAIR Principles

  • (F) Findable
  • (A) Accessible
  • (I) Interoperable
  • (R) Reusable

Goals:

  • Make data easy to discover through rich metadata
  • Ensure data can be accessed under clear conditions
  • Promote interoperability across disciplines & tools
  • Enable reuse through licenses & clear documentation

CARE Principles

  • (C) Collective Benefit
  • (A) Autority to Control
  • (R) Responsibility
  • (E) Ethics

Goals:

  • People- and purpose-oriented
  • First Nations data rights and governance
  • Inspired from OCAP®
  • Complement FAIR Principles

TRUST Principles

  • (T) Transparency
  • (R) Responsibility
  • (U) User Focus
  • (S) Sustainability
  • (T) Technology

Goals:

  • Build confidence in digital repositories
  • Ensure authenticity, integrity, and reliability of data
  • Prioritize the needs of user communities
  • Guarantee long-term preservation and accessibility
  • Provide secure, persistent, and interoperable infrastructure

Practical Guide

DMP Checklist

Goal

  • Equip researchers with concrete steps to manage data responsibly, efficiently, and in line with network & funder expectations.
  • At the end, you should know what steps to undertake to prepare and update an adequate Data Management Plan

DMP Checklist

Data collection

Documentation & metadata

Storage & protection

Data Analysis

Preservation & archiving

Sharing & reuse

Legal & ethics

Roles & resources

Data collection

Data collection

Guiding Questions

  • What kinds of data will I collect?
  • In which formats? Open or proprietary?
  • How much data do I expect to generate?
  • Which instruments, sensors, or methods will I use?
  • How will I ensure quality control before, during, and after collection?
  • How will I organize and label files?

Data collection

Core Elements

  • Types of data ➡️ observational, experimental, computational, derived
  • Formats & standards ➡️ e.g., CSV, NetCDF, TIFF, Darwin Core, ISO 19115
  • Volume & scale ➡️ estimated GB/TB, number of samples/records
  • Collection methods & instruments ➡️ field protocols, sensors, lab assays, software pipelines
  • Quality assurance / quality control ➡️ calibration, duplicate samples, error-checking
  • Organization & naming ➡️ consistent file/folder naming, controlled vocabularies
  • Documentation ➡️ field sheets, lab notebooks, e-logs, linking samples to metadata

Data collection

Some notes on file formats

Avoid Proprietary & Unsuitable Formats

  • Not all formats are sustainable for long-term research data. Avoid using:
  • Proprietary formats: require specific software that may become unavailable (ex. .xlsx, .shp, .sav, .psd, .docx with macros)
  • Formats with strong version-dependence: older/newer versions may be unreadable without exact software (ex. ArcGIS-only file types)
  • Compressed / lossy formats: reduce data quality and limit reuse (ex. .jpg, .mp3)
  • Encrypted or password-protected files: block discovery, reuse, and preservation workflows

Rule of thumb: if a file requires special software, or might lose information when saved, it’s not a good archival format.

Preferred Open Formats by Data Type

  • Tabular data ➡️ CSV, Parquet

  • Spatial data ➡️ GeoPackage, GeoTIFF, NetCDF

  • Images ➡️ TIFF (uncompressed), PNG

  • Audio / Video ➡️ WAV, MP4 (H.264 codec)

  • Text / Documents ➡️ TXT, PDF/A, XML, JSON

  • Metadata ➡️ XML, JSON, standardized schemas (e.g., ISO 19115, Darwin Core)

  • Choose formats that are:

    • Open & non-proprietary
    • Well-documented & widely supported
    • Sustainable for long-term preservation

Data collection

What are data standards?

  • Define how data is structured and formatted
  • Ensure datasets are consistent and interoperable across projects
  • Often discipline-specific ➡️ adopt community norms
  • Examples:
    • Darwin Core ➡️ biodiversity occurrence records
    • NetCDF ➡️ climate & oceanographic data
    • FASTQ ➡️ DNA sequencing reads

Data standards vs Metadata standards

Data standards are closely related to metadata standards, but they serve different purposes.

See next section for Metadata standards.

Data collection

Quality Assurance / Quality Control (QA/QC)

  • Before collection ➡️ instrument calibration, standardized protocols
  • During collection ➡️ duplicate/triplicate samples, control samples, field blanks
  • After collection ➡️ validation checks, error detection, version tracking

Data collection

Organization & naming

  • Use consistent, descriptive file & folder names
  • Avoid spaces/special characters ➡️ use _ or -
  • Include versioning & dates (e.g., projectA_samples_2025-03-01_v1.csv)
  • Organize folders by project/study/site/date rather than by researcher’s preference
  • Use controlled vocabularies / ontologies where available ➡️ interoperability


Do & Don’t

lakeC_fieldnotes_2025-03-01_v2.csv
data latest & updated.xlsx

Documentation & metadata

Documentation & metadata

Guiding Questions

  • How will I document my data so that others (or my future self) can understand it?
  • Which metadata standard(s) will I use?
  • Where and how will metadata be created and stored?
  • When will metadata be created and updated?
  • How will I ensure metadata uses controlled vocabularies and persistent IDs?

Documentation & metadata

Core Elements

  • Documentation practices ➡️ lab/field notebooks, data dictionaries, README files, protocols
  • Metadata standards ➡️ Dublin Core, ISO 19115, Darwin Core, DataCite Schema
  • Tools & platforms ➡️ Polar Data Catalogue, FRDR, ISA tools
  • Timing ➡️ start at project onset, update regularly, finalize at archiving
  • Interoperability ➡️ controlled vocabularies, ontologies, persistent identifiers (DOI, ORCID, ROR)

Documentation & metadata

What are metadata & metadata standards?

  • Define how datasets are described (the context, not the content)
  • Ensure data are findable, interpretable, and reusable
  • Provide consistent fields for who, what, where, when, how
  • Examples:
    • Dublin Core ➡️ general-purpose descriptors
    • ISO 19115 ➡️ geospatial metadata
    • Darwin Core ➡️ biodiversity metadata
    • DataCite Schema ➡️ dataset metadata for DOIs

Metadata standards vs Data standards

Metadata standards describe the data itself (context & discovery), while data standards define how the data is structured. Together, they ensure interoperability and reuse.

Documentation & metadata

Dublin Core: What is it?

  • A generic metadata standard used across disciplines
  • Provides a basic set of 15 elements to describe digital objects
  • Focused on: who, what, where, when
  • Works across repositories, making datasets findable and shareable

Core elements (examples):
- title, creator, subject, date, format, identifier

💡 Often extended with qualifiers to add more precision

Documentation & metadata

Dublin Core: The Grammar

  • Based on element–value pairs
    • Element = the property being described
    • Value = the information recorded
  • Syntax is machine-readable (XML, JSON) but also human-readable
  • Flexible: can be embedded in repositories, DOIs, web pages

Example pattern:
- dc:title ➡️ “ArcticNet Water Sampling Data 2025”
- dc:creator ➡️ “Smith, J.”
- dc:date ➡️ “2025-04-15”

Documentation & metadata

Dublin Core: Example Record

<record>
  <dc:title>ArcticNet Water Sampling Data 2025</dc:title>
  <dc:creator>Smith, J.</dc:creator>
  <dc:subject>Oceanography</dc:subject>
  <dc:date>2025-04-15</dc:date>
  <dc:format>CSV</dc:format>
  <dc:identifier>doi:10.12345/abcd</dc:identifier>
</record>

Documentation & metadata

ArcticNet’s requirements

  • Starting year 2, researchers must provide links to metadata records in recognized repositories
  • Metadata must be openly accessible
  • Funding will be withheld if metadata records are missing or inaccessible
  • For Indigenous-owned data ➡️ researchers must identify the organization responsible for storing and managing it
    • Metadata publication is still required

ArcticNet’s commitment

Role:

  • Define metadata standards for projects
  • Support researchers in preparing metadata
  • Provide tools/templates to ease metadata submission

Initiatives

  • Working with Polar Data Catalogue (PDC) to host project metadata
  • Providing a PDC metadata template
  • Offering support for preparation and submission

Storage & protection

Storage & protection

Guiding Questions

  • Where will data be stored during the project?
  • How will it be protected (backups, encryption, access controls)?
  • Who is responsible for managing storage & protection?
  • How long must the data be kept?

These answers ensure your DMP covers both security and reliability.

Storage & protection

Core Elements

  • Storage location ➡️ institutional servers, certified cloud storage, external media
  • Backup strategy ➡️ frequency, number of copies, locations (3-2-1 rule)
  • Access & security ➡️ permissions, authentication, encryption
  • File integrity ➡️ checksums, error detection
  • Scalability ➡️ projected storage needs (GB/TB)
  • Costs & resources ➡️ who pays and what infrastructure is provided

Storage & protection

Good Practices

  • Prefer institutional or certified storage over personal laptops/USBs
  • Use encrypted storage for sensitive data
  • Automate backups whenever possible
  • Document storage practices clearly in the DMP
  • Plan ahead for long-term preservation (more on this soon)

Special Considerations

  • Sensitive / Indigenous data ➡️ use community-approved safeguards, respect sovereignty
  • Large volumes / “big data” ➡️ address infrastructure, costs, specialized servers
  • Fieldwork constraints ➡️ describe temporary solutions (field laptops, portable drives) and how data will be secured until upload

Data analysis

Data analysis

Guiding questions

  • What software, tools, or pipelines will be used?
  • How will analysis steps be documented?
  • How will you ensure reproducibility?

Short but important: show how your analysis is transparent and trustworthy.

Data analysis

Core elements

  • Software & tools ➡️ R, Python, MATLAB, ArcGIS, QGIS (note open vs proprietary)
  • Workflow documentation ➡️ scripts, Jupyter notebooks, R Markdown, Quarto
  • Reproducibility ➡️ version control (GitHub, GitLab), containers (Docker)
  • Data transformations ➡️ raw vs processed datasets, cleaning methods, derived outputs

Data analysis

Good Practices

  • Prefer open-source tools when feasible
  • Share analysis scripts with your datasets
  • Keep raw and processed data separate
  • Document assumptions, parameters, and software versions

Builds trust, efficiency, and long-term usability of results

A note on reproducibility

Preservation & archiving

Preservation & archiving

Guiding Questions

  • Where will data be deposited for long-term preservation?
  • Which repository/platforms (institutional, national, disciplinary)?
  • How will datasets be cited and versioned?
  • What is the expected retention period?
  • How will sensitive/Indigenous data be preserved responsibly?

Goal: ensure your data remain usable and accessible well beyond the project

Preservation & archiving

  • Trusted repositories ➡️ Polar Data Catalogue, FRDR, Nordicana-D, Dryad, Zenodo, GBIF, OBIS
  • Preservation formats ➡️ CSV, NetCDF, GeoTIFF, JSON (avoid lossy formats like JPEG, MP3)
  • Metadata ➡️ complete, standardized (ISO 19115, Dublin Core, Darwin Core)
  • Licensing ➡️ CC-BY, CC0, or custom terms
  • Persistent identifiers ➡️ DOIs, ARKs for datasets
  • Retention period ➡️ typically ≥ 5–10 years, ideally indefinite

Preservation & archiving

Good Practices

  • Deposit data at publication time, not years later
  • Archive raw and processed data, link to analysis scripts
  • Use repository versioning features instead of manual file names
  • Assign clear licenses and access conditions (open, embargoed, restricted)
  • Ensure alignment with FAIR & CARE principles

Special Considerations

  • Sensitive data ➡️ anonymization, restricted access, secure long-term storage
  • Indigenous data sovereignty ➡️ respect CARE, OCAP®, NISR, community protocols
  • Large volumes ➡️ consider specialized repositories, HPC, or cloud archives

Preservation & archiving

Persistent Identifiers (PIDs)

  • What they are: unique, permanent digital references for research objects, people, and institutions.

  • Examples:

    • DOI ➡️ datasets, publications
    • ORCID ➡️ researchers
    • ROR ➡️ institutions
    • ARK / Handle ➡️ digital objects
  • Why important in DMPs & RDM?

    • Ensure long-term findability and access
    • Enable unambiguous attribution (linking people, projects, data)
    • Facilitate interoperability across repositories and systems
    • Support impact tracking and reuse metrics

Think of PIDs as the “barcodes” of research

Preservation & archiving

Data repositories

  • Institutional ➡️ university libraries, research data services
  • National ➡️ Federated Research Data Repository (FRDR), Nordicana-D
  • Disciplinary ➡️ Polar Data Catalogue, GBIF, OBIS, GenBank, ICPSR
  • General-purpose ➡️ Zenodo, Dryad, Figshare, Dataverse

Choose a repository that is:

  • Trusted (certified, long-term sustainability)
  • FAIR-aligned (metadata standards, PIDs)
  • Appropriate for your data type & community

Preservation & archiving

ArcticNet’s requirements

  • No centralized ArcticNet repository ➡️ projects choose suitable long-term repository
  • Prefer certified, open-access options (PDC, Nordicana-D, GBIF, OBIS, FRDR)
  • Deposit all data and metadata supporting results
  • Plan early, use non-proprietary formats (CSV, TIFF, NetCDF)
  • Retain data as long as required by stakeholders and funders
  • State in DMP what will be preserved and any restrictions

ArcticNet’s guidance

  • Researchers decide the most appropriate repository for their discipline and data type
  • Focus on repository sustainability, DOIs, and open access
  • Preservation can include raw, processed, and derived data when valuable
  • Sensitive or Indigenous data may need restricted access or safeguards
  • Rationale for retention and preservation must be clear in the DMP

Sharing & reuse

Sharing & reuse

Guiding Questions

  • Who can access the data, and when?
  • How will data be cited?
  • What license will govern reuse?
  • Are there ethical, legal, or cultural restrictions?

Goal: make data available in a way that is clear, usable, and responsible

Sharing & reuse

Core Elements

  • Access conditions ➡️ open, embargoed, or restricted
  • Licensing ➡️ CC-BY, CC0, or custom terms
  • Citation practices ➡️ DOIs or other PIDs, recommended citation text
  • Documentation ➡️ metadata and README ensure others can reuse data

Sharing & reuse

Licensing Your Data

  • A license tells others how they can use your data
  • Common choices:
    • CC-BY ➡️ use with attribution
    • CC0 ➡️ no restrictions (public domain)
    • Custom agreements ➡️ for sensitive, Indigenous, or commercial data

Clearly state the license in your metadata, README, or repository record

Sharing & reuse

Good Practices

  • Use repositories that support DOIs and licensing
  • Publish data papers or cite dataset DOIs in articles
  • Link data to publications, code, and related datasets
  • Be transparent about conditions of reuse

Special Considerations

  • Sensitive or Indigenous data ➡️ respect CARE, OCAP®, and community protocols
  • Commercially sensitive data ➡️ embargoes or restricted access
  • Collaborations ➡️ phased sharing (internal first, open later)

Sharing & reuse

ArcticNet’s requirements

  • Data must be findable, accessible, interoperable, and reusable (FAIR)
  • Metadata published early in a recognized catalogue (e.g., re3data, PDC, FRDR)
  • Deposit data in a trusted repository with persistent identifiers (DOIs)
  • Users must cite and acknowledge data creators
  • Any restrictions (sensitive, Indigenous, security) must be justified in the DMP

ArcticNet’s guidance

  • Make data available as openly and quickly as possible, with minimal delay
  • “As open as possible, as closed as necessary” (ethical and legal considerations)
  • Indigenous and sensitive data require safeguards, informed consent, and respect for sovereignty (CARE, OCAP®, NISR)
  • Embargoes or restricted access may apply, but must be transparent and time-limited
  • Data access requests should not be unreasonably denied

Roles & resources

Roles & resources

Guiding Questions

  • Who is responsible at each stage of the data lifecycle?
  • What infrastructure and support are available?
  • How do project, network, and institutional roles align?

Roles & resources

Core Elements

  • Responsibility map ➡️ who does what at each lifecycle stage (RACI if helpful)
  • Infrastructure & tools ➡️ storage/backup, repositories, metadata tools, PID services, access provisioning
  • Standards & policies ➡️ data/metadata standards, licensing, sensitive/Indigenous protocols
  • SOPs & workflows ➡️ collection, QA/QC, versioning, handoff to archive
  • Training & support ➡️ what’s offered by ArcticNet/institution, when to use it

Roles & resources

Core Elements

  • Monitoring & reporting ➡️ DMP update cadence, progress reports, breach procedures
  • Budget & time ➡️ storage/archival costs, curation effort
  • Documentation & continuity ➡️ READMEs, onboarding, turnover plan

Roles & resources

Good Practices

  • Assign clear roles (PI, data steward, team members) early
  • Use project checklists to track responsibilities
  • Build DMP updates into project milestones
  • Communicate regularly with ArcticNet’s RDM support team

Future

Emerging Trends & Opportunities

10 min

The Future of RDM in IRPs

  • Large IRPs = diverse teams, methods, data cultures
  • Fully centralized governance is often impractical
  • Future opportunity lies in metadata interoperability
  • Structured, machine-readable metadata enriched with PIDs
  • Foundation for all downstream capabilities

Metadata interoperability first

  • Describe data consistently across projects
  • Use structured fields + persistent identifiers (PIDs)
    • ORCID (people)
    • ROR (institutions)
    • DOI (datasets, publications)
  • Unlocks discovery, integration, and reuse at scale

Modular, not monolithic

  • Strategy: lightweight, interoperable tools
  • Preserve project autonomy while enabling network-wide coordination
  • Shared metadata practices > centralized platforms
  • Example: visualization-led collaboration in other programs

Machine-actionable DMPs (maDMPs)

  • Evolve traditional DMPs into dynamic metadata registries
  • Articulate practices using structured fields & PIDs
  • Program-wide maDMPs can:
    • Feed repositories & dashboards
    • Build PID graphs linking data–people–institutions–funding
    • Expose synergies & track reuse

New capabilities from interoperability

  • Automated validation ➡️ check datasets archived as planned
  • Semantic search ➡️ discover data by concepts, not just keywords
  • Dashboards ➡️ visualize activity, reuse, collaboration networks
  • Reproducible workflows ➡️ connect metadata with pipelines (RO-Crate, Snakemake, Targets)

AI opportunities for IRPs

  • Structured metadata is the prerequisite for AI at scale
  • Enables automated knowledge graphs of people–data–projects
  • Supports cross-project synthesis and **trend discovery
  • Moves RDM from compliance ➡️ intelligence infrastructure

Toward FAIR-by-design IRPs

  • Embed metadata interoperability from project planning onward
  • Ensure data is discoverable, trackable, reusable by default
  • Strengthen attribution and impact reporting
  • Transform IRPs into knowledge hubs, not just funding umbrellas

Key takeaways

  • Metadata interoperability is the foundation
  • Enables discovery, dashboards, reproducibility, AI-driven synthesis
  • Modular, interoperable tools preserve autonomy + foster coordination
  • FAIR-by-design = long-term visibility, collaboration, and impact

Q&A

10 min

Platforms infrastructure

  • What do they do, how
  • Zenodo (https://www.openaire.eu/zenodo-relaunch)
  • Open Science Framework
  • figshare
  • dataverse
  • Siku
  • PDC

Zenodo

  • What is it? How ot works?
  • https://www.openaire.eu/zenodo-relaunch

Dataverse / Borealis

  • What is it? How ot works?
  • https://dataverse.org/

Siku

  • What is it? How ot works?

PDC

  • What is it? How ot works?
  • https://polardata.ca/

Workflow

Sharing and Reuse

  • Who will access the data and when (open, embargoed, restricted)
  • How data will be cited (DOIs or other persistent IDs)
  • Licensing terms (e.g., CC-BY, custom agreements)