Version control - data
Contents
Version control - dataΒΆ
Content π‘π©π½βπ«ΒΆ
In the following youβll find the objectives
and materials
for each of the topics weβll discuss during this session.
MotivationΒΆ
Research Data Management is a core component of good scientific practice and can help to make your work not only more reproducible and transparent but also easier, and version control for data can be one component of it. This session introduces DataLad, a data management and data publication tool building up on version control systems.
If you answer βyesβ to any of the following questions, this session will be interesting for you:
Have you ever worked through such a directory?
Is this metaphor fitting to a paper of yours?
Have you ever looked like this trying to figure out how a colleagues script is supposed to work (or an old script of yourself)?
Do you find yourself wondering how to share or publish the data and results of your recent project?
Objectives πΒΆ
Understand why we should not only version control code and other small files, but also data or software
Understand the advantages of distributed version control for data
Get first-hand usage experience with DataLad, and master the following DataLad concepts:
Create and consume datasets
Perform version control on arbitrarily sized digital objects
Link components of a data analysis (code, data, software) together
Run and rerun computationally reproducible data analyses
Questions you should be able to answer based on this lecture π₯οΈβπ½πΒΆ
Why should you version control data?
Data changes and evolves just like code or other text-based files. Version controlling data does not only structure your projects transparently, it also provides the basis for reproducibility as it helps you to identify data in its precise version. And just as code or manuscripts are often collaborative endeavours that benefit from the features of distributed version control tools, data analyses or publishing data are collaborative projects that become easier with streamlined processes for collaboration, too.
optional reading/further materialsΒΆ
If you want to learn more about DataLad or research data management in general, there are several major resources:
π The DataLad Handbook
π Technical forum
π¬ Community chat
Additionally, you can find an overview of recorded workshops and past tutorials at github.com/datalad/tutorials.