1M.1.FULLSTACKS: Common IDs and Minimal Metadata Agreement

Coordinated by Argon


This document gives an outline for how the Data Commons will build and expand our cross-cut metadata model using BDBags and MinIDs. This document aims to describe the plan for building the cross-cut metadata model such that data owners and others can begin ‘harmonizing’ the Phase I data. It describes the methods that we will use in Phase I to describe what a given variable means in a given dataset, and how that relates to other datasets.

What they achieved

This is a ‘living document’, and subject to updates. In this iteration, the consortium settled on the DAta Tag Suite (DATS) as it’s initial metadata model, which offers a lot of high-level metadata tracking.

Why is this valuable?

One of the core requirements for searching across multiple data sets and multiple repositories is a knowledge of how variables map to one another. If ‘weight’ is adult weight (kg) in one data repository, adult weight (lbs) in another, but refers to birthweight in a third, any researcher using these datasets could bias their analysis by using them as the same variable. A clear metadata model tells the computer which variables can be pooled across repositories, and what, if any, transformations that data might need.