Skip to content

Glossary of Terms used in the Data Commons Pilot Phase Consortium

  • AGR - Alliance of Genome Resources. One of the primary data sets the DCPPC is working with.

  • API - Application Programmer Interfaces. API technologies serve as software-based intermediaries to exchange data.

  • AWS - Amazon Web Services. A provider of cloud services available on-demand.

  • C4 - Commons Consortium Coordinating Committee, points of contact for DCPPC phase 1 activities.

  • Cloud Computing - Internet-based computing, wherein computing power, networking, storage or applications running on computers outside an organization are presented to that organization in a secure, services-oriented way.

  • Commons Working Group (WG) Co-Chairs - a group of NIH Institute and Center Directors who provide executive-level guidance and direction to the program.

  • Containers - (for example, Docker) a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.

  • COTs - Commons Operations Team. This is made up of NIH employees and contractors who are internally responsible for the Data Commons Pilot Phase program.

  • CWL - Common Workflow Language. A simple scripting language for describing computational workflows for performing sequential operations on data.

  • Data Steward - Members of the TOPmed, GTEx, and AGR communities who are working in the Consortium

  • DATS - DatA Tag Suite, a model that describes the metadata and the structure for datasets, originally developed to underpin the NIH BD2K Data Discovery Index prototype.

  • DCPPC - Data Commons Pilot Phase Consortium. Executive Summary; Onboarding.

  • Deliverables - Demos and products.

  • Docker - Software for running containers, packaged, portable units of code and dependencies that can be run in the same way across many computers. See also Containers.

  • DGWG - Design Guidelines Working Group

  • DOI — Digital Object Identifier; a code used to permanently and stably identify (usually digital) objects. DOIs provide a standard mechanism for retrieval of metadata about the object, and generally a means to access the data object itself.

  • EPC - External Panel of Consultants. A group of experts who provide guidance and direction to NIH about the program.

  • FAIR - Findable Accessible Interoperable Reusable.

  • FS / Full Stack - one of four teams (Argon, Calcium, Helium, Xenon) responsible for reproducing a full "stack" of software for the Data Commons (see Stack, below).

  • FSWG - Full Stack Working Group.

  • FTP - File Transport Protocol. FTP is a standard network protocol used for the transfer of computer files between a client and server on a computer network.

  • GA4GH - The Global Alliance for Genomics and Health. International standards organization devoted to enabling genomic data sharing for the benefit of human health.

  • GitHub - An online hub for storing and sharing computer programs and other plain text files. We use it for storage, hosting websites, communication and project management.

  • GTEx - Genotype-Tissue Expression (GTEx) Program. One of the primary data sets the DCPPC is working with.

  • GUID - Globally Unique IDentifier.

  • GWAS - genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait.

  • Harmonizing/Harmonization - The process of determining which variables in one study can be treated as the same as
    variables in another study; especially if the variables have different names and/or units. Sets of datasets where these relationships have been recorded are 'harmonized'.

  • Interoperability — The ability of data or tools from multiple resources to effectively integrate data, or operate processes, across all systems with a moderate degree of effort.

  • JSON - JavaScript Object Notation, or JSON, is a lightweight data-interchange format that is easy for humans to read and write, and easy for machines to parse and generate.

  • Jupyter Notebooks - a web-based interactive environment for organizing data, performing computation, and visualizing output.

  • KC - Key Capabilities or Key Component originally referred to eight targeted development areas to implement the Data Commons. Each team defined milestones in their proposals designed to achieve these capabilities. The term ‘KC’ now applies to working groups that are collaboratively implementing their milestones.

  • Milestones - Required to implement activities such as Demos and Products. Milestones are tied to teams.

  • MODs - Model Organism Databases. One of the Data Stewards.

  • MVP - Minimum Viable Product. This term is being deprecated in favor of Demos and Products.

  • OAUTH - An open standard for authorization for web resources. Used to permit access to resources without requiring passwords to be stored or distributed.

  • OIDC - OpenID Connect, an open standard for user authentication built on top of the OAUTH standard, used to verify the identity of a user and provide details about the user.

  • OTA - Other Transaction Award, a method by which the NIH is funding the NIH Data Commons - see Other Transaction Award Policy Guide for the NIH Precision Medicine Initiative® Research Programs

  • Products - Resources resulting from the DCPPC that are considered to be deliverables; include standards and conventions, APIs, data resources, websites, repositories, documentation, and training/outreach materials.

  • RFC - Request for Comments - see RFC Process.

  • Sprints - Term of art used in software generation, referring to short, iterative cycles of development, with continuous review of code through daily builds and end-of-sprint demos.

  • Stack - Term of art referring to a suite of services that run in the cloud and enable ubiquitous, convenient, on-demand access to a shared pool of configurable computing resources.

  • Team - Groups of people led by a Principal Investigator (PI), or PIs, who will complete milestones and produce deliverables. Each group has been assigned a name, represented by the elements on the periodic chart.

  • TOPMed - Trans-Omics for Precision Medicine, one of the primary data sets the DCPPC is working with.

  • User epic - a small chunk of a user narrative, targetting a small range of user-focused activities.

  • User narrative - a path through several user stories that proposes a particular order of events and their relationships, targeted at a particular end-user.

  • User story - a description of a software feature from an technical/process oriented perspective.

  • VCF - Variant Call Format. The standard file format that specifies the format of a text file used in bioinformatics for storing gene sequence variations.

  • Virtual Private Cloud - is an on-demand configurable pool of shared computing resources allocated within a public cloud environment, providing a certain level of isolation between the different organizations using the resources.

  • Whitelist - A security measure to permit only an approved list of entities (the “whitelist”) to access a resource.

  • Workflow - the sequence of processes, usually computational in this context, through which a user may computationally analyze data.