Skip to content

6M.1.FULLSTACKS: Refined User Flows across full stacks

Description

This is a large workflow for hundreds of genomic files, and demonstrates how the Full Stacks can share and use data. This is an example of how a real end user might interact with the Data Commons.

What they achieved

In this demo:

1.   The Full Stacks all have access to the same GTEx whole genome sequencing data and GUIDs for files.

2.   The Full Stacks divide up the GTEx whole genome sequencing data such that each Stack will end
     up processing ~a quarter of the data through the alignment
     workflow.

3.   Full Stacks implement a way of sharing data between
     stacks.

4.   Full Stacks share auth tokens so they can all access the
     other 3 stacks.

5.  Full Stacks actually run their subsets of GTEx CRAMs
     through TOPMed alignment workflow, then share the outputs
     with Team Calcium.

6.  Team Calcium runs joint variant calling on all re-aligned
     CRAMs, then shares the results back to Helium, Argon, and
     Xenon.

7.  All four Full Stacks perform different downstream analysis.

Why is this valuable?

This workflow shows how a user might interact with the Data Commons. One very large dataset, consisting of ~500 whole genome sequences is quickly aligned back to a reference genome by splitting the work across the Full Stacks. Since the Full Stacks have previously demonstrated that they can provide identical results, these alignments can be re-combined and used for joint variant calling on a single stack. Then those variants can be used to do four differing analyses, which are each run on the Full Stack which is specialized to excel in that type of analysis.