6M.1.FULLSTACKS: Refined User Flows across full stacks¶
This is a large workflow for hundreds of genomic files, and demonstrates how the Full Stacks can share and use data. This is an example of how a real end user might interact with the Data Commons.
What they achieved¶
In this demo:
1. The Full Stacks all have access to the same GTEx whole genome sequencing data and GUIDs for files. 2. The Full Stacks divide up the GTEx whole genome sequencing data such that each Stack will end up processing ~a quarter of the data through the alignment workflow. 3. Full Stacks implement a way of sharing data between stacks. 4. Full Stacks share auth tokens so they can all access the other 3 stacks. 5. Full Stacks actually run their subsets of GTEx CRAMs through TOPMed alignment workflow, then share the outputs with Team Calcium. 6. Team Calcium runs joint variant calling on all re-aligned CRAMs, then shares the results back to Helium, Argon, and Xenon. 7. All four Full Stacks perform different downstream analysis.
Why is this valuable?¶
This workflow shows how a user might interact with the Data Commons. One very large dataset, consisting of ~500 whole genome sequences is quickly aligned back to a reference genome by splitting the work across the Full Stacks. Since the Full Stacks have previously demonstrated that they can provide identical results, these alignments can be re-combined and used for joint variant calling on a single stack. Then those variants can be used to do four differing analyses, which are each run on the Full Stack which is specialized to excel in that type of analysis.