4M.4.FULLSTACKS: Cross-stack Compute¶
The Full Stacks provided a demonstration of how data and/or workflows are portable across stacks, and how the Full Stacks can work together to perform a cross-stack compute. The above is a representitive example of the workflow.
What they achieved¶
The Full Stacks worked together to port the TOPMed Alignment and Variant calling workflows to portable, cloud-compatible versions and then registered them on Dockstore. The teams then showed they could run the same workflow on each stack and get the same results. They verified their results were the same by using a 'checker workflow' which compared the md5sums of their results. The workflows were run up on (up to) 25 genomes in each stack.
Why is this valuable?¶
One of the core goals of the Data Commons is building multiple robust and sustainable software stacks. These stacks are not exact copies of one another, and may have varying capabilities, however their work should be replicable. That is, given the same starting data and same workflow, they should give identical answers. This is important not only for verifying results, but for sharing workflows across stacks. If a researcher wants to run half of their workflow on one stack, and the other half on another one, then the result should be exactly the same as if the entire workflow was run on a single stack. Showing that the same workflow can run on multiple stacks is an important initial step for showing cooperation, compatibility and supporting cross-stack computing.