Skip to content

6M.3.DEMO: STRETCH: Data Analysis: Multi-cloud (2 stacks) compute with consortia data and novel data by user permission



Demonstrate the ability of the fullstacks to perform an analysis on consortia data and user data accessible by direct ingestion on multiple cloud providers. Show that the fullstacks can access data externally and thus show that data can be shared between tools and are not tied to a particular cloud architecture.

What they achieved

Helium: Provided a slide deck to illustrate that Helium is able to run a workflow through cross stack compute and access data necessary to perform tertiary analysis using their own tool-set as well as tools used by other researchers. The analyses shown use both workflows run using CWL and Toil as well as UIs built using such applications as R Shiny and Jupyter notebooks.

Xenon: The demo was meant to provide researchers flexibility to be able choose cloud providers based on functionality, price and performance and not just be tied to a single provider. Xenon provided a 6 minute video slideshow demonstration about the multi-cloud compute with the FAIR4CURES platform. The video highlights the transparency of costs associated with running a workflow on AWS or Google Cloud via the FAIR4CURES platform. They show detailed reports for one alignment workflow on AWS but no other examples of costs for compute are provided for comparison. The fact that Xenon has developed a way to keep one copy of data in the cloud and compute on it using Amazon or Google (with automated data copies when necessary) is indeed a valuable contribution to making data more accessible, interoperable and reusable.

Why is this valuable?

Having researchers/data scientist not tied to a single cloud provider and being able to switch between providers based on price concerns, data locations or performance while keeping their code largely the same is very important.