October 20, 2016
Talks, Houston, Texas, Houston, USA
CyVerse (formerly iPlant Collaborative) is a life sciences cyberinfrastructure funded by the National Science Foundation (NSF). The infrastructure’s purpose is to scale science, domain expertise, and knowledge by providing a variety of computational tools, services, and platforms for storing, sharing, and analyzing large and diverse biological datasets. The Discovery Environment (DE) in CyVerse provides a modern web interface for running powerful computing, data, and analysis applications. By providing a consistent user interface for accessing tools and computing resources needed for specialized scientific analyses, the DE facilitates data exploration and scientific discovery. DE merges the “science gateway” functionality and the bioinformatics “work bench” with high-performance data management to allow seamless access to reusable computational workflows that can run at very large scales. It is common in bioinformatics to build new analysis methods utilizing multiple programs, libraries, and modules. However, each analysis that uses these tools requires specific versions of the operating system and underlying software. Docker is a container virtualization technology that wraps software of interest (e.g., a bioinformatics tool) together with all its software dependencies so it can run in a reproducible manner regardless of the environment. CyVerse has adopted Docker for integrating software apps that run in the DE’s Compute Cluster. The user creates a Dockerfile, which is sent to CyVerse and used to build the Docker image containing the tool. After the image has been deployed on the DE’s compute cluster, the user can build an web app in the DE to enable other researches easily use the tool.