WQ-Maker: A Flexible and Scalable Genome Annotation Pipeline on Jetstream Cloud

Date:

National Science Foundation (NSF) funded Jetstream is a self-provisioned, scalable science and engineering cloud environment which allows researchers to analyze their data on customized virtual machines (VMs) in a cloud-based environment. Jetstream is freely available to US based researchers. MAKER is a flexible and scalable genome annotation pipeline used for de novo annotation of newly sequenced genomes, for updating existing genome annotations, or just to combine annotations, evidence, and quality control statistics. Installing and using MAKER on multiuser HPC systems comes with challenges associated with software version dependencies. Utilizing cloud-based systems for large-scale annotations using MAKER provides more flexibility in configuration, but have limitations such as no shared file system and need to balance work between multiple instances. WQ-MAKER, a customized version of MAKER with Work queue based distributed computing framework is designed to run on multiple VMs in the cloud making it feasible to readily scale annotation tasks that overcomes the limitations of shared file system requirement. WQ-MAKER framework also leverages MPI capability of MAKER, making full use of available cores on each cloud instance. We have created a Jetstream image of WQ-MAKER and is freely available to community members to annotate their genomes. WQ-MAKER efficiently runs MAKER simultaneously on multiple Jetstream instances, greatly speeding up the annotation run-time.

Here is the link to the presentation