A Hybrid Approach to Assemble and Annotate the Brassica rapa Transcriptome in the Cloud through the iPlant Collaborative and XSEDE

Date:

Currently there are two different approaches for producing transcriptome assembly, de novo and reference-based. Each of these methods was successfully employed to assemble transcripts by aligning reads generated using RNA-Seq technologies. Both methods have advantages and disadvantages. De novo methods can define novel transcripts, as well as non-collinear and trans-spliced transcripts that result from chromosomal rearrangements. However they perform poorly on low-expressed genes, can produce chimeras and misassemblies, and are computationally intensive. In contrast, reference-based methods are computationally less demanding, tolerate sequencing errors, and detect repeats through alignment. However reference-based methods are dependent on a reference genome, assume that transcripts are collinear with the genome, and mismatched genome alignment or genome assembly errors lead to errors in transcriptome prediction. In this study we report a hybrid approach that combines the transcripts generated from de novo and reference-based strategies to generate a transcriptome assembly and subsequently annotating them. In addition to generating a transcriptome assembly, RNA-Seq was also used to improve the existing genome annotation of B. rapa using PASA software. Both transcriptome assembly and genome annotation are often rate-limiting steps requiring complex workflows, specialized software and access to high performance computing (HPC) facilities. We show how scalable cloud-computing infrastructures such as iPlant and XSEDE (distributed computing) can enable high performance bioinformatics analyses of very large next generation transcriptome sequence data. Specifically, we use iPlant for: (i) uploading, storing (iRODS) and controlled sharing of data and results, (ii) testing and development of bioinformatics pipelines and (iii) high performance computer resources provided such as XSEDE. In future we plan to deploy the hybrid transcriptome assembly and annotation pipeline as virtual machine (VM) in iPlant’s Atmosphere Cloud Service and link to XSEDE for added processing

Here is the link to the presentation