Posts by Collection

portfolio

publications

Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato

Published in PNAS, 2013

Recommended citation: Daniel Koenig, José M. Jiménez-Gómez, Seisuke Kimura, Daniel Fulop, Daniel H. Chitwood, Lauren R. Headland, Ravi Kumar, Michael F. Covington, Upendra Kumar Devisetty, An V. Tat, Takayuki Tohge, Anthony Bolger, Korbinian Schneeberger, Stephan Ossowski, Christa Lanz, Guangyan Xiongi, Mallorie Taylor-Teeples, Siobhan M. Bradya,j, Markus Pauly, Detlef Weigel, Björn Usadel, Alisdair R. Fernie, Jie Peng, Neelima R. Sinha, and Julin N. Maloof (2015). "Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato" PNAS. 1(3). http://upendrak.github.io/files/paper3.pdf

Modeling leaf development enables quantitative trait mapping mapping of novel loci and reveals independent genetic modules for leaf size and shape in Brassica rapa

Published in New Phytology, 2015

Recommended citation: Robert L. Baker, Wen Fung Leong, Marcus T. Brock, Robert C. Markelz, Mike Covington, Upendra K. Devisetty, Julin Maloof, Stephen Welch, and Cynthia Weinig (2015). "Modeling leaf development enables quantitative trait mapping mapping of novel loci and reveals independent genetic modules for leaf size and shape in Brassica rapa" New Phytology. 1(6). http://upendrak.github.io/files/paper6.pdf

Genetic architecture, biochemical underpinnings, and ecological impact of floral UV patterning

Published in Molecular Ecology, 2016

Recommended citation: Marcus T. Brock, Lauren K. Lucas, Nicholas A. Anderson, Matthew J . Rubin, R. J. Cody Markelez, Michael F. Covington, Upendra K. Devisetty, Clint Chappel,‡ Julin N. Maloof and Cynthia Weing (2015). "Genetic architecture, biochemical underpinnings, and ecological impact of floral UV patterning" Molecular Ecology. 1(7). http://upendrak.github.io/files/paper7.pdf

Evolinc: a comparative transcriptomics and genomics pipeline for quickly identifying sequence conserved lincRNAs for functional analysis

Published in Frontiers in Genetics, 2017

* These authors contributed equally to this manuscript

Recommended citation: Andrew D. Nelson*, Upendra K. Devisetty*, Kyle Palos, Asher K. Haug-Baltzell, Eric Lyons, Mark A. Beilstein (2017). "Evolinc: a comparative transcriptomics and genomics pipeline for quickly identifying sequence conserved lincRNAs for functional analysis" Frontiers in Genetics. 1(10). http://upendrak.github.io/files/paper10.pdf

Using RNA-seq for genomic scaffold placement, correcting assemblies, and genetic map creation in a common Brassica rapa mapping population

Published in G3, 2017

Recommended citation: RJ Cody Markelz, Michael F Covington, Marcus T Brock, Upendra K Devisetty, Daniel J Kliebenstein, Cynthia Weinig, Julin N Maloof (2017). "Using RNA-seq for genomic scaffold placement, correcting assemblies, and genetic map creation in a common Brassica rapa mapping population" G3. 1(13). http://upendrak.github.io/files/paper13.pdf

research

WQ-MAKER (Work Queue Maker)

Published in CyVerse, 2017

MAKER is one of the most popular bioinformatic pipeline used to annotate genomic information (Cantarel et al. 2008). MAKER utilizes standard programs in bioinformatics to customize the processing and preparation of the raw data. This includes processes to identify repeats, align ESTs and proteins to a target genomes, predict genes and quantify the quality of the results based on the provided evidence. MAKER focuses on automating the entire annotation process to create an easy and consistent initial annotation. MAKER is still under active development and is used in many areas of organism modeling.

talks

Development and Application of Genomic Resources in Brassica rapa at Brassicas workshop

Published in Plant and Animal Genome (PAG) conference, 2015

Brassica rapa is an economically important vegetable and oilseed crop, and serves as an excellent model for evolutionary research studies. Even though the whole genome sequence of B. rapa is available, only a very few genome based resources are currently available. The advent of high-throughput next generation sequencing technologies allowing whole transcriptome sequencing (RNA-Seq) along with the development of novel computational approaches provides the opportunity for efficiently addressing this problem. Here, we report the deep sequencing of B. rapa transcriptome in order to provide a more comprehensive set of genomic resources for functional studies. As a proof-of-concept, we used the developed genomic resources for a variety of applications including genome annotation, polymorphism detection, gene-based genetic markers detection, genotyping of a mapping population, genetic map construction, QTL and eQTL mapping. We hope that the large-scale RNA sequencing effort described here, along with the development and application of the resulting resources will significantly help researchers in the mapping and functional analysis of quantitative traits in Brassica rapa.

A Hybrid Approach to Assemble and Annotate the Brassica rapa Transcriptome in the Cloud through the iPlant Collaborative and XSEDE

Published in Plant and Animal Genome (PAG) conference, 2015

Currently there are two different approaches for producing transcriptome assembly, de novo and reference-based. Each of these methods was successfully employed to assemble transcripts by aligning reads generated using RNA-Seq technologies. Both methods have advantages and disadvantages. De novo methods can define novel transcripts, as well as non-collinear and trans-spliced transcripts that result from chromosomal rearrangements. However they perform poorly on low-expressed genes, can produce chimeras and misassemblies, and are computationally intensive. In contrast, reference-based methods are computationally less demanding, tolerate sequencing errors, and detect repeats through alignment. However reference-based methods are dependent on a reference genome, assume that transcripts are collinear with the genome, and mismatched genome alignment or genome assembly errors lead to errors in transcriptome prediction. In this study we report a hybrid approach that combines the transcripts generated from de novo and reference-based strategies to generate a transcriptome assembly and subsequently annotating them. In addition to generating a transcriptome assembly, RNA-Seq was also used to improve the existing genome annotation of B. rapa using PASA software. Both transcriptome assembly and genome annotation are often rate-limiting steps requiring complex workflows, specialized software and access to high performance computing (HPC) facilities. We show how scalable cloud-computing infrastructures such as iPlant and XSEDE (distributed computing) can enable high performance bioinformatics analyses of very large next generation transcriptome sequence data. Specifically, we use iPlant for: (i) uploading, storing (iRODS) and controlled sharing of data and results, (ii) testing and development of bioinformatics pipelines and (iii) high performance computer resources provided such as XSEDE. In future we plan to deploy the hybrid transcriptome assembly and annotation pipeline as virtual machine (VM) in iPlant’s Atmosphere Cloud Service and link to XSEDE for added processing

Bringing your bioinformatics tools to cyverse′s discovery environment using docker

Published in Houston, Texas, 2016

CyVerse (formerly iPlant Collaborative) is a life sciences cyberinfrastructure funded by the National Science Foundation (NSF). The infrastructure’s purpose is to scale science, domain expertise, and knowledge by providing a variety of computational tools, services, and platforms for storing, sharing, and analyzing large and diverse biological datasets. The Discovery Environment (DE) in CyVerse provides a modern web interface for running powerful computing, data, and analysis applications. By providing a consistent user interface for accessing tools and computing resources needed for specialized scientific analyses, the DE facilitates data exploration and scientific discovery. DE merges the “science gateway” functionality and the bioinformatics “work bench” with high-performance data management to allow seamless access to reusable computational workflows that can run at very large scales. It is common in bioinformatics to build new analysis methods utilizing multiple programs, libraries, and modules. However, each analysis that uses these tools requires specific versions of the operating system and underlying software. Docker is a container virtualization technology that wraps software of interest (e.g., a bioinformatics tool) together with all its software dependencies so it can run in a reproducible manner regardless of the environment. CyVerse has adopted Docker for integrating software apps that run in the DE’s Compute Cluster. The user creates a Dockerfile, which is sent to CyVerse and used to build the Docker image containing the tool. After the image has been deployed on the DE’s compute cluster, the user can build an web app in the DE to enable other researches easily use the tool.

WQ-Maker: A Flexible and Scalable Genome Annotation Pipeline on Jetstream Cloud

Published in Plant and Animal Genome (PAG) conference, 2017

National Science Foundation (NSF) funded Jetstream is a self-provisioned, scalable science and engineering cloud environment which allows researchers to analyze their data on customized virtual machines (VMs) in a cloud-based environment. Jetstream is freely available to US based researchers. MAKER is a flexible and scalable genome annotation pipeline used for de novo annotation of newly sequenced genomes, for updating existing genome annotations, or just to combine annotations, evidence, and quality control statistics. Installing and using MAKER on multiuser HPC systems comes with challenges associated with software version dependencies. Utilizing cloud-based systems for large-scale annotations using MAKER provides more flexibility in configuration, but have limitations such as no shared file system and need to balance work between multiple instances. WQ-MAKER, a customized version of MAKER with Work queue based distributed computing framework is designed to run on multiple VMs in the cloud making it feasible to readily scale annotation tasks that overcomes the limitations of shared file system requirement. WQ-MAKER framework also leverages MPI capability of MAKER, making full use of available cores on each cloud instance. We have created a Jetstream image of WQ-MAKER and is freely available to community members to annotate their genomes. WQ-MAKER efficiently runs MAKER simultaneously on multiple Jetstream instances, greatly speeding up the annotation run-time.

teaching

Evolinc – Identification and Evolutionary Analysis of lncRNA

Published in CyVerse, 2016

In this webinar, I along with Andrew Nelson presented Evolinc, a two-part set of apps in the CyVerse Discovery Environment (DE). Evolinc-I is designed to make long non-coding RNA (lncRNA) identification easy and reproducible, regardless of the system. Evolinc-II compares such lncRNAs to determine whether they are conserved at the genomic or transcriptomic level in various species. This information is helpful in curating lncRNA populations and identifying promising candidates for functional analysis. The tutorial for running Evolinc can be found here and the paper describing the Evolinc can be found here

WQ-MAKER: A Flexible, scalable genome annotation pipeline on Jetstream cloud

Published in CyVerse, 2017

In this webinar, I presented WQ-MAKER, a customized version of MAKER with a Work Queue-based distributed computing framework designed to run MAKER on multiple virtual machines on the Jetstream cloud. We'll show how to run WQ-MAKER on a test dataset starting from setting up a Jetstream account along with some of the accessory scripts (Ansible playbooks and custom scripts) and a few apps developed to manage the computation and progress. Performance numbers for various genomes annotated using WQ-MAKER will be discussed. The tutorial for this webinar is online at and a publication describing WQ-MAKER is here here.

CyVerse Container Camp: Container Technology for Scientific Research

Published in CyVerse, 2018

CyVerse Container Camp is an intense three-day hands-on workshop to learn how to create, use, and deploy containers across a variety of compute systems (your computer, local HPC, cloud compute environments, and national resources). We will use blend of practical theory and hands-on exercises where small groups deploy tools and workflows they bring to the workshop. Outcomes: Theory and application of container technology, how to containerize an application, how to use other containerized applications, how to build/deploy containerized workflows and how to scale out your computation: From Laptop to Cloud to HPC. As part of this workshop, my role as one of the lead instructor is to teach the basic concepts of reproducible research using software containers here

CyVerse AstroContainers Workshop

Published in CyVerse, 2018

Container technologies such as Docker and Singularity let scientists easily share, reuse, and scale all types of computational analyses. The CyVerse AstroContainers Workshop series are two-day hands-on workshops designed for astronomers to learn how to create, use, and deploy containers across a variety of compute systems (your computer, CyVerse, local HPC, etc). Our inaugural workshop will focus on Docker and Singularity. We will use a blend of practical theories and hands-on exercises for small groups to deploy tools and workflows they bring to the workshop. As part of this workshop, my role as one of the lead instructor is to teach the basic concepts of reproducible research using software containers here

Cyber Carpentry: Data Life-Cycle Training with the Datanet Federation Consortium

Published in North Carolina, Chappel Hill, 2018

This two-week workshop provided doctoral students and post-doctoral researchers with an overview of best data management practices, data science tools, and concrete steps and methods for performing end-to-end data intensive computing and data life-cycle management. Training will prepare participant to facilitate and promote reproducible science and data reuse. As part of this workshop, my role as one of the lead instructor is to teach the basic concepts of reproducible research using software containers. The tutorial for running Cybercarpentry workshop's containers can be found here