Greenlight Bioscience, Durham, NC
- Duties: Establish RNA-Seq and analysis on Illumina platform to characterize mRNAs produced for vaccine and diag- nostic development. Identifying high quality novel targets for RNAi of insects and fungi. Developing Machine Learning models to predict sequence-to-sequence bias and yield of dsRNA produced in in-house dsRNA production platform. Setting up Genomics and Bioinformatics infrastructure that includes storage and compute on AWS cloud for Bioinformatics analyses for large-scale next generation data thereby simplifying and securely scaling the genomic analyses in the cloud for the Data Science Team
2016-2019: Science Informatician
CyVerse, University of Arizona, AZ- Duties: Scientifically interact with biologists, bioinformaticians, programming teams and other members of CyVerse team as well as coordinate development across projects, and facilitate integration and cross-communication. Translate community input into proof-of-concept prototypes, formal software requirements, and participate in the design of implementation of Next Generation systems
- Achievements: Optimized the process of building Bioinformatics and Data Science software into CyVerse using Docker. Developed custom Bioinformatics and computational workflows (MAKER) in the Jetstream cloud for processing complex biological data in a distributed processing environment using Work-queue and Pegasus.Developed novel Bioinformatic pipelines (Eg. RMTA) for mining 100+ TB of publicly available RNA-seq data.
- Supervisor: Dr. Eric Lyons
2019-2020: Technical Consultant
Insight Data Science, Seattle, WA- Achievements: Developed and delivered workshops such as Introduction to AWS for Data Scientists, Big Data processing platforms (Hadoop and Spark), Flask web development, ML model deployment using Heroku to Data Engineering fellows.
2019: Data Science Fellow
Insight Data Science, Seattle, WA- Achievements: Built PlantMD, an image-based plant disease detection web app that can rapidly and accurately diagnose plant diseases with 99% validation accuracy and achieving an ROC-AUC score of 0.92. Trained and validated Alexnet and VGG16 CNN architectures. Used (100K, 500GB) diseased and healthy plant leaf images using using Keras on Google Collabs GPU nodes. Used Docker, Github and Dockerhub to automatically manage building and deploying PlantMD on AWS.
2019-Current: Data Science Instructor
Datacamp- Achievements: Designed and developed course content for Big Data Fundamentals with PySpark using Apache PySpark and its components (RDD, DataFrames, SparkSQL and MLlib). The course has over 20000 students todate.
2015-2016: Post Doctoral Research Associate
Department of Forest Ecosystems and Society, Oregon State University, OR- Duties: Developing cheap and high-throughput DNA extraction and DNA library construction protocols for Populus and Implementing TASSEL GBS pipeline for Populus
- Achievements: Developed cheap and high-throughput DNA extraction and DNA library construction protocols for structural polymorphism discovery in Populus, implemented TASSEL GBS pipeline for Populus, constructed GBS libraries and analysed GBS data for understanding Aspen phylogeography
- Supervisor: Dr. Steve Strauss
2010-2015: Post Doctoral Researcher
Department of Plant Biology, University of California Davis, CA- Duties: Carrying out Quality control, primary analysis and interpretation of RNA-Seq data, detecting molecular genetic markers and constructing the novel comprehensive transcriptome assembly pipeline using deep RNA-Seq data and developing pipelines and novel tools for assembly validation and assemblers comparison of Brassica rapa.
- Achievements: Detected molecular genetic markers and constructed the novel comprehensive transcriptome assembly pipeline using deep RNA-Seq data of Brassica rapa, developed pipelines and novel tools for assembly validation and assemblers comparison of Brassica rapa, optimized the high-throughput RNA-Seq protocol for making ~1000 and ~2000 libraries from mapping populations of Brassica rapa and Arabidopsis thaliana respectively, determined the genotype of mapping population of Brassica rapa using RNA-Seq data and constructed first ever genetic map using coding genetic markers, conducted RNA-Seq expression analysis on Phytochrome mutants of Brassica rapa, conducted research to uncover novel differentially expressed genes and pathways associated with shade avoidance response in Brassica rapa, provided bioinformatics research support to other projects in the lab including Arabidopsis RNA-Seq, Tomato RNA-Seq, Bacterial genome assembly and annotation.
- Supervisor: Dr. Julin Maloof
2005-2009: Ph.D
Plant and Crop sciences Division, University of Nottingham, Nottingham, UK- Investigated the meiotic recombination in wheat using molecular biology, molecular genetics and field-based crop analysis.
- Cloned RAD51 & DMC1 meiotic homoeologous genes of wheat and investigated their role in meiotic recombination pathway in Arabidopsis using variety of functional genomic approaches
- Supervisor Dr. Sean Mayes
Education
- B.Sc (Ag), A.N.G.R.A.U (India), 1996-2000
- M.Sc (Ag), G.B.P.U.A.T (India), 2001-2003
- Ph.D (Ag), University of Nottingham (U.K), 2005-2010
My full-length C.V can be found here