RNASeq quickstart - from sequences to alignments to visualization in IGB

under construction

Introduction

High-throughput sequencing of cDNA, also called RNA-Seq, can provide many levels of information about gene expression, such as information about previously unannotated genes, expression of pseudogenes, differential expression both within and across samples, and alternative splicing.

Making the libraries and sending them out for sequencing is only the first step in performing an RNA-Seq experiment. What most people find is that processing, analyzing, and interpreting the data can often be just as time-consuming.

Take heart. These data are sets are so rich that you may never exhaust their potential. However, there are a few first steps you'll want to perform right away, as well as some quality control steps that will help you assess how well your experiment worked.

What follows is a description of RNA-Seq processing steps we do fairly routinely in the Loraine lab, along with links to software programs we've developed for in-house use. Please be aware that these programs are very much works in progress and so may not always work as advertised. If you find bugs or inconsistencies, please let us know - contact Ann (aloraine@uncc.edu) with feedback, suggestions, and bug reports.

Methods

The following protocol describes processing data from Illumina HiSeq pipeline. These steps may vary depending on your data, its age, and so on. For example, when you run TopHat, you may need to adjust parameters to accommodate smaller read lengths if you are running data from pre-HiSeq instruments.

Align your sequence. For RNA-Seq data sets, we mainly have used TopHat, from the University of Maryland. Many different versions of TopHat have been released over the past couple of years and each behaves slightly differently. However, a few things seem to remain stable. First, TopHat will typically report multiple alignments for some number of reads. This is to be expected. However, depending on your experimental goals, you may want to focus on the reads that map exactly once onto the genome. Also, you should determine the minimum and maximum intron sizes for your genome and provide these as parameters to TopHat. For details on running TopHat, see the TopHat Manual.

An example invocation of TopHat, fine-tuned for Arabidopsis thaliana:

tophat

Page tree

RNASeq quickstart - from sequences to alignments to visualization in IGB

Introduction

Methods