Introduction

The Arabidopsis and other plant genomes include many genes predicted to encode multiple splice variants.

However, it isn't always clear which of these splicing variants are the most highly expressed in specific cell types, organ structures, or conditions.

One way to assess which isoform is most abundant is to visualize aligned reads from an RNA-Seq experiment.

If the read lengths are sufficiently long (75 or more bases) and the gene of interest has sufficient coverage, you can usually tell which variant is the most abundant.

And if you have data from experiments that investigate different treatments, you can sometimes also find out if the ratio of isoforms has changed as a result of the treatment. This is called condition-dependent alternative splicing, which just means: a treatment has changed the pattern of splicing at a locus.

In this tutorial, you'll learn how to use IGB to investigate splicing using RNA-Seq data. Using data from a cold-stress experiment in Arabidopsis thaliana, you'll learn how to:

Identify the dominant isoform for a gene
Observe condition-dependent alternative splicing

This tutorial will feature SR45/AT1G16610 from Arabidopsis thaliana, a putative splicing regulator that is itself alternatively spliced.

Setting up

Follow these instructions to configure IGB to get started looking at a cold stress RNA-Seq data from Arabidopsis thaliana:

Start IGB
- Download and launch IGB using Java Web Start from http://www.bioviz.org/igb/download.shtml
- Or download igb.zip from http://sourceforge.net/projects/genoviz/files/.
  - To launch IGB from igb.zip, double-click igb.zip to unpack it, open folder igb, and double-click run_igb.bat (Windows) or run_igb.command (Mac)
Choose species Arabidopsis thaliana, Genome Version A_thaliana_Jun_2009

Note that the gene models should automatically load.

Tip: Color the gene track black (foreground) and white (background) to make attractive images for slides and printing. Just click the color swatches (squares) in the Data Management table to change colors.

IGB 6.7 showing protein-coding (TAIR10 mRNA) chr1 gene models, with gene models changed to black and background changed to white.

Open read alignment files

Now start opening read alignment files from chilled and unchilled samples.

Under the Data Access tabbed panel, open
- Data Source IGB Quickload > RNASeq > Loraine Lab > Mixed Cold > SM > Reads
Click the checkboxes next to data sets named:
- Control, alignments
- Cold, alignments

Please Note: When you click the checkboxes, IGB will access the files to make sure they are readable, but it won't immediately start loading the data. The reason for this is that many genomic data sets, especially next-generation sequencing data, are very large. To save memory and time, IGB gives you a chance to zoom and pan to a region you are most interested in viewing before loading the data into the viewer.

If it was able to access the files, IGB will add the data sets to the Data Management Table under the Data Access Tab and will also create new (empty) tracks for the data sets. Once the data are loaded, it will appear in these new tracks.

Customize the tracks

If you don't like the default color scheme, take a moment to customize your track settings colors.

Foreground and background color schemes are set by the IGBQuickLoad server, but you can change them in IGB if you like. IGB will remember your color choices between sessions.

Click the color squares in the Data Management Table (FG is foreground, BG is background) or choose File > Preferences > Tracks to open a new Window that let's you customize different aspects of the display at several at once. Click each row in the table and give each row a white background. Choose colors for individual rows by clicking on them and then selecting options.

Tip: Customize multiple tracks at once by choosing File > Preferences > Tracks

Tip: Simplify the track name and increase track name label size to make the Track Labels easier to read.

Analyze Splicing

The remainder of the tutorial will focus on one gene - SR45, also known as AT1G16610.

Background on SR45, a plant-specific, putative splicing regulator

SR45 encodes a putative splicing regulator resembling members of the so-called SR protein family, a group of proteins that contain one or more RNA-recognition motifs (RRM) and regions enriched with serine-arginine (SR) or arginine-serine (RS) amino acid repeats.

The function of the RS/SR repeats is not well-understand, but one idea is that they help stabilize interactions between pre-mRNA transcripts and proteins during splicing or spliceosome assembly.

Read about SR45 at the Arabidopsis Information Resource
Read about SR45 in Entrez Gene at the National Center for Biotechnology Information (USA)

A note on nomenclature - AGI locus codes

Like most Arabidopsis genes, SR45 has a second name (AT1G16610) called an AGI locus code that was first assigned by the Arabidopsis Genome Initiative, which sequenced and annotated early versions of the Arabidopsis genome. Use the AGI code to locate the gene in Integrated Genome Browser. (More on this later.)

SR45 mutant phenotype

One mutant line (called sr45-1) carrying a disruption in the SR45 gene has been characterized. These plants exhibit multiple defects, including misshapen petals and reduced root growth. SR45a expresses at least three known splicing variants.

An article titled Two alternatively spliced isoforms of the Arabidopsis SR45 protein have distinct roles during normal plant development investigated how Variants 1 and 2 differ with respect to SR45 function.

The authors found that transforming the sr45-1 plants with splice variant 1 corrected the petal phenotype, while transforming with variant 2 corrected the root phenotype. (Variant 3 was not tested.)

The difference between the variants was not large, but clearly the two different splice forms affect development in different ways.

Another article titled The plant-specific SR45 protein negatively regulates glucose and ABA signaling during early seedling development in Arabidopsis reported that SR45 plays a role in sugar sensing and ABA pathways during early seedling development. (ABA is a plant hormone that helps keep seeds from sprouting prematurely.) This article represents one of the first discoveries of a link between a splicing regulator and a hormone response pathway in plants.

View SR45 in IGB

Use the search tab to search for AT1G16610 (SR45 AGI locus code)

To view the SR45 gene in IGB, click the Search tab and enter the AGI locus code for SR45: AT1G16610.

Three rows of results will appear. Double-click one to navigate to the SR45 region.

You'll see three variants. Note that transcription proceeds from right to left - the promoter is on the right in this case because the gene resides on the minus strand of the chromosome.

Let's customize the display even further - change the "max stack depth" and show all gene models in one track:

Right-click (or control-click on Mac) the TAIR10 track label
- Choose Change > Adjust Max Stack Depth and enter 3 to remove extra white space above the gene models. (Note that if you zoom out, gene models from genes with more than three gene models may be drawn on top each other.)
- Choose Change > Show 1 track (+/-) to combine plus and minus strand gene models into the same track

When you're done, you'll see something like the image below.

Note that one exon is selected in variant AT1G16610.2. This is the differentially spliced exon that differs between variants 1 and 2. In variant 2, the exon has a different five prime end. It encodes a slightly shorter protein than Variant 1.

View genomic sequence and protein translations for SR45 splice variants

Right-click (or control-click on Mac) variant 1 - AT1G1660.1.
Choose View Genomic Sequence in Sequence Viewer
- When the viewer opens, click Show cDNA
- Click Show > +2 Translation to view the amino acids encoded by variant 1.

Note how the SR45 protein contains C-terminal and N-terminal RS/SR motifs. This is unusual for an SR protein and some researchers believe that this unusual structure makes SR45 a plant-specific SR protein, and others argue that it should not properly be considered an SR protein.

Read this review article on SR proteins: Implementing a rational and consistent nomenclature for serine/arginine-rich protein splicing factors (SR proteins) in plants.

Genomic sequence viewer with amino acid translation of splice variant AT1G16601.1:

Load RNA-Seq reads from cold stress experiment

If you have already accessed the read alignment files, you should see two empty tracks in the IGB main view (as in the images above.) To load the reads mapping to the SR45 region, click the Load Data button.

Click Load Data to load reads for the region
Right-click (or control-click) the Control and Cold Treatment track labels
- Choose Change > Adjust Max Stack Depth and enter "0" to ensure that all reads will be shown
- Use the vertical slider to stretch the display vertically to stretch the reads and gene models

About Max Stack Depth: The Max Stack Depth setting takes some getting used to, but as you learn to use IGB, you'll find it gives you extra power to configure the IGB display and fine-tune images you create for publication, slides, and reports.

The Max Stack Depth setting allows you to control the number items (reads, gene models, or other annotations) that can be stacked on top of each other in a given track.

This is a useful setting when you're working with BAM files (read alignments) because for some genes, there will be tens, hundreds, or even thousands of aligned reads that overlap the same region of the genome.

If you set the Max Stack Depth to a number other than 0, then IGB will never build "stacks" of overlapping reads taller than that number. Any extra reads that would make the stack too tall (and exceed the Max Stack Depth setting) will be drawn on top of each other in a special overlap row at the very top of the track (for plus strand reads) or at the bottom of the track (for minus strand reads.)

This special "slop" row of extra reads or annotations will be drawn in a slightly darker color than the individually stacked items in the rest of the track.

You can interact with both the individually stacked and "slop" reads or annotations; you can click on them, click-drag across them to select them, and so on, which is useful when you need to count the number of reads overlapping a gene or a an intron. (More on this later.)

Assess relative abundance of SR45 splicing isoforms

The SR45 gene is annotated as producing three possible variants. Which of these is the most highly expressed in the cold treatment and control samples?

To answer this question, use the aligned reads from the two RNA-Seq samples.

Focus on the differentially spliced region:

Double-click the alternative exon in variant 1 (sixth from the left) to zoom in on it. Note it has two possible boundaries and that the intron to its immediate right (5-prime side relative to transcription) is an alternatively spliced intron.
Scroll up through the treatment and control read alignments. Observe how many reads support the the V1 versus the V2 version of the alternative intron. Which form is more common? Is the same form more common in the treatment or in the control sample?

To answer this question, we have to look at reads that align across the intron immediately to the right of the alternative exon.

The intron has two alternative 3-prime boundaries, also called "acceptor sites." Remember that the gene is transcribed from right to left (see arrows in IGB) because it is located on the minus strand of the chromosome.

With respect to splicing, the most informative reads are spliced reads, reads that align across the alternative intron. These spliced reads can support one or the other choice, but not both.

You can count all the reads that overlap the intron simply by selecting them. When you click-drag over a region using the arrow (selection cursor), then anything you capture in the drag will be added to the list of currently selected items.

Select reads by click-dragging over a region
To add an item to the selection list, SHIFT-CLICK it
To remove an item to the selection list, CONTROL-SHIFT-CLICK it.

Use selection to count the number of reads that support V1 or V2.

IGB with 56 reads selected. Selected reads include just the reads that were spliced across the alternative intron - selections include reads supporting either V1 or V2.

IGB with just the V2 supporting reads selected:
Based on this analysis, it appears that the V1 form (AT1G16610.1) is more abundantly expressed in the control sample.

That is, out of 56 spliced reads, only 16 (around 25%) supported V2 and the rest supported V1.

Repeat the process for the Cold-treated sample.
- Were there fewer or more reads overlapping the alternative intron in the cold-treated sample?
- What percentage of reads supported V2?

What preliminary conclusions can you draw about cold stress and splicing in SR45?

Finishing up - export an image showing what you found

If you would like to make images like the ones you see above, choose File > Export image. A new window will appear giving you the option to create image files in various formats, set image size and resolution, and save either the whole IGB window, the main view, or the main view with labels.

Use File > Export Image to create a "souvenir" image from your session with IGB.

Page tree

Assessing splicing - Cold-induced alternative splicing of Arabidopsis SR45