Introduction

The FindJunctions program is a Java program that uses spliced alignments to identify exon-exon junctions in RNA-Seq data. When given a BAM file, it produces a BED file that summarizes every spliced aligned alignment identified in the BAM file. If also given a reference genomic sequence file (in .2bit format) it attempts to identify the strand of origin for each junction by looking for canonical intron splice junction sequences.

How to get FindJunctions

To obtain a copy of FindJunction, you can either download a copy of the compiled "jar" file or get a copy of the source code and compile the program yourself.

To download a compiled copy, select Tools > Attachments (top right of this Web page). See the attachment notes for the subversion repository revision number of the compiled "jar" file.

To get a copy of the source code, check it out from the Genoviz repository and then use ant to compile it, as follows:

$ svn co http://svn.code.sf.net/p/genoviz/code/trunk/tools/FindJunctions
$ cd FindJunctions
$ ant release

This will create a new file named dist/FindJunction_exe.jar. Use this file to run the program.

Using the jar file to run FindJunction

To run the program from the command line, you would do something like:

java -Xmx1g -jar FindJunction_exe.jar -u -n 5 -b Genome.2bit -o FJ.bed sample1.bam,sample2.bam

In this example, the -Xmx1g option specifies that the program can run with up to 1 Gb of computer memory (RAM) using the code in jar file (-jar) FindJunction_exe.jar. The -u option (for unique) indicates that only uniquely mapping spliced reads with NH tag equal to 1 will be used to construct junctions. The -n option is the number of bases that must map to either side of a putative intron for a spliced read to be used to create or support a junction feature. The -b option gives the full path to the .2bit genomic sequence file that will be used to identify junction strand. The -o (output) option gives the name of the junctions.bed file that will be written. The final argument is a comma-separated list of the BAM files containing spliced alignments.

The output file (FJ.bed) is BED12 format. The name field contains a name constructed from the location of the junction and the score field contains the number of spliced alignments used to create the junction feature.