The FindJunctions program is a Java program that uses spliced alignments to identify exon-exon junctions in RNA-Seq data. When given a BAM file, it produces a BED file that summarizes every spliced aligned alignment identified in the BAM file. If also given a reference genomic sequence file (in .2bit format) it attempts to identify the strand of origin for each junction by looking for canonical intron splice junction sequences.
To obtain a copy of FindJunction, you can either download a copy of the compiled "jar" file or get a copy of the source code and compile the program yourself.
To download a compiled copy, select Tools > Attachments (top right of this Web page). See the attachment notes for the subversion repository revision number of the compiled "jar" file.
To get a copy of the source code, check it out from the Genoviz repository and then use ant to compile it, as follows:
$ svn co http://svn.code.sf.net/p/genoviz/code/trunk/tools/FindJunctions $ cd FindJunctions $ ant release |
This will create a new file named dist/FindJunction_exe.jar. Use this file to run the program.
To run the program from the command line, you would do something like:
java -Xmx1g -jar FindJunction_exe.jar -u -n 5 -b Genome.2bit -o FJ.bed sample1.bam,sample2.bam |
In this example, the -Xmx1g option specifies that the program can run with up to 1 Gb of computer memory (RAM) using the code in jar file (-jar) FindJunction_exe.jar. The -u option (for unique) indicates that only uniquely mapping spliced reads with NH tag equal to 1 will be used to construct junctions. The -n option is the number of bases that must map to either side of a putative intron for a spliced read to be used to create or support a junction feature. The -b option gives the full path to the .2bit genomic sequence file that will be used to identify junction strand. The -o (output) option gives the name of the junctions.bed file that will be written. The final argument is a comma-separated list of the BAM files containing spliced alignments.
The output file (FJ.bed) is BED12 format. The name field contains a name constructed from the location of the junction and the score field contains the number of spliced alignments used to create the junction feature.