Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Next, I changed in the new directory and then downloaded this 2bit file using wget in the usual way:

Panel

$ wget http://hgdownload.cse.ucsc.edu/goldenPath/xenTro3/bigZips/xenTro3.2bit

Once the file downloaded, I changed the name and checked the file size:

Panel

$ mv xenTro3.2bit X_tropicalis_Nov_2009.2bit
$ ls -lh *.2bit
-rw-r-r-  1 pi  staff   373M Sep  9  2011 X_tropicalis_Nov_2009.2bit

I changed the name because IGB expects the genome sequence file name to match the genome version name. That is, when a user requests the sequence to be loaded into IGB, IGB will look for a file named X_tropicalis_Nov_2009.2bit and then use that file to retrieve the sequence data.

...

Next, I used twoBitInfo to make a "genome.txt" file reporting the names of the assembled chromosomes and contigs and their sizes:

Panel

$ twoBitInfo X_tropicalis_Nov_2009.2bit genome.txt

This creates the genome.txt file IGB needs to display contig and chromosome names and their sizes in the Current Genome tab.

...

To process the EST data set, I used the following command to strip off the first column

Panel

gunzip -c X_tropicalis_Nov_2009_all_est.gz | grep -v bin | cut -f2- > X_tropicalis_Nov_2009_all_est.psl

Next, I sorted and created an index using bgzip and  tabix:

Panel

$ sort -k14,14 -k16,16n X_tropicalis_Nov_2009_all_est.psl > sorted.psl

$ mv sorted.psl X_tropicalis_Nov_2009_all_est.psl

$ bgzip X_tropicalis_Nov_2009_all_est.psl

$ tabix -s 14 -b 16 -0 X_tropicalis_Nov_2009_all_est.psl.gz

The sort command first sorts on fields 14 through 14, inclusive (-k 14,14) and then sorts on field 16 through 16, inclusive (-k16,16). The first sort (field 14) sorts the file by target sequence name and the second sort (field 16) sorts numerically on the start position for each alignment. After sorting, bgzip block-compresses the file. 

The last command (tabix) creates a tabix index (.tbi) file that IGB knows to look for when it encounters files with extension .gz.

...

Being a bit lazy, I usually just copy and paste another annots.xml file from another part of the repository when setting up a new genome. I use svn for this, however:

Panel

$ svn cp ../V_vinifera_Mar_2010/annots.xml .
A         annots.xml

I then open a simple text editor (like TextEdit) and edited the file, like so:


Recently, we added the capability to specify various styles for annotation files delivered via QuickLoad. Most of these (e.g., foreground and background) are self-explanatory, but a few are not so obvious as they pertain to specialized aspects of how IGB presents data.

...