For example, the following sequence of commands downloads the software, moves it to a directory named "bin" in the home directory, and then makes it executable using the chmod command.

Code Block
$ wget http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.i386/twoBitInfo $ mv twoBitInfo ~/bin $ chmod a+x ~/bin/twoBitInfo

Step-by-step guide to adding a new UCSC genome to IGBQuickLoadto IGBQuickLoad

The following instructions explain how to

set up your local copy of the public QuickLoad data repository
set up your environment to run QuickLoad scripts
get annotation files and sequence files from UCSC Genome Bioinformatics
convert files to random access, indexed file formats that enable partial data loading in IGB
update meta-data files IGB requires to update its interface and allow users to access the newly added genome
update HEADER.html and other files describing the new genome
commit your new files and updates the repo
submit a Jira ticket requesting that the main site and mirror sites be updated

If you have questions don't hesitate to ask Ann.

Get the QuickLoad data repo

...

To set up a src directory for checked-out code:

Code Block
cd mkdir src cd src

Then, use svn to get a copy of the QuickLoad source code from the repo and save it to a directory named quickload_src:

...

Use svn mkdir to create a new genome directory. The name of the new directory should be identical to the IGB QuickLoad genome version name.

Code Block
$ svn mkdir G_species_Mmm_YYYYY

...

Change directories into the newly created genome version directory

Code Block
$ cd G_species_Mmm_YYYYY

Download the sequence data

...

Is the 2bit file available?

For most of the more recent genomes, UCSC is using the 2bit format to distribute sequence data. However, some older versions may not make this available.

If yes, download it using wget.

...

Return to your terminal UNIX shell, type wget, and paste the URL into the shell.

For example,

Code Block
$ wget http://hgdownload.soe.ucsc.edu/goldenPath/danRer7/bigZips/danRer7.2bit

...

The prefix (file name part) of the 2bit file for a genome should be the genome version identifier and suffix (file extension) should be 2bit.

For example,

Code Block
$ mv danRer7.2bit D_rerio_Jul_2010.2bit

...

Return to your terminal UNIX shell, type wget, and paste the URL into the shell.

For example,

Code Block
$ wget http://hgdownload.soe.ucsc.edu/goldenPath/dm3/bigZips/chromFa.tar.gz

...

Use tar with options xvf to uncompress the file while also extracting its contents. An ".fa" file for each chromosome will appear when completed.

Code Block
$ tar xvf chromFa.tar.gz

Note
Once you've created the 2bit file for the genome assembly, you'll delete the .fa and the .gz files.

Create 2bit file using faToTwoBit

Get faToTwoBit installed. In the following example, the MacOS version is downloaded:

Code Block
$ wget http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.i386/faToTwoBit $ mv faToTwoBit ~/bin $ chmod a+x ~/bin/faToTwoBit

h7. Convert the fa files to twoBit;

faToTwoBit will read one or more fasta files and convert them to a single 2bit file. To understand how to run it, type the name of the program. If you run it without any arguments, it will print a usage message.

Code Block

$ faToTwoBit
faToTwoBit - Convert DNA from fasta to 2bit format
usage:
   faToTwoBit in.fa [in2.fa in3.fa ...] out.2bit
options:
   -noMask       - Ignore lower-case masking in fa file.
   -stripVersion - Strip off version number after . for genbank accessions.
   -ignoreDups   - only convert first sequence if there are duplicates

The prefix (file name part) of the 2bit file you create for this genome version should be the genome version identifier and suffix (file extension) should be "2bit."

Code Block
$ faToTwoBit *.fa G_species_Mmm_YYYY.2bit

Delete .fa files and the chromFa.tar.gz file

Code Block
$ rm *.fa $ rm chromFa.tar.gz

Note
You can always re-create the .fa files using twoBitToFa, another UCSC tool.

Note
The 2bit file is typically smaller than the compressed chromFa.tar.gz file it replaced. Unlike the .tar.gz file, it supports random access, allowing IGB to support partial loading of sequence from the IGBQuickLoad site into IGBthe coordinates track.

Make genome.txt file

Use twoBitInfo to create a genome.txt file for the genome. This file lists sequences and their sizes for the genome.

Code Block
$ twoBitInfo G_species_Mmm_YYYY.2bit genome.txt genome.txt

Note
You can use twoBitInfo to make BED files marking the location of N's in the genome or calculate the amount of non-N sequence in an assembly.

Sort the genome.txt file sequence size

...

To ensure that the chromosomes are listed with largest ones first, sort the file

Code Block
$ sort -k2,2nr genome.txt > tmp $ mv tmp genome.txt

Add genome.txt to the repo

...

Warning
Note that this file is tab-separated, so when you edit the file, be sure to use a tab character to separate the genome version and geome title fields.

Test the new genome

...

Testing is absolutely critical as it is easy to make an error along with way. Plan to spend at least as much time testing as you spent on building the site.

Test under the released version of IGB

Download the latest release of IGB from http://www.bioviz.org

...

.

Configure data sources under Data Sources tab

...

Page tree

Versions Compared

Old Version 13

New Version 14

Key

Step-by-step guide to adding a new UCSC genome to IGBQuickLoadto IGBQuickLoad

Get the QuickLoad data repo

Download the sequence data

Is the 2bit file available?

If yes, download it using wget.

Create 2bit file using faToTwoBit

Delete .fa files and the chromFa.tar.gz file

Make genome.txt file

Sort the genome.txt file sequence size

Add genome.txt to the repo

Test the new genome

Testing is absolutely critical as it is easy to make an error along with way. Plan to spend at least as much time testing as you spent on building the site.

Test under the released version of IGB

Configure data sources under Data Sources tab

Page tree

Page History

Versions Compared

Old Version 13

New Version 14

Key

Step-by-step guide to adding a new UCSC genome to IGBQuickLoadto IGBQuickLoad

Get the QuickLoad data repo

Download the sequence data

Is the 2bit file available?

If yes, download it using wget.

Create 2bit file using faToTwoBit

Delete .fa files and the chromFa.tar.gz file

Make genome.txt file

Sort the genome.txt file sequence size

Add genome.txt to the repo

Test the new genome

Testing is absolutely critical as it is easy to make an error along with way. Plan to spend at least as much time testing as you spent on building the site.

Test under the released version of IGB

Configure data sources under Data Sources tab