Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following describes how to add a new genome version to the IGB QuickLoad repository and update IGB synonymsspecies.txt.

Command-line utilities you'll need need

...

This will update everything in the current working directory and all the directories beneath it.

Questions about using subversion? See: Version Control with Subversion.

Get the data from UCSC

Open a Web browser and find the genome you would like to add in the Table Browser at UCSC

...

For example, the assembly menu for zebrafish reads Jul. 2010 (Zv9/danRer7). The terms in parentheses are genome version synonyms for this assembly. The one on the right (danRer7) is what UCSC calls the "database" for this assembly and is used as identifier of the genome in the UCSC DAS1 data sourcessource. The term on left (Zv9) is usually another commonly-used term, sometimes assigned by the sequencing consortium that generated the assembly or the original sequence. Sometimes, however, this term is not unique. For example, some genome versions are reported with the term "Broad," which is an organization, not an assembly.

Name the genome for

...

genus, species, strain (optional), release month and year

Choose an IGB genome assembly version identifier to represent the UCSC genome.

...

Use svn mkdir to create a new genome directory. The name of the new directory should be identical to the IGB QuickLoad genome version name.

Code Block
$ svn mkdir G_species_Mmm_YYYYY

...

This will take you to a directory where you can download files. Typically, the address of the directory is

http://hgdownload.soe.ucsc.edu/goldenPath/UCSCNAME/bigZips/Image Removed

For example, UCSC's danRer7 genome is in http://hgdownload.soe.ucsc.edu/goldenPath/danRer7/bigZips/.

...

Code Block
$ wget http://hgdownload.soe.ucsc.edu/goldenPath/dm3/bigZips/chromFa.tar.gz

...

Unpack the file

...

Use tar with options xvf to uncompress the file while also extracting its contents. An ".fa" file for each chromosome will appear when completed.

...

Note

Once you've created the 2bit file for the genome assembly, you'll delete the .fa and the .gz files.

...

Create 2bit file using faToTwoBit

...

Convert the fa files to twoBit, using the faToTwoBit. This program (from UCSC) will read one or more fasta files and convert them to a single 2bit file. To understand how to run it, type the name of the program. If you run it without any arguments, it will print a usage message.

...

Code Block
$ faToTwoBit *.fa G_species_Mmm_YYYY.2bit

...

Delete .fa files and the chromFa.tar.gz file

...

Code Block
$ rm *.fa
$ rm chromFa.tar.gz
Note

You can always re-create the .fa files using twoBitToFa, another UCSC tool.

Note

The 2bit file is typically smaller than the compressed chromFa.tar.gz file it replaced. Unlike the .fa files combined. It also tar.gz file, it supports random access, allowing IGB to support partial loading of sequence from the IGBQuickLoad site into IGB.

...