...
Table of Contents |
---|
Introduction
UCSC genome bioinformatics supports mammalian, insect, fish, avian, and some fungal genomes.
IGBQuickLoad contains genome directories with sequence and annotations data for some (not all) genome assemblies supported at UCSC.
The following describes how to add a new genome version to the IGB QuickLoad repository and update IGB species.txt.
Command-line utilities you'll need
- faToTwoBit and twoBitInfo scripts from UCSC Jim Kent tools. Available from http://hgdownload.cse.ucsc.edu/admin/exe/
- UNIX wget (not installed by default on Mac but available on most other UNIX systems)
- UNIX sort (should be pre-installed on any UNIX system, including Mac)
- A version of git for your platform.
- A version of subversion (svn) for your platform.
- IGBQuickLoad scripts in https://bitbucket.org/lorainelab/genomesource (details on how to get the code are in the next sections)
Get the compiled programs from UCSC and make sure they are executable on your system.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
wget http://hgdownload.cse.ucsc.edu/admin/exe/]macOSX.i386/twoBitInfo Getmv the compiled programs these in a directory in your PATH. Make sure they are executable on your system. If you're doing this on a Mac desktop or laptop computer, create a directory called "bin" in your home directory and save all compiled binaries there. Edit your .bash_profile file to include a line like the following to ensure that the shell can find the programs. {code} export PATH=.:$HOME/bin:$PATH {code} For example, the following sequence of commands downloads the software, moves it to a directory named "bin" in the home directory, and then makes it executable using the chmod command. {code} wget http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.i386/twoBitInfo mv twoBitInfo ~/bin chmod a+x ~/bin/twoBitInfo {code} h1. Step-by-step guide to adding a new UCSC genome to IGBQuickLoad The following instructions explain how to * set up your local copy of the public QuickLoad data repository * set up your environment to run QuickLoad scripts * get annotation files and sequence files from UCSC Genome Bioinformatics * convert files to random access, indexed file formats that enable partial data loading in IGB * update meta-data files IGB requires to update its interface and allow users to access the newly added genome * update HEADER.html and other files describing the new genome * commit your new files and updates the repo * submit a Jira ticket requesting that the main site and mirror sites be updated If you have questions don't hesitate to ask Ann. h2. Get the QuickLoad data repo h3. Check out or update a copy of IGB QuickLoad data directories. Open a terminal (UNIX shell) and change into the directory where you want your checked-out copy of the genomes repository to reside. For example, you might do this: {code} cd # change into your home directory svn co https://svn.transvar.org/repos/genomes/trunk/pub/quickload {code} Or, if you already have a copy, just update using svn up. {code} svn up {code} This will update everything in the current working directory and all the directories beneath it. To avoid conflicts with other people's committed changes, *be sure to update* your local copy of IGB QuickLoad repo before starting work. Questions about using subversion? See: [Version Control with Subversion|http://svnbook.red-bean.com/]. h3. Configure your Apache web server to server the IGB QuickLoad data directories via http You'll need this to test that the new genome directory looks OK when visited in a Web browser. How you do this will depend on your computer. The following instructions explain how to do this on a Mac. * Use {{locate}} to find your local copy of {{httpd.conf}}, the Apache configuration file. Probably it's located at {{/private/etc/apache2/httpd.conf}}, depending on your system. * Open a terminal window and change into the same directory as the configuration file. * Make a backup copy of the file: {code} cp httpd.conf httpd.conf.bak {code} * Use sudo to open the file in a text editor like {{pico}} or {{emacs}} and enter your password. \*Note\* this only works if you have admin privileges on your computer. If you can't edit this file, then you'll need to get help before proceeding. {code} sudo pico httpd.conf {code} * Find the place in the file that says *DocumentRoot*. Comment the current DocumentRoot and substitute the full path to your checked-out copy of the QuickLoad repository: {code} #DocumentRoot "/Library/WebServer/Documents" DocumentRoot "/Users/username/quickload" {code} and {code} #<Directory "/Library/WebServer/Documents"> <Directory "/Users/username/quickload"> {code} where {{quickload}} is your copy of the checked-out repo. * Restart Apache. To restart Apache on a Mac, open *Apple > System Preferences ... > Sharing* and select *Web Sharing*. If it is already selected, that means Apache is already running. Unselect it to stop Apache and then select it again to restart Apache. * Open a Web browser and enter url [http://localhost]. (You may need to refresh your browser.) * You should see now see something that looks exactly like the public IGB QuickLoad site. {note} Now, you can configure IGB to access your local copy of IGB QuickLoad using both the URL [http://localhost] \**or*\* using the file chooser because IGB supports QuickLoad access via the Web (http) or from local files. {note} h3. Check out or update a copy of IGB QuickLoad source code (src) directory. As before, open a terminal and change into the directory where you want your checked-out copy of the genomes src code to reside. A good place for checked-out code is a directory named {{src}} in your home directory. To set up a {{src}} directory for checked-out code: {code} twoBitInfo ~/bin chmod a+x ~/bin/twoBitInfo |
Step-by-step guide to adding a new UCSC genome to IGBQuickLoad
The following instructions explain how to
- set up your local copy of the public QuickLoad data repository
- set up your environment to run QuickLoad scripts
- get annotation files and sequence files from UCSC Genome Bioinformatics
- convert files to random access, indexed file formats that enable partial data loading in IGB
- update meta-data files IGB requires to update its interface and allow users to access the newly added genome
- update HEADER.html and other files describing the new genome
- commit your new files and updates the repo
- submit a Jira ticket requesting that the main site and mirror sites be updated
If you have questions don't hesitate to ask Ann.
Get the QuickLoad data repo
Check out or update a copy of IGB QuickLoad data directories.
Open a terminal (UNIX shell) and change into the directory where you want your checked-out copy of the genomes repository to reside.
For example, you might do this:
Code Block |
---|
cd # change into your home directory
svn co https://svn.bioviz.org/repos/genomes/quickload/
|
To ensure you will be able to upload changes, request a user id and password from Dr. Loraine. Alternatively, use user name "guest" and password "guest" for read-only access.
Or, if you already have a copy, just update using svn up. Change into your local copy and run:
Code Block |
---|
svn up
|
This will update everything in the current working directory and all the directories beneath it. To avoid conflicts with other people's committed changes, be sure to update your local copy of IGB QuickLoad repo before starting work.
Questions about using subversion? See: Version Control with Subversion.
Configure your Apache web server to server the IGB QuickLoad data directories via http \[Optional\]
You'll need this only if you want to test how the new genome directory looks when visited in a Web browser. How you do this will depend on your computer.
For this, you need to install Apache and then configure it to use the checked-out copy of the QuickLoad content as the "DocumentRoot"
Check out or update a copy of IGB QuickLoad source code (src) directory.
As before, open a terminal and change into the directory where you want your checked-out copy of the genomesource code to reside. As described above, a good place for checked-out code is a directory named src
in your home directory.
To set up a src
directory for checked-out code:
Code Block |
---|
cd
mkdir src
cd src
{code}
|
Then,
...
use
...
git to
...
get
...
a
...
copy
...
of
...
the
...
QuickLoad
...
source code:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
git clone https://svnbitbucket.transvar.org/repos/genomes/trunk/pub/src quickload_src {code} To ensure that lorainelab/genomesource |
To ensure that you'll
...
be
...
able
...
to
...
run
...
the
...
code,
...
add
...
the
...
new
...
directory
...
to
...
your
...
PATH
...
and
...
your
...
PYTHONPATH
...
environmental
...
variables
...
by
...
editing
...
your
...
.bash_profile
...
startup
...
script:
...
Code Block |
---|
export PATH=$HOME/src/quickload_src:$PATH
export PYTHONPATH=$HOME/src/quickload_src:$PYTHONPATH
{code}
*Test that it worked* by opening a new terminal and typing {{sample.py}} at the prompt. If your path is correctly configured, the script will run without error.
h2. Get the data from UCSC
h3. Open a Web browser and find the genome you would like to add in the Table Browser at UCSC
Go to [ |
Test that it worked by opening a new terminal and typing sample.py
at the prompt. If your path is correctly configured, the script will run without error.
Get the data from UCSC
Open a Web browser and find the genome you would like to add in the Table Browser at UCSC
Go to http://genome.ucsc.edu/cgi-bin/hgTables
...
Use the genome version menu to determine the month and year of the genome release you want.
Make note of the genome version synonyms UCSC is using. This usually in parentheses next to the month and year of the release. These will need to be included in IGB's curated list of genome version synonyms to ensure compatibility with Galaxy, UCSC DAS, or other external resources.
For example, the assembly menu for zebrafish reads Jul. 2010 (Zv9/danRer7).
...
The
...
terms
...
in
...
parentheses
...
are
...
genome
...
version
...
synonyms
...
for
...
this
...
assembly.
...
The
...
one
...
on
...
the
...
right
...
(danRer7)
...
is
...
what
...
UCSC
...
calls
...
the
...
"database"
...
for
...
this
...
assembly
...
and
...
is
...
used
...
as
...
identifier
...
of
...
the
...
genome
...
in
...
the
...
UCSC
...
DAS1
...
data
...
source.
...
The
...
term
...
on
...
left
...
(Zv9)
...
is
...
usually
...
another
...
commonly-used
...
term,
...
sometimes
...
assigned
...
by
...
the
...
sequencing
...
consortium
...
that
...
generated
...
the
...
assembly
...
or
...
the
...
original
...
sequence.
...
Sometimes,
...
however,
...
this
...
term
...
is
...
not
...
unique.
...
For
...
example,
...
some
...
genome
...
versions
...
are
...
reported
...
with
...
the
...
term
...
"Broad,"
...
which
...
is
...
an
...
organization,
...
not
...
an
...
assembly.
...
Name
...
the
...
genome
...
for
...
genus,
...
species,
...
strain
...
(optional),
...
release
...
month
...
and
...
year
...
Choose
...
an
...
IGB
...
genome
...
assembly
...
version
...
identifier
...
to
...
represent
...
the
...
UCSC
...
genome.
...
IGB
...
genome
...
assembly
...
versions
...
identifiers
...
look
...
like:
...
IGB
...
uses
...
genus,
...
species,
...
strain
...
(optionally),
...
and
...
release
...
month
...
and
...
year
...
to
...
identify
...
genome
...
assembly
...
versions
...
for
...
a
...
species,
...
individual,
...
strain,
...
or
...
cultivar
...
whose
...
genome
...
was
...
sequenced.
...
G_species_strain_Mmm_
...
YYYY
where
- G is the first letter (upper-case)
...
- of
...
- the
...
- genus
...
- name
...
- species
...
- is
...
- the
...
- species
...
- name
...
- (lower-case)
...
- strain
...
- is
...
- cultivar,
...
- strain,
...
- or
...
- individual
...
- whose
...
- genome
...
- was
...
- sequenced
...
- (this
...
- is
...
- optional
...
- and
...
- not
...
- usually
...
- needed
...
- for
...
- UCSC-managed
...
- genomes)
...
- Mmm
...
- is
...
- the
...
- three-letter
...
- English
...
- abbreviation
...
- of
...
- the
...
- month
...
- of
...
- the
...
- release
...
- (first
...
- letter
...
- is
...
- upper-case)
...
- YYYY
...
- is
...
- the
...
- year
...
- of
...
- the
...
- release
...
Examples)
...
- P_troglodytes_Oct_2010
...
- genome
...
- assembly
...
- for
...
- chimp
...
- released
...
- Oct
...
- 2010
...
- Z_mays_B73_Mar_2010
...
- genome
...
- assembly
...
- for
...
- maize
...
- plant
...
- cultivar
...
- B73
...
- released
...
- March
...
- 2010
...
Use
...
svn
...
mkdir
...
to
...
create
...
a
...
new
...
genome
...
version
...
directory
...
for
...
the
...
genomes
...
repo
...
Use
...
svn
...
mkdir
...
to
...
create
...
a
...
new
...
genome
...
directory.
...
The
...
name
...
of
...
the
...
new
...
directory
...
should
...
be
...
identical
...
to
...
the
...
IGB
...
QuickLoad
...
genome
...
version
...
name.
...
Code Block |
---|
svn mkdir G_species_Mmm_YYYYY
{code}
{warning}
|
Warning |
---|
Don't use the UNIX mkdir command to create the directory and then use svn add later to add it later to the repo. If you do this and are not careful, you will accidentally add large sequence data files to the repo. {warning} Change directories into the newly created genome version directory {code} |
Change directories into the newly created genome version directory
Code Block |
---|
cd G_species_Mmm_YYYYY
{code}
h3. Download the sequence data
Go to UCSC Genome Bioinformatics and click Downloads > Genome Data. Click the link for your species and then click the link labeled Full data set.
This will take you to a directory where you can download files. Typically, the address of the directory is
[ |
Download the sequence data
Go to UCSC Genome Bioinformatics and click Downloads > Genome Data. Click the link for your species and then click the link labeled Full data set.
This will take you to a directory where you can download files. Typically, the address of the directory is
http://hgdownload.soe.ucsc.edu/goldenPath/UCSCNAME/bigZips/
...
For example,
...
UCSC's
...
danRer7
...
genome
...
is
...
in
...
http://hgdownload.soe.ucsc.edu/goldenPath/danRer7/bigZips/
...
.
...
Is
...
the
...
2bit
...
file
...
available?
...
For
...
most
...
of
...
the
...
more
...
recent
...
genomes,
...
UCSC
...
is
...
using
...
the
...
2bit
...
format
...
to
...
distribute
...
sequence
...
data.
...
However,
...
some
...
older
...
versions
...
may
...
not
...
make
...
this
...
available
...
.
...
If
...
yes,
...
download
...
it
...
using
...
wget.
...
Right-click
...
the
...
link
...
in
...
your
...
browser
...
and
...
select
...
"Copy
...
Link
...
Location."
...
Return
...
to
...
your
...
terminal
...
UNIX
...
shell,
...
type
...
wget,
...
and
...
paste
...
the
...
URL
...
into
...
the
...
shell.
...
For
...
example,
...
Code Block |
---|
wget http://hgdownload.soe.ucsc.edu/goldenPath/danRer7/bigZips/danRer7.2bit
{code}
h6. Rename the file to |
Rename the file to G_species_Mmm_YYYYY.2bit.
...
The
...
prefix
...
(file
...
name
...
part)
...
of
...
the
...
2bit
...
file
...
for
...
a
...
genome
...
should
...
be
...
the
...
genome
...
version
...
identifier
...
and
...
suffix
...
(file
...
extension)
...
should
...
be
...
2bit.
...
For
...
example,
...
Code Block |
---|
mv danRer7.2bit D_rerio_Jul_2010.2bit
{code}
h5. If |
If not,
...
download
...
the
...
sequence
...
data
...
in
...
fasta
...
format.
...
For
...
older
...
genomes,
...
UCSC
...
provides
...
sequence
...
data
...
in
...
a
...
so-called
...
"bigZip"
...
file
...
which
...
contains
...
each
...
assembled
...
sequence
...
(typically
...
one
...
per
...
physical
...
chromosome)
...
in
...
a
...
separate
...
fasta
...
file.
...
For
...
example,
...
as
...
of
...
this
...
writing,
...
the
...
dm3
...
(fruit
...
fly)
...
genome
...
is
...
provided
...
in
...
this
...
format.
...
Right-click
...
the
...
link
...
in
...
your
...
browser
...
and
...
select
...
"Copy
...
Link
...
Location."
...
Return
...
to
...
your
...
terminal
...
UNIX
...
shell,
...
type
...
wget,
...
and
...
paste
...
the
...
URL
...
into
...
the
...
shell.
...
For
...
example,
...
Code Block |
---|
wget http://hgdownload.soe.ucsc.edu/goldenPath/dm3/bigZips/chromFa.tar.gz
{code}
h6. Unpack the file
Use tar with options xvf to uncompress the file while also extracting its contents. An |
Unpack the file
Use tar with options xvf to uncompress the file while also extracting its contents. An ".fa"
...
file
...
for
...
each
...
chromosome
...
will
...
appear
...
when
...
completed.
...
Code Block |
---|
tar xvf chromFa.tar.gz
{code}
{note}
Once |
Note |
---|
Once you've created the 2bit file for the genome assembly, you'lldelete the .fa and the .gz files. {note} h6. Create 2bit file using faToTwoBit Get faToTwoBit installed. In the following example, the MacOS version is downloaded: {code} |
Create 2bit file using faToTwoBit
Get faToTwoBit installed. In the following example, the MacOS version is downloaded:
Code Block |
---|
wget http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.i386/faToTwoBit
mv faToTwoBit ~/bin
chmod a+x ~/bin/faToTwoBit
{code}
h7. Convert the fa files to twoBit
faToTwoBit will read one or more fasta files and convert them to a single 2bit file. To understand how to run it, type the name of the program. If you run it without any arguments, it will print a usage message.
{code}
|
Convert the fa files to twoBit
faToTwoBit will read one or more fasta files and convert them to a single 2bit file. To understand how to run it, type the name of the program. If you run it without any arguments, it will print a usage message.
Code Block |
---|
faToTwoBit
faToTwoBit - Convert DNA from fasta to 2bit format
usage:
faToTwoBit in.fa [in2.fa in3.fa ...] out.2bit
options:
-noMask - Ignore lower-case masking in fa file.
-stripVersion - Strip off version number after . for genbank accessions.
-ignoreDups - only convert first sequence if there are duplicates
{code}
|
The
...
prefix
...
(file
...
name
...
part)
...
of
...
the
...
2bit
...
file
...
you
...
create
...
for
...
this
...
genome
...
version
...
should
...
be
...
the
...
genome
...
version
...
identifier
...
and
...
suffix
...
(file
...
extension)
...
should
...
be
...
"2bit."
...
Code Block |
---|
faToTwoBit *.fa G_species_Mmm_YYYY.2bit
{code}
h6. Delete |
Delete .fa
...
files
...
and
...
the
...
chromFa.tar.gz
...
file
Code Block |
---|
rm *.fa
rm chromFa.tar.gz
{code}
{note}
You can always |
Note |
---|
You can always re-create the .fa files using twoBitToFa, another UCSC tool. {note} {note} The 2bit file is typically smaller than the compressedtool. |
Note |
---|
The 2bit file is typically smaller than the compressed chromFa.tar.gz file it replaced. Unlike the .tar.gz file, it supports random access, allowing IGB to support partial loading of sequence from the IGBQuickLoad site into the coordinates track. {note} h3. Make |
Make genome.txt
...
file
...
Use
...
twoBitInfo
...
to
...
create
...
a
...
genome.txt
...
file
...
for
...
the
...
genome.
...
This
...
file
...
lists
...
sequences
...
and
...
their
...
sizes
...
for
...
the
...
genome.
...
Code Block |
---|
twoBitInfo G_species_Mmm_YYYY.2bit genome.txt
{code}
{note}
You can use twoBitInfo to make BED files marking the location of N's in the genome or calculate the amount of non-N sequence in an assembly.
{note}
h4. Sort the genome.txt file sequence size
The order of sequences listed in the genome.txt is how they will appear in the IGB Current Sequence tab when users visit the genome version in IGB.
Check the genome.txt file. Are the chromosomes listed in a reasonable order?
If you created the 2bit file from fasta files, then probably they will be listed in alphabetical.
Depending on the state of the genome assembly, i.e., how close to complete it is, it's usually much better to list the chromosomes by size, with larger sequences appearing first in the list.
To ensure that the chromosomes are listed with largest ones first, sort the file
{code}
|
Note |
---|
You can use twoBitInfo to make BED files marking the location of N's in the genome or calculate the amount of non-N sequence in an assembly. |
Sort the genome.txt file sequence size
The order of sequences listed in the genome.txt is how they will appear in the IGB Current Sequence tab when users visit the genome version in IGB.
Check the genome.txt file. Are the chromosomes listed in a reasonable order?
If you created the 2bit file from fasta files, then probably they will be listed in alphabetical.
Depending on the state of the genome assembly, i.e., how close to complete it is, it's usually much better to list the chromosomes by size, with larger sequences appearing first in the list.
To ensure that the chromosomes are listed with largest ones first, sort the file
Code Block |
---|
sort -k2,2nr genome.txt > tmp
mv tmp genome.txt
{code}
h4. Add |
Add genome.txt
...
to
...
the
...
repo
...
Add
...
the
...
genome.txt
...
file
...
to
...
the
...
repo
...
Code Block |
---|
svn add genome.txt
{code}
h2. Edit |
Edit contents.txt
...
Add
...
the
...
new
...
genome
...
to
...
the
...
contents.txt
...
in
...
the
...
top
...
level
...
of
...
your
...
checked-out
...
quickload
...
directory.
...
The
...
contents.txt
...
file
...
is
...
a
...
tab-separated
...
file
...
with
...
two
...
columns:
...
- Column
...
- 1
...
- -
...
- genome
...
- directory
...
- Column
...
- 2
...
- -
...
- genome
...
- title
...
The
...
genome
...
title
...
is
...
what
...
IGB
...
will
...
display
...
in
...
the
...
title
...
bar
...
when
...
users
...
visit
...
the
...
new
...
genome.
...
To
...
create
...
the
...
title,
...
follow
...
the
...
same
...
conventions
...
as
...
for
...
the
...
other
...
UCSC
...
genomes.
...
Title
...
begins
...
with
...
genus
...
and
...
species,
...
followed
...
by
...
the
...
date
...
of
...
the
...
release
...
(in
...
parentheses),
...
followed
...
by
...
the
...
common
...
name
...
for
...
the
...
species,
...
followed
...
by
...
the
...
UCSC
...
genome
...
name
...
(in
...
parentheses).
...
For
...
example,
...
Code Block |
---|
Cavia porcellus (Feb 2008) guinea pig (cavPor3)
{code}
{warning}
Note that this file is |
Warning |
---|
Note that this file is tab-separated, so when you edit the file, be sure to use a tab character to separate the genome version and geome title fields. {warning} h1. Edit files to make the Web site look nicer. The main job of a QuickLoad site is to enable users to load data into IGB. However, a QuickLoad site is also a Web site, and so it's important to provide some additional files for users who visit the site in their Web browser. When you add a new set of files to the main IGBQuickLoad site, you need to also: * provide appropriate HEADER.html files for the genome version directory * edit the .htacess file in the QuickLoad root directory (one level above the genome version directories) h2. Add a new HEADER.html file to the genome directory. For this, use the script named {{writeQuickLoadHeaderUCSC.py}}. To run it, make sure you are in the top level of the QuickLoad directory and enter: {src} |
Edit files to make the Web site look nicer.
The main job of a QuickLoad site is to enable users to load data into IGB. However, a QuickLoad site is also a Web site, and so it's important to provide some additional files for users who visit the site in their Web browser. When you add a new set of files to the main IGBQuickLoad site, you need to also:
- provide appropriate HEADER.md files for the genome version directory
- edit the .htacess file in the QuickLoad root directory (one level above the genome version directories)
Add a new HEADER.md file to the genome directory.
For this, use the script named writeQuickLoadHeaderUCSC.py
. To run it, make sure you are in the top level of the QuickLoad directory and enter:
Code Block |
---|
writeQuickLoadHeaderUCSC.py G_species_MMMMmm_YYYY > G_species_MMMMmm_YYYY/HEADER.html {src} The script will read the md |
The script will read the contents.txt
...
file,
...
look
...
for
...
the
...
human-readable
...
title
...
of
...
the
...
genome
...
(column
...
2)
...
for
...
the
...
directory
...
G_species_
...
Mmm_YYYY,
...
and
...
then
...
...
text
...
for
...
HEADER.
...
md to
...
stdout.
...
Edit the .htaccess
...
file
...
When
...
users
...
open
...
a
...
location
...
in
...
the
...
IGB
...
QuickLoad
...
site
...
using
...
their
...
Web
...
browser,
...
the
...
browser
...
displays
...
a
...
list
...
of
...
files
...
and
...
some
...
descriptive
...
text
...
next
...
to
...
each
...
file.
...
This
...
happens
...
because
...
the
...
IGB
...
QuickLoad
...
site
...
has
...
a
...
file
...
named
...
.htaccess
...
that
...
causes
...
Apache
...
to
...
display
...
this
...
information.
...
So
...
when
...
you
...
add
...
a
...
new
...
file
...
type
...
or
...
a
...
new
...
genome
...
to
...
the
...
QuickLoad
...
site,
...
you
...
also
...
need
...
to
...
add
...
a
...
new
...
Description
...
to
...
the
...
.htaccess
...
file.
...
Open
...
the
...
file
...
in
...
a
...
text
...
editor
...
and
...
add
...
one
...
new
...
line
...
for
...
the
...
genome,
...
the
...
same
...
text
...
you
...
added
...
to
...
the
...
contents.txt
...
file,
...
but
...
with
...
the
...
order
...
to
...
the
...
columns
...
reversed.
...
Use
...
the
...
same
...
text
...
to
...
ensure
...
consistency
...
between
...
what
...
IGB
...
shows
...
in
...
the
...
window
...
title
...
bar,
...
what
...
the
...
HEADER.html
...
title
...
displays,
...
and
...
what
...
appears
...
in
...
the
...
directory
...
description
...
when
...
users
...
view
...
the
...
Web
...
site
...
in
...
their
...
Web
...
browser.
...
For
...
example:
...
Code Block |
---|
AddDescription "Gallus gallus (Nov 2011) chicken (galGal4/ICGC Gallus-gallus-4.0)" G_gallus_Nov_2011 {code} *2011 |
Note:
...
Add
...
the
...
new
...
text
...
to
...
the
...
end
...
of
...
the
...
file.
...
Test
...
the
...
new
...
genome
...
Testing
...
is
...
absolutely
...
critical
...
as
...
it
...
is
...
easy
...
to
...
make
...
an
...
error
...
along
...
with
...
way.
...
Plan
...
to
...
spend
...
at
...
least
...
as
...
much
...
time
...
testing
...
as
...
you
...
spent
...
on
...
building
...
the
...
site!
...
Test under the released version of IGB
Download the latest release of IGB from http://www.bioviz.org
...
.
...
Configure
...
data
...
sources
...
under
...
Data
...
Sources
...
tab
...
Click
...
the
...
configure
...
link.
...
- Add
...
- your
...
- local
...
- copy
...
- of
...
- the
...
- IGB
...
- QuickLoad
...
- site
...
- to
...
- your
...
- list
...
- of
...
- data
...
- sources
...
- Remove
...
- all
...
- data
...
- sources
...
- EXCEPT
...
- the
...
- UCSC
...
- DAS1
...
- server
...
- and
...
- your
...
- local
...
- QL
...
- directory
Check that the new genome version appears
In the species menu, you should see the organism's genus, species, and strain listed. When you hover the mouse over the its menu item, you should also see a tooltip reporting the species' colloquial name (in English).
In the genome version menu, you should see the genome version name listed.
If not, edit species.txt
If you don't, this only means that IGB doesn't recognize it. To ensure IGB recognizes the genome, add it to the file species.txt that resides at the top level of the IGBQuickLoad directory, the same level as contents.txt.
species.txt is a tab-separated file in which each line represents synonyms for a genome. The first column should list the full Linnean name for the species; this is what will appear in the species menu. It can include spaces. The second column should list the common (colloquial) name for the species, in English. The next column should contain the IGB geome version minus the data and also minus any numbers at the end of the name. The next column should contain the UCSC genome version, minus the number. The final column should contain the genus and species name joined by an underscore.
For example:
Code Block |
---|
Pan troglodytes Chimp P_troglodytes panTro Pan_troglodytes
{code}
h3. Restart IGB and |
Restart IGB and re-test
...
When
...
you're
...
done,
...
restart
...
IGB
...
and
...
check
...
that
...
the
...
species
...
and
...
version
...
are
...
displayed
...
correctly.
...
Use
...
the
...
Species
...
and
...
Genome
...
menus
...
(under
...
Current
...
Sequence
...
tab)
...
to
...
select
...
your
...
new
...
genome
...
Choose
...
your
...
genome
...
version.
...
The
...
chromosomes
...
from
...
your
...
genome.txt
...
file
...
should
...
then
...
appear
...
in
...
the
...
Current
...
Sequence
...
tab.
...
You
...
should
...
also
...
see
...
the
...
corresponding
...
UCSC
...
DAS1
...
data
...
source
...
appear
...
in
...
the
...
Data
...
Sets/Data
...
Sources
...
tree
...
on
...
the
...
left.
...
If
...
this
...
step
...
fails,
...
it
...
may
...
be
...
there
...
is
...
a
...
problem
...
with
...
your
...
genome.txt
...
file.
...
Test
...
that
...
you
...
can
...
load
...
sequence
...
data.
...
Check
...
that
...
IGB
...
can
...
load
...
sequence
...
data
...
from
...
your
...
local
...
2bit
...
file.
...
Visit
...
a
...
chromosome,
...
zoom
...
in,
...
and
...
click
...
Load
...
Sequence.
...
Zoom
...
in
...
to
...
check
...
that
...
the
...
sequence
...
is
...
visible.
...
Note
...
that
...
depending
...
on
...
which
...
sequence
...
file
...
you
...
used,
...
some
...
letters
...
will
...
be
...
lower-case.
...
This
...
is
...
how
...
UCSC
...
masks
...
repetitive
...
or
...
low-complexity
...
sequence
...
.
...
Repeat
...
the
...
above
...
tests
...
using
...
the
...
latest
...
(trunk)
...
version
...
of
...
IGB
...
To
...
get
...
a
...
copy
...
of
...
the
...
trunk,
...
check
...
it
...
using
...
svn
...
from
...
sourceforge
...
OR
...
download
...
it
...
from
...
<a
...
href="http://test.bioviz.org/igb/">the
...
test
...
deployment
...
site
...
for
...
IGB</a>.
...
To
...
get
...
the
...
trunk,
...
follow
...
the
...
Download
...
links.
...
Check
...
the
...
new
...
site
...
looks
...
OK
...
in
...
a
...
Web
...
browser
...
Open
...
the
...
site
...
...
if
...
it's
...
on
...
your
...
local
...
computer)
...
in
...
a
...
Web
...
browser
...
and
...
check
...
that
...
- there
...
- are
...
- no
...
- typos
...
- or
...
- errors
...
- in
...
- the
...
- descriptive
...
- text
...
- next
...
- to
...
- the
...
- new
...
- genome
...
- directory
...
- the
...
- genome
...
- directory
...
- has
...
- a
...
- header
...
- and
...
- all
...
- the
...
- files
...
- are
...
- listed
...
- each
...
- file
...
- has
...
- a
...
- description
...
- all
...
- the
...
- links
...
- in
...
- the
...
- HEADER
...
- file
...
- still
...
- work
...
- (report
...
- any
...
- broken
...
- links
...
- as
...
- a
...
- Jira
...
- issue
...
- and
...
- notify
...
- Ann)
...
If
...
all
...
tests
...
pass
...
on
...
both
...
versions,
...
use
...
svn
...
to
...
check
...
in
...
your
...
changes.
...
These
...
include:
...
- edits
...
- to
...
- contents.txt
...
- addition
...
- of
...
- the
...
- new
...
- genome
...
- directory
...
- addition
...
- of
...
- the
...
- genome.txt
...
- file
...
- for
...
- the
...
- genome
...
- edits
...
- to
...
- species.txt
...
- (possibly)
...
When
...
you're
...
finished,
...
submit
...
a
...
ticket
...
to
...
IGB
...
Jira
...
to
...
request
...
testing
...
of
...
the
...
newly
...
added
...
genome
...
on
...
a
...
test
...
site.
...
If
...
you
...
downloaded
...
the
...
2bit
...
file
...
from
...
UCSC,
...
include
...
a
...
link
...
to
...
the
...
file.
...
If
...
you
...
created
...
it
...
from
...
a
...
fasta
...
file,
...
include
...
a
...
link
...
to
...
the
...
fasta
...
file
...
instead.
Add annotations
Now, the basic structure - sequence and meta-data about contigs and chromosomes - is available. Now you need to add the annotations. For this, see:
Updating RefGene UCSC data set for an existing genome in IGB QuickLoad