Table of Contents

Introduction

UCSC genome bioinformatics supports mammalian, insect, fish, avian, and some fungal genomes.

IGBQuickLoad contains genome directories with sequence and annotations data for some (not all) genome assemblies supported at UCSC.

The following describes how to add a new genome version to the IGB QuickLoad repository and update IGB species.txt.

Command-line utilities you'll need need

faToTwoBit from UCSC (needed if there is no 2Bit file available)
twoBitInfo from UCSC (needed to generate the genome.txt file)
UNIX wget (not installed by default on Mac but available on most other UNIX systems)
UNIX sort (should be pre-installed on any UNIX system, including Mac)
IGBQuickLoad scripts in genomes/pub/src subversion repo.

Compiled UCSC software tools are available from http://hgdownload.cse.ucsc.edu/admin/exe/

...

.

...

Get

...

the

...

compiled

...

programs

...

these

...

in

...

a

...

directory

...

in

...

your

...

PATH.

...

Make

...

sure

...

they

...

are

...

executable

...

on

...

your

...

system.

...

If

...

you're

...

doing

...

this

...

on

...

a

...

Mac

...

desktop

...

or

...

laptop

...

computer,

...

create

...

a

...

directory

...

called

...

"bin"

...

in

...

your

...

home

...

directory

...

and

...

save

...

all

...

compiled

...

binaries

...

there.

...

Edit

...

your

...

.bash_profile

...

file

...

to

...

include

...

a

...

line

...

like

...

the

...

following

...

to

...

ensure

...

that

...

the

...

shell

...

can

...

find

...

the

...

programs.

...

}

Code Block

export PATH=.:$HOME/bin:$PATH {code}

For

...

example,

...

the

...

following

...

sequence

...

of

...

commands

...

downloads

...

the

...

software,

...

moves

...

it

...

to

...

a

...

directory

...

named

...

"bin"

...

in

...

the

...

home

...

directory,

...

and

...

then

...

makes

...

it

...

executable

...

using

...

the

...

chmod

...

command.

...

}

Code Block

wget http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.i386/twoBitInfo mv twoBitInfo ~/bin chmod a+x ~/bin/twoBitInfo {code} h1.

Step-by-step

...

guide

...

to

...

adding

...

a

...

new

...

UCSC

...

genome

...

to

...

IGBQuickLoad

...

The
...
following
...
instructions
...
explain
...
how
...
to
...
set
...
up
...
your
...
local
...
copy
...
of
...
the
...
public
...
QuickLoad
...
data
...
repository
set up your environment to run QuickLoad scripts
get annotation files and sequence files from UCSC Genome Bioinformatics
convert files to random access, indexed file formats that enable partial data loading in IGB
update meta-data files IGB requires to update its interface and allow users to access the newly added genome
update HEADER.html and other files describing the new genome
commit your new files and updates the repo
submit a Jira ticket requesting that the main site and mirror sites be updated
If you have questions don't hesitate to ask Ann.

Get the QuickLoad data repo

Check out or update a copy of IGB QuickLoad data directories.

Open a terminal (UNIX shell) and change into the directory where you want your checked-out copy of the genomes repository to reside.

For example, you might do this:

Code Block
cd # change into your home directory svn co https://svn.transvar.org/repos/genomes/trunk/pub/quickload {code}

Or,

...

if

...

you

...

already

...

have

...

a

...

copy,

...

just

...

update

...

using

...

svn

...

up.

...

}

Code Block

svn up {code}

This

...

will

...

update

...

everything

...

in

...

the

...

current

...

working

...

directory

...

and

...

all

...

the

...

directories

...

beneath

...

it.

...

To

...

avoid

...

conflicts

...

with

...

other

...

people's

...

committed

...

changes,

...

be

...

sure

...

to

...

update

...

your

...

local

...

copy

...

of

...

IGB

...

QuickLoad

...

repo

...

before

...

starting

...

work.

...

Questions

...

about

...

using

...

subversion?

...

See:

...

...

...

...

...

.

Configure your Apache web server to server the IGB QuickLoad data directories via http

You'll need this to test that the new genome directory looks OK when visited in a Web browser. How you do this will depend on your computer. The following instructions explain how to do this on a Mac.

Use locate to find your local copy of httpd.conf, the Apache configuration file. Probably it's located at /private/etc/apache2/httpd.conf

...

,

...

depending

...

on

...

your

...

system.

...

Open

...

a

...

terminal

...

window

...

and

...

change

...

into

...

the

...

same

...

directory

...

as

...

the

...

configuration

...

file.

...

Make

...

a

...

backup

...

copy

...

of

...

the

...

file:

...

}

Code Block

cp httpd.conf httpd.conf.bak
{code}

* Use sudo to open the file in a text editor like {{pico}} or {{emacs}} and enter your password. \*Note\* this only works if you have admin privileges on your computer. If you can't edit this file, then you'll need to get help before proceeding.

{code}

Use sudo to open the file in a text editor like pico or emacs and enter your password. *Note* this only works if you have admin privileges on your computer. If you can't edit this file, then you'll need to get help before proceeding.

Code Block

sudo pico httpd.conf
{code}

* Find the place in the file that says *DocumentRoot*. Comment the current DocumentRoot and substitute the full path to your checked-out copy of the QuickLoad repository:

{code}

Find the place in the file that says DocumentRoot. Comment the current DocumentRoot and substitute the full path to your checked-out copy of the QuickLoad repository:

Code Block
#DocumentRoot "/Library/WebServer/Documents" DocumentRoot "/Users/username/quickload" {code} and {code}

and

Code Block
#<Directory "/Library/WebServer/Documents"> <Directory "/Users/username/quickload"> {code} where {{quickload}} is your copy of the

where quickload is your copy of the checked-out

...

repo.

...

Restart

...

Apache.

...

To

...

restart

...

Apache

...

on

...

a

...

Mac,

...

open

...

Apple

...

>

...

System

...

Preferences

...

...

...

>

...

Sharing

...

and

...

select

...

Web

...

Sharing

...

.

...

If

...

it

...

is

...

already

...

selected,

...

that

...

means

...

Apache

...

is

...

already

...

running.

...

Unselect

...

it

...

to

...

stop

...

Apache

...

and

...

then

...

select

...

it

...

again

...

to

...

restart

...

Apache.

...

Open

...

a

...

Web

...

browser

...

and

...

enter

...

url

...

http://localhost

...

.

...

(You

...

may

...

need

...

to

...

refresh

...

your

...

browser.)

...

You

...

should

...

see

...

now

...

see

...

something

...

that

...

looks

...

exactly

...

like

...

the

...

public

...

IGB

...

QuickLoad

...

site.

...

Note

Now,

you

can

configure

IGB

to

access

your

local

copy

of

IGB

QuickLoad

using

both

the

URL

[

http://localhost

] \

*

or*

\*

using

the

file

chooser

because

IGB

supports

QuickLoad

access

via

the

Web

(http)

or

from

local

files.

{note} h3. Check out or update a copy of IGB QuickLoad source code

Check out or update a copy of IGB QuickLoad source code (src)

...

directory.

...

As

...

before,

...

open

...

a

...

terminal

...

and

...

change

...

into

...

the

...

directory

...

where

...

you

...

want

...

your

...

checked-out

...

copy

...

of

...

the

...

genomes

...

src

...

code

...

to

...

reside.

...

A

...

good

...

place

...

for

...

checked-out

...

code

...

is

...

a

...

directory

...

named

...

src

...

in

...

your

...

home

...

directory.

...

To

...

set

...

up

...

a

...

src

...

directory

...

for

...

checked-out

...

code:

...

}

Code Block

cd mkdir src cd src {code}

Then,

...

use

...

svn

...

to

...

get

...

a

...

copy

...

of

...

the

...

QuickLoad

...

source

...

code

...

from

...

the

...

repo

...

and

...

save

...

it

...

to

...

a

...

directory

...

named

...

quickload_src:

...

}

Code Block

$ svn co https://svn.transvar.org/repos/genomes/trunk/pub/src quickload_src {code}

To

...

ensure

...

that

...

you'll

...

be

...

able

...

to

...

run

...

the

...

code,

...

add

...

the

...

new

...

directory

...

to

...

your

...

PATH

...

and

...

your

...

PYTHONPATH

...

environmental

...

variables

...

by

...

editing

...

your

...

.bash_profile

...

startup

...

script:

...

}

Code Block

export PATH=$HOME/src/quickload_src:$PATH
export PYTHONPATH=$HOME/src/quickload_src:$PYTHONPATH
{code}

*Test that it worked* by opening a new terminal and typing {{sample.py}} at the prompt. If your path is correctly configured, the script will run without error.



h2. Get the data from UCSC

h3. Open a Web browser and find the genome you would like to add in the Table Browser at UCSC

Go to [

Test that it worked by opening a new terminal and typing sample.py at the prompt. If your path is correctly configured, the script will run without error.

Get the data from UCSC

Open a Web browser and find the genome you would like to add in the Table Browser at UCSC

Go to http://genome.ucsc.edu/cgi-bin/hgTables

...

Use the genome version menu to determine the month and year of the genome release you want.

Make note of the genome version synonyms UCSC is using. This usually in parentheses next to the month and year of the release. These will need to be included in IGB's curated list of genome version synonyms to ensure compatibility with Galaxy, UCSC DAS, or other external resources.

For example, the assembly menu for zebrafish reads Jul. 2010 (Zv9/danRer7).

...

The

...

terms

...

in

...

parentheses

...

are

...

genome

...

version

...

synonyms

...

for

...

this

...

assembly.

...

The

...

one

...

on

...

the

...

right

...

(danRer7)

...

is

...

what

...

UCSC

...

calls

...

the

...

"database"

...

for

...

this

...

assembly

...

and

...

is

...

used

...

as

...

identifier

...

of

...

the

...

genome

...

in

...

the

...

UCSC

...

DAS1

...

data

...

source.

...

The

...

term

...

on

...

left

...

(Zv9)

...

is

...

usually

...

another

...

commonly-used

...

term,

...

sometimes

...

assigned

...

by

...

the

...

sequencing

...

consortium

...

that

...

generated

...

the

...

assembly

...

or

...

the

...

original

...

sequence.

...

Sometimes,

...

however,

...

this

...

term

...

is

...

not

...

unique.

...

For

...

example,

...

some

...

genome

...

versions

...

are

...

reported

...

with

...

the

...

term

...

"Broad,"

...

which

...

is

...

an

...

organization,

...

not

...

an

...

assembly.

...

Name

...

the

...

genome

...

for

...

genus,

...

species,

...

strain

...

(optional),

...

release

...

month

...

and

...

year

...

Choose
...
an
...
IGB
...
genome
...
assembly
...
version
...
identifier
...
to
...
represent
...
the
...
UCSC
...
genome.
...
IGB
...
genome
...
assembly
...
versions
...
identifiers
...
look
...
like:
...
IGB
...
uses
...
genus,
...
species,
...
strain
...
(optionally),
...
and
...
release
...
month
...
and
...
year
...
to
...
identify
...
genome
...
assembly
...
versions
...
for
...
a
...
species,
...
individual,
...
strain,
...
or
...
cultivar
...
whose
...
genome
...
was
...
sequenced.
...
G_species_strain_Mmm_
...
YYYY
where
G is the first letter (upper-case)
...
of
...
the
...
genus
...
name
...
species
...
is
...
the
...
species
...
name
...
(lower-case)
...
strain
...
is
...
cultivar,
...
strain,
...
or
...
individual
...
whose
...
genome
...
was
...
sequenced
...
(this
...
is
...
optional
...
and
...
not
...
usually
...
needed
...
for
...
UCSC-managed
...
genomes)
...
Mmm
...
is
...
the
...
three-letter
...
English
...
abbreviation
...
of
...
the
...
month
...
of
...
the
...
release
...
(first
...
letter
...
is
...
upper-case)
...
YYYY
...
is
...
the
...
year
...
of
...
the
...
release
...
Examples)
...
P_troglodytes_Oct_2010
...
genome
...
assembly
...
for
...
chimp
...
released
...
Oct
...
2010
...
Z_mays_B73_Mar_2010
...
genome
...
assembly
...
for
...
maize
...
plant
...
cultivar
...
B73
...
released
...
March
...
2010
...

Use

...

svn

...

mkdir

...

to

...

create

...

a

...

new

...

genome

...

version

...

for

...

the

...

genomes

...

repo

...

Use

...

svn

...

mkdir

...

to

...

create

...

a

...

new

...

genome

...

directory.

...

The

...

name

...

of

...

the

...

new

...

directory

...

should

...

be

...

identical

...

to

...

the

...

IGB

...

QuickLoad

...

genome

...

version

...

name.

...

Code Block
svn mkdir G_species_Mmm_YYYYY {code} {warning}

Warning

Don't

use

the

UNIX

mkdir

command

to

create

the

directory

and

then

use

svn

add

later

to

add

it

later

to

the

repo.

If

you

do

this

and

are

not

careful,

you

will

accidentally

add

large

sequence

data

files

to

the

repo.

{warning} Change directories into the newly created genome version directory {code}

Change directories into the newly created genome version directory

Code Block

cd G_species_Mmm_YYYYY
{code}

h3. Download the sequence data

Go to UCSC Genome Bioinformatics and click Downloads > Genome Data. Click the link for your species and then click the link labeled Full data set.

This will take you to a directory where you can download files. Typically, the address of the directory is

[

Download the sequence data

Go to UCSC Genome Bioinformatics and click Downloads > Genome Data. Click the link for your species and then click the link labeled Full data set.

This will take you to a directory where you can download files. Typically, the address of the directory is

http://hgdownload.soe.ucsc.edu/goldenPath/UCSCNAME

...

/bigZips/

For example,

...

UCSC's

...

danRer7

...

genome

...

is

...

in

...

http://hgdownload.soe.ucsc.edu/goldenPath/danRer7/bigZips/

...

.

...

Is

...

the

...

2bit

...

file

...

available?

...

For
...
most
...
of
...
the
...
more
...
recent
...
genomes,
...
UCSC
...
is
...
using
...
the
...
2bit
...
format
...
to
...
distribute
...
sequence
...
data.
...
However,
...
some
...
older
...
versions
...
may
...
not
...
make
...
this
...
available.
...

If

...

yes,

...

download

...

it

...

using

...

wget.

...

Right-click

...

the

...

link

...

in

...

your

...

browser

...

and

...

select

...

"Copy

...

Link

...

Location."

...

Return

...

to

...

your

...

terminal

...

UNIX

...

shell,

...

type

...

wget,

...

and

...

paste

...

the

...

URL

...

into

...

the

...

shell.

...

For

...

example,

...

}

Code Block

wget http://hgdownload.soe.ucsc.edu/goldenPath/danRer7/bigZips/danRer7.2bit {code} h6. Rename the file to

Rename the file to G_species_Mmm_YYYYY.2bit.

...

The

...

prefix

...

(file

...

name

...

part)

...

of

...

the

...

2bit

...

file

...

for

...

a

...

genome

...

should

...

be

...

the

...

genome

...

version

...

identifier

...

and

...

suffix

...

(file

...

extension)

...

should

...

be

...

2bit.

...

For

...

example,

...

}

Code Block

mv danRer7.2bit D_rerio_Jul_2010.2bit {code} h5. If

If not,

...

download

...

the

...

sequence

...

data

...

in

...

fasta

...

format.

...

For

...

older

...

genomes,

...

UCSC

...

provides

...

sequence

...

data

...

in

...

a

...

so-called

...

"bigZip"

...

file

...

which

...

contains

...

each

...

assembled

...

sequence

...

(typically

...

one

...

per

...

physical

...

chromosome)

...

in

...

a

...

separate

...

fasta

...

file.

...

For

...

example,

...

as

...

of

...

this

...

writing,

...

the

...

dm3

...

(fruit

...

fly)

...

genome

...

is

...

provided

...

in

...

this

...

format.

...

Right-click

...

the

...

link

...

in

...

your

...

browser

...

and

...

select

...

"Copy

...

Link

...

Location."

...

Return

...

to

...

your

...

terminal

...

UNIX

...

shell,

...

type

...

wget,

...

and

...

paste

...

the

...

URL

...

into

...

the

...

shell.

...

For

...

example,

...

}

Code Block

wget http://hgdownload.soe.ucsc.edu/goldenPath/dm3/bigZips/chromFa.tar.gz {code} h6. Unpack the file Use tar with options xvf to uncompress the file while also extracting its contents. An

...

Unpack the file

...

Use tar with options xvf to uncompress the file while also extracting its contents. An ".fa"

...

file

...

for

...

each

...

chromosome

...

will

...

appear

...

when

...

completed.

...

}

Code Block

tar xvf chromFa.tar.gz {code} {note} Once

Note

Once you've

created

the

2bit

file

for

the

genome

assembly,

you'll

delete

the

.fa

and

the

.gz

files.

{note} h6. Create 2bit file using faToTwoBit Get faToTwoBit installed. In the following example, the MacOS version is downloaded: {code}

...

Create 2bit file using faToTwoBit

...

Get faToTwoBit installed. In the following example, the MacOS version is downloaded:

Code Block
wget http://hgdownload.cse.ucsc.edu/admin/exe/macOSX.i386/faToTwoBit mv faToTwoBit ~/bin chmod a+x ~/bin/faToTwoBit {code}

h7.

...

Convert

...

the

...

fa

...

files

...

to

...

twoBit

...

faToTwoBit

...

will

...

read

...

one

...

or

...

more

...

fasta

...

files

...

and

...

convert

...

them

...

to

...

a

...

single

...

2bit

...

file.

...

To

...

understand

...

how

...

to

...

run

...

it,

...

type

...

the

...

name

...

of

...

the

...

program.

...

If

...

you

...

run

...

it

...

without

...

any

...

arguments,

...

it

...

will

...

print

...

a

...

usage

...

message.

...

}

Code Block

faToTwoBit
faToTwoBit - Convert DNA from fasta to 2bit format
usage:
   faToTwoBit in.fa [in2.fa in3.fa ...] out.2bit
options:
   -noMask       - Ignore lower-case masking in fa file.
   -stripVersion - Strip off version number after . for genbank accessions.
   -ignoreDups   - only convert first sequence if there are duplicates
{code}

The

...

prefix

...

(file

...

name

...

part)

...

of

...

the

...

2bit

...

file

...

you

...

create

...

for

...

this

...

genome

...

version

...

should

...

be

...

the

...

genome

...

version

...

identifier

...

and

...

suffix

...

(file

...

extension)

...

should

...

be

...

"2bit."

...

}

Code Block

faToTwoBit *.fa G_species_Mmm_YYYY.2bit {code} h6. Delete

Delete .fa

...

files

...

and

...

the

...

chromFa.tar.gz

...

file

{}

Code Block

rm *.fa rm chromFa.tar.gz {code} {note} You can always

Note

You can always re-create

the

.fa

files

using

twoBitToFa,

another

UCSC

tool.

{note} {note} The 2bit file is typically smaller than the compressed

Note

The 2bit file is typically smaller than the compressed chromFa.tar.gz

file

it

replaced.

Unlike

the

.tar.gz

file,

it

supports

random

access,

allowing

IGB

to

support

partial

loading

of

sequence

from

the

IGBQuickLoad

site

into

the

coordinates track. {note} h3. Make

coordinates track.

Make genome.txt

...

file

...

Use

...

twoBitInfo

...

to

...

create

...

a

...

genome.txt

...

file

...

for

...

the

...

genome.

...

This

...

file

...

lists

...

sequences

...

and

...

their

...

sizes

...

for

...

the

...

genome.

...

}

Code Block

twoBitInfo G_species_Mmm_YYYY.2bit genome.txt
{code}

{note}
You can use twoBitInfo to make BED files marking the location of N's in the genome or calculate the amount of non-N sequence in an assembly.
{note}

h4. Sort the genome.txt file sequence size

The order of sequences listed in the genome.txt is how they will appear in the IGB Current Sequence tab when users visit the genome version in IGB.

Check the genome.txt file. Are the chromosomes listed in a reasonable order?

If you created the 2bit file from fasta files, then probably they will be listed in alphabetical.

Depending on the state of the genome assembly, i.e., how close to complete it is, it's usually much better to list the chromosomes by size, with larger sequences appearing first in the list.

To ensure that the chromosomes are listed with largest ones first, sort the file

{code}

Note
You can use twoBitInfo to make BED files marking the location of N's in the genome or calculate the amount of non-N sequence in an assembly.

Sort the genome.txt file sequence size

The order of sequences listed in the genome.txt is how they will appear in the IGB Current Sequence tab when users visit the genome version in IGB.

Check the genome.txt file. Are the chromosomes listed in a reasonable order?

If you created the 2bit file from fasta files, then probably they will be listed in alphabetical.

Depending on the state of the genome assembly, i.e., how close to complete it is, it's usually much better to list the chromosomes by size, with larger sequences appearing first in the list.

To ensure that the chromosomes are listed with largest ones first, sort the file

Code Block
sort -k2,2nr genome.txt > tmp mv tmp genome.txt {code} h4. Add

Add genome.txt

...

to

...

the

...

repo

...

Add

...

the

...

genome.txt

...

file

...

to

...

the

...

repo

...

}

Code Block

svn add genome.txt {code} h2. Edit

Edit contents.txt

...

Add

...

the

...

new

...

genome

...

to

...

the

...

contents.txt

...

in

...

the

...

top

...

level

...

of

...

your

...

checked-out

...

quickload

...

directory.

...

The

...

contents.txt

...

file

...

is

...

a

...

tab-separated

...

file

...

with

...

two

...

columns:

...

Column

...

1

...

-

...

genome

...

directory

...

Column

...

2

...

-

...

genome

...

title

...

The

...

genome

...

title

...

is

...

what

...

IGB

...

will

...

display

...

in

...

the

...

title

...

bar

...

when

...

users

...

visit

...

the

...

new

...

genome.

...

To

...

create

...

the

...

title,

...

follow

...

the

...

same

...

conventions

...

as

...

for

...

the

...

other

...

UCSC

...

genomes.

...

Title

...

begins

...

with

...

genus

...

and

...

species,

...

followed

...

by

...

the

...

date

...

of

...

the

...

release

...

(in

...

parentheses),

...

followed

...

by

...

the

...

common

...

name

...

for

...

the

...

species,

...

followed

...

by

...

the

...

UCSC

...

genome

...

name

...

(in

...

parentheses).

...

For

...

example,

...

}

Code Block

Cavia porcellus (Feb 2008) guinea pig (cavPor3) {code} {warning} Note that this file is

Warning

Note that this file is tab-separated,

so

when

you

edit

the

file,

be

sure

to

use

a

tab

character

to

separate

the

genome

version

and

geome

title

fields.

{warning} h1. Edit files to make the Web site look nicer. The main job of a QuickLoad site is to enable users to load data into IGB. However, a QuickLoad site is also a Web site, and so it's important to provide some additional files for users who visit the site in their Web browser. When you add a new set of files to the main IGBQuickLoad site, you need to also: * provide appropriate HEADER.html files for the genome version directory * edit the .htacess file in the QuickLoad root directory (one level above the genome version directories) h2. Add a new HEADER.html file to the genome directory. For this, use the script named {{writeQuickLoadHeaderUCSC.py}}. To run it, make sure you are in the top level of the QuickLoad directory and enter: {src}

Edit files to make the Web site look nicer.

The main job of a QuickLoad site is to enable users to load data into IGB. However, a QuickLoad site is also a Web site, and so it's important to provide some additional files for users who visit the site in their Web browser. When you add a new set of files to the main IGBQuickLoad site, you need to also:

provide appropriate HEADER.html files for the genome version directory
edit the .htacess file in the QuickLoad root directory (one level above the genome version directories)

Add a new HEADER.html file to the genome directory.

For this, use the script named writeQuickLoadHeaderUCSC.py. To run it, make sure you are in the top level of the QuickLoad directory and enter:

Code Block
writeQuickLoadHeaderUCSC.py G_species_MMMMmm_YYYY > G_species_MMMMmm_YYYY/HEADER.html {src}

The

...

script

...

will

...

read

...

the

...

contents.txt

...

file,

...

look

...

for

...

the

...

human-readable

...

title

...

of

...

the

...

genome

...

(column

...

2)

...

for

...

the

...

directory

...

G_species_

...

Mmm_YYYY,

...

and

...

then

...

print

...

text

...

for

...

HEADER.html

...

to

...

stdout.

...

Edit the .htaccess

...

file

...

When

...

users

...

open

...

a

...

location

...

in

...

the

...

IGB

...

QuickLoad

...

site

...

using

...

their

...

Web

...

browser,

...

the

...

browser

...

displays

...

a

...

list

...

of

...

files

...

and

...

some

...

descriptive

...

text

...

to

...

each

...

file.

...

This

...

happens

...

because

...

the

...

IGB

...

QuickLoad

...

site

...

has

...

a

...

file

...

named

...

.htaccess

...

that

...

causes

...

Apache

...

to

...

display

...

this

...

information.

...

So

...

when

...

you

...

add

...

a

...

new

...

file

...

type

...

or

...

a

...

new

...

genome

...

to

...

the

...

QuickLoad

...

site,

...

you

...

also

...

need

...

to

...

add

...

a

...

new

...

Description

...

to

...

the

...

.htaccess

...

file.

...

Open

...

the

...

file

...

in

...

a

...

text

...

editor

...

and

...

add

...

one

...

new

...

line

...

for

...

the

...

genome,

...

the

...

same

...

text

...

you

...

added

...

to

...

the

...

contents.txt

...

file,

...

but

...

with

...

the

...

order

...

to

...

the

...

columns

...

reversed.

...

Use

...

the

...

same

...

text

...

to

...

ensure

...

consistency

...

between

...

what

...

IGB

...

shows

...

in

...

the

...

window

...

title

...

bar,

...

what

...

the

...

HEADER.html

...

title

...

displays,

...

and

...

what

...

appears

...

in

...

the

...

directory

...

description

...

when

...

users

...

view

...

the

...

Web

...

site

...

in

...

their

...

Web

...

browser.

...

For

...

example:

...

}

Code Block

AddDescription "Gallus gallus (Nov 2011) chicken (galGal4/ICGC Gallus-gallus-4.0)" G_gallus_Nov_2011 {code} *

Note:

...

Add

...

the

...

new

...

text

...

to

...

the

...

end

...

of

...

the

...

file.

...

Test

...

the

...

new

...

genome

...

Testing
...
is
...
absolutely
...
critical
...
as
...
it
...
is
...
easy
...
to
...
make
...
an
...
error
...
along
...
with
...
way.
...
Plan
...
to
...
spend
...
at
...
least
...
as
...
much
...
time
...
testing
...
as
...
you
...
spent
...
on
...
building
...
the
...
site!
...

Test under the released version of IGB

Download the latest release of IGB from http://www.bioviz.org

...

.

...

Configure

...

data

...

sources

...

under

...

Data

...

Sources

...

tab

...

Click
...
the
...
configure
...
link.
...
Add
...
your
...
local
...
copy
...
of
...
the
...
IGB
...
QuickLoad
...
site
...
to
...
your
...
list
...
of
...
data
...
sources
...
Remove
...
all
...
data
...
sources
...
EXCEPT
...
the
...
UCSC
...
DAS1
...
server
...
and
...
your
...
local
...
QL
...
directory

Check that the new genome version appears

In the species menu, you should see the organism's genus, species, and strain listed. When you hover the mouse over the its menu item, you should also see a tooltip reporting the species' colloquial name (in English).

In the genome version menu, you should see the genome version name listed.

If not, edit species.txt

If you don't, this only means that IGB doesn't recognize it. To ensure IGB recognizes the genome, add it to the file species.txt that resides at the top level of the IGBQuickLoad directory, the same level as contents.txt.

species.txt is a tab-separated file in which each line represents synonyms for a genome. The first column should list the full Linnean name for the species; this is what will appear in the species menu. It can include spaces. The second column should list the common (colloquial) name for the species, in English. The next column should contain the IGB geome version minus the data and also minus any numbers at the end of the name. The next column should contain the UCSC genome version, minus the number. The final column should contain the genus and species name joined by an underscore.

For example:

Code Block
Pan troglodytes Chimp P_troglodytes panTro Pan_troglodytes {code} h3. Restart IGB and

Restart IGB and re-test

...

When
...
you're
...
done,
...
restart
...
IGB
...
and
...
check
...
that
...
the
...
species
...
and
...
version
...
are
...
displayed
...
correctly.
...

Use

...

the

...

Species

...

and

...

Genome

...

menus

...

(under

...

Current

...

Sequence

...

tab)

...

to

...

select

...

your

...

new

...

genome

...

Choose
...
your
...
genome
...
version.
...
The
...
chromosomes
...
from
...
your
...
genome.txt
...
file
...
should
...
then
...
appear
...
in
...
the
...
Current
...
Sequence
...
tab.
...
You
...
should
...
also
...
see
...
the
...
corresponding
...
UCSC
...
DAS1
...
data
...
source
...
appear
...
in
...
the
...
Data
...
Sets/Data
...
Sources
...
tree
...
on
...
the
...
left.
...
If
...
this
...
step
...
fails,
...
it
...
may
...
be
...
there
...
is
...
a
...
problem
...
with
...
your
...
genome.txt
...
file.
...

Test

...

that

...

you

...

can

...

load

...

sequence

...

data.

...

Check
...
that
...
IGB
...
can
...
load
...
sequence
...
data
...
from
...
your
...
local
...
2bit
...
file.
...
Visit
...
a
...
chromosome,
...
zoom
...
in,
...
and
...
click
...
Load
...
Sequence.
...
Zoom
...
in
...
to
...
check
...
that
...
the
...
sequence
...
is
...
visible.
...
Note
...
that
...
depending
...
on
...
which
...
sequence
...
file
...
you
...
used,
...
some
...
letters
...
will
...
be
...
lower-case.
...
This
...
is
...
how
...
UCSC
...
masks
...
repetitive
...
or
...
low-complexity
...
sequence.
...

Repeat

...

the

...

above

...

tests

...

using

...

the

...

latest

...

(trunk)

...

version

...

of

...

IGB

...

To
...
get
...
a
...
copy
...
of
...
the
...
trunk,
...
check
...
it
...
using
...
svn
...
from
...
sourceforge
...
OR
...
download
...
it
...
from
...
<a
...
href="http://test.bioviz.org/igb/">the
...
test
...
deployment
...
site
...
for
...
IGB</a>.
...
To
...
get
...
the
...
trunk,
...
follow
...
the
...
Download
...
links.
...

Check

...

the

...

new

...

site

...

looks

...

OK

...

in

...

a

...

Web

...

browser

...

Open
...
the
...
site
...
(http://localhost
...
if
...
it's
...
on
...
your
...
local
...
computer)
...
in
...
a
...
Web
...
browser
...
and
...
check
...
that
...
there
...
are
...
no
...
typos
...
or
...
errors
...
in
...
the
...
descriptive
...
text
...
next
...
to
...
the
...
new
...
genome
...
directory
...
the
...
genome
...
directory
...
has
...
a
...
header
...
and
...
all
...
the
...
files
...
are
...
listed
...
each
...
file
...
has
...
a
...
description
...
all
...
the
...
links
...
in
...
the
...
HEADER
...
file
...
still
...
work
...
(report
...
any
...
broken
...
links
...
as
...
a
...
Jira
...
issue
...
and
...
notify
...
Ann)
...

If

...

all

...

tests

...

pass

...

on

...

both

...

versions,

...

use

...

svn

...

to

...

check

...

in

...

your

...

changes.

...

These
...
include:
...
edits
...
to
...
contents.txt
...
addition
...
of
...
the
...
new
...
genome
...
directory
...
addition
...
of
...
the
...
genome.txt
...
file
...
for
...
the
...
genome
...
edits
...
to
...
species.txt
...
(possibly)
...
When
...
you're
...
finished,
...
submit
...
a
...
ticket
...
to
...
IGB
...
Jira
...
to
...
request
...
testing
...
of
...
the
...
newly
...
added
...
genome
...
on
...
a
...
test
...
site.
...
If
...
you
...
downloaded
...
the
...
2bit
...
file
...
from
...
UCSC,
...
include
...
a
...
link
...
to
...
the
...
file.
...
If
...
you
...
created
...
it
...
from
...
a
...
fasta
...
file,
...
include
...
a
...
link
...
to
...
the
...
fasta
...
file
...
instead.

Page tree

Page History

Versions Compared

Old Version 18

New Version 19

Key

Introduction

Command-line utilities you'll need need

Step-by-step

guide

to

adding

a

new

UCSC

genome

to

IGBQuickLoad

Get the QuickLoad data repo

Check out or update a copy of IGB QuickLoad data directories.

Configure your Apache web server to server the IGB QuickLoad data directories via http

Check out or update a copy of IGB QuickLoad source code (src)

directory.

Get the data from UCSC

Open a Web browser and find the genome you would like to add in the Table Browser at UCSC

Use the genome version menu to determine the month and year of the genome release you want.

Name

the

genome

for

genus,

species,

strain

(optional),

release

month

and

year

Use

svn

mkdir

to

create

a

new

genome

version

directory

for

the

genomes

repo

Download the sequence data

Is

the

2bit

file

available?

For ... most ... of ... the ... more ... recent ... genomes, ... UCSC ... is ... using ... the ... 2bit ... format ... to ... distribute ... sequence ... data. ... However, ... some ... older ... versions ... may ... not ... make ... this ... available....

If

yes,

download

it

using

wget.

Rename the file to G_species_Mmm_YYYYY.2bit.

If not,

download

the

sequence

data

in

fasta

format.

Unpack the file

Create 2bit file using faToTwoBit

Delete .fa

files

and

the

For
...
most
...
of
...
the
...
more
...
recent
...
genomes,
...
UCSC
...
is
...
using
...
the
...
2bit
...
format
...
to
...
distribute
...
sequence
...
data.
...
However,
...
some
...
older
...
versions
...
may
...
not
...
make
...
this
...
available.
...

Add
...
the
...
genome.txt
...
file
...
to
...
the
...
repo
...
}
Code Block
svn add genome.txt {code} h2. Edit

Testing
...
is
...
absolutely
...
critical
...
as
...
it
...
is
...
easy
...
to
...
make
...
an
...
error
...
along
...
with
...
way.
...
Plan
...
to
...
spend
...
at
...
least
...
as
...
much
...
time
...
testing
...
as
...
you
...
spent
...
on
...
building
...
the
...
site!
...

When
...
you're
...
done,
...
restart
...
IGB
...
and
...
check
...
that
...
the
...
species
...
and
...
version
...
are
...
displayed
...
correctly.
...