Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added information about CRAM files/file indexes to the index row of the table.

Step One: Create a QuickLoad root directory.

Create a QuickLoad root directory (folder) on your local computer or on a Web server. This folder will contain your genome directories and a "meta-data" file called contents.txt.

Tip

If you are hosting a QuickLoad site using the Apache Web server, you can configure QuickLoad to demand a user name and password from anyone who accesses the data by adding an .htacess file to the top level directory.
See: http://httpd.apache.org/docs/2.0/howto/auth.html.

Step Two: Create one or more genome directories.

Create genome directories corresponding to genome versions you want to make available via your QuickLoad. 

First, find out if main QuickLoad site already supports your genome of interest.

If you're setting up a QuickLoad site for a new (unsupported) genome

If your genome is not listed in http://igbquickload.org/quickload/contents.txt, then you'll need to create your own, IGB-friendly name. Doing this will ensure that your genome version will be displayed correctly in the Current Genome tabbed panel menus.  the names of these directories should follow the IGB format of:

  • (First letter of the genus, capitalized)   (species, all lower case)   (3 letters indicating month of release, first letter capitalized)   (4 digits, the year).

For example, A_thaliana_Jun_2009 is the TAIR10 Arabidopsis thaliana genome assembly that was released in June 2009.

You can also indicate subspecies or cultivated varieties by including additional suffixes.

For example, there are two Oryza sativa (rice) subspecies in wide cultivation: japonica and indica.The first rice sequence published was from the Nipponbare variety which is a member of the japonica subspecies. Because an indica genome sequence is now available, we need names that distinguish them. We distinguish the japonica assemblies using prefix O_sativa_japonica and the month and year of release: O_sativa_japonica_Oct_2011. Annotations and sequence data reside in genome version folder O_sativa_japonica_Oct_2011 on the main IGB QuickLoad site - view its contents by visiting http://igbquickload.org/quickload/O_sativa_japonica_Oct_2011

Tip: If you set up your QuickLoad site in the Web directories of an Apache server, you can modify and fine-tune how files are displayed by adding directives to the .htaccess file in that directory.  You can configure Apache to allow users to view the contents of your QuickLoad directory and include a short description of each file or data set. Here is an example:

Panel

Options Indexes
IndexOptions FancyIndexing IgnoreCase FoldersFirst DescriptionWidth=*
AddDescription "Arabidopsis thaliana TAIR9 genome release" A_thaliana_Jun_2009
AddDescription "Arabidopsis thaliana TAIR8 genome release" A_thaliana_Apr_2008
AddDescription "Arabidopsis thaliana TIGRv5/TAIR7 genome release" A_thaliana_Jan_2004
AddDescription "TAIR9 annotation file" TAIR9_*.gz
AddDescription "Annotation file list" annots.[txt|xml]
AddDescription "Chromosome lengths and assembly information" genome.txt
AddDescription "Available genome assemblies" contents.txt
AddDescription "Mitochondrial Genome (Fasta format)" [Cc]hrM.fa*
AddDescription "Chloroplast Genome (Fasta format)" [Cc]hrC.fa*

Step Three: Create contents.txt file listing available genome versions for your QuickLoad site.

Create a simple, plain text file called contents.txt and add it the top-level directory.

This file should be plain text, tab-delimited; all columns must be separated by a tab character. If you create this file by hand, be sure to use an editor that inserts tab characters when you type a tab.

The first column should list names of the genome directories you have created and want to share.

The second column is optional - it contains text IGB will display on the title bar when you open IGB.

Note: You can include other directories and other files in your QL site folder;  IGB will simply ignore anything that isn't listed in your contents.txt file. Also, any changes in the name of the genome directory must be updated in the contents.txt file or IGB will not recognize it.

Optional: Create a synonyms.txt file.

This is a list of synonyms for genomes. This list allows you to match names across different QuickLoad sites and also the Galaxy workflow site. For example, if a QuickLoad site at another location uses different names to refer to the same genome version, you can specify these in the synonyms.txt. Each line contains any number of synonyms for a genome, separated by tabs.

Here is an example: http://igbquickload.org/quickload/synonyms.txt

Step Four: Create genome.txt files for each genome represented in your QuickLoad site.

For each genome directory, create a genome.txt file that lists all the chromosomes in your genome, together with their sizes.

As with the contents.txt file, it should be tab delimited.

The first column lists the chromosome names and the second column lists their sizes. IGB uses this file to create the sequence selection table under the Data Access Panel.

Here is an example: http://igbquickload.org/quickload/A_thaliana_Jun_2009/genome.txt

If your genome version is already part of the main IGB QuickLoad site, then you can get the genome.txt file from there. Just download it and save into your local QuickLoad site.

If not, you can easily create a your genome.txt file from a sequence 2bit file (see below) using twoBitInfo, available from http://hgdownload.cse.ucsc.edu/admin/exe/. See Step Seven: Add sequence data.

Step Five: Create annots.xml files.

Each genome version subdirectory needs an annots.xml file that is either empty or contains a listing of annotations you want to appear the Available Data Sets section of IGB's Data Access tabbed panel.

Here is a simple example:

Code Block
<files>
  <file name="TAIR10.bed.gz"
   title="TAIR10 ALL"
   description="All TAIR10 genome models"
   label_field="id"
   background="000000"
   foreground="00FFFF"
   max_depth="10"
   name_size="12"/>
</files>

Note that the name attribute of the file tag indicates the physical location of the file relative to the directory where the annots.xml file resides. However, you can also use a URL to indicate the location of the file. This means that you if you want to reference other resources hosted on other machines, you can do it. For example, instead of the file name "TAIR10.bed.gz", you could enter "http://igbquickload.org/quickload/A_thaliana_Jun_2009/TAIR10.bed.gz", which is the internet address of the file.

Also note that if you are hosting BAM files, the BAM file index ".bai" file needs to reside in the same directory as the BAM file. Otherwise, IGB will display an error message when users try to access the data.

When loaded into IGB, each file will create one or more tracks. (Some file types, such as BED and GFF3, can specify multiple tracks.)

Annots.xml options - specify track color, annotation style, and more

An annots.xml file contains one or more file tags, enclosed in a files element. The file tag has attributes that will dictate how the data will look when displayed in IGB. Most of these options correspond to options users configure using the Tracks tab in the IGB Preferences window. (See File > Preferences > Tracks.)

...

attribute

...

optional

...

Description

...

name

...

Required

...

The name of the file on your file system or a URL. The URL can point to any file, anywhere, provided it's accessible via the internet.

...

title

...

Use annots.xml to specify track color, annotation style, and more

An annots.xml file resides in an IGB QuickLoad genome directory. It lists all data files available for that genome.

The file contains one or more file tags, enclosed in a files element. The file tag has attributes that control how the data looks when loaded as new track into IGB. Many options correspond to options you can configure using the Annotation or Graph tabbed panels in  IGB.


attribute

optional

Description

name

Required

The path to the file on the Quickload site itself or an absolute URL. The URL can point to any file, anywhere, provided it's accessible via the internet.

You can specify the location of the file using an absolute URL or the path to the file relative to the genome version directory where the annots.xml file resides.

ex) file path

filename="bamfiles/treatment.bam" - a directory "bamfiles" must reside in the genome version directory, alongside annots.xml for that genome version

ex)

name="http://www.example.com/bamfiles/treatment.bam" (absolute)

If the file is a reference file, such as 2bit, the reference option must be set to "true".

index

Optional

If the file is a BAM, CRAM, or tabix indexed file, then IGB assumes the index file has the same name as the target file with a standard extension appended. BAM file indexes have extension ".bai", CRAM file indexes have extension ".crai", and tabix indexes have extension ".tbi." 

However, if the index is in a different location or has a non-standard name, you can specify its location and file name using the index attribute. 

ex) file path

index="indexes/myindex.tbi"

ex) absolute URL

index="http://www.example.com/indexfile.bai"


title

Required

User-friendly text IGB will show in the Data Access Panel as the title of the data set.

If you don't provide a title, IGB will display the name attribute instead

If you would like IGB to display data set titles in nested folders in the Available Data Sets section of the Data Access panel, insert the folder separator character "/" used under the Unix operating system (a forward slash) into the title.

For example, if a file title is RNA-Seq/Treatment/Sample1, then IGB displays the data set named Sample1 inside a named Treatment inside another folder named RNA-Seq.

description

Optional

IGB

will display

displays this text

as a

in the tooltip that appears when

users hover

the mouse hovers over the data set title in the Data Access tab.

If this value is present, then IGB displays an "info" icon next to the file name.

url

Optional

Use this tag to specify the location of

a

an external Web page describing the data set. If provided, IGB will display an "info" icon next to the data set title.

When users click the icon, their Web browser will open showing the contents of the URL you provide.

This attribute can be specified as either an absolute url, or a relative path from the root directory of your Quickload server.

You can specify the location of the file using an absolute URL or a relative URL.

An absolute URL looks like this: "http://www.example.com"

Relative in this case means: relative to the Quickload root directory. (Note that this is different from how the file attribute works!)

For example, to show "About.html" residing inside a genome version directory named E_unicornis_Jun_2009, use relative url "E_unicornis_Jun_2009/About.html"

load_hint

Optional

If used, its value should be "Whole Sequence". Using this tag will force IGB to load the entire file when users select the genome version, which is usually appropriate only for reference gene model annotations or other equally small data sets.

Note: IGB stores local copies of files with load_hint "Whole Sequence" for faster access. See Cache tab under Settings.

label_field

Optional

Use this field to indicate the annotation property (e.g., "score" or "id")

that should be

used

in IGB

to label individual annotations

. For gene models, "id" is best

within a track. To show no label, insert none.

To find out what annotation properties are available, view the Selection Info for an annotation loaded from the file. To view annotation properties:

  • Open the file and load some data.
  • Click an annotation to select it.
  • View Selection Info by clicking the "i" button (top left) or opening the Selection Info tab.
  • Alternatively, choose the Selection Info tab to view another tabular view of properties.

Annotation properties are listed in the first column.

referenceOptionalSet to "true" if the file is a 2bit reference file.

background

Optional

Use this field to

define

set a track's background color.

Note you must use

Set colors using six-digit, hexadecimal triplets

to specify color, but do not include a leading "#" character.

.

For example, to

specify

set a white background, use background="FFFFFF".

foreground

Optional

Use this field to define annotation or graph colors (e.g. foreground="00FFFF"). See above.

max_depth

Optional

Use this field to define the default

max depth value,

maximum stack height for an annotations track. This is the number of annotations

that can be

shown individually in a stack

(max_depth="10" or max

.

Any annotations exceeding this number will be drawn on top of each in a single row at the top of the track.

This is called the "slop" row.

If the track is configured to show plus and minus strand features separately, then the "slop row" for minus strand features appears in the bottom row.

To allow IGB to show all available annotations use max_depth="0" for unlimited

)

.

name_size

Optional

Use this field to define the default track label name font size (e.g. name_size="12")

connected

Optional

Use this field to define the default boolean value for the connected field  (e.g. connected="true" or connected="false"). The default is "true". This tells IGB to draw lines linking the blocks belonging to

larger

compound features assembled from smaller ones.

show2tracks

Optional

Use this field to define the default boolean value for the show2Tracks field  (e.g. show2tracks="true" or show2tracks="false"). If you would like minus and plus strand features to be displayed in the same track, set this to "false." The default is "true."

direction_type

Optional

Use this field to dictate whether annotations or alignments will be shown using arrows and/or color to indicate direction.  (e.g. direction_type="arrow", direction_type="color", direction_type="both", direction_type="none"). The default is "none."

positive_strand_color

Optional

Use this field to define the default positive strand color (e.g. positive_strand_color="CCFFFF") for when the direction_type setting is "color."

negative_strand_color

Optional

Use this field to define the default negative strand color (e.g. negative_strand_color="33FFFF") for when the

directon

direction_type setting is "color."

Step Six: Add your annotation files.

Place your annotation, BAM, or graph (bedgraph or bigwig) files into the appropriate genome sub-directories. You can you use any format IGB supports. 

Starting with IGB 6.6, IGB QuickLoad can support BED, bedgraph, and GFF files that have been sorted, compressed, and indexed using the bgzip and tabix utilities. Doing this helps speed up data loading in IGB and allows loading data by region. For an example of how we used tabix to distribute coverage and junction files from an RNA-Seq data set, see Creating a new genome release for IGB QuickLoad from the IGB Developer's Guide.

Step Seven: Add sequence data (if working with a new genome.)

You may not need to do this if you are working with a genome that other QuickLoad sites support. IGB may be able to retrieve sequence data for your genome from other sites if it can find another site that supports the same genome version and contains sequence data. However, if you are working with a newly sequenced genome or don't want to use other groups' servers, you can support IGB's sequence visualization functionality by setting up your own sequences to distribute.

To do this, you will need to obtain sequence files and convert them to a binary format (twobit) that allow IGB to quickly obtain all or part of the sequence data when users press the Get Sequence button, run a blast search, or open the Sequence Viewer window.

The twobit format was first developed for the blat cDNA-to-genome alignment tool. IGB 2bit uses to support efficient retrieval of sequence data. To convert a fasta format file to twobit, use faToTwoBit available from http://hgdownload.cse.ucsc.edu/admin/exe/.

To convert a fasta file to 2bit, do something like this:

Code Block
$ faToTwoBit A_thaliana_Jun_2009.fa A_thaliana_Jun_2009.2bit

Note: The 2bit sequence file should have the same name as your genome version directory and should have file extension "2bit," e.g., A_thaliana_Jun_2009.2bit.

However, before you create the sequence files, check that the names of sequences in your fasta file match the names used in your annotation files. For example, if your annotations file lists sequence name chr1, then the fasta file should contain a sequence record with fasta header ">chr1" and nothing else. If they don't match, you'll need to either edit your fasta headers or create a sequence synonyms files.

Note: You can make your genome.txt file (Step Four above) using twoBitInfo, another program available from http://hgdownload.cse.ucsc.edu/admin/exe/. To create a genome.txt file from a 2bit sequence file, do something like this:

Code Block
$ twoBitInfo A_thaliana_Jun_2009.2bit genome.txt

Step Eight: Add your new QuickLoad site to IGB.

Tell IGB to use your new QuickLoad site.

...