Table of Contents |
---|
Introduction
IGB QuickLoad (QL) is a simple file-based system for users to access annotation, alignment, or sequence data.you can use to share data files on-line, using a Web server or your Dropbox "Public" folders. Using QuickLoad, you can configure data sets to appear in the IGB Data Access, making it easy for lab members, collaborators, and the public to browse and view your data.
Any file that IGB can open (using the File menu) can be accessed from a QL QuickLoad site.
QuickLoad sites can reside on your computer hard drive, on-line, or in the cloud. Typically, you'll create a QuickLoad site on a remote Web (http) or FTP siteyour local computer, test it, and then copy it to a Web site or Dropbox.
A QL QuickLoad site contains
- simple meta-data files describing available genome versions, annots
- data files, including sequence data, alignments, genome graphs, and annotations
You can set up a QL QuickLoad site on your local computer for your own personal use or on a web server or in a public Dropbox folder if you want to share data with others.
To view an example, see the main IGB QuickLoad site the IGB team uses to distribute reference gene annotations for human and model organism genomes.
...
How to Set Up a QuickLoad Site
Step One: Create a QuickLoad root directory.
Create a QuickLoad root directory (folder( on your local computer or on a Web server. This folder will contain your genome directories and a "meta-data" file called contents.txt.
...
For example, there are two Oryza sativa (rice) subspecies in wide cultivation: japonica and indica.The first rice sequence published was from the Nipponbare variety which is a member of the japonica subspecies. Because an indica genome sequence is now available, we need names that distinguish them. We designate assemblies of distinguish the japonica genome sequence assemblies using prefix O_sativa_japonica and the month and year of release: O_sativa_japonica_Jun_2009. Annotations and sequence data reside in genome version folder O_sativa_japonica_Jun_2009 on the main IGB QuickLoad site - view its contents by visiting http://igbquickload.org/quickload/O_sativa_japonica_Jun_2009
...
This file should be plain text, tab-delimited; all columns must be separated by a tab character. If you create this file by hand, be sure to use a text an editor that inserts tab characters when you type a tab.
...
Note: You can include other directories and other files in your QL site folder; IGB will simply ignore anything that isn't listed in your contents.txt file. Also, any changes in the name of the genome directory must be updated in the contents.txt file or IGB will not recognize it.
Optional
...
: Create a synonyms.txt file.
This is a list of synonyms for genomes. This list allows you to match names across different QuickLoad sites and also the Galaxy workflow site. For example, if a QuickLoad site at another location uses different names to refer to the same genome version, you can specify these in the synonyms.txt. Each line contains any number of synonyms for a genome, separated by tabs.
Here is an example: http://igbquickload.org/quickload/synonyms.txt
...
Step Four: Create genome.txt files for each genome represented in your QuickLoad site.
For each genome directory, create a genome.txt (formerly, mod_chromInfo.txt) file that lists all the chromosomes in your genome, together with their sizes.
As with the contents.txt file, it should be tab delimited:.
The first column lists the chromosome names and the second column lists their sizes. IGB uses this file to create the sequence selection table under the Data Access Panel.
Here is an example*: * http://igbquickload.org/quickload/A_thaliana_Jun_2009/genome.txt
If your genome version is already part of the main IGB QuickLoad site, then you can get the genome.txt Note: You can create file from there. Just download it and save into your local QuickLoad site.
If not, you can easily create a your genome.txt file from a sequence 2bit file (see below) using twoBitInfo, available from http://hgdownload.cse.ucsc.edu/admin/exe/. See Step Seven: Add sequence data.
...
.
This describes how contigs are assembled into chromosomes.
Each line contains: CONTIG_START tab CONTIG_NAME tab CONTIG_LENGTH tab CHROMOSOME_NAME tab CHROMOSOME_LENGTH
Note: This is a relatively old feature in IGB and we are not entirely sure it still works.
...
Step Five: Create annots.xml files.
...
Here is a simple example:
Code Block |
---|
<files>
<file name="TAIR10.bed.gz"
title="TAIR10 ALL"
description="All TAIR10 genome models"
label_field="id"
background="000000"
foreground="00FFFF"
max_depth="10"
name_size="12"/>
</files>
|
Note that the name attribute of the file tag indicates the physical location of the file relative to the directory where the annots.xml file resides. However, you can also use a URL to indicate the location of the file. This means that you if you want to reference other resources hosted on other machines, you can do it. For example, instead of the file name "TAIR10.bed.gz", you could enter "http://igbquickload.org/quickload/A_thaliana_Jun_2009/TAIR10.bed.gz", which is the internet address of the file.
Also note that if you are hosting BAM files, the BAM file index ".bai" file needs to reside in the same directory as the BAM file. Otherwise, IGB will display an error message when users try to access the data.
...
An annots.xml file contains one or more file tags, enclosed in a files element. The file tag has attributes that will dictate how the data will look when displayed in IGB. Most of these options correspond to options users configure using the Tracks tab in the IGB Preferences window. (See File > Preferences > Tracks.)
attribute | optional | Description |
---|---|---|
name | Required | The name of the file on your file system or a URL. The URL can point to any file, anywhere, provided it's accessible via the internet. |
title | Optional | User-friendly text IGB will show in the Data Access Panel as the title of the data set. If you don't provide a title, IGB will display the name attribute instead. |
description | Optional | IGB will display this text as a tooltip when users hover the mouse over the data set title in the Data Access tab. |
url | Optional | Use this tag to specify the location of a Web page describing the data set. If provided, IGB will display an "info" icon next to the data set title. When users click the icon, their Web browser will open showing the contents of the URL you provide. |
load_hint | Optional | If used, its value should be "Whole Sequence". Using this tag will force IGB to load the entire file when users select the genome version, which is usually appropriate only for reference gene model annotations or other equally small data sets. |
label_field | Optional | Use this field to indicate the annotation property (e.g., "score" or "id") that should be used in IGB to label individual annotations. For gene models, "id" is best. |
background | Optional | Use this field to define track background color (e.g. background="000000") . Note you must use six-digit, hexadecimal triplets to specify color, but do not include a leading "#" character. For example, to specify white background, use background="FFFFFF". |
foreground | Optional | Use this field to define annotation or graph colors (e.g. foreground="00FFFF"). See above. |
max_depth | Optional | Use this field to define the default max depth value, the number of annotations that can be shown individually in a stack (max_depth="10" or max_depth="0" for unlimited) |
name_size | Optional | Use this field to define the default track label name font size (e.g. name_size="12") |
connected | Optional | Use this field to define the default boolean value for the connected field (e.g. connected="true" or connected="false"). The default is "true". This tells IGB to draw lines linking the blocks belonging to larger features. |
show2tracks | Optional | Use this field to define the default boolean value for the show2Tracks field (e.g. show2tracks="true" or show2tracks="false"). If you would like minus and plus strand features to be displayed in the same track, set this to "false." The default is "true." |
direction_type | Optional | Use this field to dictate whether annotations or alignments will be shown using arrows and/or color to indicate direction. (e.g. direction_type="arrow", direction_type="color", direction_type="both", direction_type="none"). The default is "none." |
positive_strand_color | Optional | Use this field to define the default positive strand color (e.g. positive_strand_color="CCFFFF") for when the direction_type setting is "color." |
negative_strand_color | Optional | Use this field to define the default negative strand color (e.g. negative_strand_color="33FFFF") for when the directon_type setting is "color." |
Step Six: Add your annotation files.
...
Starting with IGB 6.6, IGB QuickLoad can support BED, bedgraph, and GFF files that have been sorted, compressed, and indexed using the bgzip and tabix utilityutilities. Doing this helps speed up data loading in IGB and allows loading data by region. For an example of how we used tabix to distribute coverage and junction files from an RNA-Seq data set, see Creating a new genome release for IGB QuickLoad from the IGB Developer's Guide.
Step Seven: Add sequence data (if working with a new genome.)
You may not need to do this if you are working with a genome that other QuickLoad sites support. IGB may be able to retrieve sequence data for your genome from other sites if it can find another site that supports the same genome version and contains sequence data. However, if you are working with a newly sequenced genome or don't want to use other groups' servers, you can support IGB's sequence visualization functionality by setting up your own sequences to distribute.
...
To convert a fasta file to 2bit, do something like this:
Code Block |
---|
$ faToTwoBit A_thaliana_Jun_2009.fa A_thaliana_Jun_2009.2bit
|
...
Note: You can make your genome.txt file (Step Four above) using twoBitInfo, another program available from http://hgdownload.cse.ucsc.edu/admin/exe/. To create a genome.txt file from a 2bit sequence file, do something like this:
Code Block |
---|
$ twoBitInfo A_thaliana_Jun_2009.2bit genome.txt
|
...
Follow the directions in Adding and Managing Data Source Servers. To add the local server, click the "..." button and select the folder which contains the contents.txt file, e.g. "Quickload servers".