Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Introduction

IGB QuickLoad (QL) is a simple file-based system you can use to share data files on-line, using a Web server or your Dropbox "Public" folders. Using QuickLoad, you can configure data sets to appear in the IGB Data Access, making it easy for lab members, collaborators, and the public to browse and view your data. 

Any file that IGB can open (using the File menu) can be accessed from a QuickLoad site.

QuickLoad sites can reside on your computer hard drive, on-line, or in the cloud. Typically, you'll create a QuickLoad site on your local computer, test it, and then copy it to a Web site or Dropbox.

A QuickLoad site contains

  • simple meta-data files describing available genome versions, annots
  • data files, including sequence data, alignments, genome graphs, and annotations

You can set up a QuickLoad site on your local computer for your own personal use or on a web server or in a public Dropbox folder if you want to share data with others.

To view an example, see the main IGB QuickLoad site the IGB team uses to distribute reference gene annotations for  human and model organism genomes.


How to Set Up a QuickLoad Site

Step One: Create a QuickLoad root directory.

Create a QuickLoad root directory (folder( on your local computer or on a Web server. This folder will contain your genome directories and a "meta-data" file called contents.txt.

If you are hosting a QuickLoad site using the Apache Web server, you can configure QuickLoad to demand a user name and password from anyone who accesses the data by adding an .htacess file to the top level directory.
See: http://httpd.apache.org/docs/2.0/howto/auth.html.


Step Two: Create one or more genome directories.

Create genome directories for each of the genome versions you want to make available via QuickLoad.

To ensure the genome will be displayed correctly in the Current Genome tabbed panel menus, the names of these directories should follow the IGB format of:

  • (First letter of the genus, capitalized)   (species, all lower case)   (3 letters indicating month of release, first letter capitalized)   (4 digits, the year).

For example, A_thaliana_Jun_2009 is the TAIR10 Arabidopsis thaliana genome assembly that was released in June 2009.

You can also indicate subspecies or cultivated varieties by including additional suffixes.

For example, there are two Oryza sativa (rice) subspecies in wide cultivation: japonica and indica.The first rice sequence published was from the Nipponbare variety which is a member of the japonica subspecies. Because an indica genome sequence is now available, we need names that distinguish them. We distinguish the japonica assemblies using prefix O_sativa_japonica and the month and year of release: O_sativa_japonica_Jun_2009. Annotations and sequence data reside in genome version folder O_sativa_japonica_Jun_2009 on the main IGB QuickLoad site - view its contents by visiting http://igbquickload.org/quickload/O_sativa_japonica_Jun_2009

Tip: If you set up your QuickLoad site in the Web directories of an Apache server, you can modify and fine-tune how files are displayed by adding directives to the .htaccess file in that directory.  You can configure Apache to allow users to view the contents of your QuickLoad directory and include a short description of each file or data set. Here is an example:

Options Indexes
IndexOptions FancyIndexing IgnoreCase FoldersFirst DescriptionWidth=*
AddDescription "Arabidopsis thaliana TAIR9 genome release" A_thaliana_Jun_2009
AddDescription "Arabidopsis thaliana TAIR8 genome release" A_thaliana_Apr_2008
AddDescription "Arabidopsis thaliana TIGRv5/TAIR7 genome release" A_thaliana_Jan_2004
AddDescription "TAIR9 annotation file" TAIR9_*.gz
AddDescription "Annotation file list" annots.[txt|xml]
AddDescription "Chromosome lengths and assembly information" genome.txt
AddDescription "Available genome assemblies" contents.txt
AddDescription "Mitochondrial Genome (Fasta format)" [Cc]hrM.fa*
AddDescription "Chloroplast Genome (Fasta format)" [Cc]hrC.fa*


Step Three: Create contents.txt file.

Create a simple, plain text file called contents.txt and add it the top-level directory.

This file should be plain text, tab-delimited; all columns must be separated by a tab character. If you create this file by hand, be sure to use an editor that inserts tab characters when you type a tab.

The first column should list names of the genome directories you have created and want to share.

The second column is optional - it contains text IGB will display on the title bar when you open IGB.

Note: You can include other directories and other files in your QL site folder;  IGB will simply ignore anything that isn't listed in your contents.txt file. Also, any changes in the name of the genome directory must be updated in the contents.txt file or IGB will not recognize it.

Optional: Create a synonyms.txt file.

This is a list of synonyms for genomes. This list allows you to match names across different QuickLoad sites and also the Galaxy workflow site. For example, if a QuickLoad site at another location uses different names to refer to the same genome version, you can specify these in the synonyms.txt. Each line contains any number of synonyms for a genome, separated by tabs.

Here is an example: http://igbquickload.org/quickload/synonyms.txt


Step Four: Create genome.txt files for each genome represented in your QuickLoad site.

For each genome directory, create a genome.txt file that lists all the chromosomes in your genome, together with their sizes.

As with the contents.txt file, it should be tab delimited.

The first column lists the chromosome names and the second column lists their sizes. IGB uses this file to create the sequence selection table under the Data Access Panel.

Here is an example: http://igbquickload.org/quickload/A_thaliana_Jun_2009/genome.txt

If your genome version is already part of the main IGB QuickLoad site, then you can get the genome.txt file from there. Just download it and save into your local QuickLoad site.

If not, you can easily create a your genome.txt file from a sequence 2bit file (see below) using twoBitInfo, available from http://hgdownload.cse.ucsc.edu/admin/exe/. See Step Seven: Add sequence data.


Step Five: Create annots.xml files.

Each genome version subdirectory needs an annots.xml file that is either empty or contains a listing of annotations you want to appear the Available Data Sets section of IGB's Data Access tabbed panel.

Here is a simple example:

<files>
  <file name="TAIR10.bed.gz"
   title="TAIR10 ALL"
   description="All TAIR10 genome models"
   label_field="id"
   background="000000"
   foreground="00FFFF"
   max_depth="10"
   name_size="12"/>
</files>

Note that the name attribute of the file tag indicates the physical location of the file relative to the directory where the annots.xml file resides. However, you can also use a URL to indicate the location of the file. This means that you if you want to reference other resources hosted on other machines, you can do it. For example, instead of the file name "TAIR10.bed.gz", you could enter "http://igbquickload.org/quickload/A_thaliana_Jun_2009/TAIR10.bed.gz", which is the internet address of the file.

Also note that if you are hosting BAM files, the BAM file index ".bai" file needs to reside in the same directory as the BAM file. Otherwise, IGB will display an error message when users try to access the data.

When loaded into IGB, each file will create one or more tracks. (Some file types, such as BED and GFF3, can specify multiple tracks.)

Annots.xml options - specify track color, annotation style, and more

An annots.xml file contains one or more file tags, enclosed in a files element. The file tag has attributes that will dictate how the data will look when displayed in IGB. Most of these options correspond to options users configure using the Tracks tab in the IGB Preferences window. (See File > Preferences > Tracks.)

attribute

optional

Description

name

Required

The name of the file on your file system or a URL. The URL can point to any file, anywhere, provided it's accessible via the internet.

title

Optional

User-friendly text IGB will show in the Data Access Panel as the title of the data set. If you don't provide a title, IGB will display the name attribute instead.

description

Optional

IGB will display this text as a tooltip when users hover the mouse over the data set title in the Data Access tab.

url

Optional

Use this tag to specify the location of a Web page describing the data set. If provided, IGB will display an "info" icon next to the data set title. When users click the icon, their Web browser will open showing the contents of the URL you provide.

load_hint

Optional

If used, its value should be "Whole Sequence". Using this tag will force IGB to load the entire file when users select the genome version, which is usually appropriate only for reference gene model annotations or other equally small data sets.

label_field

Optional

Use this field to indicate the annotation property (e.g., "score" or "id") that should be used in IGB to label individual annotations. For gene models, "id" is best.

background

Optional

Use this field to define track background color. Note you must use six-digit, hexadecimal triplets to specify color, but do not include a leading "#" character. For example, to specify white background, use background="FFFFFF".

foreground

Optional

Use this field to define annotation or graph colors (e.g. foreground="00FFFF"). See above.

max_depth

Optional

Use this field to define the default max depth value, the number of annotations that can be shown individually in a stack (max_depth="10" or max_depth="0" for unlimited)

name_size

Optional

Use this field to define the default track label name font size (e.g. name_size="12")

connected

Optional

Use this field to define the default boolean value for the connected field  (e.g. connected="true" or connected="false"). The default is "true". This tells IGB to draw lines linking the blocks belonging to larger features.

show2tracks

Optional

Use this field to define the default boolean value for the show2Tracks field  (e.g. show2tracks="true" or show2tracks="false"). If you would like minus and plus strand features to be displayed in the same track, set this to "false." The default is "true."

direction_type

Optional

Use this field to dictate whether annotations or alignments will be shown using arrows and/or color to indicate direction.  (e.g. direction_type="arrow", direction_type="color", direction_type="both", direction_type="none"). The default is "none."

positive_strand_color

Optional

Use this field to define the default positive strand color (e.g. positive_strand_color="CCFFFF") for when the direction_type setting is "color."

negative_strand_color

Optional

Use this field to define the default negative strand color (e.g. negative_strand_color="33FFFF") for when the directon_type setting is "color."

Step Six: Add your annotation files.

Place your annotation, BAM, or graph (bedgraph or bigwig) files into the appropriate genome sub-directories. You can you use any format IGB supports. 

Starting with IGB 6.6, IGB QuickLoad can support BED, bedgraph, and GFF files that have been sorted, compressed, and indexed using the bgzip and tabix utilities. Doing this helps speed up data loading in IGB and allows loading data by region. For an example of how we used tabix to distribute coverage and junction files from an RNA-Seq data set, see Creating a new genome release for IGB QuickLoad from the IGB Developer's Guide.

Step Seven: Add sequence data (if working with a new genome.)

You may not need to do this if you are working with a genome that other QuickLoad sites support. IGB may be able to retrieve sequence data for your genome from other sites if it can find another site that supports the same genome version and contains sequence data. However, if you are working with a newly sequenced genome or don't want to use other groups' servers, you can support IGB's sequence visualization functionality by setting up your own sequences to distribute.

To do this, you will need to obtain sequence files and convert them to a binary format (twobit) that allow IGB to quickly obtain all or part of the sequence data when users press the Get Sequence button, run a blast search, or open the Sequence Viewer window.

The twobit format was first developed for the blat cDNA-to-genome alignment tool. IGB 2bit uses to support efficient retrieval of sequence data. To convert a fasta format file to twobit, use faToTwoBit available from http://hgdownload.cse.ucsc.edu/admin/exe/.

To convert a fasta file to 2bit, do something like this:

$ faToTwoBit A_thaliana_Jun_2009.fa A_thaliana_Jun_2009.2bit

Note: The 2bit sequence file should have the same name as your genome version directory and should have file extension "2bit," e.g., A_thaliana_Jun_2009.2bit.

However, before you create the sequence files, check that the names of sequences in your fasta file match the names used in your annotation files. For example, if your annotations file lists sequence name chr1, then the fasta file should contain a sequence record with fasta header ">chr1" and nothing else. If they don't match, you'll need to either edit your fasta headers or create a sequence synonyms files.

Note: You can make your genome.txt file (Step Four above) using twoBitInfo, another program available from http://hgdownload.cse.ucsc.edu/admin/exe/. To create a genome.txt file from a 2bit sequence file, do something like this:

$ twoBitInfo A_thaliana_Jun_2009.2bit genome.txt

Step Eight: Add your new QuickLoad site to IGB.

Tell IGB to use your new QuickLoad site.

Follow the directions in Adding and Managing Data Source Servers. To add the local server, click the "..." button and select the folder which contains the contents.txt file, e.g. "Quickload servers".

  • No labels