Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Introduction

IGB QuickLoad (QL) is a simple file-based system for users to access annotation, alignment, or sequence data.

Any file that IGB can open (using the File menu) can be accessed from a QL site.

QuickLoad sites can reside on your computer hard drive or on a remote Web (http) or FTP site.

A QL site contains

  • meta-data files describing available genome versions, annots
  • data files, including sequence data, alignments, genome graphs, and annotations

You can set up a QL site on your local computer or on a web server.

To view an example, see the IGB QuickLoad site.

How to Set Up a QuickLoad Site

Step One: Create a QuickLoad root directory.

Create a folder on your local computer or on a Web server. This folder will contain your genome directories and a "meta-data" file called contents.txt.

If you are hosting a QuickLoad site using the Apache Web server, you can configure QuickLoad to demand a user name and password from anyone who accesses the data by adding an .htacess file to the top level directory.
See: http://httpd.apache.org/docs/2.0/howto/auth.html.

Step Two: Create one or more genome directories.

Create genome directories for each of the genome versions you want to make available via QuickLoad.

The names of these directories should follow the IGB format of:

  • (First letter of the genus, capitalized)   (species, all lower case)   (3 letters, typically the code for the month, first letter capitalized)   (4 digits, the year).

For example, A_thaliana_Jun_2009 is theTAIR10 genome released June 2008.

You can also indicate subspecies or cultivated varieties by including additional suffixes.

For example, there are two Oryza sativa (rice) subspecies in wide cultivation: japonica and indica.The first rice sequence published was from the Nipponbare variety which is a member of the japonica subspecies. Because an indica genome sequence is now available, and we need names that distinguish them. We designate assemblies of the original genome sequence using prefix O_sativa_japonica and the month and year of release: O_sativa_japonica_Jun_2009. Annotations and sequence data reside in genome version folder O_sativa_japonica_Jun_2009 on the main IGB QuickLoad site - view its contents by visiting http://igbquickload.org/quickload/O_sativa_japonica_Jun_2009

Tip: If you set up your QuickLoad site in the Web directories of an Apache server, you can modify and fine-tune how files are displayed by adding directives to the .htaccess file in that directory.  You can configure Apache to allow users to view the contents of your QuickLoad directory and include a short description of each file or data set. Here is an example:

Options Indexes
IndexOptions FancyIndexing IgnoreCase FoldersFirst DescriptionWidth=*
AddDescription "Arabidopsis thaliana TAIR9 genome release" A_thaliana_Jun_2009
AddDescription "Arabidopsis thaliana TAIR8 genome release" A_thaliana_Apr_2008
AddDescription "Arabidopsis thaliana TIGRv5/TAIR7 genome release" A_thaliana_Jan_2004
AddDescription "TAIR9 annotation file" TAIR9_*.gz
AddDescription "Annotation file list" annots.[txt|xml]
AddDescription "Chromosome lengths and assembly information" genome.txt
AddDescription "Available genome assemblies" contents.txt
AddDescription "Mitochondrial Genome (Fasta format)" [Cc]hrM.fa*
AddDescription "Chloroplast Genome (Fasta format)" [Cc]hrC.fa*

Step Three: Create contents.txt file.

Create a simple, plain text file called contents.txt and add it the top-level directory.

This file should be plain text, tab-delimited; all columns must be separated by a tab character!

The first column should list names of the genome directories you have created and want to share.

The second column is optional - it contains text IGB will display on the title bar when you open IGB.

Note: You can include other directories and other files in your QL site folder;  IGB will simply ignore anything that isn't listed in your contents.txt file. Also, any changes in the name of the genome directory must be updated in the contents.txt file or IGB will not recognize it.

Optional Step: Create a synonyms.txt file.

This is a list of synonyms for genomes.  This list allows you to match names across diverse quickload sites. For example, if a DAS1 or DAS2 data source (such as ones hosted at Ensembl or UCSC) uses different names to refer to the same genome version, you can specify these in the synonyms.txt. Each line contains any number of synonyms for a genome, separated by tabs.  

Here is an example: http://igbquickload.org/quickload/synonyms.txt

Step Four: Create genome.txt files.

For each genome directory, create a genome.txt (formerly, mod_chromInfo.txt) file that lists all the chromosomes in your genome, together with their sizes.

As with the contents.txt file, it should be tab delimited:

The first column lists the chromosome names and the second column lists their sizes. IGB uses this file to create the sequence selection table under the Data Access Panel.

Here is an example*: *http://igbquickload.org/quickload/A_thaliana_Jun_2009/genome.txt

Note: You can create your genome.txt file from a sequence 2bit file (see below) using twoBitInfo, available from http://hgdownload.cse.ucsc.edu/admin/exe/. See Step Seven: Add sequence data.

Optional: Create liftall.lft file.

This describes how contigs are assembled into chromosomes.  

Each line contains: CONTIG_START tab CONTIG_NAME tab CONTIG_LENGTH tab CHROMOSOME_NAME tab CHROMOSOME_LENGTH

Step Five: Create annots.xml files.

If you have any annotations for your genomes, you can make it possible for IGB to display them in the Data Access Panel by listing them in the annots.xml file.

Here is an example:

<files>
<file name="TAIR10.bed.gz" title="TAIR10 ALL" description="All TAIR10 genome models" label_field="id" background="000000" foreground="00FFFF" max_depth="10" name_size="12" connected="true"/>
</files>

When loaded into IGB, each file will create one or more tracks. (Some file types, such as BED and GFF3, can specify multiple tracks.)

Annots.xml options - specify track color, annotation style, and more

An annots.xml file contains one or more file tags, enclosed in a files element. The file tag has attributes that will dictate how the data will look when displayed in IGB. Most of these options correspond to options users configure using the Tracks tab in the IGB Preferences window. (See File > Preferences > Tracks.)

attribute

optional

Description

name

Required

The name of the file on your file system.

title

Optional

User-friendly text IGB will show in the Data Access Panel as the title of the data set. If you don't provide a title, IGB will display the name of the file instead.

description

Optional

IGB will display this text as a tooltip when users hover the mouse over the data set title in the Data Access tab.

url

Optional

Use this tag to specify the location of a Web page describing the data set. If provided, IGB will display an "info" icon next to the data set title. When users click the icon, their Web browser will open showing the contents of the URL you provide.

load_hint

Optional

Optional but if used its value should be "Whole Sequence". Using this tag will force IGB to load the entire file when users select the genome version, which is usually appropriate only for reference gene model annotations. Use this only if your site is the sole provider of a particular genome's reference genome annotations.

label_field

Optional

Use this field to indicate the annotation property (e.g., "score" or "id") that should be used in IGB to label individual annotations. For gene models, "id" is best.

background

Optional

Use this field to define track background color (e.g. background="000000")

foreground

Optional

Use this field to define annotation color (e.g. foreground="00FFFF")

max_depth

Optional

Use this field to define the default max depth value, the number of annotations that can be shown individually (max_depth="10" or max_depth="0" for unlimited)

name_size

Optional

Use this field to define the default track label name font size (e.g. name_size="12")

connected

Optional

Use this field to define the default boolean value for the connected field  (e.g. connected="true" or connected="false")

show2tracks

Optional

Use this field to define the default boolean value for the show2Tracks field  (e.g. show2tracks="true" or show2tracks="false")

direction_type

Optional

Use this field to dictate whether annotations or alignments will be shown using arrows and/or color to indicate direction.  (e.g. direction_type="arrow", direction_type="color", direction_type="both", direction_type="none")

positive_strand_color

Optional

Use this field to define the default positive strand color (e.g. positive_strand_color="CCFFFF")

negative_strand_color

Optional

Use this field to define the default negative strand color (e.g. negative_strand_color="33FFFF")

view_mode

Optional

Use this field to define the default view mode (e.g. view_mode="default", view_mode="depth")

Step Six: Add your annotation files.

Place your annotations into the appropriate genome sub-directories. They can be "gzipped" or not depending on your preference. You should be able to use any of the many formats IGB supports.  For annotation files, BED format is the most commonly used.

Starting with IGB 6.6, IGB QuickLoad can support BED, bedgraph, and GFF files that have been sorted, compressed, and indexed using the bgzip and tabix utility. Doing this helps speed up data loading in IGB. For an example of how we used tabix to distribute coverage and junction files from an RNA-Seq data sets from maize, see Creating a new genome release for IGB QuickLoad from the IGB Developer's Guide.

Step Seven: Add sequence data.

You may not need to do this if you are working with a genome that other QuickLoad or DAS2 sites support. That is, IGB may be able to retrieve sequence data for your genome from other sites if it can recognize that your annotations belong to the same genome version as the other data providers it gets data from.  However, if you are working with a newly sequenced genome or don't want to use other groups' servers, you can support IGB's sequence visualization functionality by setting up your own sequences to distribute.

To do this, you will need to obtain sequence files and convert them to one of two specialized compressed binary formats (bnib or 2bit) that allow IGB to quickly obtain all or part of the sequence data when users press Get Sequence buttons.

We strongly recommend you use the UCSC Genome Bioinformatics 2bit format for distribution of sequence files. The 2bit format was first developed for the blat cDNA-to-genome alignment tool; IGB uses to support efficient retrieval of sequence data. To convert a fasta format file to 2bit, use a program called faToTwoBit available from http://hgdownload.cse.ucsc.edu/admin/exe/.

To convert a fasta file to 2bit, do something like this:

$ faToTwoBit A_thaliana_Jun_2009.fa A_thaliana_Jun_2009.2bit

Note: The 2bit sequence file should have the same name as your genome version directory and should have file extension "2bit," e.g., A_thaliana_Jun_2009.2bit.

However, before you create the sequence files, check that the names of sequences in your fasta file match the names used in your annotation files. For example, if your annotations file lists sequence name chr1, then the fasta file should contain a sequence record with fasta header ">chr1" and nothing else. If they don't match, you'll need to either edit your fasta headers or create a sequence synonyms files.

Note: You can make your genome.txt file (Step Four above) using twoBitInfo, another program available from http://hgdownload.cse.ucsc.edu/admin/exe/. To create a genome.txt file from a 2bit sequence file, do something like this:

$ twoBitInfo A_thaliana_Jun_2009.2bit genome.txt

IGB can also use sequence files in the legacy IGB BNIB format. For more information on the BNIB format, see Converting FASTA to BNIB. Be sure that you name your files using the same names you listed in your genome.txt file.

Step Eight: Add your new QuickLoad site to IGB.

Tell IGB to use your new QuickLoad site.

Follow the directions in Adding and Managing Data Source Servers To add the local server, click the "..." button and select the folder which contains the contents.txt file, e.g. "Quickload servers".

  • No labels