Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Genomes and chromosomes

All data viewed in IGB, regardless of its source, is organized into distinct genomes and chromosomes. 

In IGB, a chromosome refers to any single sequence.  Often this will be the sequence of an actual chromosome.  At other times it may be an assembled contig, a BAC, or any other DNA sequence.  All chromosomes in IGB are assumed to be DNA, rather than RNA sequences.

A genome refers to any group of these so-called chromosomes.  

For example, NCBI versions 35 and 36 of the human genome are considered to be two separate genomes.  Each one contains multiple chromosome sequences, including the expected chromosomes 1 to 22, X, and Y. Other sequences, such as "chr22_random" are also considered distinct chromosomes for the purposes of display in IGB. 

Each sequence in IGB is identified by its genome and chromosome names, which must therefore be distinct.  There can not be two genomes with the same name nor two chromosomes in one genome with the same name.  Chromosomes in different genomes often do have the same name.

Naming a genome

If you are building a genome for display in IGB, we recommend you give it a standard name, consisting of the month and year of release combined with genus and species, following the patter G_species_mon_yyyy, where G is the first letter of the genus, mon is the three-letter English abbreviation for the month the genome was released, and year is the year of the release. For example:

  • A_thaliana_Jun_2009
  • A_mellifera_Jan_2005
  • H_sapiens_Feb_2009

Using this scheme will ensure that IGB displays the latest genome first in the genome menu (under the Data Access tab) and, even better, users can view data from multiple data providers together in the same view. 

Synonyms

Unfortunately, different groups tend to refer to the same genome or chromosome by different names.  For example, NCBI human genome build 35 is also known as hg17 and ensembl1834, as well as H_sapiens_May_2004.  When IGB is able to recognize that two names refer to the same genome or chromosome, it will merge the data.  Otherwise it will keep the two data sets distinct.  Currently, IGB uses a simple table of synonyms to store these associations.  You can create your own set of synonyms that will extend this set if needed. 

Annotations, sequences, graphs, and alignments

IGB can work with four distinct types of data: annotations, alignments (typically from Illumina sequencing experiments), graphs, and genomic sequences.  Some features of the program make sense only with some of these types of data.

Annotations indicate the known or suspected locations of genomic landmark features, such as genes, exons, promoter regions, pseudogenes, and so forth.  Alignments of EST sequences, GeneChip probe sequences, and other sequences onto chromosome are also sometimes as annotations, particularly when they don't include the sequence of the aligned entity.  Annotation data can be loaded from files, QuickLoad, and DAS servers. 

Sequences are sets of DNA residues comprising a chromosome.  Sequences can be loaded from files, QuickLoad, and DAS servers.  It is a good diea to load sequence data only for small regions of the genome at a time.

Graphs indicate scores or other numeric values as a function of genomic position.  Graphs are generally displayed as some form of plot (x,y-plot, bar plot, etc.).  The results from tiling arrays are generally represented as graphs.  There are two types of graphs data: point-based graphs, in which numerical values are associated with individual (single) base positions, and interval graphs, which capture values associated with ranges of genomic positions.

Alignments represent how sequences (such as short reads from an RNA-Seq experiment) align onto the reference genomic sequence. At low zoom, they look like regular annotations, but with marks representing mismatches whenever these data are available. At high zoom, they show the sequence of the aligned read and sometimes indicate scores and the degree of agreement with the reference sequence. These are typically loaded from BAM (binary alignment) files.

  • No labels