Introduction
This tutorial describes how to import sequence and annotations into IGB when your genome of interest is not available from an IGB QuickLoad or REST data source.
In this tutorial, we will demonstrate importing a bacterial genome (E. coli) using files downloaded from NCBI. Overall, to view custom genome and annotations in IGB properly, a Synonym File needs to be created.
Get genome data from NCBI
First, visit NCBI to retrieve the sequence data and annotations.
- Go to the NCBI GenBank record for E. coli K-12 subtr. MG1655: http://www.ncbi.nlm.nih.gov/nuccore/NC_000913.3
Download annotations file
- Click Send to: (on the right side of the record)
- Select the following:
- Complete Record
- File
- GenBank(Full)
- Click Create File
Save the file to your computer; by default, it will be named "sequence.gb".
If you change the name, be sure to use the ".gb" file extension so that IGB can recognize the file and the file format.
Download sequence file
- Click FASTA (on the left side of the record)
- Click Send to:
- Select the following:
- Complete Record
- File
- FASTA
- Click Create File
Save the file to your computer; by default, it will be named "sequence.fasta".
Create Synonyms File
When IGB shows the "sequence.gb" file, it uses the 'LOCUS' name from that file. To show the "sequence.fasta" file, IGB uses the 'sequence name' that follows the '>'. Note, when the labels for each file type (FASTA sequence files, annotation files, BAM/alignment files, etc.) for the exact same chromosome/sequence are all different, IGB will treat each one as a separate chromosome; they will not be visualized together.
To overcome this, we enter 'synonyms'; IGB already has some internal synonyms, for example "1" and "chr1" are equivalent. You will need to know the sequence names of each of these files; if you are not sure, a quick way is to drag all of the files into IGB at the same time.
Determine 'sequence names' in sequence and annotation files:
- Drag "sequence.fasta" and "sequence.gb" into IGB
- In the Current Genome tab, look under Sequence(s)
- Note that the sequence name from "sequences.fasta" is NC_000913.3 and the sequence name from "sequence.gb" is NC_000913
Create synonym file
Now that we know the headers of the files we will create a tab-delimited 'personal synonym' file:
- Open a Text Editor
- On the first line, type 1, then press tab
- Type chr1, then press tab
- 1 and chr1 are standard names for the first chromosome, and many files use these as their headers. Typically, we always include these two options in a synonym file.
- Type in the header of the "sequence.gb" file, NC_000913, then press tab
- Type in the header of the "sequence.fasta" file, NC_000913.3
- Save this file with the name chromsome.txt
If you are making a synonym file for a multi-chromosomal organism, then make a new line in the file for each chromosome, and just add all of the 'names' associated with it (make sure that there is a 'tab' between each name!). If you include a file that has a new name, open chromosome.txt and add the name to the proper line.
Import Synonym File into IGB
In IGB:
- Click File > Preferences...
- Select the Data Sources tab
- Click the ellipses (three dots) next to Chromosome Synonyms File
- Browse for the "chromosome.txt" file
- Restart IGB
Visualizing the Reference Sequence and Models
When you open the new instance of IGB, your synonyms will be loaded for you. At this point, we will open the files, "sequences.fasta" and "sequences.gb", so you can begin analysis of your own data.
To open the sequence and annotation files:
- Click File > Open Genome from File...
- Browse for the "sequence.fasta" file
- Click OK
- Click File > Open File...
- Browse for the "sequence.gb" file
- Click the Load Data button
Both the reference sequence and gene models will load. You will be zoomed out, so the sequence will appear as a grey bar; as you zoom in the colors and nucleotides will become visible.
Keep in mind that if the name of the chromosome(s) is different in your files, you will have to add that name to "chromosome.txt" and then reopen IGB so it can load in the new information.