The following instructions assumes you are not going to make a new genome version directory and that all you need to do is update RefGen, mRNA, and EST files.
IGB QuickLoad is configured so that a set of foundation gene annotations (gene models) load into IGB as soon as the user selects the corresponding genome version.
This is configured through the annots.xml file that resides in every genome version directory. Any data set with attribute \"load_model\" set to \"Whole Genome\" will automatically load into IGB.
For genomes harvested from UCSC, this foundation gene annotations data typically are from the the UCSC refGene table. If a genome does not have a refGene table, we instead use ensGene or whatever other genes data set looks the most complete and the most useful.
Install these in a directory in your PATH.
Add the following line to your .bash_profile file: export PATH=.:$HOME/bin:$PATH Then save all your downloaded or compiled programs (like tabix, bgzip, wget) to bin in your home directory |
$ svn co https://svn.transvar.org/repos/genomes/pub/quickload quickload $ svn co https://svn.transvar.org/repos/genomes/pub/src quickload_src |
Add quickload_src to your PATH as it contains a python script you'll use to created BED detail files from ordinary BED files.
If you already have a copy, just update using svn up.
Go to http://genome.ucsc.edu/cgi-bin/hgTables
Configure Table Browser with the following settings:
UCSC data set file names saved in IGB QuickLoad always include the IGB genome version name followed by an underscore character followed by the UCSC table name. The title field in the annots.xml file should always be the UCSC track name because that is what users will recongized from having used the UCSC genome browser. |
$ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz $ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz |
For example, do something like this:
$ ucscToBedDetail.py ~/Downloads/G_species_MMM_YYYY_refGene.bed.gz G_species_MMM_YYYY_refGene.bed |
This will create a new BED detail file (G_species_MMM_YYYY_refGene.bed) in the current working directory.
Note that for this to work, gene2accession.gz and gene_info.gz must be in your current working directory.
You can also provide these files using options -a and -g if you have saved them to a different location. See the script documentation for details.
$ sort -k1,1 -k2,2n G_species_MMM_YYYY_refGene.bed | bgzip > k1,1 -k2,2n G_species_MMM_YYYY_refGene.bed.gz $ tabix -p bed G_species_MMM_YYYY_refGene.bed.gz |
$ gunzip -c G_species_MMM_YYYY_refGene.bed.gz | wc -l |
The wc command should print the number of lines in the file, which should be equal to the number of rows in the corresponding refGene table. To find out how many rows the refGene table contains, click \"describe table scheme\" in the Table Browser.
Open file the file and change load mode to whole genome. Click the genome row in the Current Sequence table. You should see something above every chromosome.
If some chromosomes have no data, go back to the table browser and use the region text area to confirm there was no data for the given chromosome.
Make sure the number of chromosomes listedin the file does not exceed the number of chromosomes listed in the genome descriptor file genome.txt:
$ gunzip -c G_species_MMM_YYYY_refGene.bed.gz | cut -f1 | sort | uniq wc -l $ wc -l genome.txt |
The first line counts the number of unique sequences appearing the first column of the bed file. The second line counts the number of lines in the genome.txt file.
The annots.xml file description for the refGene annotations contains the date the data were downloaded. Edit the file accordingly to reflect today's date.
Use svn status command to double-check which files you've changed:
$ svn st |
It should print something like:
M G_species_MMM_YYYY_refGene.bed.gz M G_species_MMM_YYYY_refGene.bed.gz.tbi M annots.xml |
M stands for \"modified\".
These are the only files that should have changed. If others are different, something has gone wrong. |
Check in the files one-by-one:
svn ci G_species_MMM_YYYY_refGene.bed.gz -m "Enter a message here." svn ci G_species_MMM_YYYY_refGene.bed.gz.tbi -m "Enter a message here." svn ci annots.xml -m "Enter a message here." |