Table of Contents |
---|
Introduction
The following instructions assume you are not going to make a new genome version directory and that all you need to do is update refGene, all_mrna and all_est files.
...
If you're doing this on a Mac desktop or laptop computer, create a directory called "bin" in your home directory and save all compiled binaries there. Edit your .bash_profile file to include a line like the following to ensure that the shell can find the programs.
Code Block |
---|
export PATH=.:$HOME/bin:$PATH
|
...
Check out or update a copy of IGB QuickLoad data and source code directories.
Use git to obtain a copy of genome_src:
Code Block |
---|
$ svngit coclone https://svnbitbucket.transvar.org/repos/genomes/trunk/pub/quickload lorainelab/genomes_src |
If you already have a copy, then update it. Changed into your local copy and run:
Code Block |
---|
$ git pull origin master
|
Use svn to get a copy of the QuickLoad data repository:
Code Block |
---|
$ svn co https://svn.transvar.org/repos/genomes/trunk/pub/src quickload_src |
If you already have a copy, just update using svn up. Change into your checked-out, local copy and run:
Code Block |
---|
$ svn up
|
Add quickload_src to your PATH (to run the python code there)
Add quickload_src to your PATH as it contains a python script you'll use to created BED detail files from ordinary BED files. Edit the .bash_profile file as in above:
Code Block |
---|
export PATH=.:$HOME/quickload_src:$HOME/bin:$PATH
|
...
Note |
---|
UCSC data set file names saved in IGB QuickLoad should always include the IGB genome version name followed by an underscore character followed by the UCSC table name. The title field in the annots.xml file should always be the UCSC track name because that is what users will recongized recognize from having used the UCSC genome browser. |
...
Get gene info and accession info files from NCBI ftp site
Code Block |
---|
$ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
$ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
|
Create BED detail file with gene information
Use ucscToBedDetail.py (from https://bitbucket.org/lorainelab/genomes_src) to create a new BED file with gene symbol and description
For example, do something like this:
Code Block |
---|
$ ucscToBedDetail.py ~/Downloads/G_species_MMM_YYYY_refGene.bed.gz G_species_MMM_YYYY_refGene.bed
|
...
Sort, compress, and index
Code Block |
---|
$ sort -k1,1 -k2,2n G_species_MMM_YYYY_refGene.bed | bgzip > G_species_MMM_YYYY_refGene.bed.gz
$ tabix -s 1 -b 2 -e 3 bed G_species_MMM_YYYY_refGene.bed.gz
|
...
Check that it has data
Code Block |
---|
$ gunzip -c G_species_MMM_YYYY_refGene.bed.gz | wc -l
|
...
Make sure the number of chromosomes listedin the file does not exceed the number of chromosomes listed in the genome descriptor file genome.txt:
Code Block |
---|
$ gunzip -c G_species_MMM_YYYY_refGene.bed.gz | cut -f1 | sort | uniq wc -l
$ wc -l genome.txt
|
...
Use svn status command to double-check which files you've changed:
Code Block |
---|
$ svn st
|
It should print something like:
Code Block |
---|
M G_species_MMM_YYYY_refGene.bed.gz
M G_species_MMM_YYYY_refGene.bed.gz.tbi
M annots.xml
|
...
Check in the files one-by-one:
Code Block |
---|
svn ci G_species_MMM_YYYY_refGene.bed.gz -m "Enter a message here."
svn ci G_species_MMM_YYYY_refGene.bed.gz.tbi -m "Enter a message here."
svn ci annots.xml -m "Enter a message here."
|
...
Use grep, sort, and bgzip to make a PSL file:
Code Block |
---|
$ gunzip -c ~/Downloads/G_species_MMM_YYYY_all_mrna.psl.gz | grep -v '^#' | sort -k14,14 -k16,16n | bgzip > G_species_MMM_YYYY_all_mrna.psl.gz
|
Use tabix to index the sorted, compressed PSL file
Code Block |
---|
$ tabix -s 14 -b 16 -e 17 G_species_MMM_YYYY_all_mrna.psl.gz
|
...
Check that it has data
Code Block |
---|
$ gunzip -c G_species_MMM_YYYY_all_mrna.psl.gz | wc -l
|
...
Make sure the number of chromosomes listed in the file does not exceed the number of chromosomes listed in the genome descriptor file genome.txt:
Code Block |
---|
$ gunzip -c G_species_MMM_YYYY_all_mrna.psl.gz | cut -f1 | sort | uniq wc -l
$ wc -l genome.txt
|
...
Use svn status command to double-check which files you've changed:
Code Block |
---|
$ svn st
|
It should print something like:
Code Block |
---|
M G_species_MMM_YYYY_all_mrna.psl.gz
M G_species_MMM_YYYY_all_mrna.psl.gz.tbi
M annots.xml
|
...
Check in the files one-by-one:
Code Block |
---|
svn ci G_species_MMM_YYYY_refGene.bed.gz -m "Enter a message here."
svn ci G_species_MMM_YYYY_refGene.bed.gz.tbi -m "Enter a message here."
svn ci annots.xml -m "Enter a message here."
|
...