Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

The following instructions assume you are not going to make a new genome version directory and that all you need to do is update refGene, all_mrna and all_est files.

...

If you're doing this on a Mac desktop or laptop computer, create a directory called "bin" in your home directory and save all compiled binaries there. Edit your .bash_profile file to include a line like the following to ensure that the shell can find the programs.

Code Block

export PATH=.:$HOME/bin:$PATH

...

Check out or update a copy of IGB QuickLoad data and source code directories.

Use git to obtain a copy of genome_src:

Code Block

$ svngit coclone https://svnbitbucket.transvar.org/repos/genomes/trunk/pub/quickload
lorainelab/genomes_src

If you already have a copy, then update it. Changed into your local copy and run:

Code Block
$ git pull origin master 

Use svn to get a copy of the QuickLoad data repository:

Code Block
$ svn co https://svn.transvar.org/repos/genomes/trunk/pub/src quickload_src

If you already have a copy, just update using svn up. Change into your checked-out, local copy and run:

Code Block
$ svn up 

Add quickload_src to your PATH (to run the python code there)

Add quickload_src to your PATH as it contains a python script you'll use to created BED detail files from ordinary BED files. Edit the .bash_profile file as in above:

Code Block

export PATH=.:$HOME/quickload_src:$HOME/bin:$PATH

...

Note

UCSC data set file names saved in IGB QuickLoad should always include the IGB genome version name followed by an underscore character followed by the UCSC table name. The title field in the annots.xml file should always be the UCSC track name because that is what users will recongized recognize from having used the UCSC genome browser.

...

Get gene info and accession info files from NCBI ftp site

Code Block

$ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
$ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz

Create BED detail file with gene information

Use ucscToBedDetail.py (from https://bitbucket.org/lorainelab/genomes_src) to create a new BED file with gene symbol and description

For example, do something like this:

Code Block

$ ucscToBedDetail.py ~/Downloads/G_species_MMM_YYYY_refGene.bed.gz G_species_MMM_YYYY_refGene.bed

...

Sort, compress, and index

Code Block

$ sort -k1,1 -k2,2n G_species_MMM_YYYY_refGene.bed | bgzip > G_species_MMM_YYYY_refGene.bed.gz
$ tabix -s 1 -b 2 -e 3 bed G_species_MMM_YYYY_refGene.bed.gz

...

Check that it has data

Code Block

$ gunzip -c G_species_MMM_YYYY_refGene.bed.gz | wc -l

...

Make sure the number of chromosomes listedin the file does not exceed the number of chromosomes listed in the genome descriptor file genome.txt:

Code Block

$ gunzip -c G_species_MMM_YYYY_refGene.bed.gz | cut -f1 | sort | uniq  wc -l
$ wc -l genome.txt

...

Use svn status command to double-check which files you've changed:

Code Block

$ svn st

It should print something like:

Code Block

M  G_species_MMM_YYYY_refGene.bed.gz
M  G_species_MMM_YYYY_refGene.bed.gz.tbi
M  annots.xml

...

Check in the files one-by-one:

Code Block

svn ci G_species_MMM_YYYY_refGene.bed.gz -m "Enter a message here."
svn ci G_species_MMM_YYYY_refGene.bed.gz.tbi -m "Enter a message here."
svn ci annots.xml -m "Enter a message here."

...

Use grep, sort, and bgzip to make a PSL file:

Code Block

$ gunzip -c ~/Downloads/G_species_MMM_YYYY_all_mrna.psl.gz | grep -v '^#' | sort -k14,14 -k16,16n | bgzip > G_species_MMM_YYYY_all_mrna.psl.gz

Use tabix to index the sorted, compressed PSL file

Code Block

$ tabix -s 14 -b 16 -e 17 G_species_MMM_YYYY_all_mrna.psl.gz

...

Check that it has data

Code Block

$ gunzip -c G_species_MMM_YYYY_all_mrna.psl.gz | wc -l

...

Make sure the number of chromosomes listed in the file does not exceed the number of chromosomes listed in the genome descriptor file genome.txt:

Code Block

$ gunzip -c G_species_MMM_YYYY_all_mrna.psl.gz | cut -f1 | sort | uniq  wc -l
$ wc -l genome.txt

...

Use svn status command to double-check which files you've changed:

Code Block

$ svn st

It should print something like:

Code Block

M  G_species_MMM_YYYY_all_mrna.psl.gz
M  G_species_MMM_YYYY_all_mrna.psl.gz.tbi
M  annots.xml

...

Check in the files one-by-one:

Code Block

svn ci G_species_MMM_YYYY_refGene.bed.gz -m "Enter a message here."
svn ci G_species_MMM_YYYY_refGene.bed.gz.tbi -m "Enter a message here."
svn ci annots.xml -m "Enter a message here."

...