Table of Contents

Introduction

The following instructions assume you are not going to make a new genome version directory and that all you need to do is update refGene, all_mrna and all_est files.

...

If you're doing this on a Mac desktop or laptop computer, create a directory called "bin" in your home directory and save all compiled binaries there. Edit your .bash_profile file to include a line like the following to ensure that the shell can find the programs.

Code Block
export PATH=.:$HOME/bin:$PATH

...

Check out or update a copy of IGB QuickLoad data and source code directories.

Use git to obtain a copy of genome_src:

Code Block
$ svngit coclone https://svnbitbucket.transvar.org/repos/genomes/trunk/pub/quickload lorainelab/genomes_src

If you already have a copy, then update it. Changed into your local copy and run:

Code Block
$ git pull origin master

Use svn to get a copy of the QuickLoad data repository:

Code Block
$ svn co https://svn.transvar.org/repos/genomes/trunk/pub/src quickload_src

If you already have a copy, just update using svn up. Change into your checked-out, local copy and run:

Code Block
$ svn up

Add quickload_src to your PATH (to run the python code there)

Add quickload_src to your PATH as it contains a python script you'll use to created BED detail files from ordinary BED files. Edit the .bash_profile file as in above:

Code Block
export PATH=.:$HOME/quickload_src:$HOME/bin:$PATH

...

Note
UCSC data set file names saved in IGB QuickLoad should always include the IGB genome version name followed by an underscore character followed by the UCSC table name. The title field in the annots.xml file should always be the UCSC track name because that is what users will recongized recognize from having used the UCSC genome browser.

...

Get gene info and accession info files from NCBI ftp site

Code Block
$ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz $ wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz

Create BED detail file with gene information

Use ucscToBedDetail.py (from https://bitbucket.org/lorainelab/genomes_src) to create a new BED file with gene symbol and description

For example, do something like this:

Code Block
$ ucscToBedDetail.py ~/Downloads/G_species_MMM_YYYY_refGene.bed.gz G_species_MMM_YYYY_refGene.bed

...

Sort, compress, and index

Code Block
$ sort -k1,1 -k2,2n G_species_MMM_YYYY_refGene.bed \| bgzip > G_species_MMM_YYYY_refGene.bed.gz $ tabix -s 1 -b 2 -e 3 bed G_species_MMM_YYYY_refGene.bed.gz

...

Check that it has data

Code Block
$ gunzip -c G_species_MMM_YYYY_refGene.bed.gz \| wc -l

...

Make sure the number of chromosomes listedin the file does not exceed the number of chromosomes listed in the genome descriptor file genome.txt:

Code Block
$ gunzip -c G_species_MMM_YYYY_refGene.bed.gz \| cut -f1 \| sort \| uniq wc -l $ wc -l genome.txt

...

Use svn status command to double-check which files you've changed:

Code Block
$ svn st

It should print something like:

Code Block
M G_species_MMM_YYYY_refGene.bed.gz M G_species_MMM_YYYY_refGene.bed.gz.tbi M annots.xml

...

Check in the files one-by-one:

Code Block
svn ci G_species_MMM_YYYY_refGene.bed.gz -m "Enter a message here." svn ci G_species_MMM_YYYY_refGene.bed.gz.tbi -m "Enter a message here." svn ci annots.xml -m "Enter a message here."

...

Use grep, sort, and bgzip to make a PSL file:

Code Block
$ gunzip -c ~/Downloads/G_species_MMM_YYYY_all_mrna.psl.gz \| grep -v '^#' \| sort -k14,14 -k16,16n \| bgzip > G_species_MMM_YYYY_all_mrna.psl.gz

Use tabix to index the sorted, compressed PSL file

Code Block
$ tabix -s 14 -b 16 -e 17 G_species_MMM_YYYY_all_mrna.psl.gz

...

Check that it has data

Code Block
$ gunzip -c G_species_MMM_YYYY_all_mrna.psl.gz \| wc -l

...

Make sure the number of chromosomes listed in the file does not exceed the number of chromosomes listed in the genome descriptor file genome.txt:

Code Block
$ gunzip -c G_species_MMM_YYYY_all_mrna.psl.gz \| cut -f1 \| sort \| uniq wc -l $ wc -l genome.txt

...

Use svn status command to double-check which files you've changed:

Code Block
$ svn st

It should print something like:

Code Block
M G_species_MMM_YYYY_all_mrna.psl.gz M G_species_MMM_YYYY_all_mrna.psl.gz.tbi M annots.xml

...

Check in the files one-by-one:

Code Block
svn ci G_species_MMM_YYYY_refGene.bed.gz -m "Enter a message here." svn ci G_species_MMM_YYYY_refGene.bed.gz.tbi -m "Enter a message here." svn ci annots.xml -m "Enter a message here."

...

Page tree

Versions Compared

Old Version 16

New Version 17

Key

Introduction

Check out or update a copy of IGB QuickLoad data and source code directories.

Add quickload_src to your PATH (to run the python code there)

Get gene info and accession info files from NCBI ftp site

Create BED detail file with gene information

Sort, compress, and index

Check that it has data

Use tabix to index the sorted, compressed PSL file

Check that it has data

Page tree

Page History

Versions Compared

Old Version 16

New Version 17

Key

Introduction

Check out or update a copy of IGB QuickLoad data and source code directories.

Add quickload_src to your PATH (to run the python code there)

Get gene info and accession info files from NCBI ftp site

Create BED detail file with gene information

Sort, compress, and index

Check that it has data

Use tabix to index the sorted, compressed PSL file

Check that it has data