...
Panel |
---|
$ svn ci genome.txt -m "Output from twoBitInfo on current version of X_tropicalis_Nov_2009.2bit" |
The next step is to create an annots.xml file IGB will use to get a listing of the annotations available for this genome as well as styling information, e.g., the background colors to use for gene annotations, whether to load all the annotations immediately, and so on.
Being a bit lazy, I usually just copy and paste another annots.xml file from another part of the repository when setting up a new genome. I use svn for this, however:
$ svn cp ../V_vinifera_Mar_2010/annots.xml .
A annots.xml
My plan (currently) is to get the RefSeq genes track from UCSC and deploy it on our QuickLoad site. I'll provide meta-data about the data set using the annots.xml file.
I then open a simple text editor (like TextEdit) and modify the file, like so:
Next, I used the UCSC Table Browser to get the RefSeq genes for this species. Here are the settings I used for this:
Unfortunately, there don't appear to be a lot of RefSeq gene annotations available for this species:
Panel |
---|
$ gunzip -c X_tropicalis_Nov_2009_refGene.bed.gz | wc -l |
Probably it would be a good idea to look for another data set that might provide a more complete view of the Xenopus expressed gene repertoire.
Using the Table Browser, I explored the available data sets for Xenopus. To do this, I just choose a table and then click the button "describe table schema," which takes me to a page reporting the number of annotations available in the selected table.
It looks like the table all_mrna may be the most complete; it contains slightly more than 20,00 000 rows. So, probably users will want to see this data, as well as the RefGene track. I'll download this data set, add it to the repository, and update add it to the annots.xml file.
While I'm at it, I also checked the ESTs data set. There are around 1.5 million ESTs for Xenopus. I'll download that data set as well, but set it up so that users can access it on a region-by-region basis by sorting and indexing using the tabix utility. I'll also need to massage the format a bit to get the ESTs, as well. For this data set, I'll use the format pslx, which will allow users to view the aligned sequence.it to make the PSL (blat output) format.
To process the EST data set, I used the following command to strip off the first column
gunzip -c X_tropicalis_Nov_2009_all_est.gz | grep -v bin | cut -f2- > X_tropicalis_Nov_2009_all_est.psl
I've created a number of data files, and so my next step will be to try opening them in IGB. I also want to test whether IGB will be able to open and display the genome sequence.
So my next step is to create an annots.xml file IGB will use to get a listing of the annotations available for this genome as well as styling information, e.g., the background colors to use for gene annotations, whether to load all the annotations immediately, and so on.
Being a bit lazy, I usually just copy and paste another annots.xml file from another part of the repository when setting up a new genome. I use svn for this, however:
$ svn cp ../V_vinifera_Mar_2010/annots.xml .
A annots.xml
I then open a simple text editor (like TextEdit) and modify the file, like so:
To be continued.....