Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Using the Table Browser, I explored the available data sets for Xenopus. To do this, I just choose a table and then click the button "describe table schema," which takes me leads to a page reporting the number of annotations available in the selected table.

It looks like the table all_mrna may be the most complete; it contains slightly more than 20,000 rows. So, probably users will want to see this data, as well as the RefGene track. I'll download this data set, add it to the repository, and add it to the annots.xml file (see below.)

I also checked the ESTs data set. There are around 1.5 million ESTs for Xenopus. I'll download that data set as well, but set it up so that users can access it on a region-by-region basis by sorting and indexing using the tabix utility. I'll also need to massage the format a bit to get it to make the PSL (blat output) format.

...

Panel

sort -k14,14 -k16,16n X_tropicalis_Nov_2009_all_est.psl > sorted.psl

mv sorted.psl X_tropicalis_Nov_2009_all_est.psl

bgzip X_tropicalis_Nov_2009_all_est.psl

tabix -s 14 -b 16 -0 X_tropicalis_Nov_2009_all_est.psl.gz

The last command (tabix) creates a tabix index (.tbi) file that IGB knows to look for when it encounters files with extension .gz.

Next, I added each of the data files to IGB QuickLoad subversion repository - the sorted, bgzip-compressed EST data file, its index file, a compressed (gzip) file for the RefGene data set, and a compressed, gzip'd file for the mRNA data set. Since both will be loaded as soon as the user visits the genome version, I did not bother to make tabix'd versions of those files.

Deploying the files on QuickLoad

...

I then open a simple text editor (like TextEdit) and modify edited the file, like so: To be continued..... Image Added
Recently, we added the capability to specify various styles for annotation files delivered via QuickLoad. Most of these (e.g., foreground and background) are self-explanatory, but a few are not so obvious as they pertain to specialized aspects of how IGB presents data.

The "max_depth" parameter refers to the maximum number of overlapping annotations that can appear in a stack within a track. The "name_size" parameter specifies the font size for the track labels that appear on the left-hand side of the main display menu. The "url" parameter specifies where IGB links to when the user clicks the info button (blue "i" icon) next to a data set. In this case, we link back to the main UCSC Web page, since this is where the data came from originally. The "name" parameter indicates the file name IGB should load, and the "description" parameter specifies the tooltip text that will appear when the user hovers the mouse over the data set in the Data Access Panel. The "title" specifies the name of the data set as it will appear to the user.

Finally, I added my local QuickLoad site to IGB (using the Configure link in the Data Access panel) and loaded up the new genome.

Everything looked good, and after making a few changes to the color scheme (to maximize legibility), I committed my final changes to the repository.

My last step was to log into the main IGB QuickLoad site and run "svn up" to deploy the new genome and all its associated data files on the main site to ensure that all IGB users can now access the data.

Conclusion

All in all, the entire process, subtracted breaks for lunch and meetings, took about four hours, including the time it took to write this tutorial.

Here is what the genome will look like for some-one visiting this genome version for the first time: Image Added