Page tree
Skip to end of metadata
Go to start of metadata

How to create and add a new default data provider (ucsc-rest-api-service is a module developed to integrate the UCSC Rest API to IGB as a default data provider; this can be used as a reference)

  1. Create a new module for the data provider and add the following classes:
    • Factory class that implements DataProviderFactory - this interface has methods to initialize the actual data provider class with different options of parameters.
    • Add the created DataProviderFactory class as a Reference in the DataProviderFactoryManager class.
    • Data provider class that extends BaseDataProvider - this class has methods that have to be implemented to initialize the provider parameters and provide implementation to get all the supported genome versions and all the available data sets for a single genome version.
    • The data provider class should also implement these two interfaces, AssemblyProvider and ReferenceSequenceProvider - these have methods that are used to get the chromosome and sequence information.
    • It is mandatory to override the below method in the data provider class.

      CustomDataProvider.java
      @Override
      public Optional<String> getFactoryName() {
         return Optional.of(CustomDataProviderFactory.FACTORY_NAME);
      }
  2. Methods that are needed to be implemented:
    • initialize() - the data provider will be initialized, and usually an object that stores the available genomes is created and loaded with the data.
    • getSupportedGenomeVersionNames() - other classes use this method to get the available genomes for that provider.
    • getAvailableDataSets() - has a parameter, data container, which has the selected genome version for which this method returns the available datasets. The available datasets has to be returned as a Set<Dataset> object, so once you retrieve all the datasets convert them into the DataSet objects and add it to a Set.
    • getAssemblyInfo() - this method is used to get the chromosome information. It should return a Map<String, Integer> object, where the key is the chromosome name, and the value is the chromosome start.
    • getSequence() - this is used to get the sequence of the genome version for the requested chromosome and region.
  3. Once all the code is implemented, add that provider to this file using the format below:

    igbDefaultPrefs.json
    {
         "factoryName": "factory name", //factory name for the provider, this should be the same as the Factory name used in the created DataProviderFactory class
         "name": "provider name", //provider name
         "url": "provider_url",
         "loadPriority": "load_priority_int", //this is used to determine the order in which data providers are initialized
         "defaultDataProviderId": "IgbDefaultDataProviderId:16" //this is like a key in a map, so it should be unique from the rest of the providers
    }
  4. All the default data providers should be present in igbDefaultPrefs.json, this file is used to initialize the data providers. If all the code is implemented for the provider and it’s not declared here, the provider won’t be initialized.
  5. When you created the module, it should be added to the modules dependency section in igb-project pom.xml like below:

    pom.xml (igb-project)
    <module>core/ucsc-rest-api-service</module>
    
    <dependency>
    	<groupId>org.lorainelab.igb</groupId>
        <artifactId>ucsc-rest-api-service</artifactId>
        <version>${project.parent.version}</version>
        <scope>provided</scope>
    </dependency>
  6. Add it as a dependency in main pom.xml as well like below:
pom.xml (main)
<dependency>
	<groupId>org.lorainelab.igb</groupId>
    <artifactId>ucsc-rest-api-service</artifactId>
    <scope>runtime</scope>
</dependency>
  • No labels