Driving ontology development The Ontogrator could be used to help

Driving ontology development The Ontogrator could be used to help mature and improve the ontologies it relies upon. More precisely, it could implement a mechanism to provide feedback on terms that have either been overrepresented in data (and may need further specialization) useful handbook or do not exist in the current hierarchy (e.g., a term clearing house can be provided for the submission of new terms to existing ontologies). Similarly, Ontogrator could be open to user-driven updates of annotations/mappings of the Ontograted resources (e.g., a user can indicate that a returned entry is not relevant to a particular query, so the software could have the ability to learn e.g., by removing the annotations and/or by re-training the mapping tools.

Conclusions We argue that the combined approach of faceted browsing and resource aggregation is an effective solution for aligning and mining information across a collection of related databases. Furthermore, by combining the power of searching over information resources with ontologies, complex distributed data sets can be searched over whilst leveraging the combined knowledge of expert communities. Acknowledgements The work on the Ontogrator Platform presented here was funded by the NERC Environmental Bioinformatics Centre (NEBC), UK.
The 16S rRNA gene sequence of the strain TK-6T (“type”:”entrez-nucleotide”,”attrs”:”text”:”Z30214″,”term_id”:”520869″,”term_text”:”Z30214″Z30214) shows the highest degree of sequence identity, 97%, to the type strain of H. hydrogenophilus [6].

Further analysis shows 96% 16S rRNA gene sequence identity with an uncultured Aquificales bacterium clone pKA (“type”:”entrez-nucleotide”,”attrs”:”text”:”AF453505″,”term_id”:”21666733″,”term_text”:”AF453505″AF453505) from a near-neutral thermal spring in Kamchatka, Russia. The single genomic 16S rRNA sequence of H. thermophilus was compared with the most recent release of the Greengenes database [13] using NCBI BLAST under default values and the relative frequencies of taxa and keywords, weighted by BLAST scores, were determined. The five most frequent genera were Hydrogenobacter (52.4%), Entinostat Thermocrinis (18.8%), Aquifex (10.3%), Sulfurihydrogenibium (6.2%) and Hydrogenivirga (5.7%). Regarding hits to sequences from other members of the genus, the average identity within HSPs (high-scoring segment pairs) was 96.1%, whereas the average coverage by HSPs was 93.5%. The species yielding the highest score was H. hydrogenophilus. The five most frequent keywords within the labels of environmental samples which yielded hits were ‘hot’ (6.5%), ‘yellowstone’ (5.8%), ‘spring’ (5.6%), ‘national/park’ (5.4%) and ‘microbial’ (3.9%). These keywords corroborate what is known from the ecology and physiology of strain TK-6T [1,2].

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>