However, further exploration of this database revealed that sequences related to reproduction, the other major issue for turbot farming, were underrepresented. In order to ob tain more sequences of genes related to sex phenotype and reproduction control, and for isolation of EST associated genetic markers, a 454 pyrosequencing Inhibitors,Modulators,Libraries run was Inhibitors,Modulators,Libraries performed Brefeldin_A from the brain hypophysis gonadal axis by using tissues of 30 turbot individuals at different stages of sexual development. Table 2 summarizes the statistics of the turbot pyrosequencing normalized library. Raw data generated 2,762,845 sequences. These sequences were fil tered using Roches software with default settings. After filtration, 1,191,866 sequence reads were obtained with an average length of 286 bp. Se quences were assembled into 65,472 contigs with a mean length of 625.
9 bp. About half of these contigs were longer than 500 bp and their distribution by range was the highest for the 200 499 bp length, followed by the 1 199 bp length and finally by the 500 999 bp length. The average depth coverage per contig was of 4. 6 sequences. Reads obtained in this high throughput sequence analysis have been submitted Inhibitors,Modulators,Libraries to the NCBI Sequence Read Archive under accession number SRA056483. Table 3 shows the top 20 longest contigs obtained from the 454 run with their annotation. They ranged from 3,550 bp to 5,012 bp and their average coverage depth per nucleotide ranged between 4. 3 and 33. 2. Cytochrome c oxidase subunit 3 was the longest contig. Table 4 shows the top 20 contigs with the deepest coverage.
Although a normalized library was used, most contigs with the deepest coverage corresponded to pro tein ribosomal genes. However, genes involved in the reproductive Inhibitors,Modulators,Libraries system such as the histone deacetylase complex or the epididymal secretory protein, which is highly expressed on the surface of ejaculated spermato zoa, were also present. About half of the contigs obtained in the 454 run were successfully annotated and classified into Gene Ontology categories. More precisely, contigs exclusively obtained by the 454 run were functionally classified in the BP, CC and MF categories. Creation of the turbot 3 database The sequencing strategies used, i. e. traditional Sanger and high throughput 454, yielded a high amount of transcriptomic sequences both from immune and repro ductive systems in turbot.
With all the information generated, a new Turbot 3 database was created and stored in a web based portal for exploitation, first by the consortium participating in this project and then publically once the project is finished by the end of 2013. Cap3 soft ware was used to assemble the sequences coming from all Sanger based libraries and the contigs from 454 pyrosequencing, yielding 52,427 unique sequences, thus reducing redundancy among sequences.