*Result*: A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data.
Nat Biotechnol. 2010 May;28(5):511-5. (PMID: 20436464)
Nat Biotechnol. 2011 May 15;29(7):644-52. (PMID: 21572440)
Insect Mol Biol. 2012 Apr;21(2):205-21. (PMID: 22283785)
Genome Res. 2011 Feb;21(2):315-24. (PMID: 21177959)
Bioinformatics. 2009 Nov 1;25(21):2872-7. (PMID: 19528083)
Bioinformatics. 2012 Apr 15;28(8):1086-92. (PMID: 22368243)
Nat Methods. 2008 Jul;5(7):621-8. (PMID: 18516045)
Genome Res. 2008 May;18(5):810-20. (PMID: 18340039)
J Mol Biol. 1990 Oct 5;215(3):403-10. (PMID: 2231712)
Genome Res. 2007 Nov;17(11):1697-706. (PMID: 17908823)
Nat Methods. 2010 Nov;7(11):909-12. (PMID: 20935650)
J Biomol Struct Dyn. 1989 Aug;7(1):63-73. (PMID: 2684223)
Genome Res. 2008 Feb;18(2):324-30. (PMID: 18083777)
J Comput Biol. 1995 Summer;2(2):291-306. (PMID: 7497130)
Nucleic Acids Res. 2010 Jan;38(Database issue):D5-16. (PMID: 19910364)
Genome Res. 2008 May;18(5):802-9. (PMID: 18332092)
Genome Res. 2008 May;18(5):821-9. (PMID: 18349386)
Genome Res. 2010 Feb;20(2):265-72. (PMID: 20019144)
Bioinformatics. 2009 May 1;25(9):1105-11. (PMID: 19289445)
*Further Information*
*Background: The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph.
Results: Since the de Bruijn graph contains all branching possibilities, we develop a memory-efficient algorithm to recover alternative splicing information and library-specific expression information directly from the graph without prior genomic knowledge. We implement the algorithm as a postprocessing module of the Velvet assembler. We validate our algorithm by simulating the transcriptome assembly of Drosophila using its known genome, and by performing Drosophila transcriptome assembly using publicly available RNA-Seq libraries. Under a range of conditions, our algorithm recovers sequences and alternative splicing junctions with higher specificity than Oases or Trans-ABySS.
Conclusions: Since our postprocessing algorithm does not consume as much memory as Velvet and is less memory-intensive than Oases, it allows biologists to assemble large libraries with limited computational resources. Our algorithm has been applied to perform transcriptome assembly of the non-model blow fly Lucilia sericata that was reported in a previous article, which shows that the assembly is of high quality and it facilitates comparison of the Lucilia sericata transcriptome to Drosophila and two mosquitoes, prediction and experimental validation of alternative splicing, investigation of differential expression among various developmental stages, and identification of transposable elements.*