*Result*: Data structures and algorithms for analysis of alternative splicing with RNA-Seq data

Title:
Data structures and algorithms for analysis of alternative splicing with RNA-Seq data
Authors:
Publisher Information:
Freie Universität
Publication Year:
2010
Collection:
Max Planck Society: MPG.PuRe
Document Type:
*Dissertation/ Thesis* doctoral or postdoctoral thesis
File Description:
application/pdf
Language:
English
Relation:
info:eu-repo/semantics/altIdentifier/urn/https://refubium.fu-berlin.de/handle/fub188/12643
Rights:
info:eu-repo/semantics/openAccess
Accession Number:
edsbas.C46D4CAE
Database:
BASE

*Further Information*

*Research in molecular biology was revolutionized by the invention of semi- automated Sanger sequencing for DNA in the early 1990’s. It was the foundation for the se- quencing of several genomes including the human genome. In the last few years a second revolution in the field of DNA sequencing has occurred that has changed the field. Next-generation sequencing (NGS) approaches suddenly enable the sequencing of millions of DNA fragments leading to short sequencing reads in less than a day. These NGS technologies are still in its infancy and their further development will herald a new era where DNA sequencing is inexpensive and easily manageable. This development has shifted the largest proportion of the workload onto the workbench of the computational biologist that has to cope with gigabases of sequence data, creating a bottleneck for scientific discovery. This thesis deals with the challenges related to the application of Next-generation sequencing (NGS) technologies to the sequencing of expressed mRNAs (RNA-Seq) and the detection of alternative exon events (AEEs), summarizing alternative splicing, alternative promoter, and alternative polyadenylation events. There are three main contributions. First, methods are introduced that enable the detection of AEEs within or between conditions, e.g. disease and normal, based on given gene annotation and mapped RNA-Seq reads. All methods are based on a Poisson model that describes the random placement of reads along a transcript. The methods are applied to a dataset from a human embryonic kidney (HEK) and a B cell line. Several thousand AEEs were predicted in these cell lines. The robustness and correctness of the predictions was assessed by simulations, bootstrapping, and RT-PCR validation experiments. In addition, a comparison of splicing prediction by RNA-Seq with prediction from exon arrays shows higher sensitivity and accuracy for RNA-Seq based predictions. Second, a new method for inferring isoform expression levels from RNA-Seq data is proposed, given annotated ...*