Please use this identifier to cite or link to this item: http://hdl.handle.net/2381/42494
Title: Exploiting Public Human Genome NGS Datasets to Characterize Repetitive DNA and Recover Assembly Gaps
Authors: Ogeh, Denye Nathaniel
Supervisors: Badge, Richard
Award date: 11-May-2018
Presented at: University of Leicester
Abstract: With the advent of Next Generation Sequencing (NGS), we have witnessed the generation of enormous volumes of short read sequence data, cheaply and on short time scales. Nevertheless, the quality of genome assemblies generated using NGS technologies has been greatly affected by this innovation, compared to those generated using Sanger DNA sequencing. This is largely due to the inability of short read sequence data alone to scaffold repetitive structures, creating gaps, inversions and rearrangements and ultimately resulting in assemblies that are, at best, draft forms (by draft we mean, assembly that is only a preliminary result that will require more work to be done to make it a more complete and accurate representation of the genome). Single molecule long-read sequencing (SMS) technologies on the other hand, address this challenge by generating sequences with greatly increased read lengths, offering the prospect to better recover these complex repetitive structures, concomitantly improving assembly quality. Following this development, we evaluate the ability of SMS data (specifically Pacific Biosciences SMRT data and Oxford Nanopore MinION data from human genomes) to recover poorly represented repetitive sequences (specifically, GCrich human minisatellites), identify novel transposable element insertions and enable the closing of gapped regions. Our results show that by using single molecule sequencing and long read technology, poorly represented repetitive sequences (specifically, minisatellites and L1s) and other missing elements in published human genome assemblies can be characterized by developing custom software, scalable for the analysis of single molecule long-reads (particularly, Pacific Biosciences’ SMRT technology). The tool designed is cross-platform, thus, giving computational and non-computational biologists a straightforward approach and less technical platform for local analysis of specific poorly characterized DNA sequences.
Links: http://hdl.handle.net/2381/42494
Type: Thesis
Level: Doctoral
Qualification: PhD
Rights: Copyright © the author. All rights reserved.
Appears in Collections:Leicester Theses
Theses, Dept. of Genetics

Files in This Item:
File Description SizeFormat 
2018OGEHDNPhD.pdfThesis3.68 MBAdobe PDFView/Open


Items in LRA are protected by copyright, with all rights reserved, unless otherwise indicated.