View on GitHub

NanoBLASTer

Basic Local Alignment and Search Tool for Oxford Nanopore Long Sequences

Download this project as a .zip file Download this project as a tar.gz file

NanoBLASTer: Basic Local Alignment and Search Tool for Oxford Nanopore Long Sequences

Current Version: 0.16
Release Date: October 13, 2015
Platform: Linux x64 system

Background

Oxford Nanopore's new MinION instrument is a handheld single-molecule DNA sequencer using nanopore-technology to produce long reads. However, the quality of the reads has been, to date, lower than other technologies, causing great interest to develop new algorithms that can make use of the data. So far, alignment methods including LAST, BLAST, BWA-MEM and GraphMap have been used to analyze these sequences. However, each of these tools has signicant challenges to use with these data: LAST and BLAST require considerable processing time for high sensitivity, BWA-MEM has the smallest average alignment length, and GraphMap aligns many random strings with moderate accuracy.

Results

To address these challenges we developed a new read aligner called NanoBLASTer specically designed for long nanopore reads. In experiments resequencing the well-studied S. cerevisiae (yeast) and Escherichia coli (E. coli) genomes, we show that our algorithm produces longer alignments with higher overall sensitivity than LAST, BLAST and BWA-MEM. We also show that the runtime of NanoBLASTer is faster than GraphMap, BLAST and BWA-MEM. Finally, we show that NanoBLASTer is the most specific of the aligners.

Conclusion

Using our improved aligner, we characterize the accuracy of nanopore reads, and present our insights into the biases and applications of the new sequencing technology.

Methods

NanoBLASTer uses fixed size exact matching seeds followed by DP-based extension. However, because of the high error rate of the nanopore sequencing instruments (approximately 10% to 40% base error rate), the seeds that must be used are extremely short and provide relatively little specificity. NanoBLASTer overcomes this challenge and improves the specificity of short seeds by clustering neighboring seeds into mapping regions, and then identifying highly similar segment that we call ANCHORs from the clustered seeds. Extending the top scoring candidate ANCHORs with a block-wise banded sequence alignment algorithm generates the alignments. NanoBLASTer aligns long noisy reads using the following steps:

Nanopore sequencing data: Oxford Nanopore Data for Reference and Reads

Please contact moamin@cs.stonybrook.edu for quick response to resolve any bug or feature update.

Installation

Please follow the following steps to install NanoBLASTer from source:

Clone NanoBLASTer source code: git clone https://github.com/ruhulsbu/NanoBLASTer.git
Go to the NanoBLASTer source directory: cd NanoBLASTer/nano_src 
Build the NanoBLASTer project: make

Input specifications

Use the following options to run NanoBLASTer:
-C: To specify one of the Parameters: -C10, -C25, or -C50
-r: To specify the name of Reference file (FASTA format)
-i: To specify the name of Reads file (FASTA format)
-o: To specify the prefix of Output file
-k: To specify the size of KMER
-a: To specify the size of ANCHOR
-l: To specify the min number of Clusters
-s: To run the program at higher sensitivity
-n: To specify the Number of reads to be aligned
-g: To specify the interval (or Gap) length between KMERs
-X: To configure NanoBLASTer for less memory using Single index
-h, or -?: To print this Help information.

Usage examples

Run NanoBLASTer in ``fast'' mode (KMER=13, ANCHOR=45 and CLUSTERS=10):
$ ./nanoblaster -C10 -r path/to/reference.fa -i path/to/reads.fa -o output

Run NanoBLASTer in ``sensitive'' mode (KMER=11, ANCHOR=40 and CLUSTERS=25):
$ ./nanoblaster -C25 -r path/to/reference.fa -i path/to/reads.fa -o output

Run NanoBLASTer in ``highly sensitive'' mode (KMER=11, ANCHOR=40, CLUSTERS=50 and SENSITIVITY=TRUE):
$ ./nanoblaster -C50 -r path/to/reference.fa -i path/to/reads.fa -o output

Run NanoBLASTer with default KMER=11, ANCHOR=40 and CLUSTERS=10:
$ ./nanoblaster -r path/to/reference.fa -i path/to/reads.fa -o output

Run NanoBLASTer with KMER=13, ANCHOR=45 and CLUSTERS=25 using default parameters at higher sensitivity:
$ ./nanoblaster -r path/to/reference.fa -i path/to/reads.fa -o output -k13 -a45 -l25 -s

Run NanoBLASTer in ``fast'' mode using ``single index'':
$ ./nanoblaster -C10 -X -r path/to/reference.fa -i path/to/reads.fa -o output

Run NanoBLASTer in ``fast'' mode using ``interval=4'':
$ nanoblaster -C10 -g 4 -r path/to/reference.fa -i path/to/reads.fa -o output

Optimize configurations

Edit the configurations in constant.h file to optimize NanoBLASTer alignment manually. Editing the following constants will have an overall impact on the alignment quality of NanoBLASTer:

Editing the following constants will have an impact on the sensitivity and runtime of NanoBLASTer. These constants have been defined with the default values for NanoBLASTer. Some of these default values have been optimized for "fast", "sensitive" and "highly sensitive" mode. So these three configured modes will not be affected even if the following constants are edited. It is recommended to define -k, -a and -l when the following constants are changed manually:

Finally, define -k, -a and -l while running the ./nanoblaster executable with these changed constants to get the expected performance.

Currently, the sensitivity modes (C10, C25 and C25) are defined to align data set with very low accuracy (~65%). If the accuracy of the data set is ~85% then we need to modify the KMER and/or ANCHOR size when mentioning the sensitivity mode to get better runtime performance.

Contact information

Please send your comments or bug reports to moamin@cs.stonybrook.edu