Europe PMC requires Javascript to function effectively.
Either your web browser doesn't support Javascript or it is currently turned off. In the latter case, please
turn on Javascript support in your web browser and reload this page.
This website requires cookies, and the limited processing of your
personal data in order to function. By using the site you are agreeing
to this as outlined in our
privacy notice and cookie policy.
This article is a preprint. It may not have been peer reviewed.
A preprint is a complete scientific manuscript that an author uploads
on a public server for free viewing. Initially it is posted without
peer review, but may acquire feedback or reviews as a preprint, and
may eventually be published in a peer-reviewed journal. The posting of
preprints on public servers allows almost immediate dissemination and
scientific feedback early in the 'publication' process.
Share this article
Share with emailShare with twitterShare with linkedinShare with facebook
Abstract
This study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, kmer size, databases, and input sample data size. NBC++ can successfully assess a wide range of superkingdoms using a small training database. We demonstrate that NBC++ and Kraken2 are affected by database depth with macro measures increasing with depth but that the full diversity of life, especially viruses, is still a challenge for these classifiers. NBC++ spends less time training but at the cost of long querying time. The major enhancements are to accommodate canonical k mer storage (with major storage savings), adaptable and optimized memory allocation that quickens the query analysis and allows the classifier to be run on almost any system, and enables output of the log-likelihood values against each training genome which provides users with valualbe confidence information.