Performing Selection on a Monotonic Function in Lieu of Sorting Using Layer-Ordered Heaps

Kyle Lucke; Jake Pennington; Patrick Kreitzberg; Lukas Käll; Oliver Serang

doi:10.1021/acs.jproteome.0c00711

Back

Performing Selection on a Monotonic Function in Lieu of Sorting Using Layer-Ordered Heaps

Journal article

Peer reviewed

Performing Selection on a Monotonic Function in Lieu of Sorting Using Layer-Ordered Heaps

Kyle Lucke, Jake Pennington, Patrick Kreitzberg, Lukas Käll and Oliver Serang

Journal of proteome research, Vol.20(4), pp.1849-1854

02/04/2021

DOI: https://doi.org/10.1021/acs.jproteome.0c00711

PMID: 33529032

Abstract

algorithms

peptide search

Percolator

layer-ordered heap

partition

performance

tandem mass spectrometry

nonparametric statistical test

sorting

false discovery rate

Nonparametric statistical tests are an integral part of scientific experiments in a diverse range of fields. When performing such tests, it is standard to sort values; however, this requires Ω( ( )) time to sort values. Thus given enough data, sorting becomes the computational bottleneck, even with very optimized implementations such as the C++ standard library routine, std::sort. Frequently, a nonparametric statistical test is only used to partition values above and below a threshold in the sorted ordering, where the threshold corresponds to a significant statistical result. Linear-time selection and partitioning algorithms cannot be directly used because the selection and partitioning are performed on the transformed statistical significance values rather than on the sorted statistics. Usually, those transformed statistical significance values (e.g., the value when investigating the family-wise error rate and values when investigating the false discovery rate (FDR)) can only be computed at a threshold. Because this threshold is unknown, this leads to sorting the data. Layer-ordered heaps, which can be constructed in ( ), only partially sort values and thus can be used to get around the slow runtime required to fully sort. Here we introduce a layer-ordering-based method for selection and partitioning on the transformed values (e.g., values or values). We demonstrate the use of this method to partition peptides using an FDR threshold. This approach is applied to speed up Percolator, a postprocessing algorithm used in mass-spectrometry-based proteomics to evaluate the quality of peptide-spectrum matches (PSMs), by >70% on data sets with 100 million PSMs.

Metrics

24 Record Views

Details

Title: Performing Selection on a Monotonic Function in Lieu of Sorting Using Layer-Ordered Heaps
Creators - without role: Kyle Lucke - University of Montana
Jake Pennington - University of Montana
Patrick Kreitzberg - University of Montana
Lukas Käll - Science for Life Laboratory
Oliver Serang - University of Montana
Publication Details: Journal of proteome research, Vol.20(4), pp.1849-1854
Identifiers: 991019559549907081
Language: English
Resource Type: Journal article