Output list
Preprint
Median of heaps: linear-time selection by recursively constructing binary heaps
Posted to a preprint site 24/04/2023
arXiv (Cornell University)
The first worst-case linear-time algorithm for selection was discovered in 1973; however, linear-time binary heap construction was first published in 1964. Here we describe another worst-case linear selection algorithm,which is simply implemented and uses binary heap construction as its principal engine. The algorithm is implemented in place, and shown to perform similarly to in-place median of medians.
Preprint
Adversarial network training using higher-order moments in a modified Wasserstein distance
Posted to a preprint site 07/10/2022
arXiv (Cornell University)
Generative-adversarial networks (GANs) have been used to produce data closely resembling example data in a compressed, latent space that is close to sufficient for reconstruction in the original vector space. The Wasserstein metric has been used as an alternative to binary cross-entropy, producing more numerically stable GANs with greater mode covering behavior. Here, a generalization of the Wasserstein distance, using higher-order moments than the mean, is derived. Training a GAN with this higher-order Wasserstein metric is demonstrated to exhibit superior performance, even when adjusted for slightly higher computational cost. This is illustrated generating synthetic antibody sequences.
Preprint
Optimally selecting the top K values from X+Y with layer-ordered heaps
Posted to a preprint site 30/01/2020
arXiv (Cornell University)
Selection and sorting the Cartesian sum, $X+Y$, are classic and important
problems. Here, a new algorithm is presented, which generates the top $k$
values of the form $X_i+Y_j$. The algorithm relies only on median-of-medians
and is simple to implement. Furthermore, it uses data structures contiguous in
memory, and is fast in practice. The presented algorithm is demonstrated to be
theoretically optimal.
Preprint
Posted to a preprint site 02/08/2017
arXiv (Cornell University)
The bit-reversed permutation is a famous task in signal processing and is key to efficient implementation of the fast Fourier transform. This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes, local pairwise swapping of bits, and swapping via a cache-localized matrix buffer. Three new strategies for performing the bit-reversed permutation in C++11 are proposed: an inductive method using the bitwise XOR operation, a template-recursive closed form, and a cache-oblivious template-recursive approach, which reduces the bit-reversed permutation to smaller bit-reversed permutations and a square matrix transposition. These new methods are compared to the extant approaches in terms of theoretical runtime, empirical compile time, and empirical runtime. The template-recursive cache-oblivious method is shown to be competitive with the fastest known method; however, we demonstrate that the cache-oblivious method can more readily benefit from parallelization on multiple cores and on the GPU.
Preprint
TRIOT: Faster tensor manipulation in C++11
Posted to a preprint site 31/03/2017
arXiv.org
The Art, Science, and Engineering of Programming, 2017, Vol. 1, Issue 2, Article 6 [abridged] Context: Multidimensional arrays are used by many different algorithms. As such, indexing and broadcasting complex operations over multidimensional arrays are ubiquitous tasks and can be performance limiting. Inquiry: Simultaneously indexing two or more multidimensional arrays with different shapes (e.g., copying data from one tensor to another larger, zero padded tensor in anticipation of a convolution) is difficult to do efficiently: Hard-coded nested for loops in C, Fortran, and Go cannot be applied when the dimension of a tensor is unknown at compile time. Likewise, boost::multi_array cannot be used unless the dimensions of the array are known at compile time, and the style of implementation restricts the user from using the index tuple inside a vectorized operation (as would be required to compute an expected value of a multidimensional distribution). On the other hand, iteration methods that do not require the dimensionality or shape to be known at compile time (e.g., incrementing and applying carry operations to index tuples or remapping integer indices in the flat array), can be substantially slower than hard-coded nested for loops. ... Importance: Manipulation of multidimensional arrays is a common task in software, especially in high performance numerical methods. This paper proposes a novel way to leverage template recursion to iterate over and apply operations to multidimensional arrays, and then demonstrates the superior performance and flexibility of operations that can be achieved using this new approach.
Preprint
Detrended real total return factor is more predictive of future 10-year real returns than CAPE
Date created 04/03/2025–04/03/2025
Cyclically adjusted price-to-earnings (CAPE) is famous for its predictive power in estimating long-term returns. Recent multiple expansions have produced CAPE values more extreme than any time other than the dot-com bubble in the late 1990s. Here we show that detrending the real total return factor relative to a long-term exponential growth model is more predictive of future real returns than CAPE. This approach is oblivious to both earnings and the risk-free rate. We also show that the predictive power of CAPE is much decreased after removing this signal. Detrended real total return factor estimates valuations to be stretched relative to historical data but slightly less so than suggested by CAPE; even so, both CAPE and the method compared here estimate annualized future 10-year real returns beginning Feb 1, 2024 will be ∈ [2.656%, 2.973%] without conditioning on additional macro factors.