Abstract
The Art, Science, and Engineering of Programming, 2017, Vol. 1,
Issue 2, Article 6 [abridged] Context: Multidimensional arrays are used by many different
algorithms. As such, indexing and broadcasting complex operations over
multidimensional arrays are ubiquitous tasks and can be performance limiting.
Inquiry: Simultaneously indexing two or more multidimensional arrays with
different shapes (e.g., copying data from one tensor to another larger, zero
padded tensor in anticipation of a convolution) is difficult to do efficiently:
Hard-coded nested for loops in C, Fortran, and Go cannot be applied when the
dimension of a tensor is unknown at compile time. Likewise, boost::multi_array
cannot be used unless the dimensions of the array are known at compile time,
and the style of implementation restricts the user from using the index tuple
inside a vectorized operation (as would be required to compute an expected
value of a multidimensional distribution). On the other hand, iteration methods
that do not require the dimensionality or shape to be known at compile time
(e.g., incrementing and applying carry operations to index tuples or remapping
integer indices in the flat array), can be substantially slower than hard-coded
nested for loops. ... Importance: Manipulation of multidimensional arrays is a
common task in software, especially in high performance numerical methods. This
paper proposes a novel way to leverage template recursion to iterate over and
apply operations to multidimensional arrays, and then demonstrates the superior
performance and flexibility of operations that can be achieved using this new
approach.