Software implementations and algorithms that solve a particular problem are inherently not future-proof as an algorithm appropriate for the problem and execution environment has to be instantiated into software that locks in many choices. However, the actual operation—the answer or a given input—remains invariant through execution environment changes and changes in required fidelity. Our layered approach to application specification leverages this insight: we capture the systematic process by which the specification of a desired operation is turned into an algorithm, and subsequently instantiated into an actual implementation.
We target a DARPA relevant class of problems that is captured through regular computation over multi-dimensional data. Important signal and image processing operations like synthetic aperture radar (SAR), and space-time adaptive processing (STAP) for target detection fall into this class and run on DARPA relevant high performance embedded computing (HPEC) platforms. In this class are also simulations and computer-aided design/engineering (CAD/CAE) problems on traditional HPC platforms and embedded application kernels. For these problems, we propose a layered approach to specifying 1) the desired operation, 2) the paradigm (environment) in which the operation is computed, and 3) the specific details of the paradigm i.e. the ISA, cache sizes, and the type of multi-threading available. These three layers map to changes required in the operation, algorithms, and implementations respectively. An implication of separating the specification into these three distinct but inter-related layers is that there is a natural separation of concerns, allowing the same specification to handle different changes to the environment. For example, when porting the operation to an architecture with a different ISA, this only affects the regions of the code touched by the platform specification, whereas changing from a sequential to multicore environment requires changes to the choice of algorithm.
The lowering of abstractions from operation specification to algorithms, and subsequently, to implementation is performed through a systematic process of algebraic term rewriting. This preserves the correctness of the operation specification, while allowing different algorithm specifications to be formally derived from the original operation specification. In addition, these systematic rewrites also allows analysis to carry over from one algorithm to another. We provide hooks for BRASS TA2 performers to fully query and reconfigure applications specified and built from our specifications. Our enabling technologies are the SPIRAL system that has demonstrated performance portability across a wide variety of platforms for the domain of spectral methods, a calculus of loop invariants that allows one to derive algorithms with desired performance characteristics, and language/compiler technology from the Polaris group at UIUC.
Our approach centers on the ideas that future-proof code requires alayered specification of semantics, quality/parameter relationships, performance/quality/resource trade-offs, quality measures and fault tolerance, and a powerful code synthesis engine that turns this specification into reconfigurable, extensible high performance implementations specialized to their execution environment. We base our proposed system on the SPIRAL code generation and autotuning system and philosophy, augmented with lessons learned from FFTW, BLIS, ATLAS, HTA, PetaBricks, and modern as well as traditional optimizing compiler approaches. The key idea is to turn SPIRAL’s extreme cross-platform compile-time portability into an extensible post-deployment reconfiguration capability. SPIRAL becomes akin to an installation/reconfiguration-time JIT that regenerates an FFTW-like implementation for a given specification and execution target. However, the implementation is not used as autotuning library but as a framework for BRASS TA2 systems to reconfigure the application based on execution environment changes and changes in user/mission requirements. Finally, we define a user-friendly specification language that borrows from HTA, and functional programming ideas to allow users to fully express application intent and trade-offs as well as execution environments. To provide an even easier on-ramp for legacy code, a subset of the functionality will be accessible through a delayed-evaluation interface using BLAS, LAPACK, FFTW and a subset of C as domain-specific specification language.
P. D’Alberto, F. Franchetti, P. A. Milder, A. Sandryhaila, J. C. Hoe, J. M. F. Moura, and M. Püschel Generating FPGA Accelerated DFT Libraries
Proceedings of Field-Programmable Custom Computing Machines (FCCM) 2007.
M. Püschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, N. Rizzolo SPIRAL: Code Generation for DSP Transforms
Proceedings of the IEEE Special Issue on "Program Generation, Optimization, and Adaptation," Vol. 93, No. 2, 2005, pages 232-275.