|
DARPA BRASS: |
|
|
|
|
|
|
|
Generating Hyper-Portable Future-Proof Computational Kernels with SPIRAL
(FA8750-16-2-0033) |
|
|
|
Franz Franchetti (PI), Mike Franusich (SpiralGen, Inc, Co-PI), James C. Hoe (Co-PI), Tze Meng Low (Co-PI) José M. F. Moura (Co-PI), David Padua (UIUC, Co-PI)
PhD Students: Richard Veras (PhD student, advisor: Franchetti), Thom Popovici (PhD student, advisor: Franchetti)
Engineers: Brian Duff, Jason Larkin
|
|
|
|
Overview
Software implementations and algorithms that solve a particular problem are inherently not future-proof as an algorithm appropriate for the problem and execution environment has to be instantiated into software that locks in many choices. However, the actual operation—the answer or a given input—remains invariant through execution environment changes and changes in required fidelity. Our layered approach to application specification leverages this insight: we capture the systematic process by which the specification of a desired operation is turned into an algorithm, and subsequently instantiated into an actual implementation.
We target a DARPA relevant class of problems that is captured through regular computation over multi-dimensional data. Important signal and image processing operations like synthetic aperture radar (SAR), and space-time adaptive processing (STAP) for target detection fall into this class and run on DARPA relevant high performance embedded computing (HPEC) platforms. In this class are also simulations and computer-aided design/engineering (CAD/CAE) problems on traditional HPC platforms and embedded application kernels. For these problems, we propose a layered approach to specifying 1) the desired operation, 2) the paradigm (environment) in which the operation is computed, and 3) the specific details of the paradigm i.e. the ISA, cache sizes, and the type of multi-threading available. These three layers map to changes required in the operation, algorithms, and implementations respectively. An implication of separating the specification into these three distinct but inter-related layers is that there is a natural separation of concerns, allowing the same specification to handle different changes to the environment. For example, when porting the operation to an architecture with a different ISA, this only affects the regions of the code touched by the platform specification, whereas changing from a sequential to multicore environment requires changes to the choice of algorithm.
The lowering of abstractions from operation specification to algorithms, and subsequently, to implementation is performed through a systematic process of algebraic term rewriting. This preserves the correctness of the operation specification, while allowing different algorithm specifications to be formally derived from the original operation specification. In addition, these systematic rewrites also allows analysis to carry over from one algorithm to another. We provide hooks for BRASS TA2 performers to fully query and reconfigure applications specified and built from our specifications. Our enabling technologies are the SPIRAL system that has demonstrated performance portability across a wide variety of platforms for the domain of spectral methods, a calculus of loop invariants that allows one to derive algorithms with desired performance characteristics, and language/compiler technology from the Polaris group at UIUC.
|
|
|
|
Approach
Our approach centers on the ideas that future-proof code requires alayered specification of semantics, quality/parameter relationships, performance/quality/resource trade-offs, quality measures and fault tolerance, and a powerful code synthesis engine that turns this specification into reconfigurable, extensible high performance implementations specialized to their execution environment. We base our proposed system on the SPIRAL code generation and autotuning system and philosophy, augmented with lessons learned from FFTW, BLIS, ATLAS, HTA, PetaBricks, and modern as well as traditional optimizing compiler approaches. The key idea is to turn SPIRAL’s extreme cross-platform compile-time portability into an extensible post-deployment reconfiguration capability. SPIRAL becomes akin to an installation/reconfiguration-time JIT that regenerates an FFTW-like implementation for a given specification and execution target. However, the implementation is not used as autotuning library but as a framework for BRASS TA2 systems to reconfigure the application based on execution environment changes and changes in user/mission requirements. Finally, we define a user-friendly specification language that borrows from HTA, and functional programming ideas to allow users to fully express application intent and trade-offs as well as execution environments. To provide an even easier on-ramp for legacy code, a subset of the functionality will be accessible through a delayed-evaluation interface using BLAS, LAPACK, FFTW and a subset of C as domain-specific specification language. |
|
|
|
Selected SPIRAL Publications (Underlying Technology)
M. Püschel, F. Franchetti, Y. Voronenko
Spiral
Encyclopedia of Parallel Computing, D. A. Padua (Editor).
F. Franchetti, F. de Mesmay, Daniel McFarlin, and M. Püschel
Operator Language: A Program Generation Framework for Fast Kernels
Proceedings of IFIP Working Conference on Domain Specific Languages (DSL WC), 2009.
Best Paper Award.
D. S. McFarlin, V. Arbatov, F. Franchetti, M. Püschel
Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets
Proceedings of International Conference on Supercomputing (ICS), 2011.
P. A. Milder, F. Franchetti, J. C. Hoe, and M. Püschel
Computer Generation of Hardware for Linear Digital Signal Processing Transforms
ACM Transactions on Design Automation of Electronic Systems,17(2), Article 15, 2012.
ACM TODAES Best Paper Award 2014
S. Chellappa, F. Franchetti and M. Püschel
Computer Generation of Fast FFTs for the Cell Broadband Engine
Proceedings of International Conference on Supercomputing (ICS), 2009.
P. D’Alberto, F. Franchetti, P. A. Milder, A. Sandryhaila, J. C. Hoe, J. M. F. Moura, and M. Püschel
Generating FPGA Accelerated DFT Libraries
Proceedings of Field-Programmable Custom Computing Machines (FCCM) 2007.
F. Franchetti, Y. Voronenko, and M. Püschel
FFT Program Generation for Shared Memory: SMP and Multicore
Proceedings Supercomputing 2006.
F. Franchetti, M. Püschel
Short Vector Code Generation and Adaptation for DSP Algorithms
Proceedings of ICASSP2003 - International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 537-540.
F. Franchetti, Y. Voronenko, and G. Almasi
Automatic Generation of the HPC Challenges Global FFT Benchmark for BlueGene/P
In Proceedings of High Performance Computing for Computational Science (VECPAR) 2012.
F. Franchetti, Y. Voronenko, S. Chellappa, J. M. F. Moura, and M. Püschel
Discrete Fourier Transform on Multicores: Algorithms and Automatic Implementation
IEEE Signal Processing Magazine, special issue on “Signal Processing on Platforms with Multiple Cores”, 2009.
M. Püschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, N. Rizzolo
SPIRAL: Code Generation for DSP Transforms
Proceedings of the IEEE Special Issue on "Program Generation, Optimization, and Adaptation," Vol. 93, No. 2, 2005, pages 232-275.
|
|