
DARPA PAPPA: 



SnowWhite: High Level Reasoning In Compilers
(DARPA HR00112090018)




F. Franchetti (PI), J. C. Hoe (CoPI), T.M. Low (CoPI), M. Franusich (CoPI)
PostDocs:
PhD Students: Guanglin Xu
Engineers:
Alumni: 



Overview
Since the inception of compiler research the Holy Grail has been to devise a system that provides high level abstraction (programmers express their intent as concisely as in an algorithms textbook), and an automatic system that translates these programs or specifications into executables targeting an everevolving landscape of platforms, extracting closetooptimal performance on all these platforms. The original FORTRAN compiler got close to the goal (a necessity for its adoption) on machines of the day and for relatively simple programs. Unfortunately, everincreasing hardware complexity has swept away this achievement and today we are farther away from the vision than ever. The SnowWhite effort addresses this problem aiming to sketch a potential path to a longterm solution. SnowWhite shows how program understanding beyond classical compiler analysis is key and requires a novel AI approach. 







The prototype SnowWhite system was developed in the PAPPA program. The system is available under a BSD style permissible license on GitHub, and documented at https://github.com/spiralsoftware/pythonpackagesnowwhite. At the core SnowWhite adds a new AI approach to compilers: It introduces high level reasoning to orchestrate the complex components and enables the systems to “understand” the computation much as human experts would do. Furthermore, SnowWhite utilizes a number of technologies that have proven essential: 1) domainspecific languages (DSLs), 2) the idea of telescoping languages (libraries as language components with known semantics), 3) justintime compilation (JIT), 4) automatic performance tuning (autotuning), and 5) program synthesis or program generation. The result is a feedback system that finds a closetooptimal mapping of an entire application built from components drawn from multiple domains across a range of challenging target platforms. 



System Description
The prototype system consists of the following components. 











Input Language
SnowWhite’s input programs are singlethreaded, single address space Python/NumPy programs that follow an objectoriented paradigm and are implemented relative to a SnowWhite class library. They concisely and cleanly implement the user’s algorithm as a high level program that acts as a specification. In fact, such a program is an executable specification for programs where the mathematical semantics of the used NumPy objects and functions is known. SnowWhite defines the mathematical semantics of a sufficient subset of NumPy to cover the target application domains (realtime processing and physical simulation) in machinereadable form as semantics definition modules. 

Frontend
SnowWhite’s frontend is a PythontoSPIRAL parser and analysis stage that converts the Python program fragments based on the SnowWhite object library and supported NumPy components to the SPIRAL high level input IR. The result is a SPIRAL script and expression that represents the Python program fragment and is the input to the formal reasoning system. The expression is then implemented through the SnowWhite system as a native library for Python that leverages the target’s high performance features such as GPUs and multiple nodes. The code which was originally implemented as sequence of NumPy calls is replaced by a call to the inserted native library. The SnowWhite analysis stage includes a sophisticated data flow analysis enabling interprocedural semantic analysis and crosscall and crosslibrary optimization via an algorithm detection and promotion framework.








HighLevel Reasoning System
SnowWhite introduces a new rule system for the core SPIRAL system. This component bootstraps the SPIRAL base for the target application domains based on the semantics definition of the mathematical library (NumPy), adding NumPy data structure and operation abstractions necessary for the target applications, and a general framework to add further functionality as needed. Further, a promotion rule system was added that detects wellknown patterns that require multiple NumPy library calls but are logically a single mathematical operation. The prime example is to promote a sequence of FFTpointwise multiplicationinverse FFT into a convolution operation that then can be reexpanded depending on the target platform. This in turn allows the SnowWhite via the SPIRAL rule system to reason about NumPy based programs across the range of SPIRAL supported hardware platforms, bringing it all together. The semantics definition of a set of NumPy components and the necessary Python objectoriented library to streamline user programs was a key effort in our PAPPA project. 

Backend
The SnowWhite prototype system generates high performance native code for the target platform. A range of CPUs and GPUs in the context of massively parallel (distributed memory/MPI) systems were the main targets. SnowWhite leverages and extends the SPIRAL multitarget backend to support Intel x86 (with SSE/AVX/AVX512) and IBM’s POWEWR9 (with VSX) CPUs as well as Nvidia (CUDA) and AMD (HIP/ROCm) CPUs. This requires to manage multinode and multiaddress space Python programs to present a single addressspace, single threaded abstraction to the user, while not paying too much overhead. While in PAPPA Python/NumPy was the input, this is an instance of the larger SnowWhite/SPIRAL infrastructure that supports a range of input languages. In particular, the DOE ExaScale effort FFTX, the DPRIVE effort NTTX [link to NTTX when available] and the GBTLX effort all utilize a C++ frontend similar to SnowWhite’s Python/NumPy frontend. 







Example Library
As part of PAPPA and in collaboration with the FFTX DOE ExaScale Project we developed a set of examples implemented as small Python programs. They are based on SnowWhite’s object library and demonstrate how NumPy should be employed to cleanly convey the semantics of the algorithm to the system. Key is the simplicity and cleanness of the algorithm and its implementation. The goal is not to provide a performanceoptimized, hardtounderstand implementation but the shortest possible and most concise implementation that captures the full complexity of the problem and algorithm without introducing performance optimization related artifacts that pollute the code. The SnowWhite package provides a range examples.


Documentation
The full SnowWhite system and all documentation is made available via spiral.net. This includes the core system, examples, documentation and scientific papers.
