Table of Contents

FPGA Architecture for Computing

Field Programmable Gate Arrays (FPGAs) have been undergoing a dramatic transformation from a logic technology to a computing technology. Notwithstanding the rapid progress we have seen, there are still much untapped opportunities in FPGAs’ full computing potential. FPGAs are “programmable” devices like processors, but we have not seen FPGAs deployed, even for compute applications, with dynamism anywhere near processors. Research is needed to fully develop and exploit FPGAs’ programmability in a new usage modality divorced from FPGAs’ ASIC legacy. In this project, we ask the question: what should a future FPGA—intentionally architected as a computing substrate—look like. We are working towards answers that bring forth the FPGAs' undertapped capability in programmability and dynamism through investigations in new methodologies, usage modalities, and architecture.

Funding for this work has been provided, in part, by the National Science Foundation (CCF-1012851), Intel ISRA, and SRC/JUMP. We thank Altera, Xilinx and Bluespec for their donation of tools and hardware.

Please follow continuing developments at the Intel/VMware Crossroads 3D-FPGA Academic Research Center jointly supported by Intel and VMware.

Students

Efforts

Programmable and Dynamic Computing Deployment

area-time.jpg

We are developing a dynamic partial reconfiguration (DPR) runtime environment to expand the dynamism and shareability of an FPGA in the domain of realtime, interactive computer vision applications. DPR allows one region of the FPGA logic fabric to be reprogrammed without interfering with the operations of the remaining regions. Thus, one could partition and manage an FPGA logic fabric as multiple DPR partitions that can be independently reconfigured at runtime. Making use of DPR, our runtime environment can support multiple vision pipelines to dynamically share an FPGA logic fabric both spatially and temporally. This enables more efficient provisioning and utilization of FPGA logic resources in a realtime and interactive setting when the applications and their requirements cannot be anticipated ahead of time. We have demonstrated a time-shared usage mode where the FPGA logic fabric is time-multiplexed by multiple vision processing pipelines within the timescale of individual video frames (10s of msec) (Nguyen, FPL2018). We have conducted benchmarking to quantify the benefits of dynamic FPGA mapping with DPR over static FPGA mapping, in terms of area/device cost, power and energy (Nguyen, FPL2019). While our work so far has focused on realtime vision processing, we are working to extend this dynamic approach to other computation applications and patterns.

To fully exploit FPGAs' power of programmability and dynamisim, FPGA designers need to adopt a new design perspective distinct from traditional ASIC thinking. DPR can be applied as a design optimization technique unique to FPGAs. FPGA designers have traditionally shared a similar design methodology with ASIC designers. Most notably, at design time, FPGA designers commit to a fixed allocation of logic resources to modules in a design. At runtime, some of the occupied resources could be left under-utilized due to hardto-avoid sources of inefficiencies (e.g., operation dependencies, unbalanced pipelines). With DPR as an optimization, FPGA resources can be re-allocated over time to reduce under-utilization with better area-time scheduling. (Nguyen, FPL 2020)

Service Oriented Memory Architecture

service.jpg

Prevailing memory abstractions and infrastructures that support FPGA application development continue to rely on the classic notions of loading and storing to memory address locations. Why should we limit our thinking to such a low-level, explicit paradigm when developing computing applications on an FPGA. Just like in software development, FPGA application developers should be supported by high-level abstractions that encapsulate in-memory data structures and meaningful operations on them. Under a Service-Oriented Memory Architecture, compute accelerator logic interacts with abstracted “memory objects” using a rich set of commands. The implementation complexities of the memory objects and operations, as well as the bare-mental-level memory interface, are hidden from the compute accelerator developers. The support for these memory objects and operations, realized as soft-logic datapapth modules, can be specialized to an application domain and can be provided in a reusable and extensible library. We currently are working on a proof-of-concept prototype environment and application demonstrators. (Melber, FPL 2020)

Network Function Acceleration

pigasus.jpg

We begin our investigation in NF acceleration by studying FPGA acceleration of Intrusion Detection System (IDS) and Intrusion Prevention System (IPS). (Zhao, OSDI 2020) Today’s state-of-the-art software intrusion detection systems (IDS) can process about 1 Gbps per high-end CPU core. This is an untenable starting point for an intrusion prevention system (IPS) that can operate inline with today’s 100Gbps links and respond in time to stop a malicious packet from propagating. In Project Pigasus, we are accelerating a 10K-rule IPS to 100 Gbps on one Stratix-10 MX FPGA. The effort required a very different design mindset from the software approach. We have been able to achieve 100Gbps by making effective use the Stratix-10 MX FPGA’s fast on-chip SRAM. For TCP reassembly, Pigasus uses dynamic allocation to compactly store packets in on-chip SRAM. Pigasus implements multistring “fast pattern” matching using small SRAM hash tables combined with another light-weight filter. In the current system, only the (not-yet-accelerated) full matching stage is offloaded to CPU cores in a stateless fashion.

We are looking to generalize the framework and components from the Pigasus IPS effort to other in-network compute opportunities. Future work is also to create a high-level domain-specific NF programming framework for use by networking experts who are not RTL experts. This is joint work with Justine Sherry and Vyas Sekar.

Complete opensourced code is available on github if you are interested in reproducing or reusing Pigasus IPs.

CoRAM (Classic)

Our investigation into FPGA architecture for computing began in 2009 with the question: how should data-intensive FPGA compute kernels view and interact with external memory data. In response, we developed the original CoRAM FPGA computing abstraction. The goal of the CoRAM abstraction is to present the application developer with (1) a virtualized appearance of the FPGA’s resources (i.e., reconfigurable logic, external memory interfaces, and on-chip SRAMs) to hide low-level, non-portable platform-specific details, and (2) standardized, easy-to-use high-level interfaces for controlling data movements between the memory interfaces and the in-fabric computation kernels. Besides simplifying application development, the virtualization and standardization of the CoRAM abstraction also make possible portable and scalable application development.

coram.jpg

Publications

Demo and Downloads