by Jan van Lunteren and Christoph Hagleitner
Network intrusion detection systems and emerging analytics applications for business intelligence rely on fast pattern-matching functions to scan data in real time. This type of scan operation has become increasingly challenging because of the growing number of regular expressions that have to be supported (e.g. to detect new types of intrusions) in combination with the continuous increase in network link speeds, which currently are on the order of multiple tens of gigabits per second.
At IBM Research – Zurich, together with our colleagues in Haifa and the US, we have designed a new programmable pattern-matching accelerator called RegX that is capable of simultaneously scanning thousands of regular expressions against multiple data streams at wire-speed for state-of-the-art network links. Millions of active streams can be scanned in an interleaved fashion by storing and retrieving the complete scanner state when switching between streams. The accelerator supports dynamic and incremental modification of the internal data structures, enabling updates to the pattern set without interruption of the scan operation.
The RegX accelerator combines the simple processing model of deterministic finite automata (DFA) with non-deterministic automata (NFA)-like mechanisms to realize high scan rates and obtain excellent storage efficiency. The latter is particularly important for dealing with the well-known “state-explosion” problem that occurs when certain combinations of regular-expression patterns with specific overlap conditions are mapped on the same DFA, resulting in extremely large data structures. One of these mechanisms is a lane concept that allows a flexible allocation of available scanner resources (programmable state machines) to a time-interleaved processing of multiple streams, and enables combinations of complex patterns that can cause state explosion to be separated by distributing them over multiple engines (a lane in this context comprises all the resources that are assigned to the scanning of a single data stream). A second mechanism is based on the extension of state machines with dedicated processing elements that allow the splitting of (individual) complex patterns into smaller pieces that can be scanned independently.
The micro architecture of the accelerator provides a range of flexible dispatch and execution options, allowing instructions to be executed, for example, upon the occurrence of a specific character or class of characters (e.g., a digit) in the input stream, upon the detection of one or multiple (sub)patterns, or upon fulfillment of a certain condition (e.g., for testing the distance between two detected (sub)patterns). Instructions are typically based on simple hardware primitives that are executed in a single cycle, and are exploited in a RISC fashion by an intelligent compiler stack to obtain high scan performance. RegX employs a special cache concept that is adapted to the characteristics of pattern-matching workloads to obtain hit rates of 99% and higher. This concept involves an L1 cache that is partially managed in hardware in a conventional way and partially managed by a software application that exploits hardware-based profiling to optimize the placement of the cache lines.
RegX has been implemented as part of the IBM Power Edge of NetworkTM (PowerEn) system-on-a-chip (SOC) in 45-nm SOI technology, consuming 15mm2 out of the total chip area of 410mm2 and running at a clock frequency of 2.3 GHz. It has been integrated as a processor-bus-attached accelerator providing high-speed pattern-matching functions to applications executed on general-purpose cores as shown in Figure 1. A unique feature of RegX that sets it apart from other pattern-scanning engines is its unparalleled memory efficiency that allows it to handle much larger and more complex regular-expression sets at a constant scan rate that is independent of the input characteristics when operating out of dedicated memory (128 KB per scan lane, 512 KB in total). This renders RegX less susceptible to denial-of-service attacks than other scanning accelerators. The accelerator design achieves a peak scan rate of 9.2 Gbit/s for single streams, and a theoretical aggregate scan rate of 73.6 Gbit/s when processing 8 streams in parallel. The latter cannot be entirely reached on the PowerEnTM implementation due to limitations on the data transfer bandwidth to the RegX accelerator, which reduces the peak scan rate to about 50 Gbit/s.
The RegX accelerator supports pattern collections that exceed the size of the dedicated memory, by storing part of the data structures in main memory, which, however, involves a significant access penalty on the order of 400 cycles. The hardware/software managed cache concept guarantees in this case a graceful degradation of the scan performance. This is illustrated in Figure 2, which has been adapted from  and shows measured scan rates for configurations involving between one and four scan lanes and for pattern sets of varying sizes. The initial flat portions of the graphs correspond to the case that the (compiled) patterns fit into the dedicated RegX memory, resulting in a constant scan rate. The “graceful” declines in scan rates occur when the number of patterns exceed the size of the dedicated memory and a part of the data structure has to be stored in the main memory. By increasing the number of scan lanes that process a given input stream, the total amount of dedicated memory that is available for a scan operation can be increased, so that larger numbers of patterns can be scanned at a constant flat rate. The latter, however, comes at the cost of a reduced aggregate scan rate because fewer different data streams can be scanned in parallel when multiple scan lanes operate on a single stream. For typical network intrusion detection workloads, scan rates were measured on the PowerEnTM hardware in the range from 15 to 40 Gb/s.
 J. van Lunteren et al: “Designing a Programmable Wire-Speed Regular-Expression Matching Accelerator,” Micro-45, Vancouver, CA, December 2012, pp. 461-472
Jan van Lunteren, IBM Research GmbH, Switzerland