Accelerating Applications in Computational Finance

by John Barr

The computational power required to participate successfully in todays financial markets is growing exponentially, but the performance of commodity processors is not keeping pace. How can firms harness innovative technology to achieve success? Two financial and two technology issues are combining to force a radical rethink of how financial algorithms are computed.

First, the trading landscape is changing. The 'Markets in Financial Instruments Directive' (MiFID) regulation in Europe has encouraged the creation of a raft of new execution venues. Combined with the growth of automated and algorithmic trading, the amount of market data that must be processed - and the speed with which that must be accomplished in order to deliver profitable trades - means that at every stage of the trade lifecycle, high-performance computation is required to deliver low-latency results.

Second, in the aftermath of the credit crunch there is a requirement for more sophisticated risk calculations, both from trading firms themselves and from regulators. Overnight risk calculations are no longer sufficient, with intra-day risk runs tending towards real-time calculations, covering multiple departments and asset classes in addition to counterparty and liquidity risk.

On the technology side, processor clock-speed, which for many years advanced in line with Moore's Law, has stalled at around 3 GHz as chips running faster than that consume over 150 Watts. In addition, the pressure to reduce our carbon footprint forces us to explore alternatives in many areas of life, particularly in transport and in IT.

A first approach to dealing with these issues is to exploit multicore by parallelizing applications. While this is a good option, it is non-trivial and doesn't fix all of the issues raised above. Compilers will only automagically parallelize a very small class of applications; some will be unable to exploit multicore at all, while others will require a significant (possibly a total) rewrite. In addition, if we look forward five years or more we need to consider not just dual or quad core, but dozens of cores with heterogeneous architectures.

An alternative to this is to deploy specialist architectures. Many examples exist of appliances, driven by exotic processing technology, being deployed to handle one or more specific functions such as latency monitoring and analysis, market data feed handling, ticker plant management, message acceleration or risk analysis. The processors used include Field Programmable Gate Arrays (FPGA), Graphic Processing Units (GPU) and the IBM Cell Broadband Engine (a heterogeneous high-performance computing architecture used in the Sony Playstation 3 games console).

FPGAs, in which the personality of the chip must be programmed by the developer, are not as good for computationally intensive workloads but are well suited to the processing of streaming data. Examples of FPGA use in support of financial markets include:

Endace for latency monitoring
Celoxica for data feed handling
Exegy for ticker plant acceleration
ActivFinancial for market data processing
Solace, Tervela and Tibco for message acceleration
TS Associates and Vhayu for data compression.

FPGAs run at low clock speeds and deliver high performance through pipelining and parallelization. Due to this complexity, their use by mainstream developers is very limited, but their value in appliances where they can be programmed once - and used broadly - is significant. A single FPGA-based appliance can replace a rack of commodity servers, significantly reducing up-front cost, power and cooling charges, and saving potentially very expensive rack space in colocation facilities shared with stock exchanges.

Both GPUs and IBM Cell have been used successfully in performance experiments with financial applications. IBM Research has demonstrated excellent performance of computationally intensive risk analytics applications, and particularly good scaling results when increasing the number of Cell processors targeted. The InRush ticker plant appliance from Redline Trading uses the IBM Cell as an acceleration co-processor. Professor Mike Giles' group at Oxford University has developed a series of demonstrators for financial applications based on Nvidia CUDA GPUs with up to 240 cores, generally delivering between 50 and 100 times speed-up compared to a single Xeon core.

One obvious conclusion is that numerically intensive compute performance for financial applications running on future platforms will be delivered through the complexity of parallelism and heterogeneity. In order that the performance is accessible to a broad range of developers it is crucial that appropriate programming paradigms and tools, together with portable libraries and simple APIs, are developed. One example of this is the C library of mathematical functions from the Numerical Algorithms Group that supports the IBM Cell processor (in addition to many others). Another is the Fortran and C compilers from The Portland Group that support multicore x86 processors as well as GPUs from AMD/ATI and Nvidia. For multicore and other, more exotic processor technologies to be successful in meeting the performance needs of financial markets, new programming paradigms, languages, tools and libraries must follow.

Please contact:
John Barr
The 451 Group, UK
Tel: +44 7988 764900
E-mail: john.barrthe451group.com