by Ton Engbersen
The worldwide community of Radio-Astronomy has envisioned building a very large, highly sensitive radio telescope partly in South Africa and partly in Australia by 2020. The total effective area of this radio telescope should approach one square kilometer and therefore it is called the Square Kilometre Array (SKA). The SKA instrument is expected to generate Exabytes of data per day which need to be processed and reduced, such that approximately 1 Petabyte per day is left to be stored for later use by Radio Astronomers.
Current expectations for the SKA are that the low frequency array (70 – 450 MHz) and the initial mid frequency ( 450 – 3000 MHz) will each comprise about 500,000 antenna elements while the high frequency array ( 3 – 10 GHz) will consist of approximately 3000 dishes. A quick calculation assuming no beamforming before Nyquist sampling results in 3.5 1015 samples/s or 300 ExaSamples per day (assuming 24 hour operation). Processing this is clearly beyond the capabilities of even the fastest supercomputers one can envision by 2020. The streaming and real-time nature of the SKA makes it unlikely that supercomputers are ideally suited for this application, like in LOFAR . A significant research and development effort is therefore needed. For IBM, with our focus on future Big Data and Big Data analytics, this is a highly interesting field of research: it promises to make analytics low cost and energy efficient. We have named the project DOME after the protective astronomical telescope covering.
A five-year, 33 million Euro project has been defined between IBM Research – Zürich and ASTRON, funded by the Dutch Ministry of Economic Affairs, Agriculture and Innovation and the Province of Drenthe, The Netherlands. The objective is to investigate novel exascale computing technologies and concepts, with a focus on energy-efficient data processing, data storage, and nano-photonics at a fundamental level. In addition, the DOME project will collaborate with Small and Medium Enterprises and other academic partners in the Netherlands to stimulate economic activity through supporting the development and testing of new high-performance computing applications.
In DOME, seven research tracks are defined:
1. Algorithms and Machines: The goal is to design a whole-system bounds framework enabling system-design space exploration in the early phases of the SKA implementation and thus guide the design decisions for platforms which will hold future exascale systems. A methodology already in development in the IBM Laboratory in Zürich forms the basis: analytical models and equations tie application properties, device technology and compute architecture trends together to arrive at predictions of performance, power and hardware cost.
2. Access Patterns: The SKA will generate approximately one Petabyte per day, data which will need to be kept on storage media and made available for future analysis and distribution. New storage technologies as well as very low power storage technologies (magnetic tape) will be investigated and through the – hopefully – automatic learning of the system about typical usage patterns of this radio astronomy data, the system can autonomously decide on which storage tier the data will be stored, and moved when its access is anticipated.
3. Nano-photonics: Transport of data will remain a major cost factor in the SKA system. A particular focus will be put on the processing of signals in the optical domain.
4. Micro servers: Through carefully selecting the appropriate computing hardware and energy-efficient peripheral hardware, this work-stream tries to pack as much computing power in as small an area as possible – under severe energy limitations.
5. Accelerators: This work-stream will address questions around what makes an architecture energy-efficient, and easily programmable.
6. Compressive Sampling: capture and processing of analog signals is traditionally done in 2 steps: sampling and compression. Usually sampling is done at the Nyquist frequency, followed by often lossy compression. Why sample at this high frequency to then discard samples?
7. Real-Time Communications: The objective of this work-stream is to create a computing architecture able to real-time process high-bandwidth data motion and compute intensive workloads on an Exascale-class system.
These seven work-streams will be performed in close cooperation between ASTRON and IBM in the ASTRON & IBM Center for Exascale Technology, Dwingeloo, The Netherlands and the IBM Research Laboratory, Zürich, Switzerland. We expect to achieve exciting results in the area of exascale computing, applicable not only to SKA and radio astronomy but also to Big Data analytics. After all, isn’t the SKA the ultimate Big Data analytics challenge?
Figure 1: The Astronomical Data Deluge
 M. de Vos, A.W. Gunst, R. Nijboer: “The LOFAR Telescope: System Architecture and Signal Processing”, in proc. IEEE, vol. 97, Issue 8, pp 1431- 1437, DOI 10.1109/JPROC.2009.2020509, 2009, IEEE Journals & Magazines
 P. Stanley-Marbell, V. Caparrós Cabezas, R. P. Luijten: “Pinned to the walls - impact of packaging and application properties on the memory and power walls”, in IEEE International Symposium on Low Power Electronics and Design (ISLPED’11), pp 51–56, Fukuoka, Japan, 2011
 J. W. Romein: “An Efficient Work-Distribution Strategy for Gridding Radio-Telescope Data on GPUs”, in ACM International Conference on Supercomputing (ICS’12), Venice, Italy, 2012
Ton Engbersen, IBM Research GmbH, Switzerland