Cover Story – Astronomical data challenge

6 mins read

When it's operating, the Square Kilometre Array will generate 123Tbit/s. How does the project plan to handle this volume of data?

From the earliest days of the human race, we have stared at the stars and wondered what it's all about. Pioneers such as Galileo have lifted some of the veils, but it's only recently that we have started to develop a deeper understanding of 'life, the universe and everything'.

But even though we know more about the universe, there are many questions still to be answered. And the Square Kilometre Array (SKA) is set to provide vast amounts of data to help scientists answer such questions as: is Einstein's Theory of Relativity still valid?; what happened just after the Big Bang?; what is dark matter?; and are we alone?

The name of the project reflects its scale; the ambition is to create a system of distributed radio telescopes whose collecting area totals 1sq km, or 1million m2. Achieving that will require, potentially, thousands of telescopes and these will be scattered primarily across Southern Africa and Australia. To say that transmitting and processing the data collected will be a challenge is something of an understatement. Dr Keith Grainge, who is leading the Science and Data Transport Consortium within the SKA project, said it is anticipated the array will generate 123Tbit/s as a constant stream. "To put that into context," he said, "the entire internet in 2012 was believed to carry 80Tbit/s."

Construction of the SKA has yet to start. When ground is broken in 2018, if all goes well, it will be the next step in a process which began in the 1990s. SKA architect Tim Cornwell said: "I ran a conference in 1990 when we began to talk about the idea of the SKA. It was a gleam in someone's eye."

The SKA will, in fact, be three telescopes. Head of the SKA project Alastair McPherson said: "There will be an array of dishes in Southern Africa, another array in Australia and an array of low frequency dipoles, also in Australia. We are now in the pre construction phase and heading through a preliminary design review, which should be completed by March 2015. We'll then move into detailed design and our plan is to produce construction proposals by the end of 2016." If the milestones are hit, the SKA should be ready to start operations in 2022.

Such ambition carries with it an equally large price tag. Phase 1 of the SKA project has a budget of €650million and McPherson implied that belts have already needed to be tightened. "There has been a huge amount of effort, but we have realised that costs needed to be under control and that we would have to 'rebaseline'.

"There will be a bit of pain attached, but we still don't know what the costs will be. However, we will be able to produce a telescope which will be transformational."

Part of the project will, inevitably, involve the development of new technology. McPherson noted: "We are trying to restrict technology development to the areas where it is necessary. We are trying not to make things too complicated, but there will be big steps forward in areas like correlators and data processing and those are areas where we're spending a lot of time."

Cornwell added: "We have to understand where vendors are headed and then to work according to their road maps. We need to know where companies like Intel and Nvidia are taking their technologies. While we're not expecting miracles from them, we are expecting improvements in areas such as cost and power consumption."

In the initial phase of the SKA, there will be 254 antennas spread over distances of up to 150km. Each antenna will be equipped with a very sensitive receiver and will digitise the signals it captures before sending the data to a central location.

Illustrating the complexity of the project, Cornwell said: "Radio telescopes do correlation; they compare one data stream with every other. This will result in a large number of cross correlated data streams and requires a special device to perform the operations. This will be FPGA based.

"It's a virtuous circle; every time you add an antenna, you get another 250 combinations of data."

Cornwell admitted that, even after all the planning which has been done, the SKA project is still trying to understand the scale of the problem. "Most of the data processing will be concerned with volume. There will be a large number of data channels and hundreds of thousands of frequencies. Each has to be processed independently. We're trying to understand how expensive that might be."

Previous experience with radio astronomy has shown that you can get more than you pay for. "We have learned to get a lot of value from processing. For example, we can improve the images without processing and that means we can use fewer antennas than we might have needed," Cornwell continued.

But when the array is constructed, scientists are likely to 'see' things beyond their imagination. "One of the requirements for the array," Cornwell noted, "is that we should be able to 'see' something 30million times fainter than an object a few arc seconds away from it. It will be the most difficult thing we do."

Depending on the frequency selected, the SKA will have the largest window on the universe ever opened – at frequencies of less than 1GHz, it's likely the field of view will be some 200 square degrees. Because of these larger windows, the entire sky will be surveyed more quickly than could otherwise be done – thousands of times faster, says the SKA.

McPherson said: "Manchester University is leading the data transmission and timing work. Its job is to design the system – and to make sure it works."

And that's where Dr Grainge comes in. "Our role is to create three logical networks," he said. "Data transport, monitoring and control, and timing. We are also working on a set of other three issues: connectivity from receptors scattered over the desert back to the correlator; taking correlated visibilities to high performance computers in Cape Town or Perth and then to a science data processor; and distributing reduced data to astronomers."

According to Dr Grainge, the 254 receptors – antennas – in Southern Africa will each generate about 90Gbit/s. "That's our current estimate," he said. Transmitting that data is just one part of the job. "We also realised the network has to be built in such as way as to get rid of RF interference."

Many of the antennas will be close to the correlator – a high speed computer that will combine signals from the telescopes, then synchronise them. "Here, we can use standard switches," Dr Grainge explained, "sending data over 10km of single mode fibre. But we need a different scheme for antennas that are further out and some of the furthest may need some element of amplification."

He said that amplification would be used, rather than regeneration. "In recent tests, Australian researchers have run 100Gbit/s over 3000km using just amplification – and amplification is cheaper."

Data transmission is one challenge, synchronising it is another. "We have very big timing issues," Dr Grainge admitted, "and there are multiple requirements.

"Because the SKA will work as an interferometer, you have to ensure the receptors are phase coherent. It's equivalent to making sure you have a timing signal accurate to 1ps at each one.

"But we know that's possible," he continued, "because there's a system which already does this – the UK's E-MERLIN antenna array."

The SKA is thought to have similar timing constraints to E-MERLIN, but Dr Grainge is working on what he calls a 'subtly different' version which uses a 'there and back' approach to distribute phase.

"We send a carrier wave down a fibre, measure it and monitor the reflected signal. This allows us to see how the fibre changes with time; for example, when it heats up and expands. This will allow us to perform measurement and compensation in real time."

The plan is for each antenna to receive a 1GHz reference signal generated by a hydrogen clock maser. "We plan to do this by sending a 2GHz signal to the antenna, where a crystal generates a 1GHz signal and sends it back. Because the 2GHz signal is reflected, we will receive a 2GHz signal and a 1GHz signal and the phase change will be the same for both. We can then mix them and generate a 1GHz signal with all the fibre distortions removed."

But there is also the need to distribute absolute time and the plan is to use CERN's White Rabbit technology. "White Rabbit is accurate to 1ns, so it's not accurate enough to keep the antennas synchronised," Dr Grainge explained, "but we still need to know about absolute time. White Rabbit can do this and we need it because white light fringes from correlation need to be aligned."

A white light fringe is a coherence length effect which depends on the bandwidth of the signal being carried, Dr Grainge explained. "It therefore gives a maximum tolerable delay error between the data streams being correlated."

The phase and time systems are tied to a clock ensemble at the SKA's core. "There are three hydrogen masers," Dr Grainge noted. "A primary, a secondary and a third which allows you to compare the first two. If primary and secondary differ, the third should tell you which to believe."

Cornwell took a different approach to the timing issues. "If you want to image sources, you have to know what the time is. When it comes to pulsar timing, we're talking about a frequency drift of 10ns over 10years. Monitoring pulsars will help scientists to detect gravity waves – and that's Nobel Prize stuff."

So how does Dr Grainge anticipate transmitting such high volumes of data? "We've been talking with telecom companies about what their state of the art and cost effective data transmission systems will be in 2020; what's on the road map?

"From their responses, we think we're going to be using 400G transceivers, with 80 'colours' being sent down each fibre pair. When you multiply that, it comes out to 32Tbit/s down each pair, which means we can handle our 123Tbit/s need using four or five pairs."

The SKA will, in effect, be a huge laboratory. "Scientists will say how they want to do an experiment," Cornwell said, "we'll do it and put the results into a data archive. They will then access the data and do the science, but that's not just a matter of downloading data to a laptop," he concluded.