This week astronomy twitter journal club (Thursday, 20:10 GMT) is tackling a pressing problem – how can astronomy avoid drowning in data? More is better, right? Bigger telescopes and bigger surveys are both undoubtedly good things, but to make the best use of these advances we need to be able to handle the corresponding increase in data flow, and subsequent pressure on the astronomical archives which are going to have to cope with it.
This ‘data tsunami’ is almost upon us, according to this week’s paper by G. Bruce Berriman and Steven Groom. The recent addition of large datasets from the Spitzer and WISE telescopes has massively increased queries to the online Infrared Science Archive (IRSA), and, unsurprisingly, slowed down the response time of the database. This is only going to get worse as the archive’s growth is expected to accelerate over the next few years.
The paper also points out that how astronomers use archives is going to change. At the moment, raw datasets are typically downloaded and then reduced on a user’s own computer. However, once data reach peta-byte scales it’s likely that they’ll have to be handled in situ, if only to avoid breaking the internet.
So what can be done? And, more importantly, can we do whatever we’re going to do in as cheap a way as possible? Firstly, we need better ways to search multiple online datasets efficiently – the excellent Virtual Observatory is already developing techniques to help here.
Next, we need to explore new technologies like cloud computing. The Square Kilometre Array (which will generate 10 gigabytes per second) will have theSkyNet, the (worryingly named) community based cloud which will harness the power of volunteers’ computers to process its data.
Finally we need to talk more, especially to IT experts in computer infrastructure, and then share what we’ve learned in the authors’ proposed new journal dedicated to information technology in astronomy. We then need to properly reward the effort people put into this area, as well as giving young astronomers a grounding in software engineering to better prepare them for this data-heavy future.
Do you agree? Do we need to do all this to rise above the coming flood, or are we going to cope just fine as we are? Join in on Thursday and let’s talk about it.