Cambridge University opens new horizons with software-defined telescope, bioinformatics research

One of the world’s oldest universities is looking to the future with OpenStack.

The alma mater of Isaac Newton and Francis Bacon is not resting on its laurels. From the mid-20th century, the University of Cambridge has pioneered the use of computing in fields like radio astronomy and biology. More recently, the University has driven the use of commodity x86-64 HPC clusters to support a wide research base across many scientific disciplines.

The research university, founded in 1209, now has its sights set on continuing this endeavor on the frontiers of space and translational medicine. OpenStack has an opportunity to play a big role in these potentially life-changing discoveries.

Jonathan Bryce and I recently had a chance to visit the campus and meet Dr. Paul Calleja, deputy director, research and institutional services, whose team provides research computing services for the university while also supporting research projects with high performance computing (HPC) and big data solutions. They currently have 665 active users across nearly 200 research projects, growing at a rate of 40 percent per year. Their approach is to design, implement and support their own solutions rather than getting locked into inflexible, purpose-built hardware. OpenStack will be tapped for a variety of use cases, including research computing and storage infrastructure as a service, application development, HPC as a service and HDPA as a service (such as hadoop and machine learning infrastructure).

Calleja’s team was attracted to OpenStack from both an operational perspective, as well as a research perspective. OpenStack and cloud computing bring the promise of more flexible architectures and more accessible data, meaning greater collaboration and faster time to results. They can also support a wider variety of use cases with a common platform and commodity hardware, helping with cost management and giving them access to a larger talent pool.

Stig Telfer and John Taylor work for a consulting firm, StackHPC, that has been working with the University of Cambridge to document use cases and requirements, as well as work within the OpenStack community to drive forward development. Telfer co-chairs the Scientific Working Group, established in April at the Austin Summit, along with Blair Bethwaite, senior HPC consultant at Monash University.

The Cambridge team briefed us on two major projects they are looking for the OpenStack community to help support that will push the bounds of scalability and HPC requirements over the next eight-10 years. At the OpenStack Summit Barcelona, the University of Cambridge will share more about their projects in the Tuesday morning keynote, “Smashing particles, revolutionizing medicine and exploring origins of the galaxy.”

Harnessing big data for health

Dr. Lydia Drumright, university lecturer in clinical informatics, also spoke to us about her vision for correlating gene sequencing insights with clinical patient data to deliver real-time recommendations to doctors. For example, in Iowa, Dr. John Cromwell has created a model to reduce post-op infections by 58 percent by changing treatment in real-time based on models created from patient data to predict post-operative infection.

Interestingly, the cost of gene sequencing has dropped dramatically over the last few years. While the cost of computing has also come down, the lower cost of genome sequencing has provided so much genomic data that is has become prohibitive to analyze and correlate it all. And while gene sequencing data on its own delivers insights, it’s even more powerful when you can annotate the sequenced genes (essentially add metadata), which can be correlated with real clinical patient data for real-time intervention into patient care.

Ignacio Medina, head of the computational biology lab, shared some of the tools he’s developed to analyze this data and deliver dashboards to researchers and clinicians. He started an open source project at OpenCB.org, which currently has 15 contributors and is looking to OpenStack to support his computational needs. There are also significant ethical considerations with this research and Drumright has spent considerable time working on an ethical governance framework for managing patient data in the OpenStack cloud. This recently launched project will continue to expand, because Drumright said she could be tracking up to one million data points per patient.

Deep space diving

Looking a bit farther into the future (and more pertinently the past), Cambridge physics professor Paul Alexander are helping drive forward the Square Kilometer Array (SKA) project, a software-defined telescope which will allow us to see deeper into space than ever before, exploring the origins of galaxy formation and dark matter, the nature of black holes and exploring fundamental building-blocks of life from extraterrestrial radio signals. The physical receivers will be located in Western Australia and South Africa, with a large cloud for analyzing data in Regional Science Centres (RSC) across the globe. It’s backed by numerous countries with a €650-million euro budget for the initial construction phase, expected to be completed and online in 2023.

The initiative presents a huge big data and computing challenge. The initial 110,000 individual antennas and 2,700 dishes across 3,000 square kilometers are expected to produce 5,000PB per day. Because power is so expensive in the desert, the goal is to whittle down the raw data to 50PB per day and then transfer it as quickly as possible over fiber to the Science Data Processor processing facilities in Perth and Cape Town. All in all, they will need to build a 250PetaFLOP system to analyze and store the data, and are looking to OpenStack as a framework to support thes computing power locally, as well as potentially supporting RSCs too.

The SKA team is examining other open source technologies, such as big data frameworks like Spark, which offers us a big opportunity to collaborate with those communities around a common goal. Ultimately, for economics, flexibility and speed, the SKA team wants to rely on a distributed system on commodity gear, not a converged appliance. OpenStack has a huge opportunity to power this research, but the community will need to continue to incorporate recommendations of the Scientific Working Group, and others, as we plan the development road map over the next few years.

Isaac…err OpenStack Newton

img_0489-1 — Left to right: Lauren Sell, Stig Telfer, Jonathan Bryce, Dr. Paul Calleja, John Taylor standing with the first edition of Isaac Newton’s “Principia.” With thanks to the Master and Fellows of Trinity College, Cambridge. With thanks to the Master and Fellows of Trinity College, Cambridge. // No resale rights

To wrap up the visit, the team at Cambridge brought us to the library at Trinity College, where they house several of Isaac Newton’s belongings, including Newton’s first edition of “Philosophiæ Naturalis Principia Mathematica,” published in 1687, with his hand-written notes and corrections. We had the opportunity to take photos with the book and tour the College, a chance to be inspired by the rich history of academia and research and consider how OpenStack can play a part in future discoveries. The SKA and bioinformatics projects are massive in scope, but they represent only a small portion of the innovations and possibilities, and challenge us to help achieve these ambitious goals by building the cloud platform that enables them.

Tags: HPC, Newton