Naturalis catalogs 37 million artifacts on a private cloud to reduce costs, boost productivity.

Camiel Doorenweerd has a mystery to solve.

A PhD candidate and molecular biologist at Naturalis Biodiversity Center in the Netherlands, Doorenweerd doesn’t think of moths as pests that burrow in sweaters, but as an overlooked key to understanding biodiversity and helping promote global sustainability. To help him decipher the evolutionary history of thousands of species of tiny, leaf-burrowing moths, Doorenweerd and fellow scientists in Finland, France, Great Britain, Australia, the U.S., Taiwan and Japan rely on high-performance compute resources from the OpenStack private cloud.

Moth

Scientists at Naturalis—100 researchers on staff and over 200 guest scientists—had been conducting research on large desktop computers and assuming much of their own systems administration, but the developments in DNA sequencing, 3D imaging and geospatial technologies had supersized the scientists’ data sets and created a plethora of computing headaches for the Center. IT staff were frequently called to troubleshoot issues and mitigate risks and the time spent provisioning and maintaining these systems took away from research.

The Center has more than 37 million objects in their catalog, one of the largest collections in the world. Poisonous spiders, ancient lizards, petrified rocks and rainforest foliage live in a 60-meter (198-feet) tower that punctuates the skyline of Leiden, home to the Center. But as research methodologies changed and international collaboration became necessary to solve these incredible mysteries, the Center’s catalog desperately needed to be digitized to support its researchers.

Some digitization efforts had begun, but the images of plants, animals, stones andossils were quickly approaching one peta-byte of storage. Naturalis was also receiving increasing requests to access their information and needed a system that could efficiently scan, store and then process these data requests.

“Scientists relied on high-performance desktop systems that were unevenly distributed and expensive to maintain,” says Marc de Hart, ICT manager. “To facilitate world-class research, we needed to centralize and democratize resources.”

In early 2014, the Naturalis ICT department began exploring how they could democratize compute access and digitize the extraordinary collection. Both de Hart and his colleague Atze de Vries, a dev ops and systems administrator, were attracted to open source solutions. “In the recent past, we experienced massive scalability issues with proprietary hardware with inbuild functionality, once scale was an issue so was the investment needed. We decided to eliminate the vendor lock-in risk and replace it with a vendor lock-out principle,” de Hart says. The team down-loaded and installed OpenStack and, satisfied with its functionality, built a small prototype using a Mirantis-based reference architecture. The team then expanded the cloud to support multiple workloads, including the high-performance compute resources needed for research scientists, search and storage for the digitized catalog and web servers for meeting the data-sharing requests. The team adopted OpenStack Fuel to deploy and manage cloud components and Puppet configuration management to increase ease and reliability of system builds, tests and deployments.

By 2015, their OpenStack deployment had expanded to 30 nodes with five terabytes of RAM and had been moved to commercial data centers in Delft and Amsterdam in high-availability configurations. Ceph storage was scaled to accommodate the size of the digital catalog and LAMP-based web servers running OpenStack were added to the mix.

Today, Naturalis’ OpenStack deployment exceeds 70 nodes and enables research activities such as 3D scanning and DNA sequencing, which have allowed researchers to make discoveries including those about the species lineage of dragonflies and the global impact of those species. DevOps manager De Vries emphasizes that Open-Stack’s source code control has significantly helped define and maintain the consistency of Naturalis’ scientific research methods. The Center’s scientists now have faster and more uniform access to high performance research systems, and the freedom to quickly scale, return and rebuild shared private cloud services. The uniformity in access has helped standardize lab methodologies which has led to increased productivity across the Center.

Web servers for research collaboration can be provisioned in under 10 minutes using OpenStack Fuel. With increased resource flexibility, DNA sequencing and genetic referencing tasks that previously took weeks are now completed in days or even hours. “Adding machines to our OpenStack cloud to scale compute, memory and storage is really easy,” says De Hart. The organization has saved time, energy and money in the process. Dozens of crash-prone workstations have been replaced and daily operational activities of the 70-node private cloud are managed by a single admin. Expanded Ceph storage provides the entire organization 60-day file server backup, which is easier to use and cheaper than the previous service.

Naturalis’ great OpenStack success has led the IT team to build a highly available private cloud cluster to run the organization’s existing Microsoft-based financial, human resources and collaboration applications. The increased ease of application management and monitoring has motivated the IT team to look at putting additional back-office software on their OpenStack cloud.

Cost savings and productivity haven’t been the only sea change at Naturalis. The transition to an open-source solution inspired a cultural shift at the Center. Researchers, developers and managers have adopted GitHub for creating and sharing code. DevOps manager De Vries emphasizes that OpenStack’s source code control has significantly helped define and maintain the consistency of Naturalis’ scientific research methods. Molecular biologist Doorenweerd quickly gained an appreciation for Open-Stack’s capabilities and developed a program that automatically deploys specialize hylogenetic software, which aids researchers across the globe. His work, along with others from the Center, is offered as a repository on GitHub. The team has brainstormed how they can leverage OpenStack to enhance species correlation search engines and move legacy bio-medical research applications to OpenStack, assisting scientists around the world with their fight for sustainability and against species extinction.

At the OpenStack Summit Barcelona, de Vries shared how a single admin running 70 OpenStack nodes.

 

This article first appeared in the print edition of Superuser magazine, distributed at the Barcelona Summit. If you’d like to contribute to the next one, get in touch: [email protected]

Cover photo of Trix the T-rex provided by Naturalis, moth photo courtesy Camiel Doorenweerd.