OpenStack clouds for HPC environments are becoming an increasingly popular choice across the globe because of the obvious advantages promised by the framework, says Petar Forai of the Vienna Biocenter.
Forai, along with colleagues Erich Birngruber and Ümit Seren, recently gave a look into the “CLIP”(CLoud Infrastructure Project) project. The main goal of the project is to consolidate multiple independent computing environments and HPC infrastructures into one common platform suitable for a wide variety of academic computing use cases.
Forai is the deputy head of IT for the research institutes Institute of Molecular Pathology (IMP), Institute of Molecular Biotechnology (IMBA) and the Gregor Mendel Institute (GMI) at the Vienna Biocenter. The cloud platform engineering team of 14 is tasked with delivery and operations of IT infrastructure for 40 research groups or about 500 scientists. The IT department delivers a full stack of services from workstations, networking, application hosting and development, among other things, including HPC for the campus.
The current infrastructure is a “nightmare to manage,” Forai says: an expanse of siloed islands of infrastructure that can’t talk to each other with no way to automate across them. The future? A Slurm Workload Manager that adjusts to demand with a tightly connected grid of virtual machines and an OpenStack private cloud presiding over compute nodes. It took the team about two months to build a proof-of-concept, about eight to analyze how best to use OpenStack and about two months to move into production, though some of that work is still ongoing. Of the lessons learned, Forai says “OpenStack is not a product. It is a framework.” He also advises two or three OpenStack environments (development, staging, production in thier case) to practice and understand upgrades and updates. A final note consideration, from Forai’s point of view is that the out-of-box experience and scalability for certain OpenStack subcomponents is “not optimal” and should be considered more like a reference implementation.
The team details the deployment choices and offers an outline of the system architecture, taking a deep dive into:
Catch the hour-long session here.
Cover photo // CC BY NC