OpenStack is, without doubt, an exciting project and the leading open source infrastructure as-a-service platform. In the last couple of years, I’ve had the privilege to architect and deploy dozens of OpenStack clouds for multiple customers and use cases. Over the last year, I’ve been working on use cases with high-performance computing (HPC) on OpenStack.
In this post, I’ll offer some considerations about hosting high performance and high-throughput workloads.
First, let’s start with the three types of architectures that can be used when hosting HPC workloads on OpenStack:
- Virtualized HPC on OpenStack
In this architecture, all components of the HPC cluster are virtualized in OpenStack
- Bare-metal HPC on OpenStack
All components of the HPC cluster are deployed in bare metal servers using OpenStack Ironic
- Virtualized head node and bare-metal compute nodes
The head node (scheduler, master and login node) are virtualized in OpenStack and the compute nodes are deployed in bare metal servers using OpenStack Ironic
Now that you have an overview of the three types of architecture that can deploy HPC software in OpenStack, I’m going to discuss a few OpenStack best practices when hosting these types of workloads.
For the networking aspect of OpenStack, there are two recommended configuration options:
- Provider networks: The OpenStack administrator creates these networks and maps them directly to existing physical networks in the data center (L2). Because of the direct attachment to the L2 switching infrastructure, provider networks don’t need to route L3 traffic using the OpenStack control plane, since they should have an L3 gateway in the DC network topology.
- SRIOV: SRIOV/SR-IOV (single root input/output virtualization) is recommended for HPC workloads based on performance requirements. SR-IOV enables OpenStack to extend the physical NIC’s capabilities directly through to the instance by using the available SRIOV NIC Virtual Functions (VF). In addition, support for IEEE 802.1br allows virtual NICs to integrate with, and be managed by, the physical switch.
- It’s important to mention that in tests conducted by various vendors, results show that SR-IOV can achieve near line-rate performance at a low CPU overhead cost per virtual machine/instance.
- When implementing SRIOV, you need to take into consideration two essential limitations: not been able to use live migrations for instances using VF devices and bypassing OpenStack’s security groups.
For an HPC architecture, there are two major storage categories to consider:
- OpenStack storage: image (Glance), ephemeral (Nova), and volume (Cinder)
- HPC cluster file-based data storage: Used by the HPC cluster to store data
Based on both categories, here are a couple of recommendations to consider while designing your cluster:
- Glance and Nova: For the Glance and Nova (ephemeral) storage, I recommend Ceph. One of the significant advantages of Ceph (besides the tight integration with OpenStack) are the performances benefits that you may obtain at instance creation time that image copy-on-write offers with this back end. Another advantage for the ephemeral workloads (not using SRIOV in this case) is the ability to live migrate between the members of the compute cluster.
- Cinder: For the Cinder backend in this HPC use case, I recommend Ceph (same benefits apply from the previous point) and NFS/iSCSI backends like NetApp, EMC VNX and similar systems with supported cinder drivers.
HPC cluster file-based data storage:
Common used parallel file systems in HPC, like Lustre, GPFS, OrangeFS should be used by accessing them from dedicated SRIOV/Provider networks. Another recommended back end will be Ceph, also providing the access directly from the SRIOV/Provider networks for better performance.
In general, Ceph offers a very flexible backend. A well-architected Ceph cluster can benefit multiple types of workloads in different configurations/architectures, for example:
- Ethernet-based connectivity could benefit performance by higher throughput NIC interfaces for front end and back end storage traffic (10/25/40/50/100 Gbps), plus LACP configurations that could double the amount of bandwidth available
- Storage servers components could be a combination of NVMe, SSD, SAS and SATA drives. Tailored to provide the required performance IO-wise
- The distributed nature of the technology provides a flexible and resilient platform
The next thing to consider will be to automate the deployment of your HPC application on OpenStack. For that, multiple tools can be used: Heat, Ansible, or API calls from an orchestrator system.
Happy HPC on OpenStack hacking!
About the author
Julio Villarreal Pelegrino is an enterprise architect and systems engineer with over 15 years of experience in the software and IT industry. He’s currently a chief architect in the emerging technology practice at Red Hat. This post first appeared on his blog.