Latest progress for the software ecosystem of Arm architecture, the challenges for landing Arm-based servers, and the experience of production readiness.

image

In recent years, the Arm server has been widely used, and the upper software ecosystem has also been greatly developed. In particular, some public cloud vendors have begun to provide cloud computing products based on the Arm architecture. Companies and organizations like Huawei, EasyStack, and Linaro have invested quite a lot of resources to speed up the application of the Arm architecture in cloud computing.

Next, we’ll share the latest progress for the software ecosystem of Arm architecture in the cloud computing area, the challenges for landing Arm-based servers, and the experience of production readiness.

Full-Stack enablement of Open Source Software

As shown in the above picture, the infrastructure layer is only a small part of the whole software stack, in order to make this layer functional and perform better on Arm platform, developers have also done a lot in the lower layer projects like OS, drivers and libraries. On the other hand, endusers also care about the software running on top of the infrastructure, whether they can run on Arm platform or not is one of the key considerations for users thinking about using Arm platform as infrastructure. In the next part, let us look at the enablement of Arm platform from a bigger picture.

In the open source world, the development pipeline is particularly important for upstream developers. Currently, most of the open source projects only have x86 based dev pipeline, so the whole development process and outcome is not very friendly to Arm users, there might be some extra works before running them on Arm platform:

In order to make Arm platform the first citizen in the open source world, the first step is to provide a development pipeline to the open source projects, so that developers have resources to develop and test on the Arm platform, and the outcome product will also be Arm-native:

With this idea, Huawei has newly enabled Arm CI in over 50 top open source communities, covering 6 major fields including Cloud & SDS, Big Data, Database, Web, Libs & Middleware, and AI. And there are quite a lot of open source projects that already supported the Arm platform, users now have multiple choices in those fields, readers can refer to Arm CI landscape for more details (https://kunpengcompute.github.io/arm-landscape/).

Besides enabling, Huawei has also done a lot in improving the performance of open source projects on Arm platform and close the feature gaps between x86 platform. For example, we have enabled CPU info observation and host CPU comparison on Arm platform in Libvirt, and currently working on using them to have better migration experiences in OpenStack.

With these enablements and improvements, Arm platform became more competitive and attractive for both users and upstream developers.

Linaro’s work on ARM64 cloud computing ecosystem

Linaro is an open source organization on the ARM64 ecosystem, which mainly focuses on upstream development and maintenance such as Linaro Kernel, toolchains, Android, and some specified areas such as data center. 

Linaro has deeply participated in the Open Infrastructure community through contributing to ARM64 OpenStack enablement, deployment(Maintain ARM64 Kolla images), and maintaining the ARM64 OpenDev CI resources. At the same time, the Linaro Developer Cloud has been set up based on totally upstream work on OpenStack and Ceph,  which can be not only tested members’ hardware in cloud computing but also help developers to use ARM64 resources. 

As the chart above shows, Linaro Developer Cloud is completely based on OpenStack and Ceph. Now it can offer VM/BM services based on Nova/Ironic as well as ARM64 Kubernetes Service based on Magnum support. The production level OpenStack cluster is deployed by Kolla container images, which is more flexible for operation and upgrading. A lot of upstream work were included for supporting this:

  • Nova/Ironic/disk-image-builder enablement and bugfix on ARM64
  • Devstack enablement support for ARM64 OpenStack
  • Kolla image build, Kolla-ansible deployment support, and bugfix for OpenStack version upgrading.
  • Magnum multi-arch support and K8s cloud provider support on ARM64

Linaro Developer Cloud K8s service has now online for about one year and the K8s service version v1.17, v1.18 has been certified by the CNCF conformance test. It is a good example that ARM64 is eligible in the open-source cloud computing area.

Hardware Automation is another hot topic in recent years. Quite a lot of workloads need to run on bare metal(e.g cloud-native K8s on Bare metal, HPC) to achieve better performance of network and storage or virtualization limit(nested virtualization is not supported on ARM64), so that quick hardware provisioning, standard hardware management framework are essential and significant on ARM64. 

To meet these requests, Linaro has proposed the diskless boot solution on ARM64, which leverages OpenStack Ironic for BM management, and uses Ceph ISCSI to provision the volume boot support. Use Ceph volume as a disk could largely reduce BM provision time and improve the RootFS security, all the data reliability will be guarantee by Ceph.  

As the above picture shows, the control path of the workflow has relied on the Ironic PXE ISCSI boot from the volume, Cinder Ceph ISCSI driver, and Ceph ISCSI gateway. Linaro has contributed several features in order to make the diskless boot on ARM64 happen:

  • Ironic boot from PXE iSCSI support
  • Cinder Ceph ISCSI driver support
  • Ceph iSCSI client bugfix and stable enhancements

We believe that the hardware automation solution will be beneficial to ARM64 development and CI system. We will quickly online BM service support in Linaro Developer Cloud and offer more resources to external applications.

Best Practices of Running Cloud on Arm Servers

EasyStack, as a company focusing on cloud computing, has served more than 1000 customers since its establishment. The company started to support the Arm server in April 2019, released the technology preview version in February 2020, and released the formal GA version on January 31, 2021, which has reached productlevel support in function coverage and test intensity. So far, nearly 30 customers have deployed the Arm version of cloud computing products in the production environment.

In people’s minds, Arm ecology is not as extensive as x86. There are many unknown problems to be solved, including hardware, firmware, operating system, software, and so on. Actually, this impression lasted for quite a long time and things have changed rapidly during the last 5 years since great progress has been made in the development of the Arm ecosystem. The number of CPU cores is increasing, and the processing capacity of a single core is strengthening. We have seen that the performance of distributed storage in the Arm-based is even higher than that on x86 architecture.

Ecology has also been greatly developed. More and more operating systems support Arm. Open source software in cloud computing also has better support on Arm architecture, such as using Arm CI as a verification means in the normal development process. The Arm architecture has an advantage in energy efficiency, which is also one of the driving forces to attract public cloud vendors to design their own Arm CPUs to provide cloud computing services. In short, the difference between the application of Arm in the field of cloud computing and x86 architecture is greatly reduced.

During the whole adaptation stage, EasyStack‘s engineers adapted more than 10 types of servers. These servers had differences in BIOS/firmware, which led to the same operating system could run on some servers, but could not run some servers. The engineers analyzed the differences and made corresponding repairs. Finally, the operating system could run stably on all the Arm servers. The choice of the upper software version is also particularly important. EasyStack’s products use Kubernetes as the base, and other services run in the form of pod, including OpenStack components. This makes the Arm version product use the same software stack with x86 architecture, which is easy to maintain. In this process, libvirt’s CPU specific bugs, Openvswitch’s stability, and MariaDB’s stability are solved. In a word, Kubernetes, OpenStack, and other components can run perfectly on the Arm platform.

After the adaptation work, we started to support strict testing at the product level. All the functions of x86 have been fully tested. This testing work is integrated into the daily CI / CD process to ensure that any code change will not damage the functions of X86 and Arm. It is found that when doing network bandwidth test on Arm platform, the bandwidth value does not reach the expected value, which is solved by a binding specific interrupt to the corresponding CPU; it is found that VNC console has no output in some guest OS, which is mainly related to the setting of guest OS grub startup parameter tty0; Some Arm servers don’t support the hardware watchdog very well, sometimes they can’t work normally. We have completed complete tests, and compared with x86 architecture products, There are only four differences: 1. It doesn’t support Windows guest OS; 2. The GPU driver and SDK support on Arm architecture are not good enough, it can’t support GPU temporarily. 3. Because of the limitation of Arm architecture, each virtual machine can support fewer disks and network cards than x86; 4. Because of the limitation of IPMI, some monitoring indicators such as disk speed can not be obtained, That’s all of the difference between the Arm and x86 architecture.

There are more and more customers using Arm servers for production. Here, we need to complete the migration of x86 architecture applications to Arm architecture, the resource specification conversion from X86 platform to Arm platform, and so on.  For EasyStack, there are nearly 30 customers in total, with more than 500 cloud computing physical nodes based on Arm architecture are used in production and run well, among which the longestrunning customers are more than one year. Production readiness is the basis of large-scale promotion, including the strong support of various companies in the community. At present, we have seen the dawn. We believe that in the near future, more enterprises will adopt Arm architecture as the basis of cloud computing to provide cloud services.

Get Involved

This article is a summary of the Open Infrastructure Summit session, the Progress for Cloud Computing on Arm Architecture.

Watch more Summit session videos like this on the Open Infrastructure Foundation YouTube channel. Don’t forget to join the global Open Infrastructure community, and share your own personal open source stories using the hashtag, #WeAreOpenInfra, on Twitter and Facebook.