OpenInfra Live is a weekly hour-long interactive show streaming to the OpenInfra YouTube channel every Thursday at 14:00 UTC (9:00 AM CT). Episodes feature more OpenInfra release updates, user stories, community meetings, and more open infrastructure stories.
Networking is complex, and Neutron is one of the most difficult parts of OpenStack to scale. In this episode of the Large Scale OpenStack segment, we explored early architectural choices you can make, recommended drivers, features to avoid if your ultimate goal is to scale to a very large deployment. Watch as OpenStack developers and operators share their Neutron scaling best practices.
Enjoyed this week’s episode and want to hear more about OpenInfra Live? Let us know what other topics or conversations you want to hear from the OpenInfra community this year, and help us to program OpenInfra Live! If you are running OpenStack at scale or helping your customers overcome the challenges discussed in this episode, join the OpenInfra Foundation to help guide OpenStack software development and to support the global community.
Due to its popular demand, the OpenStack Large Scale SIG is back organizing another OpenInfra Live episode with several OpenStack operators of large scale deployment, discussing different approaches to confront various operation challenges. Thierry Carrez, Vice President of Engineering at the OpenInfra Foundation and one of the chairs of the OpenStack Large Scale SIG, kicked off this episode about Neutron scaling best practices.
Questions for the panel:
The speakers in today’s panel include:
- Ibrahim Derraz, site reliability engineer at Exaion who drove the discussion
- David Comay, senior cloud engineer at Bloomberg
- Lajos Katona, master developer at Ericsson and current OpenStack Neutron Project Team Lead (PTL)
- Sławomir Kapłoński, principal software engineer at Red Hat and previous Neutron PTL
- Michal Nasiadka, senior technical lead at StackHPC, Kolla Ansible PTL
- Mohammed Naser, CEO at VEXXHOST and OpenStack technical member
Derraz asked a series of questions to the panelists about early architectural choices other users made, recommended drivers, features to avoid if the users ultimate goal is to scale to a very large deployment:
- What drivers would you recommend for users to reach a large scale deployment?
- How do users choose, more specifically, between OVS (Open vSwitch) and OVN (Open Virtual Network) based on their features?
- In terms of network architecture, what would be the most resilient choice to make sure that no services will be impacted if something goes down?
- Is there any feature to enable or stay away from at scale?
Hardware and Performance Considerations:
- Do you have any feedback or recommendations for hardware offloading for new users?
- Do you have any advice on how to size (CPU/RAM/ Number of nodes) network nodes?
Neutron in Production:
- What are the common downsides/failures of Neutron in production? What are critical metrics to monitor on network nodes?
- For Bloomberg and VEXXHOST: do you have a dedicated network team for your production?
After the lively discussion on the architecture choices, hardware and performance considerations and Neutron in production, we received a few questions from the live audience:
- How’s your experience with centralized routing in Neutron (where all traffic goes to a Neutron server) in large scale OpenStack environments?
- The time to restart Open vSwitch agent becomes longer in Distributed Virtual Router (DVR) with hundreds of hypervisors. The agent is looking up a lot of port from the database while restarting. Is there any way to solve this?
If you are interested to hear more about how large scale users are solving operational challenges, we encourage you to join the OpenStack Large Scale SIG.
For people who want to meet more large scale OpenStack operators and developers and learn how they upgrade or use OpenStack to build supercomputers in research and HPC, here are the past OpenInfra Live episodes you can check out:
- Upgrades in Large Scale OpenStack Infrastructure
- Experts Discuss Tradeoffs, Frequency, and more around Upgrades of Large Scale OpenStack Deployments
- Large Scale OpenStack: Discussing Software-Defined Supercomputers
Next Episode on #OpenInfraLive
Join us on Thursday, October 28 at 1400 UTC (9:00 AM CT) to watch the next OpenInfra Live episode: Tackling Sustainability with eco-friendly, green hardware. We will discuss how chipmakers, hardware vendors, and data center architects are all working in concert to tackle this issue.