DreamHost’s Jonathan LaCour on the benefits of this “more open, less magic” approach

image

Four years ago when our company DreamHost got into infrastructure-as-a-service, offering could storage and cloud computing, it was because we believed the world needed a viable open alternative for these services.

We were the creators of the Ceph open source cloud storage platform, as well as one of the early contributors to OpenStack, both of which bolstered our belief that we could create a capable offering to give customers a choice when it came to cloud computing.

We introduced DreamCompute as a public cloud compute service built on OpenStack and Ceph. Core networking requirements for DreamCompute included all tenants having L2 isolation from each other (which customers love for the security, among other reasons), built-in preparedness for IPv6, and 10G+ speed everywhere in the architecture.

In the first generation build of DreamCompute we made certain design choices. We selected Nicira NVP (prior to their acquisition by VMWare) to use for L2 isolation. At the time Nicira did not yet offer an L3 solution. Needing one, we spoke with software routing vendors on the market, looking at virtual routing appliances. Unfortunately, at the time none of these vendors understood the cloud and the concepts of flexibly spinning servers up and down and expanding and contracting with demand. They would ask, “how many servers do you need?” – entirely the wrong question when you understand the cloud’s way of dynamically adjusting capacity. This is when we began wondering about building our own software router and network appliance. And that’s how the Astara project was born.

Astara started off with an openBSD-based appliance using a packing filter to provide the L3 services. We were seeing some issues with our approach, however, and in the meantime Nicira/VMWare was adding L3 support. The moment came when we decided it was time for a bake-off to see how we compared. Honestly, we expected Astara to lose this challenge. However, Astara absolutely came out victorious, offering a significantly better experience and more reliability.

Even with this we did see space for improving performance, and so in DreamCompute’s second generation we moved from OpenBSD and PF to Linux and iptables, which we were more familiar with and better fit enterprise needs and a virtualized environment. We also made significant optimizations to the network orchestration platform. With these changes DreamCompute was able to scale to over 1000 customers and thousands of virtual machines. At this point, however, we ran into some serious scale issues. We discovered that, for our use case, VMWare NSX maxed out around 1250 tenants. Beyond that we were seeing serious issues, with ports going inactive, connections dropping, and throughput going through the floor. We also had serious issues with OVS, finding the performance to be very slow and unstable. Most challenging was the fact that as a closed source product, VMWare NSX was just too “magic” and unknowable, making it extremely difficult for us to debug and operate.

To solve these issues, we wanted to go to a platform that was more open and less magic – and also a lot simpler. DreamCompute’s third generation is built upon Cumulus Linux at the physical layer, and provides L2 isolation through hardware accelerated VXLAN in switches and hypervisors. In this iteration of DreamCompute, we moved all L3-L7 OpenStack networking capability to Astara’s open platform. At this time we spun out Akanda as a business to support the development of Astara, which is now an official OpenStack project.

So what are the benefits of this open approach? First of all, DreamCompute’s simple, open architecture means we have an easier time operating it. We can use proven open technology that our engineers understand, such as VXLAN, iptables, and the Linux networking stack. Managing switches and network functions deployed within virtual machines is very similar – we can use the same tools we use anywhere else and are familiar with to perform our tasks. Secondly, Astara provides the valuable benefit of simplifying our Neutron deployment. We’re able to run fewer agents while generally having a much easier time of it (Neutron is notoriously difficult to deploy). As far as performance and scale, DreamCompute is breaking through those limits we met with VMWare NSX. This is largely due to reductions in complexity, thanks to management and automation through OpenStack and Astara. The Astara model of virtual network appliances scales easily, giving us the option to deploy not just routing inside of those virtual machines, but also other network functions: load balancing, firewalls, etc. We can bring our own network functions and have them orchestrated automatically by Astara, which has been greatly beneficial.

Because it was originally envisioned from deployers’ point-of-view, Astara provides a simplicity of use and functionality that enhances everything else it offers. By providing L3-L7 network service orchestration, meeting needs for scaling and high availability, taking the challenge out of working with OpenStack Neutron, and eliminating the need to rely on magical, cumbersomely-secretive closed solutions, Astara’s open solution has made rapid growth and success possible for our cloud product and others

Jonathan LaCour is VP of cloud and development at DreamHost, a global web hosting, domain registrar and cloud services provider whose offerings include the cloud storage service DreamObjects and cloud computing service DreamCompute.

Superuser is always interested in opinion pieces and how-tos. Please get in touch: [email protected]

Cover Photo // CC BY NC