Since the last OpenStack Summit in Tokyo last November, we realized the impact that containers will have on the global community.
There has been a lot of talk about using containers and Kubernetes instead of standard virtual machines (VMs). There are a couple of reasons for the buzz: they are lightweight, easy and fast to deploy and developers love that they can easily develop, maintain, scale and roll-update their applications. At tcp cloud, because we focus on building private cloud solutions based on open-source technologies, we wanted to dive into Kubernetes to see if it can really be used in a production setup along or within the OpenStack-powered virtualization.
Kubernetes brings a new way to manage container-based workloads and enables similar features like OpenStack for VMs. If you start using Kubernetes, you quickly realize that you can deploy easily it in AWS, GCE or Vagrant, but what about your on-premise bare-metal deployment? How can you integrate it into your current OpenStack or virtualized infrastructure? Many blog posts and manuals document small clusters running in VMs with sample web applications, but none of them show real scenarios for bare-metal or enterprise performance workloads with integration in current network design. The most difficult part of architectural design is to properly design networking, just like with OpenStack. So we defined following networking requirements:
- Multi tenancy - separation of containers workload is a basic requirement for every security policy standard. e.g. default Flannel networking only provides flat network architecture.
- Multi-cloud support - not every workload is suitable for containers and you still need to put heavy loads like databases in VMs or even on bare metals. For this reason, a single control plane for the SDN is the best option.
- Overlay - is related to multi-tenancy. Almost every OpenStack Neutron deployment uses some kind of overlays (VXLAN, GRE, MPLSoverGRE, MPLSoverUDP), and we have to be able inter-connect them.
- Distributed routing engine - East-West and North-South traffic cannot go through one central software service. Network traffic has to go directly between OpenStack compute nodes and Kubernetes nodes. Providing routing on routers instead of proprietary gateway appliances is optimal.
Based on these requirements, we decided to start using OpenContrail SDN first and our mission was to integrate OpenStack workloads with Kubernetes, then find a suitable application stack for the actual load testing.
OpenContrail is an open source SDN and NFV solution, which has had tight ties to OpenStack since Havana. It was one of the first production ready Neutron plugins along with Nicira (now VMware NSX-VH) and last summit’s survey showed it is the second most deployed solution after OpenVwitch and first of the vendor-based solutions. OpenContrail has integrations to OpenStack, VMware, Docker and Kubernetes.
The Kubernetes network plugin, kube-network-manager, has been under development since the OpenStack Summit in Vancouver last year and its first announcement was released at the end of the year.
The kube-network-manager process uses the Kubernetes controller framework to listen to changes in objects that are defined in the API and add annotations to some of these objects. Then it creates a network solution for the application using the OpenContrail API that define objects such as virtual-networks, network interfaces and access control policies. More information is available at this blog.
We started testing with two independent Contrail deployments and then set up a BGP federation. The reason for federation is Keystone authentication of kube-network-manager. When contrail-neutron-plugin is enabled, contrail API uses Keystone authentication and this feature is not yet implemented at the Kubernetes plugin. The Contrail federation is described in more later in this post.
The following schema shows high-level architecture, which shows the OpenStack cluster on the left and the Kubernetes cluster on the right. OpenStack and OpenContrail are deployed in fully High Available (HA) best practice design, which can be scaled up to hundreds of compute nodes.
OpenStack integration with Kubernetes
The following figure shows federation of two Contrail clusters. In general, this feature enables Contrail controllers a connection between different sites of a Multi-site DC without requiring a physical gateway. The control nodes at each site are peered with other sites using BGP. It is possible to stretch both L2 and L3 networks across multiple DCs this way.
This design is usually used for two independent OpenStack cloud or two OpenStack Region. All components of Contrail including vRouter are exactly the same. Kube-network-manager and neutron-contrail-plugin just translate API requests for different platforms. The core functionality of the networking solution remains unchanged. This brings not only robust networking engine, but analytics too.
OpenContrail control plane
Application Stack Overview
Let's have a look at a typical scenario. Our developers gave us Docker compose.yml, which is used for development and local tests on their laptop. This situation is easier, because our developers already know Docker and application workload is Docker-ready. This application stack contains the following components:
- Database - PostgreSQL or MySQL database cluster.
- Memcached - it is for content caching.
- Django app Leonardo - Django CMS Leonardo was used for application stack testing.
- Nginx - web proxy.
- Load balancer - HAProxy load balancer for containers scaling.
When we want to get it into production, we can transform everything into Kubernetes replication controllers with services, but as we mentioned at beginning not everything is suitable for containers. So we separate database cluster to OpenStack VMs and rewrite rest into Kubernetes manifests.
Application deployment This section describes workflow for application provisioning on OpenStack and Kubernetes.
OpenStack side At the first step, we have launched a Heat database stack on OpenStack. This created three VMs with PostgreSQL and database network. The database network is private tenant isolated network.
# nova list +--------------------------------------+--------------+--------+------------+-------------+-----------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------------+--------+------------+-------------+-----------------------+ | d02632b7-9ee8-486f-8222-b0cc1229143c | PostgreSQL-1 | ACTIVE | - | Running | leonardodb=10.0.100.3 | | b5ff88f8-0b81-4427-a796-31f3577333b5 | PostgreSQL-2 | ACTIVE | - | Running | leonardodb=10.0.100.4 | | 7681678e-6e75-49f7-a874-2b1bb0a120bd | PostgreSQL-3 | ACTIVE | - | Running | leonardodb=10.0.100.5 | +--------------------------------------+--------------+--------+------------+-------------+-----------------------+
Kubernetes side On the Kubernetes side, we have to launch manifests with Leonardo and Nginx services. All of them can be displayed there.
To make it to run successfully with networking isolation, look at the following sections.
- leonardo-rc.yaml - Replication controller for Leonardo app with replicas three and virtual network leonardo
apiVersion: v1 kind: ReplicationController ... template: metadata: labels: app: leonardo name: leonardo # label name defines and creates new virtual network in contrail ...
- leonardo-svc.yaml - leonardo service expose application pods with virtual IP from cluster network on port 8000.
apiVersion: v1 kind: Service metadata: labels: name: ftleonardo name: ftleonardo spec: ports: - port: 8000 selector: name: leonardo # selector/name matches label/name in replication controller to receive traffic for this service ...
- nginx-rc.yaml - NGINX replication controller with three replicas and virtual network nginx and policy allowing traffic to leonardo-svc network. This sample does not use SSL.
apiVersion: v1 kind: ReplicationController ... template: metadata: labels: app: nginx uses: ftleonardo # uses creates policy to allow traffic between leonardo service and nginx pods. name: nginx # creates virtual network nginx with policy ftleonardo ...
- nginx-svc.yaml - creates service with cluster vip IP and public virtual IP to access application from Internet.
apiVersion: v1 kind: Service metadata: name: nginx labels: app: nginx name: nginx ... selector: app: nginx # selector/name matches label/name in RC to receive traffic for the svc type: LoadBalancer # this creates new floating IPs from external virtual network and associate with VIP IP of the service. ...
Let's run all manifests by calling kubeclt.
kubectl create -f /directory_with_manifests/
This creates the following pods and services in Kubernetes.
# kubectl get pods NAME READY STATUS RESTARTS AGE leonardo-369ob 1/1 Running 0 35m leonardo-3xmdt 1/1 Running 0 35m leonardo-q9kt3 1/1 Running 0 35m nginx-jaimw 1/1 Running 0 35m nginx-ocnx2 1/1 Running 0 35m nginx-ykje9 1/1 Running 0 35m
# kubectl get service NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE ftleonardo 10.254.98.15 <none> 8000/TCP name=leonardo 35m kubernetes 10.254.0.1 <none> 443/TCP <none> 35m nginx 10.254.225.19 18.104.22.168 80/TCP app=nginx 35m
Only Nginx service has public ip 22.214.171.124, which is floating IP configured as LoadBalancer. All traffic is now balanced by ECMP on Juniper MX.
To get the cluster fully working, you must set routing between leonardo virtual network in Kubernetes Contrail and database virtual network in OpenStack Contrail. Go into both Contrail UI and set same Route Target for both networks. This can be automated, too, through Contrail Heat resources.
The following figure shows how the final production application stack should look. At the top, there are two Juniper MXs with Public VRF, where floating IPs are propagated. The traffic is balanced through ECMP to MPLSoverGRE tunnel to three nginx pods. Nginx proxies request to Leonardo application server, which stores sessions and content into PostgreSQL database cluster running at OpenStack VMs. Connection between PODs and VMs is direct without any routed central point. Juniper MXs are used only for outgoing connection to Internet. By storing the application sessions into a database (normally is memcached or redis), we don't need specific a L7 load balancer and ECMP works seamlessly.
Connecting Kubernetes pods with OpenStack VMs
This section shows other interesting outputs from application stack. Nginx service description with LoadBalancer shows floating IP and private cluster IP. Then three IP addresses of nginx pods. Traffic is distributed through vrouter ecmp.
# kubectl describe svc/nginx Name: nginx Namespace: default Labels: app=nginx,name=nginx Selector: app=nginx Type: LoadBalancer IP: 10.254.225.19 LoadBalancer Ingress: 126.96.36.199 Port: http 80/TCP NodePort: http 30024/TCP Endpoints: 10.150.255.243:80,10.150.255.248:80,10.150.255.250:80 Session Affinity: None
Nginx routing table shows internal routes between pods and route 10.254.98.15/32, which points to leonardo service.
The previous route 10.254.98.15/32 is inside of description for leonardo service.
# kubectl describe svc/ftleonardo Name: ftleonardo Namespace: default Labels: name=ftleonardo Selector: name=leonardo Type: ClusterIP IP: 10.254.98.15 Port: <unnamed> 8000/TCP Endpoints: 10.150.255.245:8000,10.150.255.247:8000,10.150.255.252:8000
The routing table for leonardo looks similar like nginx except routes 10.0.100.X/32, whose points to OpenStack VMs in different Contrail.
The last output is from Juniper MXs VRF showing multiple routes to nginx pods.
188.8.131.52/32 @[BGP/170] 00:53:48, localpref 200, from 10.0.170.71 AS path: ?, validation-state: unverified > via gr-0/0/0.32782, Push 20 [BGP/170] 00:53:31, localpref 200, from 10.0.170.71 AS path: ?, validation-state: unverified > via gr-0/0/0.32778, Push 36 [BGP/170] 00:53:48, localpref 200, from 10.0.170.72 AS path: ?, validation-state: unverified > via gr-0/0/0.32782, Push 20 [BGP/170] 00:53:31, localpref 200, from 10.0.170.72 AS path: ?, validation-state: unverified > via gr-0/0/0.32778, Push 36 #[Multipath/255] 00:53:48, metric2 0 > via gr-0/0/0.32782, Push 20 via gr-0/0/0.32778, Push 36
We have proved that you can use a single SDN solution for OpenStack, Kubernetes, bare metal and VMware vCenter. The most important thing is that this use case can be actually used for production environments.
Currently, we are working on requirements for Kubernetes networking stacks and providing detailed comparison between different Kubernetes network plugins like Weave, Calico, OpenVSwitch, Flannel and Contrail at scale of 250 bare metal servers.
We are also working on OpenStack Magnum with Kubernetes backend to bring developers a self-service portal for simple testing and development. Then they will be able to prepare application manifests inside of OpenStack VMs, push changes of final production definitions into git and then use them in production.
Special thanks to Pedro Marques from Juniper for his support and contribution during testing, and a special thanks goes to Lachlan Evenson and his colleagues from Lithium for collaboration on this post and providing real use cases.