Here’s the first insight into usage of real bare metal Kubernetes clusters for application workloads from a networking point of view.

image

Since the last OpenStack Summit in Tokyo last November, we realized the impact that containers will have on the global community.

There has been a lot of talk about using containers and Kubernetes instead of standard virtual machines (VMs). There are a couple of reasons for the buzz: they are lightweight, easy and fast to deploy and developers love that they can easily develop, maintain, scale and roll-update their applications. At tcp cloud, because we focus on building private cloud solutions based on open-source technologies, we wanted to dive into Kubernetes to see if it can really be used in a production setup along or within the OpenStack-powered virtualization.

Kubernetes brings a new way to manage container-based workloads and enables similar features like OpenStack for VMs. If you start using Kubernetes, you quickly realize that you can deploy easily it in AWS, GCE or Vagrant, but what about your on-premise bare-metal deployment? How can you integrate it into your current OpenStack or virtualized infrastructure? Many blog posts and manuals document small clusters running in VMs with sample web applications, but none of them show real scenarios for bare-metal or enterprise performance workloads with integration in current network design. The most difficult part of architectural design is to properly design networking, just like with OpenStack. So we defined following networking requirements:

  • Multi tenancy – separation of containers workload is a basic requirement for every security policy standard. e.g. default Flannel networking only provides flat network architecture.
  • Multi-cloud support – not every workload is suitable for containers and you still need to put heavy loads like databases in VMs or even on bare metals. For this reason, a single control plane for the SDN is the best option.
  • Overlay – is related to multi-tenancy. Almost every OpenStack Neutron deployment uses some kind of overlays (VXLAN, GRE, MPLSoverGRE, MPLSoverUDP), and we have to be able inter-connect them.
  • Distributed routing engine – East-West and North-South traffic cannot go through one central software service. Network traffic has to go directly between OpenStack compute nodes and Kubernetes nodes. Providing routing on routers instead of proprietary gateway appliances is optimal.

Based on these requirements, we decided to start using OpenContrail SDN first and our mission was to integrate OpenStack workloads with Kubernetes, then find a suitable application stack for the actual load testing.

OpenContrail overview

OpenContrail is an open source SDN and NFV solution, which has had tight ties to OpenStack since Havana. It was one of the first production ready Neutron plugins along with Nicira (now VMware NSX-VH) and last summit’s survey showed it is the second most deployed solution after OpenVwitch and first of the vendor-based solutions. OpenContrail has integrations to OpenStack, VMware, Docker and Kubernetes.

The Kubernetes network plugin, kube-network-manager, has been under development since the OpenStack Summit in Vancouver last year and its first announcement was released at the end of the year.

The kube-network-manager process uses the Kubernetes controller framework to listen to changes in objects that are defined in the API and add annotations to some of these objects. Then it creates a network solution for the application using the OpenContrail API that define objects such as virtual-networks, network interfaces and access control policies. More information is available at this blog.

Architecture

We started testing with two independent Contrail deployments and then set up a BGP federation. The reason for federation is Keystone authentication of kube-network-manager. When contrail-neutron-plugin is enabled, contrail API uses Keystone authentication and this feature is not yet implemented at the Kubernetes plugin. The Contrail federation is described in more later in this post.

The following schema shows high-level architecture, which shows the OpenStack cluster on the left and the Kubernetes cluster on the right. OpenStack and OpenContrail are deployed in fully High Available (HA) best practice design, which can be scaled up to hundreds of compute nodes.

dsd7c1f3ephhgbszk85j
OpenStack integration with Kubernetes

The following figure shows federation of two Contrail clusters. In general, this feature enables Contrail controllers a connection between different sites of a Multi-site DC without requiring a physical gateway. The control nodes at each site are peered with other sites using BGP. It is possible to stretch both L2 and L3 networks across multiple DCs this way.

This design is usually used for two independent OpenStack cloud or two OpenStack Region. All components of Contrail including vRouter are exactly the same. Kube-network-manager and neutron-contrail-plugin just translate API requests for different platforms. The core functionality of the networking solution remains unchanged. This brings not only robust networking engine, but analytics too.

unep1ktfhheiivnne4iq
OpenContrail control plane

Application Stack Overview

Let’s have a look at a typical scenario. Our developers gave us Docker compose.yml, which is used for development and local tests on their laptop. This situation is easier, because our developers already know Docker and application workload is Docker-ready. This application stack contains the following components:

  • Database – PostgreSQL or MySQL database cluster.
  • Memcached – it is for content caching.
  • Django app Leonardo – Django CMS Leonardo was used for application stack testing.
  • Nginx – web proxy.
  • Load balancer – HAProxy load balancer for containers scaling.

When we want to get it into production, we can transform everything into Kubernetes replication controllers with services, but as we mentioned at beginning not everything is suitable for containers. So we separate database cluster to OpenStack VMs and rewrite rest into Kubernetes manifests.

Application deployment
This section describes workflow for application provisioning on OpenStack and Kubernetes.

OpenStack side
At the first step, we have launched a Heat database stack on OpenStack. This created three VMs with PostgreSQL and database network. The database network is private tenant isolated network.

# nova list
+--------------------------------------+--------------+--------+------------+-------------+-----------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks              |
+--------------------------------------+--------------+--------+------------+-------------+-----------------------+
| d02632b7-9ee8-486f-8222-b0cc1229143c | PostgreSQL-1 | ACTIVE | -          | Running     | leonardodb=10.0.100.3 |
| b5ff88f8-0b81-4427-a796-31f3577333b5 | PostgreSQL-2 | ACTIVE | -          | Running     | leonardodb=10.0.100.4 |
| 7681678e-6e75-49f7-a874-2b1bb0a120bd | PostgreSQL-3 | ACTIVE | -          | Running     | leonardodb=10.0.100.5 |
+--------------------------------------+--------------+--------+------------+-------------+-----------------------+

Kubernetes side
On the Kubernetes side, we have to launch manifests with Leonardo and Nginx services. All of them can be displayed there.

To make it to run successfully with networking isolation, look at the following sections.

  • leonardo-rc.yaml – Replication controller for Leonardo app with replicas three and virtual network leonardo
apiVersion: v1
kind: ReplicationController
...
  template:
metadata:
  labels:
    app: leonardo
    name: leonardo # label name defines and creates new virtual network in contrail
...
  • leonardo-svc.yaml – leonardo service expose application pods with virtual IP from cluster network on port 8000.
apiVersion: v1
kind: Service
metadata:
  labels:
    name: ftleonardo
  name: ftleonardo
spec:
  ports:
    - port: 8000
  selector:
    name: leonardo # selector/name matches label/name in replication controller to receive traffic for this service
...
  • nginx-rc.yaml – NGINX replication controller with three replicas and virtual network nginx and policy allowing traffic to leonardo-svc network. This sample does not use SSL.
apiVersion: v1
kind: ReplicationController
...
  template:
    metadata:
      labels:
        app: nginx
        uses: ftleonardo # uses creates policy to allow traffic between leonardo service and nginx pods.
        name: nginx # creates virtual network nginx with policy ftleonardo
...
  • nginx-svc.yaml – creates service with cluster vip IP and public virtual IP to access application from Internet.
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
    name: nginx
...
  selector:
    app: nginx # selector/name matches label/name in RC to receive traffic for the svc
  type: LoadBalancer # this creates new floating IPs from external virtual network and associate with VIP IP of the service.
...

Let’s run all manifests by calling kubeclt.

kubectl create -f /directory_with_manifests/

This creates the following pods and services in Kubernetes.

# kubectl get pods
NAME             READY     STATUS    RESTARTS   AGE
leonardo-369ob   1/1       Running   0          35m
leonardo-3xmdt   1/1       Running   0          35m
leonardo-q9kt3   1/1       Running   0          35m
nginx-jaimw      1/1       Running   0          35m
nginx-ocnx2      1/1       Running   0          35m
nginx-ykje9      1/1       Running   0          35m
# kubectl get service
NAME         CLUSTER_IP      EXTERNAL_IP     PORT(S)    SELECTOR        AGE
ftleonardo   10.254.98.15    <none>          8000/TCP   name=leonardo   35m
kubernetes   10.254.0.1      <none>          443/TCP    <none>          35m
nginx        10.254.225.19   185.22.97.188   80/TCP     app=nginx       35m

Only Nginx service has public ip 185.22.97.188, which is floating IP configured as LoadBalancer. All traffic is now balanced by ECMP on Juniper MX.

To get the cluster fully working, you must set routing between leonardo virtual network in Kubernetes Contrail and database virtual network in OpenStack Contrail. Go into both Contrail UI and set same Route Target for both networks. This can be automated, too, through Contrail Heat resources.

goqknltzbaphbbau3mug

The following figure shows how the final production application stack should look. At the top, there are two Juniper MXs with Public VRF, where floating IPs are propagated. The traffic is balanced through ECMP to MPLSoverGRE tunnel to three nginx pods. Nginx proxies request to Leonardo application server, which stores sessions and content into PostgreSQL database cluster running at OpenStack VMs. Connection between PODs and VMs is direct without any routed central point. Juniper MXs are used only for outgoing connection to Internet. By storing the application sessions into a database (normally is memcached or redis), we don’t need specific a L7 load balancer and ECMP works seamlessly.

mvg8xlgdl7pcalslxwdi
Connecting Kubernetes pods with OpenStack VMs

Other Outputs

This section shows other interesting outputs from application stack. Nginx service description with LoadBalancer shows floating IP and private cluster IP. Then three IP addresses of nginx pods. Traffic is distributed through vrouter ecmp.

# kubectl describe svc/nginx
Name:                   nginx
Namespace:              default
Labels:                 app=nginx,name=nginx
Selector:               app=nginx
Type:                   LoadBalancer
IP:                     10.254.225.19
LoadBalancer Ingress:   185.22.97.188
Port:                   http    80/TCP
NodePort:               http    30024/TCP
Endpoints:              10.150.255.243:80,10.150.255.248:80,10.150.255.250:80
Session Affinity:       None

Nginx routing table shows internal routes between pods and route 10.254.98.15/32, which points to leonardo service.

hfybdyubszdlbidbro9r

The previous route 10.254.98.15/32 is inside of description for leonardo service.

# kubectl describe svc/ftleonardo
Name:                   ftleonardo
Namespace:              default
Labels:                 name=ftleonardo
Selector:               name=leonardo
Type:                   ClusterIP
IP:                     10.254.98.15
Port:                   <unnamed>       8000/TCP
Endpoints:              10.150.255.245:8000,10.150.255.247:8000,10.150.255.252:8000

The routing table for leonardo looks similar like nginx except routes 10.0.100.X/32, whose points to OpenStack VMs in different Contrail.

lj7xhawdnxyqagj8uoxc

The last output is from Juniper MXs VRF showing multiple routes to nginx pods.

185.22.97.188/32   @[BGP/170] 00:53:48, localpref 200, from 10.0.170.71
                      AS path: ?, validation-state: unverified
                    > via gr-0/0/0.32782, Push 20
                    [BGP/170] 00:53:31, localpref 200, from 10.0.170.71
                      AS path: ?, validation-state: unverified
                    > via gr-0/0/0.32778, Push 36
                    [BGP/170] 00:53:48, localpref 200, from 10.0.170.72
                      AS path: ?, validation-state: unverified
                    > via gr-0/0/0.32782, Push 20
                    [BGP/170] 00:53:31, localpref 200, from 10.0.170.72
                      AS path: ?, validation-state: unverified
                    > via gr-0/0/0.32778, Push 36
                   #[Multipath/255] 00:53:48, metric2 0
                    > via gr-0/0/0.32782, Push 20
                      via gr-0/0/0.32778, Push 36

Conclusion

We have proved that you can use a single SDN solution for OpenStack, Kubernetes, bare metal and VMware vCenter. The most important thing is that this use case can be actually used for production environments.

Currently, we are working on requirements for Kubernetes networking stacks and providing detailed comparison between different Kubernetes network plugins like Weave, Calico, OpenVSwitch, Flannel and Contrail at scale of 250 bare metal servers.

We are also working on OpenStack Magnum with Kubernetes backend to bring developers a self-service portal for simple testing and development. Then they will be able to prepare application manifests inside of OpenStack VMs, push changes of final production definitions into git and then use them in production.

Special thanks to Pedro Marques from Juniper for his support and contribution during testing, and a special thanks goes to Lachlan Evenson and his colleagues from Lithium for collaboration on this post and providing real use cases.

This blog was originally posted on the tcp cloud blog. Superuser is always looking for how-tos and other contributions. Please get in touch: [email protected]

Cover Photo // CC BY NC