One of China’s largest tech companies has been running OpenStack in production for 4 years with availability of over 99.99%.

image

Update: April 24, 2018 Tencent became the newest OSF Platinum Member. More on that here.

Hear more from their team at the Vancouver Summit.

Shenzhen-based Tencent is one of the largest tech companies in Asia. The company, whose motto is “connecting people for a greater future,” will soon be bringing hit games to the United States and Europe. They’re using OpenStack on a pretty grand scale, too: 14 clusters in seven data centers in four regions and they’ve been running it in production for four years with availability of 99.99 percent.

Superuser interviewed a team at Tencent (big thanks to Xiaodong Pan, Leon Hu, Carlos Luo, Xiabing Yao) over email to find out more about their private cloud, TStack, and plans for the future.

You can hear more from Tencent about OpenStack product development and operations at  Open Infra Days China.

Who uses OpenStack at your company?

In 2013, Tencent began building a private cloud called TStack to provide cloud computing resources for our internal IT department. TStack provides infrastructure-as-a-service (IaaS) based on OpenStack to OA system operators, platform application developers, QA developers and AI researchers in Tencent.

Tencent’s internal IT system, functional department business and most of development and QA systems are built mainly in TStack. Tencent also has brought TStack and management experiences to many Chinese government departments and public enterprises, including police departments and electric power companies.

Why did Tencent choose OpenStack?

In its early stage, TStack was built based on infrastructure designed by Tencent that successfully managed over 6,000 Xen virtual machines. However, the initial TStack was not a cloud management platform that supported heterogeneous virtualization. Many resources couldn’t be managed and fully utilized, including heterogeneous virtual machines, thousands of physical servers and many third-party storage devices.
In 2014, OpenStack started to grow rapidly and had created a great ecosystem. Dozens of industry-leading companies from all over the world re involved in OpenStack and deployed many projects with it. OpenStack became a strong force in cloud computing and the first choice for open-source cloud computing platforms. As the most active solution of software-defined infrastructure, OpenStack has many advantages, such as being open-source and having an advanced design. Based on evaluation and testing results from our internal IT operation team, Tencent decided to introduce OpenStack as the infrastructure for TStack and expected it to provide better services.
TStack is a cloud computing management solution that is extensible, highly available and based on OpenStack. It provides service interfaces for managing computing resource, storage, network, image, authentication and measurement and is compatible with heterogeneous virtualization, servers, storage devices and network devices. It’s also suitable for distributed computing and storage.

Tell us about what workloads you’re running on OpenStack and the value and impact of this workload to Tencent. 

TStack is designed for environments on a huge scale. There are more than 10,000 OS it manages, 40 percent of them are deployed for over 300 internal IT services, including OA authentication, WeChat gateway, RTX, mail system, video surveillance, internal security, function management and ERP. These services have 24/7 uptime requirements. TStack also manages development and testing services of various products in Tencent, e.g. WeChat, QQ, browser, games, etc.
At the same time, Tencent works with the Chinese government to build e-government system, managing various aspects of public services including public transportation, taxes, social insurance, health care, etc. TStack helps to standardize the interface between geographically distributed data centers and schedule heterogeneous resources, reducing server costs by 30 percent and operation costs by 55 percent.

Can you give us some details on your deployment?

Right now, TStack uses OpenStack as IaaS. It’s based on the OpenStack Kilo release and has three products: an infrastructure cloud, a monitoring cloud and a self-service cloud. Its key features are heterogeneous cloud management and hybrid cloud management. The infrastructure cloud manages physical resources, the self-service cloud provides resource application management dashboard, workflow management, services management and also cloud services like PaaS and SaaS. The monitoring cloud provides monitoring on cloud hosts, resource utilization and also implements ITIL processes.

TStack uses following OpenStack components: Keystone, Nova, Neutron, Cinder, Glance, Ironic, Heat, Swift, Manila and Horizon. To meet business needs, upgrading to the latest OpenStack release is under evaluation. Up to now, TStack has deployed to 14 clusters in seven data centers in four regions of Shanghai, Chengdu, Tianjin and Shanwei. It manages over 10,000+ OSes, hosts internal IT systems, functional department business system and most of development and testing systems within Tencent. It has been running in production for four years with availability over 99.99 percent.

Based on disaster tolerance requirement of application services, TStack uses multi-region, two-location, three-center deployments. The Self-Service Cloud is used for scheduling and management across regions. Within a region, three control nodes are used to achieve high availability with Keepalive + HAProxy. Galera Cluster for MariaDB is the database. VLAN is used in most of the regions because of its simplicity and efficiency, while VxLAN is also used in some regions. With OpenStack, over 1,000 computing nodes can be managed within one single region.
What challenges did you with OpenStack, and how did you overcome them?

Tencent’s journey to running OpenStack has not been smooth. The TStack team encountered many technical problems since they began to use TStack four years ago. Fortunately, they’ve managed to overcome these difficulties while always providing steady services to users.

Here are some representative examples below:

  • Nova cannot apply resource quota to virtual machines that are already running. Therefore TStack team developed online resource quota application feature based on Nova open API, in order to apply resource quota without restarting virtual machines, especially when resources are limited.
  •  Nova’s native resizing feature will reboot the virtual machine. TStack team extended this feature with “nova live-resize” which can resizing cloud instances online.
  •  Although OpenStack provides quite a few scheduling policy based on CPU, memory, disk and node status, it still cannot meet complicated business requirements in Tencent. Therefore TStack customized scheduling policy for different businesses, achieved high availability by moving VMs across host machines.
  •  Nova block migration will copy disk volumes. For VM with large volumes, block migration times out frequently. TStack team then implemented dynamic adaptive compression during migration, dramatically saved bandwidth and hence cut the migration duration by an average of 50 percent.
  •  Heat cannot orchestrate an existing VM that’s created before orchestration. TStack team combines Tencent’s internal BlueKING platform with Heat to achieve entire lifecycle management of all VMs.
  •  The TStack team developed a Neutron plugin to manage SDN controllers from various vendors.
  •  The TStack team fine-tuned RabbitMQ kernel configuration, so that it can handle messaging between over 1,000 computing nodes within one region.
  • Physical machines, Xen VMs, KVM VMs and storage devices are all managed and scheduled by TStack to increase server utilization and reduce IT cost.

What’s your experience been with the OpenStack community and other open-source communities?

Tencent actively participates in community activities, sharing experience and benefiting a lot. At the Global Cloud Computing Open Source Conference in China, as a participant and sponsor, Tencent gave a number of technical presentations, participated and passed in an OpenStack interoperability test. Tencent also take part in 2017 OpenStack Days China, to share our experience of product development and operations.

The TStack team is also preparing for OpenStack Summit Sydney to share our past experience about using OpenStack, and at the same time, the team hopes to learn from the industry leaders from all over the world. In May 2017, Tencent joined the CNCF and Linux Foundation to contribute to container services and KVM virtualization. In June, Tencent joined the MariaDB Foundation to share experience of database CDB.

What’s next?

OpenStack has brought many benefits to Tencent and paved the way for Tencent to continue to optimize its application of OpenStack.

In the future, Tencent will strengthen its support for OpenStack in the following aspects:

  • Apart from improving the capacity to handle business growth, TStack will also increase investments in the community, starting from this year. Tencent has top-tier researchers and developers in China and will bring fresh blood to OpenStack community, providing high-quality code, sharing development and optimization experiences, also hoping to lead new projects. Tencent will contribute its operational optimization and internal modules to the community, to help improving OpenStack.  Tencent will also focus on projects like Magnum, Ironic and Kuryr, explore practical use cases and share opinions. The team believes this vital new energy and knowledge sharing will take OpenStack and also cloud computing industry a brighter future.
  • Lack of operational skillset is a common problem for OpenStack in enterprises. Powerful technology requires powerful operational skills. So Tencent will invest more in building operation management platforms and toolchains, sharing lesson learned and best practices, so that others can also manage clouds with OpenStack better.
  • Tencent will integrate a variety of basic services such as CMEM, CDB, CDN, messaging queue, load balancing and etc. into TStack. Customers can access to these basic services through an API or GUI.
  • Tencent will also cooperate more closely with partners to build an open and integrated cloud computing ecosystem to reduce the cost for cloud users.

Tencent is committed to the cloud computing market. The team will use the OpenStack-based Private Cloud TStack along with our proprietary Public Cloud to build a complete hybrid cloud service ecosystem for global market.

Tencent hopes to grow with OpenStack and make contributions to promote OpenStack and, overall, bring prosperity to the OpenStack ecosystem.

Cover image: Shenzen’s Tencent building, courtesy Tencent.