First in a series on Trove, and how to automate management of failures and failovers.

image

This article originally appeared on Tesora’s Blog. Amrith is co-founder of Tesora, and has more than two decades of experience in enterprise storage applications, fault tolerant high performance systems and massively parallel databases. You should follow him on Twitter.

As applications are migrated to the cloud, the complexity of operating databases in this new environment has become apparent. It is hard to operate a significant database infrastructure even when you have the luxury of doing it in a controlled data-center on dedicated hardware. The cloud introduces performance variability, an overhead due to virtualization and provides an end user with a much lower level of control over the underlying hardware. In the public cloud, reliability of an individual virtual machine instance is considerably lower than that of a dedicated machine in a data-center. When operating a large fleet of servers, observed failures are much more frequent. All of these make operating a database in the cloud much more challenging.

Database-as-a-Service simplifies the use of databases in the cloud by relieving the administrator of much of the administrative burden in operating the infrastructure. By being closely tied with the underlying infrastructure, and automating many common operations, DBaaS considerably simplifies many of these activities. Failures however could cause interruptions in the service and therefore it is essential that the DBaaS platform accounts for these, and handle them in a manner that makes failures totally transparent to the end user.

Trove accomplishes this in several ways. First, Trove is closely tied to the underlying OpenStack infrastructure, integrated closely with Nova, Neutron, Swift, Cinder and Keystone. It automates a considerable amount of the configuration and setup steps required in launching a new server, similar to other tools like Puppet, Chef, and Ansible. It also allows a site administrator to establish standard configurations and reliably launch servers with those configurations.

One area where this configuration support is especially important in the case of Clustering and Replication. Without Trove a user would have to manually configure these features and manage failures and failover by themselves. Trove promises to automate these capabilities and the functionality is being implemented in phases.

The initial implementation of replication in Trove will be for MySQL data stores using the built-in MySQL replication feature. Subsequent phases will extend this capability to include clustering and replication for all data stores that Trove supports. In the first release of this feature, users will be able to create a single MySQL instance and then create a slave of that instance. The act of creating the slave will establish a new instance, which will be the replication peer of the initial instance.

The following commands illustrate how a user would do this. Consider first the following operating Trove instance running MySQL version 5.5

$ trove list
ID Name Datastore Datastore Version Status Flavor ID Size
d2bd91ef-3d7c-43ae-97a9-f0726c91d322 m1 mysql 5.5 ACTIVE 7 2

One would now create a second (slave) instance referencing the master provided above, as follows.

$ trove create s1 7 --size 2 --slave_of d2bd91ef-3d7c-43ae-97a9-f0726c91d322
Property Value
created 2014-06-13T14:33:27
datastore mysql
datastore_version 5.5
flavor 7
id 9ffc7b3a-9205-412a-9cd2-521f95755c43
name s1
slaveof d2bd91ef-3d7c-43ae-97a9-f0726c91d322
status BUILD
updated 2014-06-13T14:33:27
volume 2

The user can now look at the state of the replicated pair as shown below.

trove show 9ffc7b3a-9205-412a-9cd2-521f95755c43
Property Value
created 2014-06-13T14:33:27
datastore mysql
datastore_version 5.5
flavor 7
id 9ffc7b3a-9205-412a-9cd2-521f95755c43
name s1
slaveof d2bd91ef-3d7c-43ae-97a9-f0726c91d322
status ACTIVE
updated 2014-06-13T14:33:27
volume 2
$ trove show d2bd91ef-3d7c-43ae-97a9-f0726c91d322
Property Value
created 2014-06-13T14:33:27
datastore mysql
datastore_version 5.5
flavor 7
id d2bd91ef-3d7c-43ae-97a9-f0726c91d322
name s1
slaves 9ffc7b3a-9205-412a-9cd2-521f95755c43
status ACTIVE
updated 2014-06-13T14:33:27
volume 2

To disconnect a slave from a master, the user would do this:

$ trove detach_replication <slave instance>

Now that you know the basic mechanics of Trove’s replication feature, in the next post, we will describe the implementation of the Client and the Task Manager in detail.

Photo by James Tworo // CC BY