Sometimes a small team needs a do-it-yourself solution rather than a massive implementation involving specialists or other budget-draining, off-the-rack, one-size-fits-all solutions.
Cody Herriges, principal system operations engineer at Puppet, needed a solution to create and manage thousands of virtual machines for a research project.
These developer-minded automation enthusiasts wanted to enable a more effective work environment that reflected their own core values: rapid improvement, user value and collaboration.
“At the end of the day,” says Herriges, "[you need to] be proud of the systems you’re running."
The team wanted to implement a solution that was open source, platform agnostic, and that could be built, not bought.
They turned to OpenStack, an open-source cloud operating system that can run on standardized hardware with a robust user community of thousands of people around the globe that was originally founded by Rackspace and NASA.
Herriges took time at the recent OpenStack Summit in Austin, Texas, to tell the story about his team’s journey to its own DIY implementation of OpenStack.
They named their OpenStack project SLICE, a “backronym” for Self-Service Low Impedance Compute Environment.
“We needed a safe, flexible environment that people could try and experiment with new things,” says Herriges. “It had to be fast, close, and with near-no latency."
Herriges’ team only had one full time person and two floaters. Ultimately, it took 20 months to produce the 4,000 lines of freshly-written automation and 10 community modules, with 37 rebuilt, built, or back-ported.
Issues in DIY implementation
The team faced a common challenge when adopting a new software solution: a lack of knowledge about OpenStack itself.
“We had the tools to automate it and the ability to deploy the entire stack without knowing what was underneath,” says Herriges, “but that really goes against what we believe in.”
Instead, the team took the extra time needed to get up to speed on OpenStack without its built-in automation so that when issues occurred, the team could make intelligent fixes instead of relying on random band-aid approaches.
OpenStack does have some known issues, including an as-yet unfixed SSL bug, which Herriges documented his own team’s fix at herrig.es/os-ssl-issue. Other workarounds the team came across can be found there, as well.
Finally, the team needed to make sure that OpenStack met their needs as a high availability, or HA, application. OpenStack Rally helps the team manage things at a much higher level than ever before, even as Puppet has become an even more integral part of their virtual infrastructure.
“It’s made us all very confident in the platform that we’ve delivered to people,” says Herriges.
Tips for teams implementing OpenStack
For teams considering following in Puppet’s virtual footsteps, Herriges has a number of tips.
First, he says, go for simple; the more complexity, the more points of failure will creep into the project. “You’ve got to remove complexity as much as possible” he says.
Agnosticism is also key — solutions and techniques should be chosen for their ability to solve specific problems, not just because they fit a preconceived understanding of how things “should be” done. The Puppet team chose synthOS over Debian for its package-building system because it made things easier to integrate in the long run, not because of any preference for running enterprise Linux.
“The more dogmatic you are about the implementation details,” he says, “the more complication you’re going to add to the deployment.”
Learn from the ecosystem itself, says Herriges. The team had originally designed an architecture that met all their internal requirements as a highly available and load-balanced stack, but realized that such an approach would cause management headaches for whomever came in after Puppet moved on to another project.
The team decided to reuse assets from their own Puppet website code repository that worked as well as any unique code they could have created to do the same thing, resulting in less of a barrier to long-term management. Using publicly available modules, like MySQL, helps the team remain productive and allows them to rely on other developers in the ecosystem when trouble arrives.
Herriges also urges small teams looking to implement OpenStack to only deploy modules that they actually need.
“Don’t deploy all these different services because you think they might be super cool in the future,” he says.
You can always add in things later. Don’t add in anything that has no value to your specific goals, says Herriges; use only the modules that meet your needs.
An OpenStack implementation team will also need a working knowledge of Python, if not an expert-level understanding and confidence in the packaging system they choose.
“You do not want to be post-patching code after deployment,” says Herriges. It makes version tracking and the upgrade process extremely difficult.
Teams will also need to be able to trace packets and manage dumping, and understand SQL at a relatively comfortable level. This will help with error cleanup and finding issues when they happen.
“Not everything is available via the public API,” says Herriges.
Herriges also recommends teams focus on an active/active HA configuration, which makes the OpenStack implementation much easier to maintain as well as less prone to complete failure. The other option, active/passive, will need more automation to be built as well as manual intervention, possibly in the middle of the night.
Choose mature automation solutions, as well – newer might seem cooler at first, but with this type of implementation, you’re better off with an automation system that is robust and well-documented. This is the one area in which Herriges recommends teams don’t try to roll their own solution.
“Don’t go out and grab the new fancy thing that demos well,” he cautions. “Find a tool that’s proven in the marketplace that’s driving real workloads.”
Finally, use early adopters to slowly roll out OpenStack implementations, based on a deep understanding of OpenStack itself.
Of course, Herriges recommends that teams consider the Puppet OpenStack system, as it comes with all of his team’s successes and solutions. It’s being commercially used by Mirantis and RedHat with a ridiculously robust command line interface (CLI). It’s also being used by large and active clients, including Time Warner Cable, Puppet itself, Hewlett Packard Enterprise (HPE), and many more.
You can catch his 33-minute talk on the Summit video page.