Seeding the future of Extended Maintenance in OpenStack

I started working on OpenStack over six years ago during the Grizzly release. When I started, my primary responsibility was running a continuous integration and RPM build system for a distribution product. I was mostly involved in the Quality Assurance (QA) program because we used Tempest.

Due to corporate software product policy, we supported several years of OpenStack releases, so I was also heavily involved in stable branch maintenance because we had an “upstream first” policy. Since then, I’ve been the first stable maintenance team Project Team Lead (PTL) and was also the Nova PTL for four releases. As long as I’ve worked on OpenStack, I have heard a need for longer support cycles for upstream releases.

“Since upgrades can be difficult, people upgrade less often, which leads to falling behind, which leads to internal patch debt that in turn causes more upgrade pain later…”

Over the years, there have been several lengthy discussions about extending the upstream support for stable branches, particularly around the time we would end of life (EOL) a branch. The primary reason for requesting that upstream support lasts longer has to due with the difficulty in upgrading OpenStack. For the most part, upgrades are not trivial and “war stories” are a favorite topic during events like the Summit. There are lots of projects in OpenStack with varying levels of maturity when it comes to ease of upgrade. All of those projects have lots of changes which can also mean lots of release notes to digest so needless to say it is easy to miss something, assuming the developers even wrote a release note or documentation. Since releases come every six months and historically there has been an 18-month lifespan for a stable branch, there’s increased pressure on operators to try and keep up with the upstream releases so they are still within some support window, meanwhile trying to run and scale out a production cloud to support their business. Since upgrades can be difficult, people upgrade less often, which leads to falling behind, which leads to internal patch debt that in turn causes more upgrade pain later because of merge conflict resolution when synchronizing with the community code, and the vicious cycle repeats itself.

The historical stable branch policy was also uncertain at times for what was deemed an appropriate fix based on the age of the branch. This is because the risk tolerance goes down as the branch age goes up. In other words, there is high risk in backporting changes to the oldest branch since it is next to be EOL, so if a change is backported which causes a regression and then the branch is EOL there is no way upstream to fix that regression on that branch. This is why the oldest supported branch was restricted, for the most part, to only critical and security fixes.

Looking back, there have been a few milestones in the conversation around longer upstream release support. I led a session at the Newton summit in Austin which was an early attempt at extending the lifespan of the oldest supported branch upstream. At that time, there wasn’t enough critical mass to make the proposed changes. We were just starting to turn a corner on stabilizing the daily maintenance of the stable branches due to innovations like the upper-constraints system for requirements management, so the proposal was maybe too early.

Fast forward to a session at the Queens summit in Sydney about 18 months later and attitudes had changed dramatically. Between the time of the Newton session and the Queens session, there had been several discussions in the mailing list and in-person events around Long-Term Support (LTS) branches modeled on how Ubuntu LTS works.

The LTS proposal in OpenStack is quite difficult because it requires changes to governance, infrastructure tooling, stable branch policy, release models, marketing — not to mention several OpenStack vendor distributions have their own types of LTS models, leading to questions about whether or not those needed to align. The compromise made at the Queens session about LTS was that upstream would turn over support for the oldest branch to a new team rather than EOL the branch, but with limited upstream support for testing. Unfortunately, the failure of the LTS movement always had to do with people, in that there were not enough people united to make it happen, or even a single champion to push the idea forward.

I was not thrilled at the prospect of continuing to talk ad nauseum at every in-person event about this topic. Based on the experience with Newton EOL, I realized we had another option which eventually became the Extended Maintenance resolution.

Another session was held at the Project Team Gathering (PTG) for the Rocky release in Dublin. A funny thing happened between the time of the Queens summit and the Rocky PTG. The oldest branch at the time, Newton, was scheduled to be EOL on October 25, 2017. However, the Nova team, of which I was PTL at the time, was working on a security fix which applied to Newton and we wanted to get that fix released before EOL. Without going into details, Newton was not EOL until February 1, 2018, over three months past the scheduled EOL date, but this was not due to an inability to land fixes on the stable/newton branch. This made it clear, at least to me, that upstream teams could support the oldest branch with minimal effort.

The morning after the PTG session in Dublin, I was waiting in the hotel lobby to meet some people for breakfast when a conversation started about planning the next LTS session at the next Summit in Vancouver. To put it lightly, I was not thrilled at the prospect of continuing to talk ad nauseum at every in-person event about this topic. Based on the experience with Newton EOL, I realized we had another option which eventually became the Extended Maintenance resolution.

To summarize the Extended Maintenance (EM) resolution, it’s essentially (1) no more EOL until fixes cannot merge on the branch, (2) no releases from EM branches, (3) the same stable branch core teams, (4) the same testing as long as reasonably possible, and (5) the same standard for appropriate fixes. The stable branch maintenance guidelines have been updated to reflect the resolution.

The resolution benefits several groups of stakeholders:

Developers now have less risk in backporting fixes to older branches since the risk of introducing a regression on a soon to be EOL branch is severely reduced. They will have more time to fix issues once deployments finally upgrade. They will also get continued upgrade testing on older branches.

Vendors will get a common process and code repository for fixes. The obvious need for a common code repository for fixes can be seen in the invention of the driverfixes branch employed by the Cinder team, which is a branch that extends beyond normal EOL and serves as a place for block storage driver vendors to put their fixes.

Operators will get some breathing room to upgrade since there will be less pressure to always be on a supported upstream release, although the Fast-Forward Upgrade (FFU) effort for operators will still be critical to easing the upgrade pain; EM will not be a panacea since there is no getting around the fact that some fixes simply cannot be backported to very old branches. Operators will also benefit from a common process and code repository like vendors so they do not have to carry (as many) internal patches. There should also be improved communication between operators and developers when they do upgrade and find issues which are still a problem on the master branch. For me personally, I’m more motivated to prioritize working on a fix when I know someone just hit it and is willing to help debug and verify the fix.

Get involved

Here’s where to get started:

A Forum session is planned about EM at the Rocky Summit in Vancouver.
Check out this session from the Austin Summit: “Learn what it means to maintain stable branches upstream.”
If you have questions, ask them on the openstack-dev mailing list and/or #openstack-stable freenode IRC channel.
Propose and review backports to stable branches.
If you’re a manager, send your people to in-person events to (1) gain visibility and (2) be part of the discussion; if you don’t speak up and be part of the process, decisions will be made for you.

I hope to see you in Vancouver.

About the author
Matt Riedemann works for Huawei in the cloud unit. He’s been involved with OpenStack for over six years primarily doing upstream development. He was the Nova PTL for the Newton, Ocata, Pike and Queens releases and was the first stable branch maintenance team PTL. He continues to be active on the Nova and stable maintenance core teams. You can find him on freenode IRC as mriedem.

Tags: Extended Maintenance, Long-Term Support