Ancestry's DevOps Strategy to Control Its CI/CD Pipeline

In its quest to develop and deploy software updates more smoothly, genealogy company Ancestry found that emulating startups was exactly the right fit for its operations.

The IT team needed to further evolve Ancestry’s approach to continuous integration/continuous delivery of software for its website where millions of DNA test kits and billions of records are processed and cross-referenced.

Kenneth Angell, software architect with Ancestry, spoke to InformationWeek about the DevOps strategy his company implemented through the Harness software delivery platform. He says using Harness also helped solve governance matters with the many different stakeholders within Ancestry, from operations to information security and quality assurance, to make deployment consistent.

What approach had your team taken to software development in the past and how did you improve the processes?

We had this DevOps culture of, “You own the code, so you own everything about deploying the code.” It was very much kind of like a startup mentality in terms of how we dealt with teams and DevOps. We had a large, centralized team that handled operations before that. As part of our technological transformation, we went from this large centralized operations team, where you throw your code over the wall and let them deploy it, to “You own your deploys.”

In that process, we ended up basically not giving teams a whole lot of direction. Jenkins is a good solution -- stand up your own Jenkins server and start deploying stuff. Operational support was minimal. We’ll get you the rules that you’ll need but the process is up to you.

Teams started to share best practices; some teams would adopt other team’s best practices but in that kind of ecosystem there’s a lot of divergent paths you can take in how you deploy your code. That’s exactly what happened to us. We had a very fragmented ecosystem of processes. We started to have a lot of issues with that, which in turn led us to start to create policies but the policies weren’t very enforceable because we didn’t have any insight into how they were being applied in each team’s ecosystem.

What is the scope and pace of development that you are trying to achieve?

The number of teams that we have has continued to grow. I think we’re in the neighborhood of 70 to 80 teams that are deploying code. We’ve got teams all across the world now. We’re dealing with probably around 200 to 300 deployments a day. That to me is 200 to 300 opportunities for failure, for a customer problem to pop up. At that scale, the probabilities increase dramatically. If there’s only a 5% chance of failure on any given deployment but you’re doing several hundred of them a day, the chances of failure approach 100%.

We weren’t seeing quite that level of problem, but we were definitely starting to see a lot of problems pop up during deployments. We were tracking how many of these deploy-caused outages we had across the site. For any given quarter, we were looking at several hundred minutes of downtime for different parts of the site.

Once we actually converted all of our stacks to a standardized deployment process, we went from several hundred minutes of code deploy-related outages to the first quarter after we centralized all of it having zero minutes of deploy-related outages. That was a huge win for us.

What goes on under the hood at Ancestry’s website?

Our website is probably more complex than most websites out there. We deal with family history, search, user content, and communication features. We’ve got teams working on all these different aspects of the customer experience. In order to deliver features at that scale, we really need the teams to be able to move independently and be able to deliver on all these different areas of our website for the millions of customers we have visiting our website every single day.

We’ve created a culture where teams are accountable for their responsibilities. To enable them to deliver on those customer experiences, we’ve made it so that they can deploy their code independently. We have different parts of the website updating throughout the day, depending on the release cycle that particular team is in.

Every team has the button they can push themselves -- they can deploy independently. That really speeds teams up in terms of being able to deliver on their timeframes rather than trying to coordinate rollouts.

We used to do that. Back 10 years ago, we used to try to coordinate a rollout. Everybody would get on a call, then watch to make sure everything looked good. Teams didn’t like that much because we did it around midnight; it was a very cumbersome process. We’ve come a long way since then. Having systems that are independently deployable really makes a lot of sense when you’re trying to deliver features to the customer quickly.

Were there options or services that might make sense in the startup space that had to be changed to make it the scale your team deals with?

One of the huge benefits of Harness is the ability to scale DevOps. For every hour of effort that my team puts into developing a DevOps-related feature, whether its code quality checks or post-deployment automated verification of a service or CDN asset deployment, I get a 50 to 300 times return on hours the teams don’t need to put in to get the value of that feature. That’s helped us scale tremendously because now I have a laundry list of features that I can just decide which features are going to give us the most value in terms of DevOps. The teams don’t need to give any effort in terms of adopting those features because I can roll those out with Harness. Everybody gets the benefit of those features all at once.

Were there any other lessons learned along the way?

We started with this naïve approach that we were going to create a simple pipeline that everybody is going to adopt. When we started the actual adoption effort, the scale of differences between teams was so much more than what we expected. We’re still dealing with that in some regards because we focused on migrating the apps that were the most consistent first. We’ve got a bit of a longtail that we’re working on -- those are more in our data science areas where there was more autonomy. The more autonomy a team had, the more varied their processes were. That was a huge eye-opener for us. At this scale, the longer you take to rein in those differences, the more there will be.

View Original Article

Comments

Plain text