Building an Automated Database Deployment Pipeline

Data is at the core of most businesses. At the core of data storage and retrieval, you'll often find some type of database. While a lot of work has been done in recent years supporting automation in application deployments, deploying databases is largely a manual affair. In today's environment, stepping out of the old-school approach of opening a series of scripts, reviewing them, and running them by hand, has become necessary to begin to move database deployment to a more modern footing.

Because of the centrality of data and databases to modern business, databases themselves lie at the heart of Application Lifecycle Management (ALM). ALM is broken into three basic processes:

Governance
Development
Operations

Within each of these basic processes, you're going to find aspects of data and databases. From the requirements, auditing, and security of governance, to the design, building, and testing of development, on through to deployment and monitoring within operations, databases play a major factor. Yet, data and databases are historically ignored or given a low priority within the processes of ALM and DevOps. This situation needs to change and the primary point of the changes is within the database professional part of the team.

Historical Friction

It's a long running joke that database administrators (DBAs) and other data professionals don't get along with developers. And the reverse side of the coin is true as well, developers have a long history of friction with the operations side of the business. This friction between the two factions can even lead to outright conflict at times. The core problems are caused by the natural inclinations of each of the disciplines that puts them at odds with the other.

Developers and Speed

The focus of development and developers is to get as much functional code out the door as possible, and as fast as possible. They want to go fast. Most of the various mechanisms and methodologies built in support of development over the last 10 to 15 years reflect this speed focus. Take the iterative processes such as Agile or Scrum—they are designed to smooth the development process so that well-tested, high-quality code, flows from the development team as fast as possible.

This focus on speed makes complete sense. Development teams, working closely with the business, are responsible for quickly changing direction in support of the business in order to deliver what's needed now, not what was needed last week. Because of this, anything that slows down development becomes a problem.

Operations and Protection

At the same time, operations personnel have traditionally been focused on the production servers. More specifically, they're focused on protection of those production servers. This is also very much in keeping with the needs of the business and the requirements placed on operations. Businesses expect to be able to get their data any time they need it. Further, they expect it to be accurate. Because of these business needs, operations and DBAs have built mechanisms around data management, monitoring, and data integrity, as well as high availability and a whole slew of other technologies. But, the focus here is absolutely not on speed. The opposite is true. If anything, the focus is on the elimination of speed, or indeed, any movement at all. The most stable system is the one with the least number of changes.

These two views of what is needed by the business, both correct in isolation, are what creates friction between developers and operations. Databases cause even more issues because of their nature.

Databases as a Bottleneck

Even if the DBA and operations team were inclined to move at the speed of development, databases tend to act as a bottleneck—a choke point—to slow down development. This bottleneck comes from several sources, all of which are difficult to overcome. First, the languages used within database environments, are not the same as the languages that most developers use to put their code together. Learning and properly applying T-SQL, MDX, or X-Query—even if you were to use an Object-Relational Mapping (ORM) tool—causes some slow down on development and deployment. ORM tools in particular suffer from 1.1 releases because they're oriented to work like code works, dropping the things no longer in use in order to replace them.

Databases second bottleneck is the fact that the data can't be dropped, which also slows down deployment. It would be a very simple matter to make databases deploy exactly like code, dropping the existing database and replacing it with a new one. However, that means the loss of all the data. Businesses, especially modern businesses, become very attached to their data. In fact, for most organizations, the data far outlives the application that initially collected it.

Finally, because of the nature of the job, getting called by servers at 3 a.m. when there's a problem, DBAs and other operations people can be quite paranoid about the systems under their care. This, again, is not a bad response to the responsibilities DBAs have, but it does tend to act as a brake on the speed of development and deployment.

The natural frictions between the need for speed within development and the requirements of protection within operations can cause things to slow down. When you take into account the unique problems of databases, especially the need for data retention, the process of deployment looks like an incredibly difficult task, even before you start discussing automation. Yet, we need to define mechanisms that will enable us to deploy software faster while still retaining the protection necessary for our production systems. That's where continuous delivery comes into play.

Continuous Delivery

There are three terms used a lot within the industry that need to be clearly defined in support of automating a database deployment pipeline:

Continuous Integration (CI)
Continuous Delivery (CD)
Continuous Deployment

Figure 1, below, illustrates how these terms relate to your environments and your deployment pipeline. Continuous Integration (CI) is a part of the development process, leading into testing, and is not meant as a mechanism for deployments beyond development. Instead, CI is meant as a mechanism to quickly test and validate code as it gets developed. This helps to ensure the code that gets released is of higher quality. But, let me repeat, CI is not meant as a mechanism for deployment to production.

Next comes Continuous Delivery (CD). This is a mechanism of constantly building your code in an automated fashion in preparation for deployment to production, but not necessarily to bring it there. But again, CD is not meant to go through the production environment. There should be an approval gate where you can ensure that what is getting deployed is accurate. This gate should not be the old-school approach of a line-by-line review of code, but could be.

Finally, Continuous Deployment automates deployments through the production environment. This may be a useful method for some systems, but it carries with it an inherent risk to production, so I'm going to stop with CD within this article.

A quick primer on database continuous deliver — Figure 1: Automating a deployment pipeline with continuous delivery

The goal of continuous delivery is to get faster feedback on changes made in order to create a repeatable process that can successfully deploy to production as needed. To do this, you need to continuously integrate your team changes. You'll have to find a way to automate as much testing as possible, and you're going to rehearse production deployments in some type of testing environment before deploying to production.

To arrive at this point, you'll need to go through four stages. First, you'll need to get your database into source control alongside your application. There are a number of tools available today that make this very simple. Second, once you're in source control, you can begin using a continuous integration server to automate your change testing. Again, there are a number of tools that make this very simple. Third, once you have CI running, you'll need to start building your test library. This is probably the hardest part of the process, as well as the most time consuming. Approach it as you would eating an elephant, one bite at a time. Fourth, you can start building automated deployments.

The point is, you need to build out a mechanism that lets you practice deployments over and over before you ever get anywhere near your production environment. In this way, you can be sure that the script you're going to run on production is good. Your approval gate then becomes as simple as confirming that a script passed all the tests you've run against it.

Changes to the Team

In order to support these approaches, a number of changes have to be made to the teams in your organization. You're going to have to take a DevOps approach of mixing the job responsibilities more. You need to empower your development team by building an environment that allows them to work at speed within development. You'll need to take part in the development process. Instead of trying to stop bad deployments, stop bad development. You'll need to automate all your processes.

This means that DBAs must work with developers. It's acknowledged that you must protect the data for the business. But, that has to be tempered with helping delivery functionality from the development team.

Conversely, the developers have to work with the DBAs. It's understood that the development team may have more knowledge of business requirements, but rather than isolate the DBA team from that information, educate them so that they can better support you.

Finally, and most importantly, you must have management buy-in to this process. Without that, the process will fail. Management needs to agree that you're going to empower the teams to change what needs changing in order to arrive at a repeatable, automatable process. The biggest impacts to this will be around project management and design philosophies. Deployment has to become a part of development. It must be taken into account from the start. This means that releases of the code past the 1.0 release must be accounted for. Building software that supports release 1.1 and release 2.0, as processes, is not premature optimization. And finally, data retention has to be taken into account as part of development and all your automation processes. At no point can you just say "Well, that's someone else's problem."

Conclusion

The entire process of building an automated database deployment pipeline has to be focused around eliminating or mitigating the natural friction that occurs between development and operations. You'll have to make changes to the philosophy of development and deployment and operations within the organization. You'll have to adopt new practices and implement a continuous integration approach, along with source control. But these changes to the workplace can result in a higher quality product being released into production faster while still retaining necessary protection of the data there.

***************************************

Grant Fritchey is a Microsoft SQL Server MVP , a 2014 Dunn & Bradstreet MVP, and was recently elected a member of the PASS Board of Directors for 2015. He has 20-plus years of experience in IT spent in technical support, development, and database administration. Fritchey works as a consultant and for Red Gate Software as a Product Evangelist.

Fritchey has written articles for SQL Server Central and Simple-Talk, and is author or co-author of several books, including SQL Server Execution Plans, SQL Server 2012 Query Performance Tuning, SQL Server MVP Deep Dives 2, Beginning SQL Server 2012 Administration, and Pro SQL Server 2012 Practices. He is one of the founding officers of the Southern New England SQL Server Users Group and presents at conferences and user groups around the world.

Comments

Plain text