A large database-driven software-development project is a complex entity with many developers working simultaneously. Within it, you'd probably find a development environment, a testing environment, and a production environment. But you need a way to keep track of which version of a software component is the most up-to-date: Which version exists in development, which in testing, and which in production?
In large projects, managing the development process—keeping track of what has been tested, what has been promoted to production, and what needs rework—can be a challenge. This is especially true when the database is part of a multi-layered application. Changes made in the various layers need to be synchronized with each other to avoid breaking the application. For example, a change in a Visual Basic (VB) component might require a corresponding change to a database object. Version control, though rarely discussed, is an important subject. To see how to develop your own version-control process, let's look at a system that we used to maintain version control of database objects and synchronize changes to those objects with changes to components in other application layers.
Our project called for us to develop a database-driven Web site that used the latest (pre-.NET) Microsoft technologies: Active Server Pages (ASP), Visual Basic (VB), Microsoft Site Server, Microsoft Transaction Server (MTS), SQL Server 7.0, and Windows 2000. (SQL Server 2000 was still in the early stages of implementation within our enterprise.) We were already using Microsoft Visual SourceSafe (VSS) 6.0 for version control of VB and ASP code, so we decided to try it for managing database versions as well. VSS lets you keep track of a component's various development stages and lets you go back to any previous version you need to reference.
The way you design a version-management process depends on the specific project requirements. In our case, the requirements were to incorporate the newest technologies and gain the ability to
- regenerate from scripts the entire database, without production data, exactly as it was at any given point in the development life cycle
- apply or roll back individual changes regardless of the number and type of objects involved in the change (i.e., treat each change like an all-or-nothing transaction)
Any version-management effort starts with defining one version as the base version that you'll track all future changes against. As the base version, we chose the testing environment database at a specific point in time and gave it a version number of 1.00. The testing environment was under tighter control than the development environment (with restricted access and reporting by email of changes made), so selecting that as the base made sense. The number before the decimal gave the major version number. The number after the decimal gave the release number—referring to small changes to a major version that didn't require a change in version number. Build numbers were version numbers that we submitted for user-acceptance testing—we didn't submit all intermediate versions for this testing. The major version number changed (e.g., from 1.00 to 2.00) when either the total number of minor changes to the current major version exceeded a predetermined number, such as 10, or a major change happened, such as an addition in functionality or a significant change in the application's business logic. A minor change might be having a stored procedure return an additional column; a major change would be adding a whole set of stored procedures to do something the application didn't do before.
Before we look at the details of the database change-management process, let's examine the "ingredients" of a database that's ready to start receiving production data. In a production-ready database, you can have a database definition (the CREATE DATABASE statement); logins, users, roles, and permissions; tables, defaults, rules, user-defined data types (UDTs), triggers, and constraints; views; stored procedures; and preloaded data (for example, in certain lookup tables).
To track these items, we created some project folders in VSS, then created corresponding work folders by the same names in the test server's file system to classify and track the changes. VSS essentially has an internal database that a VSS administrator sets up. After the administrator gives permissions to other users for this database, the authorized users can create new folders for their requirements. The developer checks the required files into VSS from the working folder the first time. Whenever developers need to work on a checked-in file, they can check it out of VSS into the working folder. After working on the file, they must check it back into the appropriate project folder from the working folder. Here are the folders that we used.
DatabaseCreation. This project folder contained the creation script for a database in a SQL Server installation.
DatabaseSchema. This project folder contained the script to generate all the database's tables, defaults, rules, UDTs, logins, database users and roles, object-level permissions, triggers, and constraints. To generate this script, we used Enterprise Manager's scripting facility, selecting the options that Figure 1 shows.
We decided to use separate table, view, and stored-procedure scripts primarily because of the nature of objects involved and also based on the relative change frequency of each type of object. Note in Figure 2 that to keep production tables from being dropped, we didn't generate DROP <object> commands in the Formatting tab. Because we would be transferring changes into production, we had to be careful that the table-generation script didn't have the DROP TABLE option. Stored procedure changes were the most frequent, so we made them a separate category—both individually and as a group. When we had more than 150 stored procedures, dropping and recreating all stored procedures when just one of them had changed was a big pain, so we chose to create one file per stored procedure. However, when we were setting up a new environment, choosing to put all stored procedures in one file (as Figure 3 shows) made sense. We could run just that file to create all the stored procedures.
DatabaseSchemaChanges. This project folder contained scripts of changes made to objects in the base (1.00) version of the database schema. This folder contained ALTER TABLE commands for existing tables and CREATE TABLE commands for new tables. The ALTER TABLE statement was required because changes in schema would be promoted to the production environment, where live data exists, whenever the table structure changed due to client requests. We couldn't afford to drop and recreate production tables.
The DatabaseSchemaChanges folder also contained CREATE <object> commands for all the other types of objects we created in the base schema (e.g., defaults, indexes), with DROP <object> commands preceding the CREATE statement. These objects existed only in the metadata and any change to them was a complete replacement, not an alteration. After changes were captured as a script in this folder, recreation of the entire schema required running not only the scripts in the DatabaseCreation and DatabaseSchema folders but also the script in the DatabaseSchemaChanges folder.
DatabaseAllViews. This project folder contained one script for creating all the views in the database. To generate this script, we used Enterprise Manager's scripting facility, selecting the options as Figures 4, 5, and 6 show. Note that in the Formatting tab, you drop, then recreate views and stored procedures. You can't drop tables because they might contain live data.
DatabaseIndividualViews. This project folder contained one script file per view. This set of scripts was useful in transferring incremental changes. We created the script by selecting the same options on the General and Formatting tabs as in Figures 4 and 5 and selecting Create one file per object on the Options tab.
DatabaseAllStoredProcs. This project folder contained one script for all the stored procedures in the database. To generate this script, we again used Enterprise Manager's scripting ability. This script is useful either when you have to set up an environment from scratch or when you have a large number of changes to transfer from one environment to another. Being able to drop all stored procedures and recreate them by running just one script—rather than running a script for each stored procedure that was modified or added—is convenient.
DatabaseIndividualStoredProcs. This project folder contained one script file per stored procedure. Having one script per stored procedure is useful in transferring incremental changes from one environment to another. Handling stored procedures is much like handling views, except that instead of selecting All Views in the General tab of the Generate SQL Scripts dialog box, we selected All Stored Procedures.
DatabaseDataLoadingScripts. This project folder contained all the scripts, in the form of INSERT statements, for loading initial values into the database's lookup tables. You can generate these statements from an existing table by using database life-cycle-management tools such as Embarcadero Technologies' DBArtisan. You can maintain lookup values in an Excel spreadsheet and use Data Transformation Services (DTS) to transfer them to the database.
Besides these eight project folders, we created three others. The DatabaseDocumentation project folder contained the versions of all documentation pertaining to the database. In the DatabaseBackup project folder, we maintained a snapshot of the database, in the form of a compressed database backup file, for each milestone in the development life cycle. Finally, we moved all the scripts and files that became obsolete during the course of the project to the DatabaseObsolete project folder.
This file structure let us track the versions of every component of the database that we were interested in, along with the reasons for each change (which were annotated in the VSS file properties) for each version. The reason for a version change could be to correct a defect, add functionality, or fine-tune the system. We recorded the dates of these changes along with who made them in a separate database we discuss later.
Figure 7 shows in the VSS Explorer the project folders we created. In VSS terminology, each container of versioned objects is a project—each project can contain sub-projects in a tree-like structure. This naming system was somewhat annoying at first because we wanted to call our overall effort "the project" and the containers for different types of objects "folders" belonging to the project. However, after we got past this peculiar naming convention, VSS was straightforward and consistent to use.
The Tracking Database
To complete the change-tracking system, we needed a tracking database that would help us keep track of where these changes had been applied, by whom, and when—and where they hadn't. VSS can do this for an individual piece of code but can't keep track of, say, all changes made to a given server. We built a small Microsoft Access tracking database containing one table, named DatabaseChanges. Table 1 shows the columns we used to track the changes.
Because we were dealing with only three servers, we denormalized into three columns—\[DateAppliedToServer1\], \[DateAppliedToServer2\], and \[DateAppliedToServer3\]—the set of dates that indicated when a change was applied to each server. If you have more environments than the basic three (say, separate environments for performance testing, staging, or training) or if they're more complex than we've discussed, we recommend applying the database-design practices that Michelle Poolet describes in Solutions by Design, "Real-Life Database Design," April 2001, InstantDoc ID 19948. This article describes how to correctly model even simple databases such as our Access database.
How It Worked
Once we had the version-control system set up, we put it to work. Whenever the client requested a change, a developer would make the necessary modifications and pass the script (as generated by Enterprise Manager's scripting facility, wherever applicable) by email to the person serving as the version controller. The version controller, after reviewing the revised script for compliance with naming and coding conventions, would check it into VSS along with the necessary version number and comments. Then, the version controller would make an entry in the Access table. When the change had to be applied to more than one server, the version controller entered the change date into the appropriate \[DateAppliedToServer\] column for that server and change. Generally, the version controller first applied the changes to the test environment; only after an OK from the quality-assurance team were the changes shifted to the production environment. This process enabled us to track, among other things:
- All changes to a given object
- All changes by a particular person
- All changes made between two points in time
- Any changes applied to Server1 that hadn't been applied (either intentionally or accidentally) to Server2 or Server3
- All changes that had been incorporated into a build
- Whether the appropriate people had acted on a particular change request and whether the correction had been made to the database
The database-layer version controller had to coordinate with his or her counterpart for the other application layers (e.g., presentation layer, business logic layer) before shifting a database change into a test or production system. This coordination ensured that all related components of an application-level change request were included in the build.
This system gave us the flexibility of backtracking to any point in time or recreating the database from scratch. For example, to recreate the database, we could follow six easy steps:
- Run the script in the DatabaseCreation folder
- Run the latest version of the DatabaseSchema script
- Run the latest version of the DatabaseSchemaChanges script
- Run the latest version of the DatabaseAllViews script
- Run the latest version of the DatabaseAllStoredProcs script
- Run the latest version of the DatabaseDataLoadingScripts script
We also occasionally used the database backup, which we created through SQL Server's database backup and restore facility, to restore the database to an earlier point in time and then apply all changes from that point forward. This was basically a shortcut whenever there were too many changes to apply individually, because it restored all tables, views, and stored procedures simultaneously rather than separately as with scripts. However, because database-backup schedules rarely coincided nicely with database-development change schedules, this method proved unreliable for version management.
This version-management system evolved to meet our various needs. You can handle this part of a database-development process in many ways; you just have to work out one that best suits your situation. You might improve on our method by integrating version control across all application layers (i.e., having a coordinated version-management system); this also brings out the interrelationships among components across application layers. Alternatively, you could use tools such as Embarcadero Technologies' DBArtisan Change Manager or Quest Software's Schema Manager for finding and tracking differences between two versions of the same database or two databases at different stages of the development life cycle and synchronizing the two environments.
Corrections to this Article:
- (Correction: Figure 7 is incorrectly referenced as Figure 8 in the text and caption.)