During the past 6 months, I've guided you through the process of modeling a database. I've showed you how to identify and gather database requirements and how to create data flow, entity, and logical models of your database as well as the physical design. This month, I show you how to implement the physical design to achieve a working database.
The physical implementation of a database is different from the physical model I showed you in "The Physical Design," August 2000. The physical model describes the database in a specific working environment that includes a specific database product, a specific hardware and network configuration, and a specific level of data update and retrieval activity. The physical implementation makes this specification real. The implemented database contains objects (e.g., tables, views, indexes) that correspond to the objects in your physical model.
Success in Seven Steps
I generally organize the physical implementation into the following seven steps, which you can use as a checklist to make sure you've performed all the tasks necessary to implement the physical design.
Step 1. Select a server. The Data Use Analysis and Data Volume Analysis models, which I explained last month, define the guidelines for choosing a server with adequate CPU power and enough hard disk capacity to see you through the first few years of operation. The Data Use Analysis model lets you visually map the important processes that run on a database. You then calculate average and maximum read and write operations. From this analysis, you can see the kind of processing power you need, then translate that into the CPU model and the RAM you need for prime performance. The Data Volume Analysis model, which you can view at /Files/09 /9006/Figure_02.gif, is a modified entity relationship diagram (ERD) that helps you visualize and calculate the amount of hard disk storage that the database needs.
Step 2. Create a database. You can use the Data Volume Analysis model again to guide you in sizing the user data and transaction log file. This model gives you a rough idea of space requirements, which you can translate into initial database file sizes.
Step 3. Create the database objects. You have two options for creating the tables, indexes, views, constraints, stored procedures, and triggers that make up an operational database. First, you can use the physical data model to guide you in writing SQL scripts that you can later execute, or you can create the objects directly by using Enterprise Manager's graphical and programming interface. Second, if you've used CASE software such as Visio 2000 to help with the modeling, you can let the CASE software generate the scripts for you. Figure 1 shows the opening screen of Visio 2000's Generate Wizard, which lets you generate a text file of the Data Definition Language (DDL) script or a new database or store the current database image in the model. If you plan to implement your database on a platform such as Microsoft Access, your CASE software might be able to directly create these objects and the entire Access database for you.
Step 4. Load the data. How should you approach loading data into the database? The answer depends on where the data is coming from (the source) and how much data you need to load. If you don't have any data to load when you first create the database (a highly unusual situation), you need to concentrate only on the data-capture schemes you plan to implement, such as data entry forms or automated capture programs like those used in monitored environments such as manufacturing sites and hospital intensive care units. Most likely, you'll have to import data from comma-delimited flat files or transfer data from other systems into your database. If you plan to import delimited files, the bulk copy program (bcp) might be your best option. Bcp creates minimal overhead and can quickly load data because it doesn't generate a transaction log, so you don't have to worry about transaction rollbacks, index updating, or constraint checking. But if you need to import or transform (reorganize, restructure) data from other database or nondatabase systems, you should use SQL Server's Data Transformation Services. DTS also lets you load data quickly—as fast as bcp—if no transformation is involved. In addition, DTS lets you save your work and use the same DTS framework later for more complex data loading and transformation. Of course, you can also use a third-party product for data manipulation and loading. The best advice for data loading is to use the software tool that first, you're most familiar with; second, loads data the fastest; and third, optimizes available resources (e.g., CPU, memory, bandwidth, hard disk).
Step 5. Create and run security scripts. Creating security scripts is, unfortunately, a task that you have to perform manually. You can use the security matrix from "The Security Matrix," March 2000, as a guide to building your SQL Server security scripts. You can also set up security through Enterprise Manager, then have Enterprise Manager generate the scripts.
Step 6. Create backup and other utility tasks. Now that you've created your database and loaded some data, you need to implement your disaster-avoidance plan. Last month, I talked about planning backup and recovery schemes and evaluating RAID and failover techniques. This month, you bring those plans to life. You can use the SQL Server 7.0 Database Maintenance Plan Wizard to help set up scheduled backups for all user and system databases. You can also use the Maintenance Plan wizard to help set up tasks to reorganize data and index pages, update statistics, recover unused space from database files, check for database integrity, and generate reports about utility job activity.
Step 7. Perform miscellaneous tasks. If you aren't implementing replication, you can take the evening off. However, if you're going to use replication, you need to decide which type of replication to implement: snapshot, transactional, or merge. (For more information, see Ted Daley and Bob Pfeiff, "Merge Zone," November 1999.) Snapshot replication copies data to the target database based on a regular interval that you set up. Transactional replication copies data as it changes on the source database. And merge replication lets target databases update local data, then roll up changed data to the source database. Which is best for your particular environment? Snapshot replication is best when you have relatively static data sets that need refreshed only occasionally or for road warriors who connect only periodically to the central database with their laptops and download copies of data files. Transactional replication is best when you need near-realtime changes to the target database and you can set up a persistent connection between the source and the target database. Merge replication is best when you have, say, a roving sales force that changes data from the road or when you've created horizontally partitioned files, perhaps by geography, where East Coast customers, for example, reside on the New York replicate, West Coast customers reside on the San Diego replicate, and the master customer list resides at company headquarters in Denver.
From Beginning to End
Congratulations! You've survived the past 6 months of transforming an idea for a database into reality. Just remember that no two database projects are the same. Each has its own challenges and opportunities, which you have to address individually. To recreate the database I used in this data-modeling series, go to http:// www.sqlmag.com, enter InstantDoc ID 9146, and download the Visio-generated code I used to build the sample Publications database. You can also download a fully functioning sample physical model.