When Do You Use Web Services?
Let's say that your company uses a supply-chain application that stores your customers' orders in a SQL Server database and keeps track of each order's status. Currently, when customers want to know which of their orders are pending, they contact your customer-service representative, who queries the database for that information. Customers then update their ordering systems. But suppose a customer wants to streamline the process by using an application to request order status directly from your system. To enable this type of access to your system, you and the customer need to agree on the interface the customer will use to make the request and the format in which you will return the requested data.
Related: 38 Extra Development Tips
This scenario is an ideal application for Web services because you can use SOAP to build a single standards-based interface that works for many different customers with varying needs, regardless of the software applications and computing platform their enterprises use. Additionally, SOAP lets you build a loosely coupled interface that incorporates XML as the data format. (A loosely coupled application lets you reconfigure, redeploy, or relocate the implementation without affecting dependent applications.) By using XML, you gain extensibility that lets you expand the scope of the data you can provide to your customers in the future. Simply put, supplying a Web service lets you leverage the full value of XML's standard format, extensibility, and platform independence.
Use error handling in your applications to prevent errors such as reaching the end or beginning of the recordset. You can use the EOF or BOF properties of the recordset to detect this error. For example, to check for the end of recordset when the user clicks the button, you can change the cmdNext_Click event to the following code:
If Not envNorthWind.rsCustomers.EOF\] Then envNorthWind.rsCustomers.MoveNext End If
For a better approach, you can detect the end or beginning of the recordset and disable any navigation controls so that users can't click them. This method provides better feedback than letting the users click the button, then find out that they're at the end of the recordset.
Use Member Properties to Store Information About Dimension Members
Storing additional dimension-member information is the most basic use of member properties. If you've developed your own front-end application, you can display this associated information in fly-over hints (e.g., a tool tip) or in a property panel that appears whenever a user selects a dimension member. For example, if you have an OLAP cube that contains your product information, you could use member properties to store and display such product information as the product manager's name, the original product-release date, the current version number, the associated product family, and even a longer product description.
Setting Up English Query
English Query works best with normalized databases; however, your circumstances might mandate a structure that isn't fully normalized. In this case, you can use views to solve problems that non-normalized databases cause. The English Query domain editor doesn't automatically import views. To add a view as an entity, select Table from the Insert menu and enter the name of the view. The English Query Help file provides examples of how to use views with non-normalized data.
Another tip is to define a primary key for each table in your English Query application. English Query requires primary keys to perform joins between tables to satisfy user requests. If you haven't defined keys in your database, you need to define them in the domain editor. English Query can't build your application correctly without primary keys.
When you develop English Query applications, remember that case is significant. For example, English Query knows that you're asking about a proper noun because you capitalize the words in the query. Finally, if you're running Internet Information Server (IIS) 4.0 with the Windows Scripting Host (WSH), the fastest way to build and deploy a test application is to run the setupasp.vbs macro from C:\programfiles\microsoftenglishquery\samples\asp2. This macro automatically installs and configures your data, so you can start testing immediately.
One of ADO's best assets as a great database interface is its flexibility. However, ADO's flexibility gives you more ways to shoot yourself in the foot while you're writing applications. Here's a list of the seven most common errors that I see in ADO coding.
Sin 7: Ignoring the native OLE DB Provider. You can use ADO with both the native OLE DB providers and with the OLE DB Provider for ODBC, but this choice makes continuing to use the old ODBC Data Source Names (DSNs) and drivers a bit too easy. The native OLE DB Provider for SQL Server is more efficient than these old options and provides access to features such as the SQL Server IRowsetFast interface and LinkedServer rowset.
Sin 6: Using Recordsets instead of stored procedures. Although ADO lets you easily perform queries and updates by using Recordsets, SQL Server stored procedures provide a big performance advantage because stored procedures cache execution plans. Stored procedures also can reduce network traffic by executing a batch of SQL statements in one invocation.
Sin 5: Letting ADO determine stored procedure parameters. ADO can dynamically discover the parameters of a stored procedure by using the Refresh method of the Parameters collection. This capability can easily tempt you into not explicitly creating a Parameters object in your code. This feature is useful during development, but avoid it in your production code because it will cause unnecessary trips to the server.
Sin 4: Using Recordset objects for all updates. Although using a Recordset object to update the database is easy and convenient, this method requires an open cursor and therefore has relatively high resource requirements. Updating the database by using direct SQL statements—or, better yet, stored procedures—is more efficient, especially for multiple insert, update, and delete operations.
Sin 3: Using independent connections for the Command and Recordset objects.2> Another easily misused feature of ADO is the ability of the Recordset and Command objects to create their own Connection objects. With typical network applications, repeatedly connecting and disconnecting is much less efficient than reusing an existing Connection object. The primary exception to this rule is during Web development, when you often want to isolate the connections for each Web page.
Sin 2: Using resource-intensive cursors when you don't need them. ADO supports four types of cursors: forward-only, static, keyset, and dynamic. Of these, the forward- only cursor (aka the firehose cursor) has the least overhead and best performance. You'll often be tempted to use more resource-intensive cursors, such as the keyset or dynamic, to get basic scrolling capability, but you don't need to.
Sin 1: Using the default ADO Recordset cache size. ADO's Recordset object Cache-Size property controls how many rows SQL Server retrieves when ADO issues an sp_cursorfetch against a server-side cursor. Using the CacheSize property default value of 1 can be inefficient and might result in many round-trips to the server.
The Reusable Cursor
I frequently want to perform operations such as refreshing views, checking table data, creating SQL statements, rebuilding indexes, or changing the table structure for many objects at once. To perform such bulk operations, I keep an SQL cursor on hand as a template and change it as necessary.
The script in Listing 1 opens a cursor on the table specified in the SELECT statement—in this case, the sysobjects table—then loops through each record in the cursor and assigns the contents of the current name field to the @ObjectName variable. The script uses the @ObjectName variable to build an EXEC statement to perform the desired operation. This example refreshes all views that db_owner owns.
By changing the SELECT statement in the cursor, you can perform operations against any database objects you want, use a WHERE clause to restrict the result set, and even add an ORDER BY clause. You can also edit the cursor's EXEC portion to perform a wide range of operations against the objects the SELECT statement returns. For example, Listing 2 shows how you can change the cursor to add a Timestamp field to each table in the database. If you wanted to add the field to only 30 tables, you would simply change the SELECT statement to return only the tables you want to work with. And by replacing EXEC with PRINT, you can easily generate a T-SQL script, which you can save and execute later. With a little imagination, you can use this cursor technique to perform numerous time-saving operations.
The Fine Points of DSNs
User and System Data Source Names (DSNs) reside on the machines you create them on. A User DSN is for one user only; its settings follow the same storage route for that user as persistent settings for Microsoft Word or any other Windows application. Visual InterDev stores System DSNs on the system and by system, not by user. Thus, any user on the system can reuse any System DSN.
System DSNs have another advantage. For example, let's say you have a development server, a local system, and a production server. Furthermore, the development server and the production server each have a SQL Server database that the Web application you're building uses. The development server is running a test copy of the database while the production server runs the live copy.
How do you set up your system so you can develop and test your application on your local system and development system, then deploy it to the production server without changing any code for the data connection? The secret to this juggling act is to create a System DSN on each system with the same name. Point the System DSNs on both the development server and the local workstation to the development server database. Point the System DSN on the production server to the production database. All three DSNs must have the same name. In your Active Server Pages (ASP) application, point the application's data connection to the System DSN name. Now you can move the application to the local workstation, the development server, or the production server without changing any connection information. You can use test data to work on the application and use production data to deploy the application.
SQL Tips for Developers
For more than a decade, I've been focused on data-access interfaces, so my tips address developers in that narrow area of functionality. Many developers need to be reminded of these things from time to time to keep their applications safe and running efficiently.
- DBAs should never give away the sa password, so don't try to talk them into giving it away—and don't look for it written in grease pencil on the DBA's monitor.
- Good DBAs lock down the database and don't grant anyone access to the base tables. So you developers will have to ask the DBA to create the tables and the stored procedures to access them.
- Don't develop with the sa account once you figure out the password, and never, never use a blank password—just because you see sa/no password in old examples doesn't make it right.
- Don't build your queries on the fly with the and (&) operator; this opens the door to SQL injection attacks. Create parameter-based queries—or better yet, use the stored procedures created specifically for your application. Never accept values from users or other untrustworthy sources. A user doesn't have to be malicious to enter something destructive.
- Plan on creating stored procedures to execute all your queries. The DBA might insist on this anyway, but for better performance and far better security, build your design around focused, protected stored procedures. Focused stored procedures that have fewer parameters are easier for SQL Server to compile and create efficient query plans that are valid for a wider variety of parameter values.
- Don't return more rows than your user can deal with right away. Don't fetch more columns than you need, and avoid SELECT *. If you specify the column names, you know precisely which columns SQL Server will return and the order of those columns. Be sure to add aliases to the columns so that you can more easily use bound controls that name the columns.
- Avoid client-side update processing when you can perform the same tasks in server-side procedures. This eliminates the transport costs of getting the data to and from the client.
- Don't use ADO or ADO.NET to do bulk operations—data-access query interfaces make terrible bulk-copy programs. Figure out how to use Data Transformation Services (DTS) or bulk copy program (BCP). Using SQL-DMO to drive a BCP copy operation will give you far better performance than any sexy ADO technique you devise.
- Don't store binary large objects (BLOBs) in the database. Keep the path to the BLOB and its attributes, MIME type, and other critical information—but not the text or image—in the database. Storing and retrieving BLOBs takes far longer than saving and fetching the same data from a file. And BLOBs flush the data cache, throwing out other useful procedures and data.
- Consider that data-access interfaces (e.g., DB-Library, DAO, RDO, ODBC, OLE DB, ADO, and ADO.NET) all wait at the same speed. You can't solve performance problems by asking the question faster. All too often, performance is dragged down by the nature of a query or the way SQL Server is programmed to deal with the query. Proper indexing can solve more problems than choosing a DataReader instead of a DataAdapter.
Profiles in Performance
Many people overlook SQL Server Profiler in SQL Server 2000 and 7.0 as a Web development tool. Because one of Profiler's functions is to monitor and report the various performance levels of your SQL Server database, you can use it to look at how your Web applications interact with the server. Furthermore, Profiler can help you analyze your code's efficiency. For example, you can use Profiler to determine whether your Web application would perform better if you used Query Analyzer to optimize the code.
To generate performance numbers for Profiler to use in analyzing your Web application's communications with SQL Server, start by testing a few pages. Choose a testing tool that can place a real-world load on your application, then run Profiler to watch this load. You can download the latest Web Application Stress Tool, for example, from Microsoft's Downloads page. Microsoft Application Center also includes an updated version of this tool.
When you run traces with Profiler, save the results under easily identifiable names. After you have conducted several tests, you can then return and compare the results at each step. You need to launch a new trace after each run and save the trace every time. You can also save traces and replay the load later to simulate the real load on the database. Keep in mind that a load you replay with Profiler or Query Analyzer only simulates a load, which is different from applying the load through your Web application.
Use ADO to Enforce Strong Typing
Strong data typing is a valuable weapon in the fight against SQL injection attacks. Using the ADO Command object can help you ensure not only that you're passing the right data but that SQL Server can tell which part of a query is SQL and which is data. Let's look at two ways you might invoke a stored procedure. The following procedure is a call directly from an ASP page:
<% Set Conn = Server.CreateObject ("ADODB.Connection") Conn.open application ("connection_string") Set RS = Conn.Execute("exec sp_checkloginrights " & param1 & "," & param2 ) %>
This code is a poor example of invoking a stored procedure from Visual Basic (VB). The code uses a "string-building" technique that lends itself to mischief because of poor input validation.
The second call, which Listing 3 shows, is a sample of VB code that you can use to invoke the same stored procedure from a COM DLL. How does the second procedure call provide an added level of security? The answer lies in the fact that you're no longer "string building" as you are in the earlier ASP code. In string-building SQL calls, you construct a text string and pass it to SQL Server in one call. In the COM DLL example, however, you use the Command object to enforce your data types and let ADO build the string for you. The primary advantage of the VB-COM approach is that it catches any mistakes caused by front-end programmers not properly validating input.
Note that you need to handle errors in a way that doesn't give a potential attacker any clues about your table structure. Any database-related errors should return a generic error message and log the details for developers to analyze. Displaying ADO error messages gives the end user too much information and could be dangerous in the wrong hands. Also, never assume the people on the other tier are doing good validation: They're probably assuming the same thing about you. If you apply good validation at every tier, a neglected piece of code is less likely to expose your entire application.
Using the SqlCommand Object to Import BLOBs
SQL Server Magazine technical editor Bob Pfeiff of Microsoft suggested an alternative method for using the DataSet object to import binary large objects (BLOBs)—you can use the ADO.NET Command object with a stored procedure. This method doesn't require that you use a DataSet on the client, but it does require a preexisting stored procedure on the SQL Server system. When you're using stored procedure parameters to access BLOBs, remember that the BLOB parameter must always be the last parameter in the call to the stored procedure.
The following code shows a simple stored procedure named p_insertimage that accepts one image data type parameter. The code inserts the parameter into the BLOBTable table. (You can find the schema for this table at InstantDoc ID 39867.)
CREATE PROCEDURE dbo.p_insertimage ( @image image) AS SET NOCOUNT ON INSERT INTO BLOBTable (blob_object) VALUES(@image) RETURN GO
Listing 4 shows the C# code that calls this stored procedure. The code uses an OleDbConnection object to open a connection to SQL Server. Then, the code creates a new FileStream object named fs that opens the file homer.jpg and reads the contents of that file into the byte array named imagefile. Next, the code creates an OleDbCommand object named cmd that will execute the p_insert-image stored procedure, passing the BLOB data to p_insertimage as a parameter.
BLOB access isn't limited to just the System.Data.SqlClient namespace. You can also access BLOBs by using the System.Data.OleDb namespace, as I show here, as well as the System.Data.OracleClient and the System.Data.Odbc namespaces.
XML updategrams contain special Sync, Before, and After tags that tell IIS how to process the XML document's contents. One updategram can send multiple database actions to the server. The Sync tag identifies a transaction; SQL Server considers all actions within a Sync section as one transaction. You use Before and After tags to specify the different database actions you want SQL Server to perform. Before tags specify what the data looks like before a change, whereas After tags specify what the data looks like after a change. To insert a row, you use only the After tag. To delete a row, you use only the Before tag. And to update a row, you use the Before tag to identify the data that needs updating, then the After tag to identify the new values.
Listing 5 shows an updategram called insertdept.xml, which inserts a row into the Department table that you created earlier. In this example, the first updg:sync tag marks the beginning of the transaction. The updategram uses only updg:after tags, telling SQL Server to insert the after image of the data. The element between updg:after tags identifies the table you want to update and the columns to which you want to add the row. In this example, the updategram instructs SQL Server to insert the new row into the Department table's Dep_ID and Dep_Name columns. Although this example illustrates inserting a single row, you could use multiple department elements with the updg:after tags to insert multiple rows. The closing updg:sync tag marks the end of the transaction. You can also use a URL to execute this updategram, as the following example shows:
Connection Pooling in ADO.NET
Designed expressly for n-tiered Web applications, the .NET Framework Data Provider for SQL Server automatically performs connection pooling without requiring any special setup. When an application opens a new connection to the server, the SQL Server .NET Data Provider creates a connection pool based on the values used in the ConnectionString property of the SqlConnection object. Each connection pool is associated with a unique connection string. Every time the application opens a new connection, the SqlConnection object checks whether the value in the ConnectionString property matches the connection string of an existing pool. If the string matches, the Data Provider adds the new connection to the existing pool. If the string doesn't match, the Provider creates a new pool. The SqlConnection object doesn't destroy a connection pool until the application ends. The Visual Basic .NET example that Listing 6 shows illustrates adding two connections to the same connection pool by using identical connection strings. You can use System Monitor to monitor ADO.NET connection pooling. Table 1 lists the counters that are available for tracking connection pool usage.
Using XML Templates to Improve Security
Microsoft intended URL access of databases for Web developers. With URL access, Web developers can easily access information from databases without using ADO. However, using a URL that has an SQL query can be dangerous because you're exposing all the information in the database. To prevent security problems, you can use a template. A template is an XML document that contains the SQL query. For example, Listing 7 shows the template Query1.xml for the <Titles> query. If you use a template, you expose only the template's name and not the name of your database and its columns.
Before SQL Server 2000 can execute the template, you need to make an additional IIS configuration setting. Open the IIS Virtual Directory Management for SQL Server utility, then select the Virtual Directory configured for XML access and view its properties. In the XML Query Properties dialog box, select the Virtual Names tab and click New. In the Virtual Name Configuration dialog box that appears, you need to map the Template virtual directory to a physical path. In the Virtual name text box, type Templates. In the Type drop-down list, select template. In the Path text box, enter the path to the location where you want to store the template files, then click Save. Next, place the Query1.xml template in the folder that you specified. To access the database through the template, type the following URL into your browser:
The template in Listing 7 performs a simple query, but you can create templates that perform more complex queries. For example, you can create a template that searches for titles that match a specific string. By specifying the search string in the URL and not the template, you can use the same template for different queries. The search string you specify in the query string is passed to the template as a parameter.
Designed for Development
The disconnected DataSet object in ADO.NET helps you build scalable Web applications by keeping locking to a minimum and freeing you from having to maintain state. However, these same features open up application-design concerns. Most important, the DataSet object's in-memory XML data store, with its own set of programmed relationships and referential integrity, is a potential black hole. If you don't implement good database programming practices, your DataSet objects could be enormous. Even today, you can trace most performance problems to an application-coding problem—typically caused by either a misunderstanding of the database design or inefficient data-access methods. ADO.NET will likely exacerbate these kinds of problems because programmers—not database designers—will now be responsible for constructing constraints and relationships. In addition, because the DataSet object handles all the application data, you must take special care when programming against large databases. DataSet shifts much of the data management to the client, which might not be able to efficiently handle vast amounts of data.
How can you eliminate these potential loopholes in ADO.NET? First, become intimately involved in the application-design process. If you're a DBA, help the database developers understand and programmatically implement the underlying database design for the application. Become familiar with application-development languages and tools so that you can understand how different ADO.NET data-access mechanisms work against the database. And review the SQL code that you use to populate ADO.NET DataSet objects to verify that applications aren't retrieving or replicating extraneous data. ADO.NET is the new kid on the data-access block, and like all new technologies, it offers some great benefits and some potential liabilities. Tying up the potential loopholes in ADO.NET could make a world of difference in the scalability and performance of your database applications.
Retrieving Identity Values with ADO.NET
SQL Server's Identity columns provide a great way of letting SQL Server automatically assign unique values to rows that an application adds to a table. When adding a row to a table, the application doesn't need to know anything about the unique Identity number that SQL Server assigned. SQL Server handles that part automatically. However, in some cases, the application or the user needs to know the value assigned to the Identity column as soon as it's added to the database. For example, an order-entry application that uses an Identity column for the Order ID might need to provide a confirming order number to the client immediately. To make this information instantly available, you need a way to retrieve the value of the Identity column and display it in your application. You can set this up in ADO.NET, using an output parameter for a stored procedure that performs the insert action, as Listing 8 shows. This stored procedure inserts an employee record into the Northwind database's Employees table and returns the Identity field EmployeeID as an output parameter. The important point to notice about this stored procedure is the use of the SCOPE_IDENTITY() function that retrieves the Identity value that SQL Server assigned and copies that value into the @Identity variable that the stored procedure returned.
After creating the stored procedure, you can assign it to the InsertCommand property of a SqlDataAdapter. Then, anytime an application uses the SqlDataAdapter to insert a row, the stored procedure returns the value to the current row in the DataTable.
Custom Aggregate Product
SQL Server 2000 doesn't support developing new aggregate functions; it lets you use only the built-in ones (e.g., SUM, MIN, MAX, AVG, COUNT). If you want to provide a custom aggregation such as the aggregate product of elements within a group, you need to use some tricky code to come up with a solution. To demonstrate one such unconventional technique, first run the code in Listing 9 to create and populate the table T1. You need to provide an aggregate product of the values within the value column for each group that appears in the groupid column. Figure 1 shows the desired result.
Often, you can get help in finding a T-SQL solution from the mathematics field. In this case, you're looking for a way to use the functions that T-SQL provides to produce the equivalent of a function that T-SQL doesn't provide. The following equation from the world of logarithms is the key to the solution:
LOGn(value1 * value2 * ... * valueN) = LOGn(value1) + LOGn(value2) + ... + LOGn(valueN)
The T-SQL expression SUM(LOG10(value)) is equivalent to the right side of the above equation. So, to calculate the product of the value elements, you need to raise 10 to the power of the right side of the equation: POWER(10., SUM(LOG10(value))).
At this point, you're close to solving the problem. The remaining obstacle is that LOG doesn't accept a zero or a negative value as an argument. To account for zeros, you can use a CASE expression that returns a zero if the group has at least one value of zero. To account for negative values, you can calculate the LOG function with the absolute value as input and multiply the result by -1 if an odd number of negative values exist. Listing 10 shows the complete solution. See T-SQL Black Belt, "Adding Performance," May 2001 for more explanation of this technique.