Linked servers are often-untapped resources that let you use distributed queries to query any server as if it were local. Distributed queries let you keep your inventory data in a DB2 database, for example, and your accounting data in a SQL Server database and query both sets of data as if they were in the same database, eliminating the cost and hassle of merging the two systems. SQL Server lets you execute such queries on any OLE DB-compliant data source. This article shows you how to set up, query, and gather meta data about linked servers.
About Linked Servers
Linked servers are descendants of remote servers, which you can use to execute replication stored procedures in SQL Server. However, Microsoft recommends that you use linked servers instead of remote servers in SQL Server 2000 or SQL Server 7.0 because linked servers have much more functionality, such as the ability to run ad hoc queries. With linked servers, you begin by establishing a connection in SQL Server to a compatible remote OLE DB provider. SQL Server clients connect to the linked server, then SQL Server connects to the remote provider on the client's behalf, as Figure 1 shows. The linked server acts as a middleman, taking an order from the consumer, passing it to the source, then passing it back to the consumer.
Linked servers are especially useful when configuring the OLE DB data source on each client computer is too time-consuming. In addition, stabilizing connections to other types of servers, such as DB2, can be tricky; linked servers minimize this complexity because you need to configure the connection only once. Clients need to connect only to the standard SQL Server provider; they don't need to have an OLE DB provider for DB2, for example, on their workstations.
Linked servers are also the core technology in SQL Server 2000 distributed partitioned views. In a distributed partitioned view, you can make several uniformly distributed tables appear as one table and distribute the load of large queries among many servers. (For more information about distributed partitioned views, see Kalen Delaney and Itzik Ben-Gan, "Distributed Partitioned Views.") To make this new SQL Server 2000 feature work, however, you need to inform each node of the other nodes' existence by adding a linked server for each participating node in each SQL Server system.
Setting Up a Linked Server
Most of the following examples show you how to link one SQL Server machine to another. I also explain briefly how to connect to DB2 and Oracle. Linked servers are new beginning in SQL Server 7.0, so avoid bugs by installing the latest service pack. If you haven't already done so, I recommend that you install Service Pack 2 (SP2), which contains quite a few fixes for linked servers. Some of the most dangerous bugs include access violation (AV) errors that occur when you use linked servers.
You can add a linked server through Enterprise Manager. From the Security menu, right-click the linked server icon and select New Linked Server. Or you can script the process in T-SQL by using the sp_addlinkedserver stored procedure:
sp_addlinkedserver \[@server =\] 'logical name of server' \[, \[@srvproduct =\] 'product_name'\] \[, \[@provider =\] 'provider_name'\] \[, \[@datasrc =\] 'data_source'\] \[, \[@location =\] 'location'\] \[, \[@provstr =\] 'provider_string'\] \[, \[@catalog =\] 'catalog'\]
Table 1 describes each of these parameters, which also exist in the Enterprise Manager screens. The following sp_addlinkedserver example adds a linked server named LINKEDSERVER, which connects to a SQL Server named BKNIGHT:
EXEC sp_addlinkedserver @server=LINKEDSERVER, @srvproduct = 'SQLServer OLEDB Provider', @provider = 'SQLOLEDB', @datasrc = 'BKNIGHT'
The @provider parameter is the name of the OLE DB provider that you want to use. SQL Server's OLE DB name is SQLOLEDB. Table 2 lists the core OLE DB providers' names.
After you've added the linked server, you need to set the security method it will use to connect to the remote data source. You set the security method by using the sp_addlinkedsrvlogin stored procedure:
sp_addlinkedsrvlogin \[@rmtsrvname =\] 'rmtsrvname' \[,\[@useself =\] 'useself'\] \[,\[@locallogin =\] 'locallogin'\] \[,\[@rmtuser =\] 'rmtuser'\] \[,\[@rmtpassword =\] 'rmtpassword'\]
Table 3 describes sp_addlinkedsrvlogin's parameters.
Windows NT integrated security becomes especially useful here. If you configure the security correctly, users can log in to the SQL Server machine with their NT logins, and the linked server will pass the remote server a mapped standard SQL Server login with the appropriate permissions. You can't pass NT credentials from the client workstation to the linked server then to the remote server. To circumvent this barrier, map a standard account on the remote server with an NT account on the local server. This workaround is called a double hop. Failure to map the account this way will result in the following error:
Server: Msg 18456, Level 14, State 1, Line 1 Login failed for user '\'
For more information about double hopping and this error, see the Microsoft article "PRB: Message 18456 from a Distributed Query" (http://support.microsoft.com/support/kb/articles/q238/4/77.asp).
The following example sets the SQL Server security for the linked server, using an account named user account to connect to the remote SQL Server:
EXEC sp_addlinkedsrvlogin @rmtsrvname='LINKEDSERVER', @useself='false', @rmtuser='useraccount', @rmtpassword='userspassword'
You can use the sp_serveroption stored procedure to set additional access and optimization options. The most valuable of these options is the collation compatible option. When you select this option (by using the sp_serveroption stored procedure or by selecting the appropriate check box in Enterprise Manager in the linked server's configuration dialog box), SQL Server assumes that the source and destination servers are both operating on the same collation--a particular combination of sort order and language. The collation compatible option tells SQL Server not to pull the query back to parse the ORDER BY clause, which would substantially slow performance, but to pass the query to the provider and let the provider execute the query. The data access option lets you access the data on the linked server. Finally, the RPC and RPC out options allow remote procedure calls in and out of your linked server.
You can use T-SQL to set sp_serveroption options. The following code sets the collation compatible option:
EXEC sp_serveroption 'LINKEDSERVER', 'collation compatible', 'true'
You can modify this code to set the data access option and the RPC and RPC out options.
A frustrating thing about linked servers is that you can't easily update their settings after you create links to them. You can, however, change the options and change the security configuration. The easiest way to update your linked server after creation is to drop it and recreate it. Therefore, you should always save the scripts that you use to create linked servers so that you can quickly recreate them by running the scripts in Query Analyzer.
Querying a Linked Server
After you've added a linked server, you can query it using one of three methods. The easiest method, which works well with other SQL Server systems, is to use a four-part qualifier in your queries:
SELECT * FROM LINKEDSERVER.NORTHWIND.DBO.CATEGORIES
Be aware, however, that this method of querying your linked server doesn't work well with some heterogeneous data sources, such as DB2. Queries of most heterogeneous data sources perform better with the OPENQUERY command, which we'll explore in a moment.
With a linked server, you can join tables the same way you can on a local server. The following example supposes that you have inventory information on one server and an order system on another:
SELECT lnw.CompanyName, rnw.OrderID, rnw.OrderDate, rnw.Freight FROM LINKEDSERVER.Northwind.dbo.orders rnw, Northwind..Customers lnw WHERE lnw.CustomerID = rnw.CustomerID AND rnw.shipcountry = 'USA' AND rnw.OrderDate > ' 04 / 22 / 1998'
When Query Optimizer parses these queries, it reads the capabilities of the OLE DB provider before executing.
The UNION ALL command lets you use linked server technology to make multiple databases look like one database. With UNION ALL, you can create a distributed partitioned view. The following query gathers all categories from two SQL Servers and merges the returned results into one recordset:
SELECT * FROM LINKEDSERVER.NORTHWIND.DBO.CATEGORIES UNION ALL SELECT * FROM Northwind..Categories
This approach is useful when you have data that is partitioned on separate servers by date, for example, and you want to generate a unified report of all servers.
The second way to query a linked server is to use the OPENQUERY command. This command is perfect for heterogeneous databases because it executes the requested query on the remote system, not on the SQL Server system that hosts the linked server. I've had many queries that wouldn't work with the four-part identifier but work fine with the OPENQUERY command. When you use the OPENQUERY command, you instruct SQL Server to select all records that the query specifies between the quotation marks:
SELECT * FROM OPENQUERY(LINKEDSERVER, "SELECT * FROM northwind..Categories where CategoryName Like 'Sea%'")
The first required piece of syntax is the linked server name, followed by the query. Notice that you need to use single quotes around the conditional piece of the query.
The third way to query a remote provider is to use the OPENROWSET command, which doesn't require that you have a linked server set up beforehand. This command uses linked server technology but creates the link at runtime, letting you dynamically set up the server you want to connect to. The OPENROWSET command operates the same as the OPENQUERY command: It executes all queries on the remote server. The example query in Listing 1 uses the OPENROWSET command. Listing 1's example code also shows that you can use the OPENROWSET (or OPENQUERY) command to perform joins.
Although OPENROWSET gives you additional flexibility and power to dynamically create connections, I don't recommend using this command unless you can absolutely justify a need. Having a centralized location where you can configure your linked server is much more convenient than having to recompile your code every time you make a connection change.
Gathering Meta Data
The stored procedure I use most for gathering linked server data is sp_linkedservers. This stored procedure tells you how many linked servers are configured and gives you information about them. The procedure is handy if you have obscure names for your linked servers because remembering such names can be difficult. If you haven't established a linked server, you'll see only one result: the local server's information.
You can also request a list of databases that are on your linked servers by using the sp_catalogs stored procedure. This stored procedure uses the @server_name variable, which is the name of the linked server you're querying:
sp_catalogs @server_name = 'LINKEDSERVER'
You can request a list of tables in a database by using the sp_tables_ex stored procedure:
EXEC sp_tables_ex @table_server = 'LINKEDSERVER', @table_catalog='northwind', @table_schema='dbo', @table_name='Suppliers'
Using some of the nonrequired parameters, such as @table_catalog and @table_name, is important when you want a list of only a subset of the database's tables. If you don't use the @table_name parameter, for example, the stored procedure will return every table in the database. If you don't use the @table_catalog parameter, the stored procedure will return the default database's tables only.
You can use several other stored procedures to obtain meta data about your linked server's tables and indexes. For a complete list of other stored procedures, see SQL Server Books Online (BOL).
Many companies can't or don't plan to upgrade their non-SQL Server infrastructures to SQL Server either because of the cost or because of the complexity. In these cases, linked servers let you build applications that take advantage of SQL Server's features but still access your legacy data. Linked servers also let you partition your data geographically or logically, dividing the load among the various servers.