Successful e-commerce relies heavily on IP routing, which delivers e-business
information from one company to another. Without a redundant and fault-tolerant IP routing mechanism, a company's e-business can't survive disasters (e.g., hurricanes, floods, power outages, Internet-connection disruptions, equipment failures) because the company doesn't have an alternative method to reach the Internet. To gain and maintain a competitive advantage in e-business, you must incorporate redundant IP routing into your Internet infrastructure.
A common way to implement redundant IP routing is to use redundant routers and redundant Internet connections. Multiple routers on the same subnet ensure that a gateway to your Internet servers is available when the default gateway fails. Multiple connections to several ISPs (i.e., multihoming) provide alternative routes to the Internet when one Internet link or router is down.
Before e-commerce became popular, the Internet Engineering Task Force (IETF) defined several Internet protocols as building blocks for redundant IP routing. The protocols are the Fault Isolation and Recovery Protocol, which detects dead gateways and supports multiple default gateways; Internet Router Discovery Protocol (IRDP); Virtual Router Redundancy Protocol (VRRP); and Border Gateway Protocol (BGP). You can use the Fault Isolation and Recovery Protocol, IRDP, and VRRP to build local-router redundancy, and you can use BGP for Internet-router redundancy.
Windows 2000 and Windows NT 4.0 support the Fault Isolation and Recovery Protocol and IRDP, and many internetworking and routing products support IRDP, VRRP, and BGP. To include redundant IP routing functionality in your e-commerce infrastructure, you must have a basic understanding of how these protocols work in redundant IP routing configurations.
Multiple Default Gateways
A computer sending information to the Internet usually delivers that information to a local router or Layer-3 switch in the computer's local subnet, which in turn forwards the information to another router, then to the Internet. The local router is usually the computer's default gateway. If only one router is on the subnet and that router fails, the computer can't talk to other network subnets or the Internet. To provide fault tolerance, you need two or more routers on each subnet. However, this type of configuration requires the computer to support multiple default gateways (i.e., the computer must be able to detect the availability of the default gateway). If the default router fails, the computer must fail over to an available router. IETF Request for Comments (RFC) 816 describes how the host detects a dead gateway and switches to another gateway.
Win2K and NT 4.0 Service Pack 4 (SP4) and later support multiple default gateways in their TCP/IP implementations. (NT 4.0 SP3 and earlier versions don't properly switch to default gateways. For more information about multiple gateway support in NT 4.0 SP4 and later, see the Microsoft article "TCP/IP Dead Gateway Detection Algorithm Updated for Windows NT" at http://support.microsoft.com/support/kb/ articles/q171/5/64.asp.) If your Win2K or NT system uses a static IP address, you can include multiple router IP addresses in the system's TCP/IP default gateway setting. If the system uses a dynamic IP address, you can include multiple router IP addresses in the default gateway setting of the DHCP server's subnet scope. You list router addresses in preference order. When you boot the system, it tries the first address. If the system can't reach the first address after retrying a set number of times, the system uses the second default gateway. The number of times that the system attempts to use the first default gateway equals half the value of the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Adaptername\Parameters\Tcpip\TcpMaxDataRetransmissions Registry key. If the system can't establish 25 percent or more of a system's TCP sessions using the first default gateway, the system will use the second default gateway for all communications until this gateway fails or you restart the system.
Multiple default gateways also let you load-balance multiple routers. For example, if two routers, Router 1 and Router 2, are on the same subnet, you can set half the computers on the subnet to try Router 1 first and half to try Router 2 first. This setup works for static IP addresses, but it's difficult to implement using NT's DHCP server because it can't have multiple scopes on the same subnet. However, Win2K's DHCP server supports a vendor- and user-specific option through which you can define different scopes on the same subnet.
Multiple default gateways work well for TCP communications but not for UDP communications. A UDP session (e.g., the Netlogon process) can't detect a dead gateway. (For more information about this shortcoming, see the Microsoft article "Dead Gateway Detection Is Not Triggered During Logon" at http://support.microsoft.com/support/kb/articles/q183/9/02.asp.) Thus, you can't log on to an NT domain without a domain controller on the local subnet. In this case, IRDP comes to the rescue.
Multiple default gateways require you to maintain the default gateway settings on computers or DHCP scopes. In contrast, IRDP lets a router advertise its availability. A computer can then dynamically discover the best available gateway on the subnet and automatically switch to the next best gateway if the current one fails. IETF proposed IRDP in RFC 1256.
At set intervals, an IRDP-enabled router multicasts an advertisement on the local subnet. The advertisement includes the router's interface address, a preference number, and a lifetime number (which denotes how long a computer can use this router as its default gateway before the router becomes unavailable). An IRDP-enabled computer selects as its default gateway the router that has the lowest preference number (the lower the preference number, the higher the preference). An IRDP-enabled computer can multicast or broadcast a solicitation message to all routers requesting a router advertisement when you boot the system, when its default gateway's lifetime expires, or at a predefined interval.
Enabling IRDP in an IRDP-capable router is easy. For example, in a Cisco Systems router, you use the command
to enable the protocol and set its preference and other advertisement interval parameters.
Win2K and NT 4.0 SP5 and later include host support for IRDP. However, you need to add two subkeys to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Adaptername\Parameters\Tcpip Registry key. Add the PerformRouterDiscovery subkey with a REG_DWORD value of 1, and add the SolicitationAddressBcast subkey with a REG_DWORD value of 0 (for multicast router solicitation) or 1 (for broadcast).
Virtual Router Redundancy
Multiple default gateway support and IRDP require computers intelligent enough to discover an available router on the subnet. Another potential problem is that IRDP advertisements and solicitation generate extra traffic on the subnet. VRRP, which the IETF outlines in RFC 2338, is a more efficient router-redundancy protocol that doesn't require computers' involvement in router discovery.
As its name implies, VRRP provides a virtual router to achieve redundancy. A virtual router uses a virtual router ID (VRID) address and virtual router IP (VRIP) address to represent itself. A virtual router consists of two or more physical routers: a master (i.e., active) router and one or more backup routers. The master router provides primary routing for the corresponding VRIP. The backup routers monitor the status of the master router and become active if the master router fails. The master router periodically multicasts advertisements at a set interval to let the backup routers know it's active. You set the master router to advertise its existence at a given interval, and the backup routers will assume the master router is down if they don't receive an advertisement within a period that equals three times the interval. For example, if you set the master router to advertise every 3 seconds, the backup routers will kick in after 9 seconds.
If you have more than one backup router, the backup router with the highest priority will become the active router. When the master router comes back online, it will become the active router again and the backup router will return to a standby state. The priority values range from 0 to 255. The higher the value, the higher the priority. By default, a VRRP router uses priority 100. If you use the IP address of the master router's interface as the VRIP address, you must set the master router's priority to 255. The master router uses priority 0 to tell backup routers that it has stopped working.
Figure 1 shows a VRRP configuration in which a virtual router comprising two physical routers has a VRIP address of 18.104.22.168. Router 1 is the master router for VRID 1 (it has a priority of 110), and Router 2 is a backup router (it has a priority of 100). A virtual router uses a unique Media Access Control (MAC) address formed by appending the VRID to one of the physical routers' MAC addresses. For example, the MAC address of the virtual router in Figure 1 is 00005E000101 because the MAC address of one of the physical routers is 00005E0001 and the VRID is 01. The computers in the subnet in Figure 1 use VRIP address 22.214.171.124 as their default gateway. When a computer sends information to the gateway, an accompanying Address Resolution Protocol (ARP) message requests the gateway's MAC address. The virtual router's active router responds by sending the virtual MAC address rather than the router's physical MAC address. Therefore, the computers can connect to an available router without knowing which physical router they should use.
The VRRP configuration in Figure 1 provides fault tolerance but wastes router resources because the backup router is idle. Fortunately, you can set up a VRRP configuration in which both routers are active. A VRRP router can serve more than one VRID and VRIP address on the same interface. For example, as Figure 2 shows, you can define Router 2 as the master router for VRID 02 and VRIP address 126.96.36.199 and Router 1 as the backup router for virtual router VRID 02. You can configure half the computers on the subnet to use VRIP address 188.8.131.52 as their default gateway, and the other half to use VRIP address 184.108.40.206 as their default gateway. This configuration is load balanced as well as fault tolerant.
Major vendors have implemented VRRP in their routers and routing switches. Cisco's VRRP implementation, Hot Standby Router Protocol (HSRP), is a proprietary protocol similar to VRRP. Alteon and Arrowpoint use VRRP to provide redundancy for server load balancers. The vendors call their redundancy configurations active-backup or active-active, which are similar to the configurations in Figure 1 and Figure 2, respectively. (For more information about Web server load balancers, see "Web Server Load Balancers," April 2000.)
Routers often use a routing protocol to exchange routing information and dynamically update their routing tables when network topology changes (e.g., when a router or link fails). A network under one administrative domain, such as an organization's intranet, is known as an autonomous system (AS). A routing protocol used within an AS is an interior routing protocol. The Routing Information Protocol (RIP) and Open Shortest Path First (OSPF) protocol are two popular interior routing protocols. Different ASs generally use an exterior routing protocol (aka an interdomain routing protocol) to exchange routing information. The Internet exterior routing protocol is BGP, which the IETF defined in RFC 1771. Each AS needs a unique AS number from InterNIC to run BGP on the Internet.
BGP typically runs in routers on an AS's border (e.g., your Internet routers, ISPs' routers to their customers and other ISPs). BGP routers that directly exchange BGP routing information are peers. For example, in Figure 3, page 78, Router 1 in AS1 and Router 4 in AS4 are peers. In addition, Router 2 and Router 3; Router 2 and Router 5; Router 3 and Router 6; and Router 4, Router 5, and Router 6 are peers. Two ASs that use BGP to connect are also peers (e.g., AS2 and AS3).
When two BGP peer routers have established a TCP connection, they use BGP update messages to exchange or advertise routing information. BGP routers send BGP routing information to the ASs that they and their peer routers can reach. This information includes Internet routes the routers have learned from other routers and intranet routes the routers have learned from an interior routing protocol or static routing configuration. BGP uses an aggregated or Classless Inter-Domain Routing (CIDR) IP address (aka a prefix), such as 220.127.116.11/16, to represent the route to an AS. A BGP router also associates an AS-PATH attribute with each route. This attribute denotes the path from the advertising router's AS to the AS associated with the CIDR address. For example, AS3 in Figure 3 has the network address 18.104.22.168/16. AS1, a direct peer of AS3, advertises that one possible route to 22.214.171.124/16 has the AS-PATH attribute 1 3. AS4, a direct peer of AS1, receives this information and can use it as a factor in its calculation of the best route from AS4 to AS1.
In a BGP router, you can define a policy that filters which routes a router accepts from a peer and which routes the router will advertise. To optimize routing and implement redundancy, you can incorporate attributes, such as preferences and metrics, into received and advertised routes. Peer routers use KeepAlive messages to check each others' availability. If a router doesn't receive a KeepAlive message from a peer after a predefined interval, the router drops the BGP session, removes the unreachable peer's routes from its BGP routing table, and sends an update message about the change to its other peers.
BGP running between two ASs is known as external BGP (EBGP). BGP running between routers within the same AS is known as internal BGP (IBGP). All IBGP routers in an AS must communicate with one another. You use IBGP rather than a conventional interior routing protocol (e.g., OSPF) because IBGP can take advantage of BGP's routing policy feature. BGP can natively re-advertise learned BGP routes and their associated AS-PATH attributes among IBGP routers. Many ISPs and companies that have multiple Internet connections use IBGP in their border routers. One IBGP router doesn't need to physically connect to another IBGP router as long as the routers can reach one another through an interior routing protocol or static routing configuration. For example, in Figure 3, IBGP logically connects Router 4, Router 5, and Router 6 in AS4. Thus, Router 4 in Los Angeles can advertise the routes it has learned from Router 1 of AS1 to Router 5 in Chicago and Router 6 in New York.
The simplest Internet-connection scenario is a company with one Internet connection between its network and an ISP. Unfortunately, this setup doesn't offer redundancy or fault tolerance. For redundancy, you need a multihomed configuration—that is, you must configure multiple Internet connections to one or more ISPs. The two major categories of multihomed configurations are multiple connections to one ISP and multiple connections to multiple ISPs.
If you want to multihome to one ISP, two configurations are popular. You can connect your single Internet router to two or more routers at different Points of Presence (POPs) at an ISP, as Figure 4 shows. Alternatively, you can connect two or more routers at your company to two or more routers at different POPs at an ISP, as Figure 5, page 80, shows. Although the first configuration provides redundant Internet connections, the single router at your location creates a single point of failure. The second configuration offers better redundancy: If your Internet routers are in different sites, a disaster in one location of your company won't prevent the remaining sites from accessing the Internet. If you've implemented global server load balancing for your Web servers, your customers will still be able to reach an available site.
If you want to multihome to multiple ISPs, you connect your single or multiple Internet routers to routers at two or more ISPs, as Figure 6 shows. This configuration adds more reliability to your Internet connections because if one ISP experiences a major network outage, other healthy ISPs will provide Internet access.
Fault-Tolerant Multihomed Configurations
You can set up a fault-tolerant multihomed configuration so that one link is the primary link and the other links are backup links. If the primary link is down, traffic will fail over to the backup links. For example, in Figure 4, the link from Company A's Router 3 to ISP1's Router 1 in Los Angeles is the primary link and the link from Router 3 to ISP1's Router 2 in New York is the backup link. To force Router 1 into primary link status and Router 2 into backup link status, Router 3's administrator can configure two static default routes: a shorter route to Router 1 and a longer route to Router 2. Router 3 will then give preference to the shorter link for its outbound Internet traffic.
Alternatively, Router 3 can accept the advertised default routes from Router 1 and Router 2 and associate a BGP local preference (LOCAL-PREF) attribute value with each route to denote the preferred router. The greater the value, the higher the preference. For example, Router 3's administrator can set Router 1's default route LOCAL-PREF attribute to 200 and Router 2's default route LOCAL-PREF attribute to 100 to make the Los Angeles link the primary link for outbound traffic.
To use the Los Angeles link as the primary link for inbound traffic, Router 3's administrator can apply BGP's multiple-exit-discrimination (MED) attribute to Router 3's advertised route (126.96.36.199/16). The MED attribute instructs peer ASs to choose the link with the lowest MED value as the exit to the network if the AS has multiple exits to the network. For example, if Router 3 advertised route 188.8.131.52/16 with a MED value of 100 to Router 1 and a MED value of 200 to Router 2, ISP1 would use the Los Angeles link as the primary link and the New York link as the backup link to Router 3 for inbound traffic. However, to the route, ISP1 could add a LOCAL-PREF value that overrides Router 3's MED attribute (BGP always uses the LOCAL-PREF value first when making a routing decision). To avoid problems, ask your ISP to use your MED values.
Load-Balanced Multihomed Configurations
You can create a load-balanced multihomed configuration by specifying which routers advertise and receive information about certain routes. For example, in Figure 5, Company A has two routes. Route 184.108.40.206/16 is the shortest route between ISP1 and Router 3, and 220.127.116.11/16 is the shortest route between ISP1 and Router 4. Thus, Company A's network administrator might want to configure Router 3 to prefer the Los Angeles link for inbound traffic by adding a lower MED value to the route that Router 3 advertises to Router 1 in Los Angeles and a higher MED value to the route that Router 3 advertises to Router 2 in New York. The administrator might also set a lower MED value to the route that Router 4 advertises to Router 2 in New York and a higher MED value to the route that Router 4 advertises to Router 1 in Los Angeles. The result would be that, for inbound traffic, the Los Angeles link is the primary link for 18.104.22.168/16 and the backup link for 22.214.171.124/16, and the New York link is the primary link for 126.96.36.199/16 and the backup link for 188.8.131.52/16.
If your Internet router accepts specified routes advertised from your ISP, you can load-balance these routes for outbound traffic. For example, in Figure 5, Company A has an e-business partner with a short route (route 184.108.40.206/16) to ISP1's Los Angeles POP and another partner with a short route (route 220.127.116.11/8) to ISP1's New York POP. Company A's administrator can associate a higher LOCAL-PREF value with route 18.104.22.168/16 and a lower LOCAL-PREF value with 22.214.171.124/8 received by Router 3 to make ISP1's Los Angeles link the primary link for 126.96.36.199/16 and the backup link for 188.8.131.52/8. To set the New York link as the primary link for 184.108.40.206/8 and the backup link for 220.127.116.11/16, reverse these settings for the two routes received by Router 4. In addition, Company A's administrator can define the Los Angeles link as the primary link for the default route (i.e., all other Internet routes) and the New York link as the backup link.
To load-balance and add fault-tolerance to a multihomed configuration that has multiple connections to multiple ISPs (as Figure 6 shows), you can use the same methods that you use for multihomed configurations that have multiple connections to one ISP. However, remember that the MED attribute works only in situations in which an AS has multiple connections to another AS (i.e., MED is nontransitive). Thus, if you have only one link each to multiple ISPs, you can't use the MED attribute. In Figure 6, Company A has only one connection to each ISP, so Company A's administrator can't use the MED attribute. Instead, the administrator can manipulate the AS-PATH attribute to advertise a route. For example, to set AS1 as the backup link for 18.104.22.168/16, the administrator can create a bogus AS-PATH value by adding 4 to the normal AS-PATH value 4. When Router 3 advertises 22.214.171.124/16 with an AS-PATH value of 4 4 to AS1, AS1 will advertise the route with an AS-PATH value of 1 4 4 to AS3. Router 4 advertises 126.96.36.199/16 with a normal AS-PATH value of 4 to AS2, and AS2 advertises the route with an AS-PATH value of 2 4 to AS3. Therefore, AS3 will choose the AS2 link for traffic to 188.8.131.52/16 because this route is shorter.
When you connect to multiple ISPs, block all ISP-established routes and their learned routes except routes that you specify. Otherwise, ISPs might discover a short path to another destination through your AS, and your network might become a transit AS for traffic between ISPs.
Fasten Your Seat Belts
You can use the building blocks I've described to build a redundant IP routing configuration. Multiple default gateways, IRDP, and VRRP provide first-layer routing redundancy. Multihomed Internet connections that use BGP provide second-layer routing redundancy. If you set up additional routers between the first and second layers, such as backbone routers for your network, be sure to use multiple routers and paths to incorporate redundancy. In addition, consider using reliable or redundant switches for your Internet hosts and routers. When you have a highly redundant network in place, you provide a disaster-resistant vehicle to safely carry your e-business onto the Internet.