In the past, most networks housed few shared resources; users shared access only to file servers, printers, and perhaps a database. Today, many companies distribute information and applications across networks, and many users share access to information. Employees depend on the network to do their job, so when the network goes down, users' loss of access translates into a loss of revenue or customers.
To prevent network outages that cause revenue losses, you need to regularly test your network's capabilities and plan purchases around the network's ability to meet applications' throughput needs. On small networks, you can use network monitoring software and a few simple formulas to diagnose problems or predict the network's reaction to new hardware or software. (For information about testing applications across simple networks, see "Application Testing with Network Monitor," September 1998.) However, as networks grow, their numerous devices and connections make understanding the network impossible. Too many conversations among too many devices via too many network routes occur simultaneously for you to accurately predict how one application's traffic will affect another part of the network. To diagnose problems or test new applications on a complex network, you need to simulate the network-- use a simulation program, or simulator, to build a software model of key network elements and test how well the model functions with various traffic loads or network designs.
Because the purpose of modeling network traffic is to reduce the number of devices, routes, and transactions on a network to a manageable number, your model must include simplifications and assumptions. The trick to successfully simulating your network is knowing which aspects of the network you can simplify without compromising the model's effectiveness.
To simulate your production network, you need to construct a reasonable representation of the network's topology, including the physical devices and logical parameters that comprise the network. You need to determine how much traffic is on your network during the period you want to emulate. You must specify a question you want the simulation to answer. And finally, you need to run the model through a simulator.
Your network's topology is the framework for your model. You need to include the following physical devices in your representation of the network: routers, computers, switches, WAN links, LANs, and point-to-point connections. You also need to include network parameters such as router interface settings, LAN speeds, WAN speeds, router capabilities (e.g., backplane speed), routing protocols, and naming conventions. How much detail you need to include depends on the question you want the model to answer. For example, if you are interested in WAN utilization, you don't need to include all your network's PCs in your model; you need to include only the traffic the PCs generate.
You can use software to discover your network's physical devices and logical settings. Some simulators include a discovery tool; others import information about network topology from network monitoring tools. Third-party topology discovery tools include HP's OpenView, Cabletron Systems' SPECTRUM, IBM's NetView for AIX, Digital Equipment's POLYCENTER Manager on NetView for Windows NT, Castle Rock Computing's SNMPc NT, CACI's SIMPROCESS, and Network Analysis Center's WinMIND.
Most discovery programs use the Simple Network Management Protocol (SNMP) to query the devices. Some programs require you to provide a list of the network's routers and those routers' IP addresses. Other programs need only the address of a seed router. A discovery program learns what it can from a router's Management Information Base (MIB), then asks the router to direct it to a neighboring router. The discovery program queries the next router's MIB and steps through the network, learning about routers, interfaces, and devices on each network segment.
This discovery of your network topology provides you the first benefit of simulation: network documentation. The discovery process creates an inventory of the most important devices on the network and their settings. You can export this information to a spreadsheet or use the information to update previous device inventories you've taken.Traffic
If your network's traffic stopped, users wouldn't be able to communicate with network devices, and many users would be unable to perform their job. Your model must emulate traffic in as much detail as is computationally possible.
Before you can emulate network traffic, you need to learn how information flows on your network. Collecting traffic doesn't require special hardware; you need only a computer with an NIC that operates in promiscuous mode (many standard cards do). Collecting traffic requires special software that captures data or communicates with SNMP or Remote Network Monitoring (RMON) agents. Some network analysts use a special device for troubleshooting networks. Underneath the device's case lies a standard PC with one or more standard NICs. For the sake of simplicity, I'll break these devices into two groups: network analyzers and network probes. The functionality of devices within these categories overlaps, but your enterprise probably needs a network analyzer and a network probe because products in the two categories serve specific needs common to all networks.
Network analyzers. Network, or protocol, analyzers are devices that take a snapshot of traffic on a network. For example, you can use an analyzer to see instantly whether a large number of frames on the network have errors from a problem NIC or computer. You can look inside a frame and view its contents. (However, you'll be disappointed most of the time because instead of seeing usernames and passwords, you'll see only unintelligible characters.) Because analyzers are designed to gather and report data about the lowest layers of the Open Systems Interconnect (OSI) model, and because their buffers have limited sizes, you can't use analyzers to gather data about how protocols function on the network over a long period. Administrators usually don't leave analyzers in one place for long, so analyzers often have rugged cases.
Most analyzers can record network traffic and play it back to the network later. This playback feature is useful when you want to redesign your network and test how the new system reacts to a specific user request, such as a database query. In addition, analyzers can generate traffic to stress test your production network and determine the maximum amount of traffic the network can support.
Analyzers let you filter traffic based on host or destination addresses, so they can capture the traffic that one conversation between two machines generates. This specificity is important when you want to analyze a new application before deployment, because you want to include in your analysis only traffic that the application you're testing generates. Network analyzers that run on NT include 3COM's Transcend LANsentry Manager, Network Associates' Expert Sniffer Network Analyzer and Distributed Sniffer System (DSS), AXON's LANServant Manager, NetScout Systems' NetScout Manager Plus, and Compuware's EcoSCOPE.
Network probes. Network probes use RMON2 to analyze network traffic at the network and application layers. They provide much of the same detailed network information that analyzers generate, but they can't provide single-packet analysis. Probes are best at monitoring a network's health over time because they don't suffer from the myopia inherent in analyzers.
Probes can be PCs running probe software, or they can be separate hardware devices that are smaller than PCs and have no monitor or keyboard. You attach a probe to your network, and it gathers the data that you request. You can perform administration tasks remotely, via a serial port connected to the back of the probe, or at the probe console if the unit has a console. You need probes throughout your network to keep an eye on the entire network, so most probes don't move around much. The optimal simulation environment has a probe on every network segment.
If budget constraints prevent you from placing probes on every segment of your network, you must determine where on the network to place probes to maximize traffic capture for the number of probes you can afford. Looking at your network's design and traffic data helps you determine where to place probes. You want to place them at traffic sources, or sinks. For example, if you have a Fiber Distributed Data Interface (FDDI) ring with 20 servers attached to it, the FDDI ring is an appropriate place for a probe because a lot of traffic traverses it. Probes contain information that you don't want anyone except administrators to access. Make sure your probes are password-protected, and keep them behind a locked door.
To ease the management problems that distributed monitoring devices cause, most network monitoring solutions have probes set up to report to a midlevel management station (i.e., a workstation running management software). The probes are smart, so they are capable of gathering data, running statistical analyses on the data, and sending only the important numbers to the midlevel management stations. The midlevel management stations consolidate information from all the probes that report to them, and some perform additional analysis. You need a midlevel management station for every 10 to 15 probes, depending on the volume of traffic the probes process. You also need a higher-order management system that periodically queries the midlevel management stations for probe information. This higher-order manager assimilates all the network's data and produces data sets and graphs.
Although UNIX probes and UNIX management software have traditionally dominated the network probe market, many vendors are porting their UNIX probes to NT. A UNIX or NT probe can monitor networks running either OS, because probes function independently of the local OS. Table 1 lists probes that work with NT. Probes for faster LAN protocols such as FDDI usually cost more than Ethernet or Token-Ring probes.
If you have multiple probes capturing conversations on your network, the probes might capture thousands of conversations per minute. This amount of traffic bogs down simulations on the most powerful machines. You need to remove or consolidate conversations to prevent unnecessary conversations from slowing your simulation.
Remove duplicate conversations. One method for reducing the number of conversations you capture is eliminating duplicate conversations. Two probes might capture and record the same conversation; for an accurate measure of network traffic, you need to make sure you record conversations only once. Most network probe software packages eliminate duplicate conversations. If your probe software doesn't remove duplicate conversations, you must create a utility that compares conversations and deletes one of every pair of conversations that have identical attributes.
Reduce the number of remaining conversations. You can use two methods to reduce the number of valid (i.e., not duplicate) conversations. Few probes or simulators provide these methods of data reduction, so you might need to create a standalone application to use them. The first method eliminates conversations that are too small to significantly affect network traffic. For example, 40 percent of conversations across your network might make up less than 1 percent of the network's total traffic because these conversations consist of very few packets and very few bytes. These small conversations (such as pings) register in the simulation and use up precious resources on the machine running the simulation.
The second method consolidates conversations with the same source, destination, and application. This method combines all the consolidated conversations' packets and bytes, so it doesn't lose any traffic data. You can set a time criterion (e.g., 5 seconds), so the software consolidates only conversations that took place within a specific period of time. Eliminating small conversations and consolidating remaining conversations can reduce your number of conversations by up to 70 percent.
You can use simulation to estimate the effect that deploying a new application will have on your production network. Until you deploy the application, probes on your network won't capture the application's traffic. To measure a new application's effect on a simulation, you first must add the application to one machine on the production network or (preferably) to a lab network.
Then, you can gather basic information about the application's individual conversations. Use probes that detect packets' header information, including their network protocol and application. Most probes automatically determine which network protocol a packet uses (e.g., IP, IPX, DECnet), but you must manually configure them to identify many applications (e.g., Word, Telnet, CAD). After you configure your probes, you can identify the following information for every conversation on the network: the network protocol, the application that generates the conversation, the conversation's source and destination computers, the number of packets and bytes that travel in each direction, total round-trip latency for the application, and the duration of the conversation.
Looking at your traffic at the application level is useful. You can check latency for applications to see whether they are meeting a minimum quality of service. You can see how much throughput each application uses. Because you know the source and destination of all traffic flow, you can determine where your intranet traffic is going and which users are using which resources. To simulate deployment of an application, you need to gather data about the application's conversations and manually add this data to your network model. Then, you must instruct the simulator to expand the information you've gathered about the application to simulate the activities of many users at many locations within your model.
Developing a Question
Before you run a simulation, you must devise a specific question that you want the simulation to answer. You might want to ask a change-analysis question. For example, you might want to analyze what would happen if you changed your network's WAN links, LANs, or routers or added a new application to your network. You might want the simulation to answer a question about the network's fault tolerance. For example, you might want to determine how the failure of a specific device or group of devices such as LANs, WAN links, or an entire facility of your organization--would affect application demands. Answering these questions can help you in capacity planning, rollout validation, disaster recovery, and life cycle management.
After you decide which question you want your simulation to answer, you might need to tweak the network model's topology and traffic to make sure the simulation addresses your question. Simulators automatically alter a network model to test its fault tolerance; they have built-in utilities that emulate the failure of network devices. You can select devices from a menu, and your simulation will predict what effects that device's failure would have on your production network. However, you must manually alter your model for your simulation to answer a change-analysis question. To add or move servers or users, you must change the volume of bytes an application produces in the model or add network devices or demands to the model before running your simulation.
After you develop a model of your network and define the question you want to answer, you're ready to run simulations. Simulators have traditionally run on powerful UNIX workstations, but some simulators are now available for NT. Most cost between $40,000 and $100,000, and running them requires training. Because of simulators' cost and complexity, many companies contract with firms that specialize in simulations to provide the necessary tools and run the simulations.
Simulators usually use a discrete event or an analytical approach to model network traffic. Discrete-event simulation analyzes the network traffic that each packet generates to determine the network's behavior. The analytical approach makes assumptions about network traffic before a simulation. The analytical method can be as accurate as the discrete-event method, because discrete-event simulators can't store detailed information about each packet in simulations that include the many thousands of conversations (each of which can contain many packets) that most networks generate. Because the discrete-event method is much slower than the analytical method on large networks, I recommend that you use an analytical simulator for simulations of networks with more than 50 routers.
My COMNET Predictor Tests
COMNET Predictor is an analytical modeling tool from CACI. (For more information about CACI, see http://www.caciasl.com). COMNET Predictor runs on NT and is intuitive to install. I recently used COMNET Predictor to build a simple simulation from scratch on a lab network. I defined my network topology for the simulation, and I manually added all the network's traffic demands because my network was so simple.
Screen 1 illustrates my test network. The network consisted of two LAN segments separated by a WAN link. An NT server and workstation resided on each LAN. You can see COMNET Predictor's selection of network-building tools on the left side of the screen. To create a network, you select a device icon (such as the LAN or server icon) and drag the icon into the main window. You connect devices with links and define the links' characteristics by choosing characteristics from a predefined list (such as T1 or ISDN) or customizing the specifications. My test network contained two models of Cisco routers. COMNET Predictor was familiar with the routers' capabilities and automatically included the routers' characteristics in the simulation.
Screen 2 shows five conversations that I manually entered into my simulation. For each conversation, I entered the origin computer, destination computer, application, protocol, and rate of transfer. You can select applications from a predefined list or add your applications manually (as I did in my simulation for Lotus cc:Mail).
After I entered my network topology and network traffic, I ran a simulation. I clicked the Run Simulation icon (the stoplight in Screen 1's toolbar), and the simulation began. Because my model network was simple and had little traffic, the simulation was complete in a fraction of a second. I generated reports that examine current and forecasted network utilization and potential failures. Screen 3 shows a report that provides the average percent utilization for each device and LAN on the network during the time that my simulation's traffic demands were active. Most network devices have a limit on the number of packets they can process per second; for network devices, percent utilization measures how much of this limit network traffic is consuming. For WAN and LAN segments, percent utilization measures how much of the available bandwidth network traffic is using.
I told COMNET Predictor I expect a 10 percent growth in traffic each year. The software calculated the demand for each device and LAN and projected the network components' percent utilization for the next 2 years. If my test were a simulation of an actual production network, this report would tell me to keep an eye on my WAN link. If I decided to increase the throughput of my WAN link, I could easily change the throughput of the model's WAN link and run another simulation to determine the effect that change would have on the network.
Life Cycle Management
Conducting simulations is a very important part of life cycle management for your network. You need to carry out simulations often, because most networks' topology and traffic are constantly evolving. Figure 1 illustrates how the different stages of simulations fit into the life cycle management of your network.
To maintain a healthy network, use simulations to predict your needs for the future and develop cost-effective solutions to problems before a disaster strikes. Keep an up-to-date inventory of the devices and settings on your network. Conduct traffic analyses to keep abreast of LAN, WAN, and network device utilization. Determine how long users must wait for applications to respond, which might be part of a quality-of-service contract. Use periodic simulations to help plan your capacity for future growth and determine emergency plans to deal with network failures. In addition to your periodic tests, run simulations before you roll out new applications or hardware.
A simulation is neither the starting point nor the end point for answering your network design questions. Run-ning simulations will incite your curiosity and generate more questions than the process answers. As you run simulations, you will learn to appreciate your network's complexity and gain a better sense of how all the components work together.