Building a high-performance Web site involves more than creating HTML pages and Active Server Pages (ASP) that meet your business requirements. To build a high-performance Web site, you must also address performance, cost, and reliability concerns. However, companies often don't have time to address these concerns because they are busy keeping up with a rapidly changing Internet.
Reality Online (a fully owned subsidiary of Reuters) was one company that went against the trend and addressed these concerns. A team of engineers at Reality Online had built the Reuters Web site (http://www.moneynet.com) and Web-based Reuters Investor product on UNIX using Netscape Web server software, server-side scripting language, and C-based Common Gateway Interface (CGI) programs. Under this platform, the Web site and the Reuters Investor product received about 500,000 page views per month. The engineers migrated the Web site to NT using Internet Information Server (IIS) 3.0, ASP, and the component object model (COM). The site now receives more than 10 million page views per month. This article highlights the problems that Reality Online encountered with Reuters' UNIX implementation and how the engineers at Reality Online addressed those problems using NT. (For more information on the implementation, see the sidebar, "Windows NT Magazine Interviews Nicholas DiLisi," on page 144.)
Reuters Investor Product
|Making Reutersmarket data available via the Internet required a reliable, fault-tolerant solution capable of handling millions of page views per month. Reality Online, a fully owned subsidiary of Reuters, migrated its existing UNIX and Netscape implementation to a suite of Windows NT Web servers running Internet Information Server (IIS), using Active Server Pages (ASP) to communicate with its legacy UNIX-based Oracle data servers. Now, the Reuters Web site receives more than 10 million page views per month. Migrating to NT also helped Reality Online improve its Web site development cycle.|
The Reuters Investor product lets other companies partner with Reuters to customize the appearance of Reuters' content and integrate Reuters market data into their Web sites using HTML pages. Reuters can use templates to customize these pages with the colors, fonts, and graphics that each partner site uses. The HTML pages support framed and frameless sites. A partner site can also use these pages to provide HTML fragments that the Reuters server can host to deliver content in a frameless Web site using existing templates for specific market data types.
Partner sites that want to display data in their own architecture can use Reuters' Data API. The Data API lets partner sites access Reuters' data servers via the Internet using an HTTP request. The first approach for integrating Reuters data lets a partner site begin using the Reuters content almost immediately through the use of HTML pages. The second approach lets the partner site migrate to the Data API for additional flexibility with the user interface.
The UNIX Days
When Reuters decided to provide information to partner sites on the Internet about 2 years ago, Reality Online was using a homegrown solution to connect Netscape Web servers with Oracle data servers, both running on the Sun Solaris platform. In 1992, Reality Online built a suite of content data servers on the UNIX platform consisting of custom applications and Oracle databases to deliver market data to its Reuters SmartInvestor desktop application. Investors used this application to track portfolios and investments via a modem connection to Reality Online's private network. Reality Online invested a lot of time and money in building the data servers; thus, connectivity between the existing data servers and the Web servers was important.
Reality Online used the Netscape Web servers to build the presentation layer for its moneynet Web site. The Web pages required a certain amount of flexibility, which made a server-side scripting language necessary. Initially, Reality Online used a homegrown scripting language. However, as the industry changed and companies released new products, maintaining and improving the homegrown language became costly. Thus, Reality Online migrated to MetaHTML, a first-generation commercial scripting product. However, like many first-generation scripting products, MetaHTML performed poorly as the number of hits to the Web site increased each month. This poor performance was the result of CGI programs using forks to initiate processes for each HTTP request the Web server received, and starting multiple processes is time-consuming.
Reality Online was also concerned about the Oracle data servers' connection to the legacy databases. To maximize server performance, Reality Online wrote C-based CGI programs and compiled the programs to the machine code level to communicate with the data servers via sockets.
Reuters used Oracle Web Server and Rogue Wave's database library to access the legacy databases. When a Web page displayed Reuters data with data from sources other than the database, the company used Rogue Wave's database library. When the Web page displayed only data from the Oracle database, the company used Oracle Web Server.
Reality Online used six Sun Microsystems SPARC 2 servers to run Netscape's Web server software. In this configuration, the Web servers averaged about 85 to 90 percent CPU utilization to deliver approximately 500,000 page views per month. Reality Online wanted to deliver millions of page views per month. However, the company would need to add six more SPARC 2 servers to deliver 1 million page views per month. For a scalable Web site, adding six more servers wasn't a cost-effective solution.
Other important concerns included the ability to make incremental changes to the Web content and the addition of new Web features to the Reuters Investor product. Under the UNIX implementation, releasing new templates and changing existing templates required days of preparation because users had to manually copy lists of files to each Web server. In addition, the sensitive nature of the Web site data didn't permit Reality Online to interrupt the Web site during normal trading hours. Therefore, engineers and systems administrators needed to release new content on the Web site during off-peak hours (i.e., at 3 a.m.).
Reuters examined several solutions to the problems of releasing and stag-ing Web pages. At the time, many tools didn't support revision control on the Web server files or a publishing system for releasing files to production. If Reality Online found a usable tool, the company typically had trouble integrating the new tool with the existing Web development tools, or discovered that the tool couldn't support the number of files the Web site contained.
To address the concerns of maintaining high performance while lowering cost of hardware ownership, improving productivity using a rapid application development (RAD) approach, and simplifying the release process to production, the engineering team began looking at other platforms. Having had several positive experiences using Microsoft tools on other projects and knowing that these tools generally worked well together, Reality Online decided to migrate to NT. The engineering team knew they would need to maintain the existing site content during the migration phase.
In the past, Reality Online had used Microsoft's Visual Studio, Visual Sourcesafe, and Visual C++ 5.0 to simplify the development of its Windows-based SmartInvestor product. Reality Online's biggest concern regarding these tools was whether these tools were ready for Internet Web site construction use, as Microsoft claimed. The answer to that question was yes.
The NT Era
First, Reality Online's engineering team examined the development process to determine how to release new features onto the Web site without interrupting the Web content or making changes after hours. During the release process, Reality Online engineers used Microsoft's Content Replication (MCR) system to move ASP and HTML content from development servers to beta servers and then to production servers. To ensure that the MCR system moved content to all servers, the engineers set up the Web servers in groups. This way, the engineers can add servers to the groups without changing the release process and use MCR to update groups of servers. Using the MCR system, the engineers can update content on all the servers without interrupting the Web site. The routes the engineers created for the Web pages let them move code revisions from the development servers to the beta servers for integration testing. The engineers and systems administrators then moved the pages to the production servers without error. Reality Online uses this process for Web pages and ASP scripts that it releases into production.
Reality Online also needed a way to release new server objects (e.g., ActiveX objects) into production. To release COM objects to the production servers, Reality Online created an application that let systems administrators install ActiveX components remotely. Using this install application, Reality Online could package ActiveX .dll files and rely on the rules of creating COM interfaces to enhance functionality without interfering with service.
Reality Online also used the MCR system to retrieve Web pages from partner Web sites that made their content available to Reality Online for repackaging. When Reality Online began retrieving the data and storing it locally on the Reuters Web servers, the company no longer needed to depend on partner sites to provide financial con-tent with the same high level of availability that Reuters provided to its partner sites. Retrieving partner sites' Web content locally also let Reality Online better integrate third-party content with standard Reuters content. Reality Online runs the MCR system periodically (depending on how often the company updates Web content) to capture a snapshot of the data that exists on partner Web sites. The Reuters Web servers automatically process the data into databases for repackaging into available HTML pages.
Solving the Server-side Scripting and CGI Programming Problems
Most of the moneynet Web site content is dynamic. Data changes every 7 minutes (e.g., delayed quotes) to every second (e.g., realtime quotes). Screen 1 shows the moneynet home page, and Screen 2 shows one of the quote pages you can access from the Web site.
To improve the performance of Web content that the Web server doesn't need to generate with every Web request, Reality Online built a caching mechanism. This page-caching mechanism let the engineers cache portions of the Web page between Web requests. Engineers can pull these Web page portions from memory instead of routing the request for data back to the data server. This mechanism has improved performance dramatically.
Reality Online implemented the business logic as a series of ActiveX components that the company built using VC++. Reality Online used these components to develop the ASP on Reuters' Web site. By using VC++, Reality Online built on the skills of the engineers who worked on the SmartInvestor application.
By implementing the business logic as ActiveX objects, Reality Online minimized the amount of processing required on the ASP side. The company wrote the ASP coding primarily to fill in the dynamic portions of a Web page. Reality Online's engineers placed a Web page's HTML in an ASP file. This coding resulted in replacing the static portions of the Web page that displayed dynamic data with calls to the ActiveX objects.
Supporting Multiple Browsers in One ASP Code
Writing high-performance HTML code that worked on both browsers was a major task. Reality Online tested pages on both browsers and on multiple operating systems (OSs). Reality Online paid particular attention to page design. The engineers tried to minimize the use of HTML tables in the page layout so that the browser can render Web pages as efficiently as possible. Reality Online made sure tables didn't get too large and gave tables size tags to improve their rendering performance. Reality Online might have used frames to solve the problem; however, the company needed to support non-framed pages for several of Reuters' partner sites.
Connecting to the Oracle Databases
Reality Online needed to access the valuable data in the legacy Oracle databases and wanted to use only one method to access the data after migrating to NT. The company chose to use Microsoft's ActiveX Data Objects (ADOs). ADOs come as part of a set of Microsoft-supplied ActiveX objects and let any language that can call ActiveX objects access a variety of data sources. The ADOs provided a simpler interface for accessing data in the Oracle databases than Open Database Connectivity (ODBC) provided. Using the ADOs, Reality Online can access the databases consistently from the ASP script and VC++. Reality Online's engineers had to learn only one API to access the databases, which let them reuse some of the earlier code they had written in VBScript and C++.
The ADOs also helped Reality Online during new feature development. The Reality Online engineers could prototype the new feature functionality using Microsoft SQL Server 6.5 or Access 2.0 on a local computer. Thus, engineers were able to develop Web sites regardless of which database handled data storage for that Web site. After the engineers completed development, the database administrators ported the database tables to the Oracle databases and stored procedures on the databases.
Forwarding HTTP Requests
Originally, Reality Online used round-robin Domain Name System (DNS) as a load-balancing mechanism to forward HTTP requests to a suite of proxy servers. Round-robin DNS lets a suite of Web servers appear as one domain name. When a user makes a request to a Web server, the DNS lookup rotates the IP address that the proxy server returns. Reality Online used a firewall to route requests to the proxy servers from partner sites connecting via the Internet. The proxy servers forwarded these requests to the Web servers, as Figure 1 shows. The proxy servers cached certain types of requests (e.g., multiple requests for .gif images and static HTML pages) and provided the connection to Reality Online's server outside the firewall. In this old network configuration, a one-to-one relationship existed between the proxy server and the Web servers. Thus, if one Web server's load was too heavy or the Web server was down, the user would experience an outage, even though several other Web servers were available.
Under the new network configuration, Reality Online wanted a more intelligent proxy server that only forwarded the request to an available Web server. The company used Microsoft's Proxy Server 2.0 for NT. Reality Online wrote a custom Internet Server API (ISAPI) .dll file that the proxy server uses to send round-robin requests to the Web servers. If a Web server isn't functioning, the proxy server automatically removes it from the round-robin rotation. The nonfunctioning Web server doesn't receive any more requests until it's functional again. Thus, Reality Online can bring a Web server offline for maintenance during business hours without interrupting the site's availability.
Testing the Web Servers
After the engineers developed new content for the moneynet Web site and the Reuters Investor product, they tested the program code on a series of NT-based development servers before using the MCR system to move the content to beta servers for integration with the rest of the Web site. Because Reality Online's beta servers and production servers are identical, the company uses the beta servers as backup servers when a hardware problem exists on the production servers. Reality Online performed functionality tests and ran a set of scripts to check the performance of each area of Reuters' Web site. The performance scripts issued an HTTP request to a Web server to determine the server's maximum capacity and throughput. Reality Online used this information to set up the monitoring scripts that the operations group used.
Monitoring Performance and Site Availability
To ensure that Web content is available 24 hours a day, Reality Online built a suite of monitoring plugins for Microsoft's Performance Monitor. Reality Online's operations staff uses these custom monitors to check connectivity between the NT Web servers and the UNIX data servers. Reality Online also built a suite of custom monitoring tools using HP's OpenView ManageX to monitor and restart processes (e.g., IIS) within the server. If a Web server doesn't respond, ManageX attempts to restart the Web server process. If the process restart fails, the monitors issue a page to the on-call operations staff.
Reality Online also checks connectivity to the servers from the Internet. The company monitors Web server response and page display over a dial-up Internet connection. Reality Online created several scripts that the company runs periodically to check connectivity latency and response time to the site via an Internet connection. If performance scores are below a preset limit, the Web monitor process automatically notifies systems administrators in Reality Online's data center. Reality Online also uses Keynote Systems (a service that monitors and reports Web site response times) to monitor Web site performance from remote US locations and international locations.
Reality Online plans to concentrate on improving the performance of several key content areas on Reuters' Web site (e.g., Portfolio Tracker, News). Using ASP sessions, Reality Online can cache certain data during a Web session and use that data to improve Web site performance. Reality Online also plans to build components that the company can integrate with ASP scripts so that partner sites using NT can use the site more easily. These components will make the process of developing Web pages that communicate with the Reuters data servers simpler for partner sites. To improve the site's reliability, Reality Online will migrate to IIS 4.0. By upgrading to that version, Reuters can run applications in its address space without affecting Web applications running on the server.
WINDOWS NT MAGAZINE INTERVIEWS NICHOLAS DILISI Nicholas DiLisi was the vice president of development at Reality Online responsible for migrating the Reuters Web site from UNIX to Windows NT. Here are his thoughts on the project. Nicholas currently works for VerticalNet.
What did you like about the project's implementation?
Using Microsoft's development tools helped us cut our development cycle in half. The engineers developed new features on their computers, then integrated the new functionality on our development servers. This new process helped us solve the stability problems we had experienced in the past.
What didn't you like about the project's implementation?
Trying to support all the versions of available Web browsers was a headache.
What would you have done differently on the project?
We would have converted all functionality over to NT before we switched to the new architecture to save the time we spent supporting a Web site statistics process that we planned to replace. Also, we should have considered redesigning the user interface during the migration period. After we completed the migration, we decided to update the user interface.
What advice can you give your peers?
When building a mission-critical Web site, pay particular attention to performance and server capacity. Make performance testing part of the release process for all features you release on the Web site. Monitor your servers for CPU and network utilization, and manage your servers to 60 percent utilization before you add more capacity. Also, evaluate your Web site's performance from a location outside the company. Many companies seem to look at their Web sites from their desktop computers, which is usually over a dedicated T1 connection.