Everyone realizes that the Windows NT SAM is a domain-scalability bottleneck, given its practical limit of about 40,000 user accounts in any one domain. Some companies have probed the upper limits of the SAM and have built very large domains. But they've found that these domains are difficult to manage. To work around the limitations of the SAM, many implementations have included far too many account domains.
Windows 2000 is based on Active Directory (AD), a repository for user accounts and many more object types. You can be confident that Microsoft won't make the same mistake twice—AD is more scalable than the SAM is. How much more scalable is an interesting question. How many objects can you store in one domain, how large and manageable is the database, and what type of performance can you expect from AD?
To find out, we created a very large AD database, whose capabilities we demonstrated at Comdex in Las Vegas in November 1999 and at the Win2K launch in San Francisco in February 2000. Our AD scalability demonstration shows that the database can cope with 100 million entries in a realistic production environment. Before we show you how we built the demonstration database and reveal what the building process taught us about AD, let's review some AD basics.
AD Database Basics
AD is a transactional database that features a write-ahead logging model and uses Microsoft Extensible Storage Engine 97 (ESE97) technology. Microsoft Exchange Server 5.5's Information Store (IS) and Directory Store also use ESE97. Although some small differences exist between Exchange Server's and AD's ESE97 implementations, the lessons that Exchange administrators have learned over the years are good preparation for AD management. Exchange 2000 Server uses the ESE98 engine, a newer ESE version that supports database partitioning and the streaming file for Internet content. Currently, AD doesn't need to partition its databases and holds only record-oriented data, so there's no point to AD using ESE98.
We've accumulated a great deal of experience with ESE. We know it's scalable because many Exchange 5.5 servers support databases larger than 100GB. The database can grow large without compromising performance if administrators carefully balance the I/O load. Most important, ESE can deal with hardware failures through soft or hard recoveries of data from its transaction logs. In a soft recovery, the system fails for some reason, but you don't need to restore the AD database file from backup. A hard recovery is typically caused by a catastrophic disk failure that requires you to restore the AD database from backup.
Figure 1 shows AD's architecture. You access AD through various client interfaces. Clients such as Microsoft Outlook 2000 use Messaging API (MAPI), whereas Win2K's standard Find Users feature uses Lightweight Directory Access Protocol (LDAP). Programs such as ADSIEDIT (a tool from the Microsoft Windows 2000 Resource Kit that lets you examine information about AD objects) use Active Directory Service Interfaces (ADSI) as their main programming interface.
The directory service agent (DSA) handles transactions and communicates through the database layer to ESE. The DSA and database layers represent the AD schema and functions; ESE is concerned only with managing information within the database. The database layer is responsible for taking data from the DSA and transforming it into a format that ESE understands.
The files on disk include ntds.dit, which is the AD database; a set of transaction logs; a checkpoint file that records the last buffer committed to the database; and a temporary database file. These files are comparable to dir.edb (the ESE97 database that the Exchange 5.5 Directory Store uses) and its attendant transaction logs and checkpoint file.
Figure 2 shows some of the files we used in the AD scalability demonstration. Note the size of the database (ntds.dit), which contains 100 million objects. Notice also that the transaction logs are 10MB. In contrast, Exchange transaction logs are 5MB. The difference is due to the size of the records within the databases. Exchange uses a 4KB record; AD uses an 8KB record, largely because AD holds more than 4KB of information for an average user account. Microsoft could have specified a 4KB record size for AD, but the result would have been a large number of page overflows and an inefficient internal database structure. Larger records mean that each transaction can capture more data, so Microsoft increased the log-file size to avoid some file-creation overhead.
I/O Patterns and Transaction Logs
Observation shows that 70 to 90 percent of all operations that AD performs on ntds.dit are reads. These numbers aren't surprising because AD's basic function is user authentication, which requires a check of passwords held in AD against user-supplied credentials. Even an application such as Exchange 2000, which puts information in AD, performs more reads than writes because it tends to retrieve far more information than it updates.
Access to ntds.dit is multithreaded and asynchronous. In other words, many threads run at one time to service requests from different clients and applications. Write operations return control to ESE immediately, but ESE might wait for a while before applying an update to the database.
The system reads log files only during a soft or hard recovery operation, when it must replay transactions. Writes are single-threaded and synchronous. One thread controls all writes, so only one write operation can be in progress at any time. The calling thread must wait until a write is complete before the thread can proceed. AD accesses log files sequentially and appends data to the end of the current log.
When you add, modify, or delete an AD object, the database engine first writes the transaction to a set of buffers, forming an in-memory cache, then immediately captures the transaction in the current transaction log (edb.log). When both operations are complete, the system considers the transaction committed. This implementation ensures that the data is recoverable before the system makes any attempt to write it to ntds.dit.
Lsass.exe (the Local Security Authority—LSA—process) controls all AD transactions. When system load allows, lsass.exe checks the memory cache for unsaved pages, saves those pages to the database, and moves the checkpoint pointer while the system saves buffers. Should a system crash or disk fail at this point, you can recover the data from the transaction logs.
To write to its databases and transaction logs, AD uses exactly the same principles as Exchange does. Like Exchange, AD supports circular and noncircular logging. In circular logging, the system recycles as many as 20 transaction logs in a set to hold transactions. In noncircular logging, Exchange maintains a complete set of transaction logs until the set is removed after a successful full backup (ideally, you should perform backups daily). However, AD automatically cleans up transaction logs every 12 hours, which doesn't happen in Exchange.
AD comprises organizational units (OUs), into which you place objects. Each object has a distinguished name (DN), which is a fully qualified LDAP representation of the object. A DN is composed of a set of relative distinguished names (RDNs); an object's RDNs together describe the object's position in the directory hierarchy.
The first challenge in testing how well AD scales was to find a lot of data to load in the database. We could have written a program to generate 100 million names and addresses, but using real data generates a more realistic load. Plus, we can use real data for other purposes. We obtained a set of seven CD-ROMs containing US phone numbers organized by state in zipped files from infoUSA.com. Obviously, we couldn't just dump the raw files into AD, so we wrote a program to read the data and create LDAP Data Interchange Format (LDIF) load files.
LDIF files are simple text files that contain commands to insert, update, or delete records in an LDAP-compliant database. We generated a separate LDIF file for each state. The first command in the file created an OU for the records. For example, the following command generated the OU for New Hampshire:
# Create the OU dn: OU=NH, DC=USA, DC=Compaq, DC=com changetype: add objectClass: organizationalUnit
Subsequent file commands created an AD object for each phone number. The commands read a limited number of attributes from the input file and populated AD with them. The commands also performed some processing to generate a unique DN for each object in the OU and to eliminate records without phone numbers. Listing 1 shows the LDIF command to add a typical contact object.
After generating the LDIF files, we used the LDIFDE utility to load them into AD. Part of the resource kit, LDIFDE can import data to and export it from any LDAP directory, including AD. We used the following command to import the New Hampshire data:
C: > LDIFDE k y i f NH.LDF
The switches instruct LDIFDE to ignore error messages warning that the object already exists (-k), use lazy commit to write to the AD (-y), import the data (-i), and specify the import file nh.ldf (-f). In a lazy commit, LDIFDE loads data into memory, then proceeds without receiving confirmation that the database engine has committed the data to disk. The data is committed later, when the system load permits.
The delay is acceptable if you protect the data by isolating the transaction logs from the database and ensuring that any write-back cache used on the storage controller has battery backup. While we were loading data for the AD demonstration at Comdex, a complete power failure occurred. When the power was restored, the storage controller fed the data held in the battery-protected cache to the system and the load proceeded without any lost transactions.
Processing and loading each state's LDIF file took us from 1 to 8 hours, depending on the size of the state. So, we created a Visual Basic (VB) program to scan the directory in which the LDIF files were generated and load them sequentially as the files became available. Because many records had no phone numbers, we loaded two states (Texas and California) twice to achieve the magic number of 100 million records. The whole loading process took 3 weeks to complete, but we weren't loading continuously during that period. We had to take time out to tune the load program and perform various other tasks.
The infoUSA.com CD-ROMs grouped phone number records by state, and we loaded each state into its own OU. We made no effort to break a state into counties or cities and use a separate OU for each county or city. The basic OU structure didn't affect lookups, and a last-name search usually took less than a second to complete, except when we specified a common name such as Smith without any other criteria. Figure 3 shows the results of a search for an uncommon surname (Capellas) in the state of Texas. As you can see, AD found three records (out of 5,965,164 Texas phone number records). The speed and efficiency of the searches prove the power of the ESE database engine. You can perform a search against our database yourself. To find out how, see the sidebar "A Web Demo."
OUs are designed to break up AD data into manageable chunks. Loading more than 10,000 objects into one OU is a bad idea because the performance of standard management tools such as the Microsoft Management Console (MMC) AD Users and Computers snap-in degrades significantly if you ask these tools to fetch large amounts of data each time an OU is expanded. By default, the MMC AD snap-ins fetch 2000 records when they open an OU. You can set an option to fetch more records, but this will slow down operations. It's best to design an OU structure so that each OU stores fewer objects. For example, our US phone number AD would have been far more efficient if we had established an OU for each county within a state and further subdivided some large counties.
Each phone number is represented in the AD as a separate contact object. Contacts are smaller than user objects because they aren't Win2K security principals. We could have created user objects rather than contact objects but chose not to because the load would have been slower. The resulting database would also have been larger (maybe twice as large), but searches would likely take the same amount of time because AD uses indexes effectively when searching attributes such as a last name.
AD replicates information between domain controllers to keep the directory in a state of loose consistency. The exact state of consistency depends on how often the domain controllers replicate, how many domain controllers are involved in replication, and the number of changes that occur to the data. Creating a very large directory on one server establishes a potential single point of failure, so we added a second server to our configuration.
You use the Dcpromo procedure to promote a Win2K server to become a domain controller. The promotion process replicates a complete copy of AD to the new domain controller. The load program can create new objects at slightly more than 130 objects per second, or nearly half a million objects per hour; however, replication proceeds much more slowly, at about 30 objects per second, probably because remote procedure calls (RPCs) send each object individually from an existing domain controller to the server you're promoting. The default size of a replication packet is roughly 900 objects. You can increase the packet size to send more objects at one time, but this action increases the amount of memory the process uses. As Figure 4 shows, promoting a server to a domain controller in a domain that hosts a very large AD results in a lot of data transfer, so the operation takes a long time to complete. In our case, replication finished after 7 days. Most Win2K sites don't need to replicate this much data, but replication is clearly an area that deserves attention as Microsoft continues to tune AD.
To be effective, NT domain controllers don't require a high-performance configuration. Some domain controllers are based on old 486-class systems equipped with 64MB of memory and a small disk, a setup that is sufficient to provide authentication services for a small domain. Win2K domain controllers also authenticate client access, but the AD replication mechanism and the transactional nature of the database mandate a more robust hardware configuration. The good news is that effective Win2K network designs feature far fewer domain controllers than NT networks do.
Some Win2K applications make specific demands on AD. For example, Exchange 2000 stores all its information about mailboxes, stores, and connectors in AD and accesses a Global Catalog (GC) server to provide the Global Address List (GAL) to clients and to route messages. Therefore, in a Win2K project that involves Exchange 2000, you must place GCs so that every Exchange 2000 server can easily connect to one.
Win2K domain controllers and GCs need the same type of hardware configuration as Exchange 2000 or Exchange 5.5 servers need. For large domains, a typical domain controller should have two CPUs, 256MB of memory, a RAID 1 mirror set for the AD logs, and a RAID 5 set for the AD database. "AD Test Hardware Configuration," details the hardware we used in our scalability test. Figure 5 shows the disk configuration on the server. As with Exchange, the database and the transaction logs must be in separate volumes. If both are on a single drive, failure of that drive will render both the database and the transaction logs unavailable and result in data loss.
The domain controller configuration we've described is a basic one—Win2K binaries and other application files require additional disks. A RAID 0+1 volume (i.e., striping with mirroring) provides better I/O performance and is appropriate for a server hosting a large database that you expect to be heavily accessed, such as a GC that serves several large Exchange 2000 servers.
Remember, Exchange 2000 depends on a GC to provide the GAL to clients and to make every routing decision that is necessary to deliver a message to a user. The GC is the definitive source for discovering which server holds a user mailbox, and the Exchange 2000 routing engine must access the GC to determine how best to process each message. The efficient use of a cache for recently accessed addresses mitigates the potential performance impact of all the GC lookups, but a cache can't compensate for an underpowered system configuration.
Most Win2K administrators won't be interested in running an AD as large as our test database. However, as the pace speeds up in the application service provider (ASP) race to deliver services based on AD-enabled applications such as Exchange 2000, an obvious need is developing for the ability to build and manage directories that host millions of objects. Our AD scalability demo proves that AD can host millions of objects and delivering good performance at the same time, provided the implementation is well planned and hosted on the right type of hardware.
|AD Test Hardware Configuration|
Compaq ProLiant 8500 with eight Pentium III Xeon 550MHz processors, a 2MB cache, and 2GB of 100MHz Error-Correcting Code (ECC)-protected SDRAM DIMM memory
Compaq StorageWorks ESA12000 storage controller with four I/O subsystems, each protected by a 1GB nonvolatile mirrored ECC write-back cache with battery backup
Forty-eight 18GB Ultra SCSI disk drives