Last month, I went shopping for new servers for my test lab. I haven't bought server-related equipment in several years, so I wondered whether buying computer hardware had gotten any easier. The computer industry recently appeared in the Better Business Bureau's (BBB's) list of the top 10 businesses with the most customer complaints—beating even used-car dealers in the number of complaints! Unfortunately, my hardware-buying experience supported the BBB's findings.
My servers are just test machines, so I didn't need anything snazzy: three EIDE drives per system (so I can play with mirroring and software RAID) and 1GHz processors. I expected to run into trouble on two points: Error-Correcting Code (ECC) memory (for reliability) and Preboot Execution Environment (PXE) boot support for access to Microsoft Remote Installation Services (RIS) servers. I found good news on the ECC front and bad news on the PXE front.
ECC is an essential part of any Windows NT-based system. When IBM PCs first appeared, they had a memory feature called parity checking that used an extra bit to detect memory failures. Unfortunately, parity checking could only find memory errors, not fix them, so the feature simply reported the error, then locked up the system. About 10 years ago, hardware vendors started shipping systems without parity bits. In the process, they saved a few bucks and prevented support phone calls from irate users whose systems crashed. Without parity checking, whenever a memory failure caused a system to crash, users generally blamed the software rather than the hardware. (I believe that many people who thought that NT was unstable were running NT on an unreliable memory platform.)
But Pentium systems changed that situation. A side–effect of the Pentium's hardware configuration is that if you gave those parity bits back to a system built with the right motherboard chips and added ECC memory, the system could detect an error and correct it automatically. And the cost difference is negligible. I recently priced 512MB of SDRAM; it was $74 without ECC compared with $84 with ECC.
The last time I looked at servers (a couple of years ago), the major vendors didn't offer ECC on entry-level systems. If you wanted a server with reliable memory, you had to buy a high-end system with a lot of unwanted doodads. Clones, however, are a different story. Every clone motherboard I've seen in the past 4 years supports ECC, as long as you install 72-bit SDRAM instead of 64-bit SDRAM and, on some systems, flip a CMOS switch to turn on error-checking.
On my most recent hardware search, I was pleasantly surprised to see that the top hardware vendors offer only servers with ECC support. So the bottom line on server purchasing is that the world is a better and safer place these days. Now vendors need to realize that our basic desktop and laptop systems need the same ECC protection.
My second must-have feature was an integrated NIC and software support for a PXE-bootable system. I'm a big fan of Windows 2000's RIS feature, but you can get to a RIS server only by using the generic RIS boot disk or through BIOS support for a RIS connection through PXE. The RIS boot disk works only with the 25 hard-coded built-in NIC drivers. If your system doesn't contain one of those 25 drivers, your system's BIOS must support PXE or you can't run RIS.
PXE boot support is the major reason that I decided to skip clones and go for big-name systems; I haven't found any PXE-booting clones yet. Unfortunately, none of the major vendor sites used the phrase "PXE compatible" or "RIS bootable," so I had to spend a few hours on the phone trying to find out which systems work with RIS. No vendor could answer the question in less than 30 minutes. Here's a suggestion to vendors: You spent all that money building systems that work with RIS, so tell your customers!
I ended up buying a Dell 500SC, which had the best price for the feature set I was looking for. Unfortunately, the PXE component didn't work. The system tries to boot with the PXE agent but hangs, claiming it can't find a DHCP server. So I booted the server with the RIS boot disk, hoping that the server had one of the magic 25 NICs. It did, and the RIS install worked flawlessly. But I'd paid for PXE support and it didn't work. So I contacted Dell and explained that if the RIS boot disk worked on the machine and the machine's built-in PXE agent didn't, then clearly the system's firmware had a bug, and did the company have an update? Dell's answer: That's a software problem—pay us money, and we'll look into it. I offered Network Monitor traces that clearly demonstrated that the DHCP server was responding to the Dell client software but that the Dell client software was ignoring the server. Dell still refused to support its product.
My experiences show that buying hardware is still as challenging as ever. In our rapidly changing world, I guess there's some comfort in knowing that some things never change.