Microsoft notched up another milestone in its odyssey into clustering technology with the introduction of the Compute Cluster Server (CCS) 2003 in August. This is Microsoft’s first entry into developing systems with supercomputer power.
The aim is to develop systems that aggregate a number of 64-bit computers into a number-crunching cluster that will allow massively parallel processing that was previously only available using High Performance Computing (HPC) systems. Apart from being massively parallel, HPC systems are also massively expensive and typically cost over £1 million.
That is the story according to Microsoft but the truth is slightly more complex. Nathaniel Martinez, programme manager at IDC, explains: “Microsoft has been feeling the heat from Linux in the HPC market, mostly with the growth of servers based on AMD Opteron chips in that space as well. In Addition, Microsoft has a policy of Linux containment and HPC is where they didn't have a similar offering to Linux.”
Back in 2002, Cray made a deal to offer a Linux HPC cluster running on Dell PowerEdge servers. This linked the name of the best known supercomputer specialist, the leading x86 server manufacturer and Linux – a powerful driver to gain acceptance of Linux as a serious player in the HPC market. In fact the power of a cluster is equivalent to the original Cray Research company’s supercomputers from 1991 but at about a ten thousandth of the cost.
Martinez believes that developments in the supercomputer market made x86 servers more attractive. In the search for greater compute power the supercomputer architecture has moved away from scale-up using more powerful processors and buses. The rise of scale out, applying more processors to the task, in the HPC market mirrors what has happened in the x86 world with clustering. Given that x86 processors are now challenging the speeds offered by the specialist processors developed for supercomputing, and in some cases replacing them, the future seems obvious for the likes of Cray.
Microsoft may be a new entrant but it has some perceptual advantages over Linux. Martinez says: “My feelings are that Microsoft is slightly more successful in supporting the more mission-critical enterprise applications than Linux with its typical applications at the edge of the network.”
Microsoft server skills are also more prevalent and complementary applications related to managing the CCS network are familiar products, such as Active Directory, Operations Manager 2005 and Systems Management Server 2003.
CCS is supplied on two CDs, one containing the base operating system and the other holding the interfaces, utilities and management tools that constitute the Compute Cluster Pack. The base operating system is a derivative of Windows Server 2003 specifically designed as a cluster node for CCS. The heart of the system is the Compute Cluster head node. This is the system manager housed on a 64-bit x86 server which also allocates and schedules jobs around the cluster. The head node also differentiates between the access rights allocated to Compute Admins and Compute Users. The management interface can also be run remotely on a 32-bit desktop running Windows XP or Windows 2000.
The head node will usually be set up with the basic CCS operating system for the first disc but it can also be built on top of a Windows Server 2003 x64 Edition using the Compute Cluster Pack. This would be applicable if the server is required to run additional applications alongside CCS. Such applications may be an implementation of SQL Server or a dedicated iteration of Active Directory, the key application for managing administrators and users on CCS.
Linking the cluster together under the head node depends on the chosen topology. The simplest is to have a single network interface card (NIC)on each node to pass messages around the cluster. The most complex set up could require three NICs on each node: one for a high-speed, dedicated, Message Passing Interface (MPI) network, another for the corporate network, and the last for a private, dedicated, cluster management network.
Microsoft’s design template was to make CCS as easy to deploy as possible to avoid the typical HPC hierarchy that resembled the old mainframe batch process. Under this there would be a dedicated IT team managing and deploying nodes, while users would submit batch jobs and compete for resources allocation. CCS automates much of the job allocation through rules set up under Active Directory and makes the cluster just another resource on the network from the user perspective.
This does not mean that CCS is a complete replacement for existing HPC systems. It is very much a Version 1 implementation. This means that it is a very basic workhorse without some of the sophistication of more mature systems. For example, job allocation is fairly rudimentary and a task is issued to a node on a first-come-first-served basis. This means that if one server node is particularly powerful it could end up with a fairly light job while a less powerful processor labours under the strain of some compute-hungry task. At the moment, the only way around this appears to be manual intervention to allocate jobs but Annemarie Duffy, infrastructure server team manager, claims that CCS will develop to include these functions in future versions.
Microsoft's entry into most markets in the past has been welcomed because of the legitimacy that their entry has loaned to whatever technology they are espousing. The HPC arena is different because it is much more mature. However, these are not necessarily the customers that Microsoft will end up with.
Martinez says: “People currently in HPC are primarily after the performance and functionality offered in traditional HPC markets. What Microsoft is trying to do is to create some kind of HPC need at the enterprise level for people that need power but cannot justify the costs of traditional HPC. All they care about is the pure power of their boxes and that is what Microsoft is offering with its dumbed-down version of HPC. It offers an easy way to deploy small HPC environments with a very low level of support from the systems administrator. A way to cluster computational power very quickly and really efficiently. The cutting edge users will still prefer the traditional way of doing it and that's why I say Microsoft will really need to create a demand for itself.”
Even so, Microsoft has been testing the system in the traditional markets: automotive, aerospace, life sciences, geo sciences and financial modelling. Although Martinez is sceptical of the potential success of replacing existing HPC systems in these markets, he does think that seeding is a good idea. “I think Microsoft is trying to seed the market at this stage and see where it's most likely to have an opportunity in the long haul. Another value proposition is that there is very little support needed and this will make it compelling for university students who need some kind of HPC functionality but don't need the full strength of a worldwide recognised lab.”
Most of Microsoft’s test deployments have been universities where the economics of the system are important. When the product hits the market, Microsoft estimates it will cost around $469 (£254) per node, depending on volume and other discounts. This means that a respectable system could be put together for just a few thousand pounds and educational discounts would offer even greater value. The pricing will also make it an interesting proposition for financial modelling in enterprise markets. Corporates are used to developing quite sophisticated models of their business environment and playing “what if?” games. The lowering of the price of entry might make modelling more widespread.
The only problem is that the CCS is a facility not an application. It may be that Microsoft will have to work with partners to develop CCS applications to give its HPC thrust broader appeal. In academe and R&D departments, users will be keen and able to build their own applications which may offer a source of income if they can be later modified for commercial use.
Microsoft’s entry into HPC may not be as potentially hazardous as its initial approaches to other markets in the past. HPC users tend to be more daring because they are working towards a specific goal and will use whatever tools they think will deliver the best results. They are also cash-rich and funding a test cluster using CCS that could run alongside their traditional systems will not cause a massive dent in the annual budget.
Persuasion is the key. Linux has sneaked into HPC environments through its low cost of entry, potentially lower than Microsoft CCS. Linux also has a lead of several years’ experience in HPC. Martinez comments: “I don't think Microsoft should try to compete on the pure functionality of the product – it's like a dumbed-down version of an HPC cluster. It should emphasise the ease of deployment with basic functionality. This is a ‘good-enough’ HPC system.”