The term “high performance computing” (HPC) would logically connote a computer system that is fast, persistent, pervasive, and capable of solving major problems. Certainly, an HPC system is all those things, but within enterprises, the term denotes a special configuration of distinct technologies that will yield high performance.
Indeed, enterprise-focused HPC systems aggregate or combine compute power in a manner so that the combined system can deliver much higher performance than a single large workstation. The HPC environment is designed so that a cluster of computers can pool the processing power of each individual node (which consists of multiple processors and multiple cores) and can more efficiently distribute the processing workload, resulting in higher performance.
Combined with high-speed networking architecture and vast storage systems, enterprises can address compute needs that require significant compute performance, low system latency and speedy access to huge data repositories.
Demand for High-Performance Computing
Over the past two decades, enterprises have realized the value of using clusters of computers to solve complex mathematical, computational and simulation/modeling problems. By addressing these massive problems using parallel computing techniques (allowing the problem to be split into parts that can be tackled by individual or groups of processors), the time to complete a solution can be drastically reduced. Common HPC tasks include conducting digital simulation; stress testing; and enabling extremely high-speed, high-frequency trading activities.
[This article is from research firm Tractica’s report on enterprise high-performance computing. View full report details.]
However, as enterprises have become more focused on automating manual processes, as well as incorporating some degree of cognition or intelligence into their systems, it has become clear that these processes require the ingestion and analysis of large amounts of data, and single workstation or server-based processing would simply lack the speed and power to provide results in a reasonable amount of time.
As enterprises scale these artificial intelligence (AI) pilot programs, which often incorporate deep learning (DL), machine learning (ML) and natural language processing (NLP), across the enterprise, the need for a high-performance compute and storage environment becomes clear. Tractica projects that these AI-based activities will spur additional HPC compute and storage, software and services revenue over the next six to eight years, above and beyond the spending on traditional HPC use cases and systems.
As is the case with any high-value technology purchase, the procurement of HPC compute, storage, software and services is predicated not only on the demand for increased processing horsepower, but the promise of a strong return on investment (ROI). Unlike their counterparts in research institutions, labs or government agencies, enterprises must be confident that any investment in HPC technology (top-end HPC servers alone can reach costs pushing $400,000) will, in fact, help spur a significant improvement in productivity, efficiency, accuracy and, ultimately, profitability.
As noted above, enterprises are increasingly rolling out AI-focused programs that are compute- and storage-intensive, which has led to the demand for new HPC environments. While some of these activities may be handled via cloud service providers or HPC-as-a-service (HPCaaS) vendors, Tractica projects that the majority of investments will be made in on-premises data centers or server farms. Utilization rates, concerns about data security and latency, and the type of data being processed likely will help fuel the demand for HPC environments.
However, not all enterprises want or are able to make a significant capital expenditure (capex) to handle these build-outs. Several HPC vendors interviewed for this report noted that they can offer flexible HPC acquisition terms, which include offering compute and storage on a service-based model (while putting the hardware on the company’s premises), so that expenses can be classified as operational expenditure (opex), rather than capex. Others are offering “elastic” HPC, which allows the enterprise to purchase a modest HPC system and “burst” into a cloud-based HPC offering during times of heavy demand.
Still, there are significant barriers to steep revenue growth in this market. HPC environments are expensive, and despite vendors’ claims of offering “plug and play” solutions, many enterprises still do not have the requisite infrastructure or data science knowledge in-house to make an informed purchase decision.
Furthermore, despite the hype surrounding AI, it has become apparent that much of the market activity surrounding AI is being made by a relatively small slice of the market, in terms of rolling out enterprise-wide AI applications that require massive compute and storage power. While the news media is all too happy to publicize AI activity from “hyperscale” companies, such as Uber, Amazon and Google, many more mainstream enterprise companies are still in the proof-of-concept (PoC) phase and may not be quite ready to invest in an expensive HPC environment.
Another potential speed bump may be related to the collection and use of data that powers AI learning algorithms. Congressional testimony from Facebook founder Mark Zuckerberg has helped shine a light on the vast amount of personal data that is being collected on citizens in the United States and around the world, even when they are not logged into the social media platform. Similarly, news reports highlighting other companies that may collect and have even more data (such as Google), have led to increased scrutiny by regulators regarding who owns and who has the right to use and monetize this data.
If significant regulatory changes are put into place that curtail the use of personal data, it is possible that there may be a decline in demand for the HPC systems that are used to crunch the numbers that power DL algorithms. While it is unclear as to which way the regulatory winds are blowing (the public’s attention to privacy and security issues tends to wax and wane, based on the time proximity to a security breach or data privacy scandal), HPC demand from enterprises that collect, mine and monetize personal data may be impacted.
High-Performance Computing Market Structure
HPC environments generally consist of several components or elements that are often sold as a package or a solution, though individual elements can usually be purchased separately as well. Most major vendors support open-source programming languages and protocols, thereby allowing enterprises to select technology from their preferred vendors.
Most HPC system environments are based around a standard design, which includes the use of parallel systems, to allow large and complex problems to be solved by increasing the number of processors, or nodes, to increase the system’s compute capability. This is somewhat different from a traditional supercomputer, which is generally a single massive computer system.
From a hardware perspective, HPC environments consist of four primary groups of components:
- Servers: Usually configured to work in clusters, they distribute the compute workload over many machines, thereby improving performance and efficiency.
- Processors: Traditional HPC environments use multi-core CPUs, which offer parallel execution, streamlined memory access and cutting-edge processing techniques. For training AI models, many enterprises are selecting CPU-based servers that are augmented with graphics processing units (GPUs), to handle specialized floating point, graphics and other complex computations.
- Low-latency networks: Specialized hardware, such as ultra-low-latency switches, message acceleration appliances and high-performance network monitoring and management tools, are employed to accelerate, streamline and manage the intense network traffic typical of HPC deployments.
- Parallel storage: Parallel file systems running across multiple compute nodes can eliminate bottlenecks for large file transfers and ensure performance across large storage infrastructures. Solid-state (such as flash) storage sharply improves the system’s response time, throughput and reliability, while hybrid storage arrays are designed to efficiently house data across tiers of solid-state and spinning media to balance cost and system performance.
Riding on top of the hardware are HPC-optimized versions of Windows, Linux or other large enterprise-grade operating systems, and, when not provided within the OS, cluster software that can link and allow parallel processing to distribute the workload across the entire HPC environment. Scheduling software, which prioritizes and queues jobs for processing, is also required to maximize the usage and efficiency of an HPC environment.
Tractica views the HPC market through two distinct, yet related, segments. HPC vendors report that there is a distinct segmentation of teams responsible for HPC, with traditional HPC programmers focused on programming HPC software to address a specific problem, often one rife with data and analytics, while other teams are often set up to manage the growing use of HPC environments to conduct full-scale rollouts of ML and DL programs.
Tractica has segmented the market into two broad groups: traditional HPC use cases and AI-focused use cases. This is to ensure that a wide range of use cases and industries were represented across the forecast, allowing for comparisons between traditional HPC and the growing range of AI-focused use cases.
Tractica forecasts that the overall market for enterprise HPC hardware, software, storage and networking equipment will reach $31.5 billion annually by 2025, an increase from approximately $18.8 billion in 2017.
The market is dominated by HPC equipment used for traditional use cases, or situations in which an HPC system is used for heavy-duty number crunching, simulation and analysis--techniques that require the brute force of cluster computing to reduce the time to complete complex calculations. In these cases, HPC is not used to support any AI-related techniques, such as ML, DL or NLP.
Tractica projects that while AI-focused HPC use cases accounted for just 7.4% of the overall segment revenue in 2017, that percentage will increase to 35.4% by 2025. Essentially, while revenue from traditional HPC activities will remain larger than revenue from HPC AI use cases, the latter segment is growing significantly more rapidly.
It should also be noted that, increasingly, the line between traditional HPC and AI HPC will be blurred, particularly as traditional systems and processes start to incorporate ML and DL functionality. Tractica projects that much of the spending on AI-focused HPC, however, may be classified as traditional spending, particularly if physical infrastructure and resources are being shared between the traditional and AI-focused tasks.
View details about the full Tractica report.