When one technology ascends, another usually finds itself falling out of favor. Such has been the case, it seems, for Hadoop, which has dropped from its apparent level of importance over the past 10 years as cloud computing becomes increasingly available and powerful, even as the amount of data it handles grows exponentially.
Hadoop (officially known as Apache Hadoop) is an open-source framework for storing data and running applications, offering huge storage capacity and processing power. With Hadoop, big data can be stored in a distributed environment, allowing for parallel processing. It can be stored in blocks of specified sizes, focusing on horizontal scaling instead of vertical scaling. First released by Yahoo in 2008 to the Apache Software Foundation, the framework is valuable for search, log processing, and video and image analysis, and part of its value comes from the fact that Hadoop is low cost, scalable and flexible.
A decade ago, things looked very different than they do today, as the future of Hadoop now is in question. Big data was becoming increasingly important for a wide variety of organizations, and startups such as Cloudera, Hortonworks and MapR sprung up to make use of Hadoop’s open-source base. Those three companies, in particular, raised $1.5 billion, much of it for Cloudera thanks to an investment of hundreds of millions of dollars from Intel.
In February 2008, Yahoo launched what it claimed was the world’s largest Hadoop production application, its search webmap, and a year later the company made the source code publicly available. By 2010, Facebook claimed to have the world’s largest Hadoop cluster, and by 2013 more than half of the Fortune 50 companies were using Hadoop.
In 2014, Hortonworks became the first Hadoop company to go public, and it eventually merged with Cloudera in a $5.2 billion deal. As of August, Cloudera’s market cap was below $2 billion — down from a peak of $4 billion in March. And in early August, MapR’s assets were acquired by Hewlett Packard Enterprise in a deal pegged as worth less than $50 million — a significant downgrade from the $280 million MapR raised.
It’s not that big data — increasingly just what data itself is these days — is no longer relevant. The problem for Hadoop is the cloud. Instead of Hadoop clusters, cloud platforms are becoming more commonly chosen. It’s the main culprit in Hadoop’s decline, according to Todd Wright, head of Data Management Solutions at SAS.
“For all the great promises of Hadoop, it could not compete with the ease of outsourcing and managing data within the various cloud providers available,” Wright said.
Hadoop’s open-source model may have been part of the reason for its downfall. Open source can be an asset because by its nature anybody can try out the raw code but then for-sale services can be built on it and sold to other enterprises. If an open-source project becomes popular, you end up with a situation where other people are using the tech and helping you advance it — often for free.
But for a project like Hadoop, while its open-source nature was an asset, its complexity was not. By 2016, InfoWorld was saying “peak Hadoop” had been reached, and in 2017, Datanami reported that the cost and complexity of the goal of bringing together data and computation would push the open-source framework to the sidelines, putting the future of Hadoop in doubt.
However, Hadoop isn’t going to go away overnight, said Mathivanan Venkatachalam, vice president of ManageEngine. After all, many organizations still use it for data storage. Hadoop should still have a place in projects involving parallel processing with massive amounts of data, Venkatachalam said, and even though Spark and Kafka frameworks seem to outperform Hadoop, the latter can still co-exist with the formers.
“A bigger threat to Hadoop may come from migration of big data workloads to cloud-based options, due to their reduced complexity and better resource usage,” he said.
A Future with the Cloud
There is that pesky cloud again, raining on Hadoop’s parade. Is it possible that the market changed too significantly thanks to cloud storage and processing, and therefore so did Hadoop’s usefulness? After all, the remaining Hadoop vendors have put some of their eggs in other baskets, including Spark.
Clouds and containers will continue to grow in popularity and use for big data, Wright said. But at its core, big data is about getting insights on more data, from more sources than ever before, and that’s where the market will see growth in the model management space, he said.
“The promise of big data never came from simply having more data — and from more sources — but by being able to develop analytical models to gain better insights on this data,” Wright said.
However, all of this doesn’t mean that there’s nothing more to watch for in Hadoop. “Hadoop will continue to play an important role in the big data ecosystem. It has established itself as a core element of an enterprise data strategy over the past years and continues to serve well in conjunction with other emerging technologies,” said Michael Zeller, secretary/treasurer on the executive committee of the Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining.
In July, Cloudera announced it would change its open-source licensing model to be more in line with Hortonworks. And just as the cloud has changed the situation for Hadoop, it might also be the way forward; Cloudera, Hortonworks and MapR have all made their platforms easier to deploy and manage in the cloud. As well, managed services like Google Cloud Dataproc (based on the open-source Hadoop and Spark platforms) and Microsoft Azure HDInsight (based on the Hortonworks platform) are available.
The data already stored and processed using Hadoop also remains valuable, Zeller said.
“While the big data hype might have faded, the attention at the executive level is now on creating value from the data that has been stored in Hadoop,” he said. “Therefore, data science, machine learning and AI [artificial intelligence] tremendously benefit from all the past efforts that went into creating the Hadoop ecosystem.”
It’s natural for a pioneering technology like Hadoop to experience some decline after more than a decade in the market, said Alex Bekker, head of data analytics at ScienceSoft — but that doesn’t mean it’s done.
“The solutions of the early big data adopters already rely on Hadoop,” he said. “As a complete revamp of these solutions requires both time and investments, it’s unlikely to happen massively. That’s why we can expect the demand for Hadoop support at least.”