Information interchange has reached all new levels. Now, much more than before, organizations are relying on large data sets to help them run, quantify and grow their business. Just a few years ago, we were already working with large databases. Over the last couple of years, those demands have evolved into giga, tera, and petabytes. This data no longer resides in just one location. With cloud computing, it is truly distributed.
More organizations will be placing their core business components within a data center and the cloud. Why? It simply makes sense. High-density computing and share environments are the core structure of the modern data center. The big question becomes not whether you want to manage it all – or have someone else do it. Remember, data center dependency is only slated to increase over the coming years.
The reality really sets in when we look at the numerical information provided by some organizations:
- IBM recently released a study showing that the end-user has created over 2.5 quintillion bytes of data. Furthermore, they go on to point out that more than 90% of all of the data in the world has been created over the last couple of years.
- Giants like Walmart are faced with equally growing challenges. With numerous stores all over the world, IT systems have to process over 1 million transactions. Furthermore, because of their size and the amount of product they carry and work with, Walmart has to manage over 2.5 petabytes of data.
This growth and reliance around data will be offloaded to the only platform that can handle these types of demands: the data center. Any growing organization must look at data center hosting options as a viable solution to an ever-evolving business and IT environment. Whether this is a cloud solution or a manage services option, the modern data center is the spot that can support changed business needs and evolving IT solutions.
Database administrators have been forced to find new and creative ways to manage and control this vast amount of information. The goal isn’t just to organize it but to be able to use the data to further help develop the business. In doing so, there are great open-source management options that large organizations should evaluate:
- Apache HBase. This big data management platform was built around Google’s very powerful BigTable management engine. As an open-source, Java-coded, distributed database, HBase was designed to run on top of the already widely used Hadoop environment. As a powerful tool to manage large amounts of data, Apache HBase was adopted by Facebook to help them with their messaging platform needs.
- Apache Hadoop. One of the technologies which quickly became the standard in big data management can be found with Apache Hadoop. When it comes to open source management of large data sets, Hadoop is known as a workhorse for truly intensive distributed applications utilization. The flexibility of the Hadoop platform allows it to run on commodity hardware systems and can easily integrate with structured, semi-structured, and even unstructured data sets.
- MongoDB. This solid platform has been growing in popularity among many organizations looking to gain control over their big data needs. MongoDB was originally created by the folks at DoubleClick and is now being used by several companies as an integration piece for big data management. Designed on an open-source, NoSQL engine, structured data is able to be stored and processed on a JSON-like platform. Currently, organizations such as the New York Times, Craigslist and a few others have adopted MongoDB to help them control big data sets.
Our new “data-on-demand” society has resulted in vast amounts of information being collected by major IT systems. Whether these are social media photos or international store transactions, the amount of good, quantifiable, data is increasing. The only way to control this growth is to quickly deploy an efficient management solution. Remember, aside from just being able to sort and organize the data, IT managers must be able to mine the information and make it work for the organization. I know there are a lot of other open-source big data options out there. Where have you seen success and what have you been using?