Artificial intelligence has moved far beyond the stuff of science fiction. And, for all the benefits AI provides today, we can only guess at what the future of artificial intelligence holds. To help ensure that they will be able to take advantage of any and all AI advancements, many companies are making use of data lakes.
Indeed, one of the single largest tech trends of the last five years has undoubtably been the mainstream adoption of artificial intelligence. Within just a few years time, artificial intelligence has gone from being relatively obscure to being used almost everywhere. In many ways, it reminds me of the way that cloud services suddenly gained mainstream acceptance a decade ago. All at once, software vendors collectively felt the need to rebrand their products to reflect cloud readiness. Today, the same thing is happening with AI.
As with cloud services, there are countless use cases for artificial intelligence. One of the main use cases that is driving adoption (at least, in a generic sense) is that artificial intelligence engines can sometimes be used to spot trends and derive meaningful insight from an organization’s existing data. The flip side to that idea, however, is that for the artificial intelligence engine to do its job, it needs access to raw data. There are obviously a number of different ways of making this data available for analysis, but one of the best options may be to create a data lake.
If you aren’t familiar with data lakes, they are essentially just large collections of largely unstructured data. Generally speaking, a data lake can contain just about anything, from file data to data that has been created by IoT-enabled industrial sensors. Data lakes, by their very nature, are large and disorganized.
This, of course, raises the question of why an organization should create something as seemingly chaotic as a data lake, when it’s probably going to be easier to configure an artificial intelligence engine to analyze structured data instead.
There are a few different reasons why the data lake trend is taking hold. For starters, data lakes give you the opportunity to analyze data that might have previously been ignored. Structured data sets, by their very nature, are limited. The requirement for the data to adhere to a schema means that the data set can only accommodate very specific data. Data lakes, on the other hand, essentially act as repositories for pretty much anything and everything. As such, there is a feasible path for analyzing data that otherwise would not be usable.
The second reason why data lakes are worth considering goes back to the reason AI first started becoming popular in the first place. As previously noted, artificial intelligence really began to take hold in the enterprise when it was discovered that a carefully tuned AI engine could extract hidden business insight from otherwise mundane data, thereby giving the organization a competitive edge. The important thing to understand is that the technology that acts as the basis for artificial intelligence is still in its infancy. If today’s artificial intelligence engines are able to find hidden business value in our data, then just imagine what types of insight future artificial intelligence engines might be able to find. Given this possibility, organizations are increasingly retaining data within data lakes so that the data can be analyzed in the future.
The third, and possibly the most important, reason why data lakes are gaining popularity is that the data lake approach to storage allows an organization to be more agile.
One of the big problems with structured data is that it has to be carefully curated and organized before it can be used. The problem with this approach is that the rigid way in which the data is organized makes it very difficult to make schema-level changes.
Data lakes allow data to be handled in a completely different way. Rather than requiring data to be carefully curated and organized prior to being written to a database, data lakes can accommodate all data, independently of any schema. The schema is created at the time that the data is used (using a technique called "schema on read"). This makes it very easy for people to pick and choose the data that they want to work with, regardless of how that data is currently organized.
Data lakes require IT pros to think of data storage in a way that is completely different from how they might have thought of storage in the past. Even so, this new approach holds great promise for making organizations more agile and better positioned to take advantage of advancements in artificial intelligence.