A tech company that has spent the past several years building tools to help organizations better understand their customers is betting that its newest and most ambitious effort will resonate loudly and clearly.
Segment, which provides a customer data platform, this week announced Segment Data Lakes, which it says will vastly improve the value companies can get from their traditional data warehouses and data lakes.
Segment Data Lakes, which was built from the ground up as a cloud-based solution, combines data lake and data warehouse architecture. It aims to make data more usable without the complexities of traditional data warehousing—basically turning a data warehouse into a more valuable data lake. It does this by providing an out-of-the-box solution that pre-conditions and enriches data, which allows companies to build the queries they need to better understand their customers, bypassing the data warehouse layer.
"It can be very difficult to make the data in data lakes discoverable and usable because of schema issues," explained Daniel Newman, principal analyst of Futurum Research. "As a result, data sets are rarely able to be fully utilized, or the engineering requirements to get it structured are really expensive and time-consuming."
Segment's approach circumvents this problem by automating the architecture and quality validation while handling the storage, schema, inference, security and privacy management. It also maintains the system on an ongoing basis.
The solution includes a storage layer to hold optimized and schematized customer data in a scalable object store. Because it uses object storage optimized for storing massive amounts of raw data, users can both store growing amounts of data and access that data quickly and efficiently. While most object storage solutions today don't have a universal data structure and typically require a lot of engineering work to meet the basic needs of data scientists and analysts, Segment Data Lakes takes a different approach.
"It takes unstructured raw data and automatically applies a universal structure to it, partitioning the data by source, event type, date and time and converting it into an event table format," said Segment co-founder Calvin French-Owen. "This allows data scientists to take any raw data set and narrow in quickly on the data they care about, giving them the granular access they need to more easily power advanced analytics and build AI/ML models."
The storage layer connects with a metadata store, which makes data discoverable and integrates into a decoupled compute and query platform. This allows analysts and data scientists to query data using engines like Amazon Athena or load it directly into their Jupyter notebook, according to the company. It also integrates with distributed frameworks such as Apache Spark and Hadoop.
While the data warehouse approach has merit, it can also limit the ability of companies to segment and analyze data in ways that provide personalized, real-time experiences. Data warehouses also can get more expensive to operate as data stores grow. Although data warehouses are useful for storing recent, structured data, data lakes are the best-in-class solution for storing large quantities of raw information that can be used to create better customer experiences, French-Owen said.
Segment Data Lakes is initially available for Amazon Web Services (AWS) services, but has plans to expand to Microsoft Azure, Google Cloud Platform and others.