Ascend has introduced its Structured Data Lake that unifies and synchronizes data with the pipelines that operate on it.
The data lake, which works with Ascend's Dataflow Control Plane, allows users to directly connect existing business intelligence tools and data processing engines to its data management system. It uses artificial intelligence to intelligently manage data storage across the data life cycle. According to the company, this makes even mid-pipeline data sets available to existing processing engines like external Apache Spark, Presto and Apache Hadoop, as well as to familiar tools such as Jupyter and Zeppelin notebooks, without requiring additional code.
This approach tackles the age-old problem of data movement far upstream, said Eric Kavanagh, CEO of research firm The Bloor Group.
"The de facto information architecture for nearly all organizations is a labyrinth-like spaghetti ball of ETL scripts, with very little strategic visibility across the enterprise. That results in a ton of redundancy, and serious challenges to data quality and governance," he said. "With this new approach, Ascend can help solve for this unruly reality. In essence, Ascend has done for data movement what the likes of DataRobot, Datatron and Squark have done for machine learning."
The intelligent data storage layer, powered by AI, allows data engineers to query data flows. A “queryable” data flow is one that allows users to understand what is moving through the pipes. Kavanagh calls this the "missing ingredient" in data integration until now.
The built-in intelligence allows the software to understand and react to the pipelines running against it.
"This refers to dynamic cataloguing and processing of metadata, which can alert the organization to new apps coming online, and which data sets they’re accessing," Kavanagh explained. "That kind of visibility opens the door to dynamic optimization of the data lake itself, much like we’ve seen with query optimizers for the last couple decades."
And with the right metadata in place, IT staff can build apps more quickly. That's because the dynamically generated data pipelines will be much more streamlined and thus optimized, saving code, time and compute resources, Kavanagh said.
This approach also helps optimize performance, ensure data integrity and track data lineage.
"Declarative software is very powerful for optimizing code. That’s really what it does best," Kavanagh said. "Combine that with this upstream, dynamically generated, automatically catalogued data orchestration engine, and you’ve tackled data lineage — and thus, to a large degree, integrity — virtually out of the box."
Other features include management of all data and updates as they happen, automatic lineage tracking dependency management, automated storage maintenance, and deduplication of redundant storage and operations.
This is the third time in as many months that Palo Alto, Calif.-based company, which just came out of stealth mode in July, has introduced significant functionality. With an overall goal of de-risking big data projects, Ascend first introduced its Autonomous Dataflow Service, which runs on its Dataflow Control Plane, in July. That product's goal is to allow data engineering teams to quickly build, scale and operate continuously optimized, Apache Spark-based pipelines.
In August, the company announced "queryable dataflows," which aim to speed up data development by allowing data engineers to directly query incremental stages of any dataflow without changing tools or disrupting the development process, according to the company.