Skip navigation
robot typing on a keyboard Getty Images

Iterative Expands Open Source DVC MLOps with Experiment Versioning

Widely used open source MLOps tool DVC gains new capabilities that will help data scientists more easily automate machine learning experiments at scale.

Among the increasingly popular open source MLOps tools in use today is the Data Version Control (DVC) project led by San Francisco-based Iterative.

DVC is a tool that helps data scientists and machine learning professionals manage datasets and machine learning models, and on Dec. 7, Iterative announced its latest release. This version of DVC provides new capabilities to improve machine learning operations (MLOps), and among its biggest new features is a capability that Iterative is calling experiment versioning.

Iterative co-founder and CEO Dmitry Petrov told ITPro Today that experiment versioning will enable data scientists to save all metrics and artifact information into their Git service alongside their code.

"By saving this meta information together with source code for experiments, data scientists can reproduce experiments and track changes faster across their entire team," he said.

With experiment versioning, Petrov said ML teams don't have to waste time toggling across tools trying to figure out how to set up and improve on old experiments based on stale information stored in a single machine learning tool.

"Meta information stored in one solution may not correspond to the correct experiment source code as things like data drift or business requirements changing occur," he said. "Experiment versioning from DVC fixes this."

Defining the Big MLOps Challenges that DVC Helps Solve

According to Petrov, a big challenge that will extend through 2022 is automation of processes around ML model development.

Organizations will continue to need faster feedback loops as data comes in, new parameters are introduced or code changes, Petrov said.

"ML models need to be retrained as fast as possible using the flexibility and scalability of clouds and Kubernetes clusters, so businesses can react to changes faster and keep up with competitive forces by testing new ideas in a more rapid manner," he said.

Looking forward, Petrov said Iterative will be automating more manual ML processes and filling in gaps along the ML model development lifecycle. One example of this will be an ML-specific Terraform provider for training models across various clouds such as AWS, Azure and Google Cloud Platform, as well as using Kubernetes. Terraform is a popular infrastructure-as-code tool that enables rapid and repeatable deployment of infrastructure configuration.

"Our goal is for data scientists to have similar sets of tools that DevOps teams provide for software development engineers, all optimized for AI and ML model development," Petrov said.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.