Anyone working in data management and data science can attest to the challenge and time-consuming nature of mapping a set of data from a new source into a platform where it can be cleaned, validated, and ultimately analyzed and used to train algorithms. After all, your algorithms are only as good as the data used to train them.
Now imagine if these data sets are coming from hundreds of external users who have employed any number of systems to collect this data, from Excel files to actual shoeboxes full of photos. That is the challenge that non-profit wildlife conservation machine learning and artificial intelligence service provider Wild Me has faced over its more than a decade of operation. The organization builds open software and AI for the conservation research community. The organization is made up of technologists -- software and machine learning pros -- and it is designed to be the "trusted engineering powerhouse for wildlife biologists across the globe."
This AI software enables researchers to track individuals among different species -- whale sharks for example -- identifying them by unique patterns of spots. Wild Me created this initial use case algorithm and technology through a modification of a Hubble space telescope algorithm that looked at the pattern of stars in the night sky, according to Jason Holmberg, the organization's executive director, co-founder, and director of engineering.
During a scuba trip in Djibouti in 2002, he saw his first whale shark and learned how researchers physically tagged and tracked the animals. He thought there might be a better way, through computer vision algorithms that could identify individuals by their unique spot patterns. This work turned into Whaleshark.org, a library of encounters and individual whale sharks used and maintained by marine biologists.
But that was just the first use case. From there Wild Me expanded as a platform for other animal researchers, allowing them to upload their data to catalog a series of other species from manta rays to giraffes to sea dragons. The platform serves more than 200 organizations and nearly 1,000 researchers tracking nearly 90,000 animals around the world with close to 444,000 sightings in its database.
The challenge of moving biologists' catalogs of encounters and sightings and individuals into the Wild Me platforms has been a thorny problem from the start.
"It's been an evolving process," said Holmberg. "When we first started working with biologists across the globe, we would write custom importers for every piece of data. That custom one-off code would take weeks."
There were no universal standards for how individual researchers cataloged their data. Each researcher created their own system.
Because of this, the idea of a "universal data importer is sort of farcical," Holmberg said. "But we were able to solve half the problem." Wild Me started using a tool to let field biologists begin mapping their data to a common set of fields and descriptors. These biologists could review the data in the system and then approve it.
While this streamlined the process and made it faster, there were still issues that could be improved. The system wasn't all that scalable, and it didn't let the researchers validate their own data. Wild Me began piloting a tool from a company called Flatfile, designed to solve the issues of processing and validating external data from multiple sources.
David Boskovic founded the Flatfile after working at a few different SaaS companies and running into the same annoying problem each time: how to get new customers' data into the system when each customer had used different systems.
"It has been a universal problem. The cost and effort of bringing data in is one of the costs of innovation," Boskovic said. But it was very frustrating. "I like to say I rage-designed this product."
The other aspect of bringing data into a system is that your customers need to maintain ownership and control of that data. That's important for marketers. It's also important for field biologists. It's one of the reasons why Wild Me pursued the pilot with Flatfile.
"It's an intuitive system whereby a field biologist can maintain ownership of their data through the process of importing it into our system, and it will do things that we didn't currently have like data validation," Holmberg said. For instance, it will help "make sure all the GPS coordinates are in the right format. These are human-curated data catalogs. They do have errors."
During the validation process anomalies are presented back to the biologists who curated the data so that they can go back and clean up the data. This lets biologists see their data in one of the Wild Me platforms and work with that data in the platform.
The platforms are changing biologists' knowledge of the species they study.
"When I first started on whale shark research, everyone thought the Indian ocean was the big spot for that," Holmberg said. "As we built these online platforms, we could identify the movement of individuals...We now see that the Gulf of Mexico as one of the biggest hotspots for studying whale shark behavior."
In many cases, Wild Me is a researcher's first experience with cloud computing and storage and analysis for their data, so the goal is to make the system easy to use for people whose primary job is not technology.
Holmberg said that the data processing needs to be fast so that biologists can react to population changes with better conservation policy and strategies.
"Maybe that means to put up a fence, or take down a fence, or allow fishing, or ban fishing, depending on how variables impact population numbers," he said. "The faster we can estimate population numbers, the faster we can respond to changes and make sure our conservation strategies are iterating towards evermore successful solutions that help increase population numbers, especially for threatened and endangered animals."