UNAVCO has put its data practices and services on firmer ground following the implementation the Observable data visualization platform.
UNAVCO, a nonprofit consortium supported by the National Science Foundation, NASA, and the U.S. Geological Service, aims to understand the Earth and how it changes over time. UNAVCO analysts dig deep into geodesy data, measuring earth in submillimeter detail. UNAVCO has more than 30 years of data, with roughly 1,000 sensors to capture information. Currently, the organization keeps about half a petabyte of data, and data stores grow at about 17 terabytes per quarter.
But while UNAVCO has used its data for successful research projects, it recognized it could do much more if the data was more easily discoverable, accessible, and managed. The challenges were so significant that some researchers had abandoned potentially valuable research projects.
“We’ve got sensors pulling in all sorts of data, but it’s not useful if humans can’t interact with it and use it in scientific research,” said Brooks Mershon, a software engineer who builds web-based pedagogical tools for data access and exploration on UNAVCO’s geodetic data services team. “If there is no easy way to comb through time and space queries to get what you want, people tend give up.”
Discovering data has long posed a major challenge for the organization. The issue began in the 1990s, when UNAVCO installed GPS sensors around the western part of the United States, as well as in Greenland and Antarctica. The sensors created a large network, which then started to receive massive amounts of data.
“When the data started pouring in, it became apparent that it would be difficult to find what you needed for any type of experiment,” Mershon said. “It’s like having a library where all of the books have been tossed on the ground.”
At the time, the only way to provide researchers with the specific data was to go through the data archive manually. The limitations brought the problem to a head. “We needed to find a way to organize [the data] sensibly, to serve it to anyone who asked for it, and, ideally, to let people find the data they needed themselves,” he said.
Another issue that held UNAVCO back is that it had different types of data using different types of standards. Trimble receivers, for example, which UNAVCO depends on to gather data, uses a proprietary standard to encode GPS satellites.
UNAVCO must continue to support plenty of other legacy standards, as it is hard to fathom how a single standard would ever work for all data.
To the Cloud and Beyond
UNAVCO launched several initiatives to improve its data discovery, access, and management capabilities.
The first step was to migrate data from data centers in Boulder, Colo., to the cloud. Today, UNAVCO uses cloud storage for most of its data .
Next, UNAVCO looked for a tool that would organize data and streamline data access, making it more available for researchers. The organization settled on Observable, which enables organizations to develop data libraries and build data visualizations, interactive data applications, and analytics dashboards. Observable uses the concept of computational “notebooks” composed of spreadsheet-like cells, which are building blocks for projects and can contain charts, images, and text.
Most importantly, perhaps, the Observable platform encourages users to explore how other users have worked with data. “If you see someone else solving a problem similar to yours, you can dive in and try to understand how they did it and transplant parts of functionality into what you’re doing,” Mershon noted. “You end up attacking ideas you wouldn’t even try without the community of input.”
New Ways of Working
With Observable in its toolset, UNAVCO’s data access, discovery, and management capabilities have reached a new level, Mershon said. For example, Mershon’s group now uses Observable to build live demos of data APIs, visual tools to query data, and exploratory dashboards.
“[Observable] provides a low-friction playground for testing out ideas that more often than not involve hitting web services that our backend team creates that pulls data from archival tables,” Mershon explained. “Our web services [are] easy to play with when we can create interactive visualizations with user inputs … in an Observable Notebook.”
The feedback loop between accessing a dataset and getting a working prototype that can be shared via a URL is much faster now, he added. “I can have an idea on Monday and send a URL out on our Slack communication channel by Wednesday to see if teammates want to weigh in.”
The platform’s data visualization features also help users explore data. “Very clever people keep coming up with great ways to interact with quantitative data to facilitate rather complex qualitative conclusions," Mershon said. "It’s my job to use the tricks I see in the wild, along with my own tricks, to create better ways to help users interact with data."
For example, Mershon tends to use interactive maps with colors, bar charts, histograms, and movable polygons. One project he is particularly proud of is an interactive globe that enables users to view an animated timeline of station maintenance for all stations feeding data to the scientific community. The tables are searchable, and users can easily see the amount of maintenance needed across time from various stations. When experienced staffers request a deeper dive into the data, Mershon can now more easily and quickly deliver it.
For another project, Mershon was asked by UNAVCO’s president to build a tool that shows which stations are located on indigenous territories. Mershon combined UNAVCO’s own data with that from Native Land, an application for mapping indigenous territories. Using Observable and the MATBOX microstructure analysis toolbox, he created a mockup in four hours. He completed the project within two days.
With the right tools, UNAVCO can now accomplish many of its goals, including expanded data access to geoscientists, support for research institutions to create their own visualizations, and the ability to rapidly develop prototypes to test ideas.