To tame the Internet of Things, will data miners have to embrace the fog?

With questions of scale, “the cloud” is increasingly the answer (or at least the punchline) to tough computing problems. But when it comes to the Internet of Things, with tens of thousands (or millions) of connected end points, how do you scale the cloud?

Meet “the fog,” a concept developed by Microsoft researchers Rimma Nehme and David DeWitt that proposes offloading certain computational tasks to the endpoints themselves, and then uploading only the interesting, aggregate data.

The two presented their thinking at the Day Two Keynote of the PASS Summit, taking place this week in Seattle.

“As a disclaimer, we’re not announcing a product,” DeWitt joked.

Indeed, chances are about as good if you ask a database professional about their IoT strategy, they’re just as likely to roll their eyes as to lay out a road map: Internet of Things is still very early days. But it’s not too early to start figuring out how you’re going to manage the influx of connections when you potentially go from tens of thousands of Internet-accessible endpoints to hundreds of thousands or millions within one company.

“Somewhere around 2008, the number of things connected to the Internet exceeded the number of people on Earth,” Nehme told the audience. And while that imbalance will likely become even more skewed in the coming years, device-to-device communication is still in early days. “Both on the consumer side and the industry side, it’s for early adopters.”

But now that money can be had, adoption might be quickly driven forward.

“The target is ultimately to create some value, or as I put it, make money,” said Nehme, who, perhaps not coincidentally, is finishing her MBA (she’s also a Computer Science doctorate).

They two broke down the ways to make that money into three broad categories:

Unconventional revenues
Incremental revenues
Operational efficiency

But however a business finds that value, they often run into similar bottlenecks: A massive amount of endpoints, all with variable levels of accuracy, connectivity, and throughput, all inconsistently pushing their data back to centralized control servers that have to manage authentication and analysis at a massive scale.

But there’s a better approach: Embrace what they call “the fog,” lower level data crunching at the end point level, and than send that pre-processed data back to the centralized servers or cloud storage for higher level analysis.

They approach these technique by relying heavily on field gateways, which typically already provide sensors with both the connectivity to the centralized collection as well as the signals on when and what information to collect.

As an example, DeWitt referenced a boiler sensor that reads pressure and temperature every second. If one signal that a boiler is near breaking is that the pressure stays averages 1,000 PMI for a minute, then instead of sending 60 measurements per second to a server, batch the data into 60 second increments and send the data on a minute-by-minute rolling basis.

This ties in closely with the idea of real-time analytics, which doesn’t even necessarily store the data that it’s being analyzed permanently.

“unlike a traditional database system, there’s no long-term storage of data,” DeWitt said. “It’s the queries that are long-term.” In this case, the query is looking for PMI average for a minute. “The IoT makes streaming databases become really, really important.”

This is the key to fog computing: Local decisions made out in the edge, and then pushed to the cloud.

“We can do better in terms of real time response, in terms of scalability,” said Nehme. “Computation gets pushed into the edge, and only interesting events get pushed back to the cloud.”

Comments

Plain text