(Bloomberg) -- Facebook Inc. will allow anyone to freely download and use the same artificial intelligence tools it used to make key improvements to the social network’s video and notification features as well as its Messenger messaging app.
The software, which Facebook calls Horizon, will be available on the code repository GitHub starting today, the company said in a blog post Thursday. GitHub is owned by Microsoft Corp.
Facebook used this set of tools internally to optimize how 360-degree videos are displayed on the social network, taking into account such factors as the available bandwidth and how much of the video has already been buffered. The same tools, according to the blog post, were also used to improve what content to push to users through notifications. And it was used to hone the suggestions that its intelligence assistant, which is called M, makes to users of its Messenger app.
The Horizon software is focused on reinforcement learning, in which software improves itself by trial-and-error from experience to maximize some reward or minimize some loss, rather than from labeled data sets.
Reinforcement learning underlies a number of breakthroughs in AI -- most notably the algorithm that beat the world’s top human players at the strategy game Go as well as the ones that are now competitive with humans in complex multiplayer computer games such as Dota2.
But so far, it’s only rarely been used by businesses to address real-world problems -- in part because, outside of games, it often isn’t wise or safe to let an algorithm learn by trial-and-error. And, for many real-world phenomenon, there aren’t accurate simulators in which an algorithm can be safely trained.
Figuring out what goal to give the algorithm and how to reward it for actions that seem to lead toward that goal while penalizing it for actions that might have adverse consequences is also tricky outside of games, where these elements are often built into the structure of the game.
To overcome some of these limitations, Facebook developed Horizon so that its teams could use reinforcement learning on real problems the company was facing, Srinivas Narayanan, the company’s director of applied machine learning, said in emailed responses to questions. But he said the company now wanted to share the software with others.
“We’re committed to open source, so it was a natural decision to share this latest production-ready system for the community,” Narayanan said.
Facebook follows other AI research groups, including Alphabet Inc.’s DeepMind and GoogleBrain AI teams, and OpenAI, which have recently made reinforcement learning algorithms, programming tools and test environments publicly available.
Jason Gauci, a Facebook engineer who worked on Horizon, said in an email that Facebook was the first company to make what it calls “an end-to-end”’ reinforcement learning program designed for addressing large-scale business problems freely available.
Horizon contains several features that make it safer to use reinforcement learning on real-world problems. For instance, the software helps programmers pick the right goal and rewards to feed the algorithm.
Rather than having algorithms start with zero knowledge and learn from random actions, Horizon initially trains algorithms to take a set of actions that a product engineer has specified. It then uses several kinds of counterfactual analysis, based on existing data, to simulate different actions the algorithm could have taken. In this way, Horizon mimics training the algorithm in a simulator, allowing it to be refined without worrying about it wreaking havoc in the real world.
Gauci said that, in general, using an actual simulator would be better than doing this counter-factual analysis. “But for many problems at Facebook, building a simulator is not trivial,” he said. “The team is looking into building simulators from datasets as future work.”
Once the algorithm seems to be working well, Horizon allows users to carry out small-scale online experiments, using real data in real time, and then gradually roll the new algorithm out to larger sets of users or data. This entire process can then be repeated, with the fully-trained algorithm being used as the starting point for a new training series.