GitHub, the code repository that's home to a large portion of the world's open source software projects, has been busy physically locking code up as a hedge against calamity.
They're burying it 750 feet underground in a repurposed coal mine on Spitsbergen, an island in Norway's arctic Svalbard archipelago. On the same mountain is the Global Seed Vault, where over a million seed samples are safely locked away, so in case a global disaster upsets the balance of biodiversity or wipes out valuable food sources, we'll have the resources to reboot.
GitHub's projects, the Archive Program and Acrtic Code Vault, serve a similar function, except the focus is on the open source code that's become essential to the modern world.
The old coal mine houses the Arctic World Archive, a joint venture of the Norwegian data storage company Piql, and Store Norske, a state-owned coal mining company. AWA is a store of digitized artifacts, such as the Vatican archives, movies, political histories, digitized art, scientific information -- and now open source code -- tucked away for protection against the unknown.
In a way, the GitHub Archive Program and GitHub Arctic Code Vault are two parts of a single project, although the Archive Program has some separate projects it performs for GitHub's parent company, Microsoft. For this project, the Archive Program is tasked with saving all public facing open source code on GitHub, and to make sure the code can be made useful, even in a world without computers or an understanding of software -- not only in the Arctic Code Vault but in other locations as well.
"We will protect this priceless knowledge by storing multiple copies, on an ongoing basis, across various data formats and locations, including a very-long-term archive designed to last at least 1,000 years," is how GitHub explains it.
In November GitHub announced it had archived and deposited in the code vault an initial 6,000 of its most popular repositories as a proof of concept. The code is stored on silver halide on polyester film developed specifically for the AWA by Piql, utilizing images that look like small QR codes. These images are high-density, however, with each frame containing 8.8 million microscopic pixels.
"It can withstand extreme electromagnetic exposure and has undergone extensive longevity and accessibility testing," Piql said in a statement about the film, which is also being used for other AWA projects.
The arctic location of AWA's storage facility guarantees that even in the event of a long term power failure, the temperature in the vault will remain below freezing, low enough to preserve the vault's contents for decades or longer. For further protection, the film is stored in a steel-walled container inside a sealed chamber.
In early February, satisfied with the results of the trial run, GitHub took a snapshot of all active public repositories on its site to archive in the vault. After that, Piql took the resulting 21TB of data and wrote it to 186 reels of piqlFilm, with each reel holding a kilometer of film.
"Our original plan was for our team to fly to Norway and personally escort the world’s open source code to the Arctic," Julia Metcalf, GitHub's director strategic programs wrote in a recent blog, "but as the world continues to endure a global pandemic, we had to adjust our plans.
"We stayed in close contact with our partners, waiting for the time when it was safe for them to travel to Svalbard. We’re happy to report that the code was successfully deposited in the Arctic Code Vault on July 8, 2020."
Storing code in a facility designed to survive any unknown catastrophe the future might offer is one thing. Making sure that our grandchildren's grandchildren can understand what it is and how to use it is another. GitHub has already started tackling that problem by including in every reel a human-readable copy of the “Guide to the GitHub Code Vault” in five languages and written with input from the GitHub community.
In addition, the project will eventually add a human-readable reel of film called the Tech Tree, which the company said will consist primarily of existing works selected to provide a detailed understanding of modern computing, open source and its applications, modern software development, popular programming languages, and more.
"It will also include works which explain the many layers of technical foundations that make software possible: microprocessors, networking, electronics, semiconductors, and even pre-industrial technologies," Metcalf explained. "This will allow the archive’s inheritors to better understand today’s world and its technologies, and may even help them recreate computers to use the archived software."
Going forward, GitHub plans to update the software archives every five years or so.