Last August at a big open source conference at the Vancouver Convention Center, Ben Golub, whose name you most likely associate with Docker, spoke about a program the startup he currently leads had just launched to get open source companies to help draw more users to its platform.
This week he came to the Hilton San Diego Bayfront, the conference’s venue this year, to talk about the platform’s new design and announce a beta launch of this new version.
The company, where Golub is executive chairman and interim CEO, is Storj Labs, and its ambition is nothing short of taking on Amazon Web Services in the cloud storage market. But unlike other AWS challengers – the likes of Microsoft Azure and Google Cloud Platform, who have been spending tens of billions of dollars annually to build out global networks of hyperscale data centers – Storj (pronounced same as “storage”) has a plan for beating Jeff Bezos’s cloud giant without a single server farm of its own.
“We’re aiming to be the world’s largest cloud storage provider without owning a single disk drive,” Golub, former CEO of Docker, Inc., the company that was at the forefront of the Linux-container revolution, said in an interview with Data Center Knowledge.
Storj is an IT-infrastructure take on the “sharing economy.” Much like Uber has become the world’s largest taxi service without owning a single car, and Airbnb became the largest short-term room rental company without owning a single hotel, Storj wants to crowdsource the world’s unutilized data storage space to outgun the cloud behemoths. “We’re Airbnb but for disk drives,” Golub likes to say.
Most hard drives in homes, offices, and data centers are less than 25 percent full, he reckons. With some clever software – and the power of persuasion to convince enough people to give it access to their computers – Storj wants to turn that capacity into a single pool of storage that’s bigger and more distributed than anything AWS can ever build on its own.
That’s the vision. Today, the company is only in the earliest stages of implementing the plan, and so far, not surprisingly, it hasn’t been a smooth ride.
Storj managed to accumulate 150 petabytes of capacity across about 100,000 “nodes,” or individual hard drive owners, in 180 countries for the last version of the platform (V2), but that version had to be scrapped. It didn’t perform well enough, wasn’t durable enough, and didn’t yield the right economics, Golub explained.
V3, the version that’s going into beta this week after a year in alpha, is better in all those areas, but so far, there are only about 1,500 nodes on the new network – largely people who were part of the previous one – totaling about 4 petabytes. But Golub isn’t rushing into production, well aware of the low appetite for poorly implemented storage services in the market.
“We will release when we’re ready,” he said. “In storage, you only get one chance to do it right.”
Away with Replication
One of the biggest things “getting it right” in V3 meant was rethinking availability. V2 replicated each file a user stored eight times, consuming eight times more capacity than the amount of data users needed to store. That architecture wasn’t only uneconomical, it couldn’t get to exabyte scale, Golub explained.
Using a technique called erasure coding, the new network breaks up each uploaded file into 80 pieces, any 30 of which can be used to put the whole file back together. Each piece is stored on a separate hard drive, owned by a single “operator.” That means you can lose as many as 50 drives storing your file without any effect on its availability. The likelihood of that kind of outage, where hard drives in 50 different locations suddenly go down, is very close to zero. (Storj figures there’s a 5 percent chance of losing a single node on its network, but the chance of losing 50 is 0.0550, “which is a ridiculously low number,” Golub said.)
The approach also provides a thick layer of security. User files are encrypted as they get uploaded with the users’ own encryption keys (“We can never see what you’re uploading, nobody else can.”), but even if a hacker can somehow manage to decrypt the data, before they even get to that, they have to identify at least 30 hard drives located around the world that are storing pieces of a specific file and gain access to all 30 to put that file together.
Challenging AWS on Price and Performance
Storj aims to provide its service, called Tardigrade, at about half the price of Amazon’s S3 cloud storage service, Golub said. The cost comparison includes data egress charges, which is what a cloud provider charges you for the network bandwidth necessary for your applications to access the data stored in their cloud. It’s a famous “hidden” cost of using big cloud providers that’s not immediately apparent when you browse through pricing pages on their websites.
There are about 150 alpha users on Tardigrade, and the platform already beats AWS in performance in some cases, Golub claimed: “On both uploads and downloads, we’re seeing speeds that are anywhere from on par with AWS to two to four times faster.”
And he expects performance to get better over time, as more node “operators” sign up and make the platform more distributed.
Lend Your Hard Drive
So, how does Storj recruit operators? According to Golub, there hasn’t been a shortage of people willing to make some cash on vacant disk space they have sitting inside their desktops or in data centers. The challenging part is finding operators that tick the right boxes for Storj.
To get the roughly $5 per terabyte of space on your computer per month, you have to prove that the machine is constantly powered on and connected to the internet. This naturally excludes smartphones and laptops from consideration, leaving only desktops and data centers. Storj puts every new operator through probation period, storing only test data on their hardware for a month before making it part of the live network.
That $5 per terabyte per month is an approximation. Operators get paid based on the amount of storage used and the amount of bandwidth consumed to access the stored data. If you store 2 terabytes for 15 days, for example, you get paid the same amount you would for storing a terabyte per month, which works out to be roughly $5 plus bandwidth fees.
But you don’t get $5 cash. What you get is the $5 equivalent in STORJ, the company’s own cryptocurrency, which runs on top of the Ethereum blockchain. A full-fledged cryptocurrency, trading on the market same as Bitcoin or Ethereum itself, it is Storj Labs’ preferred way to transact. As a customer, you can choose to pay in cash to use the distributed storage cloud, but everyone else, including node operators and partners, gets paid in STORJ.
The biggest advantage of using cryptocurrency for a business like Storj is it makes paying lots of people around the world – reminder: there were about 100,000 node operators on the V2 network – a lot easier than with regular currency, Golub explained. Additionally, the company is using the blockchain for smart contracts with all its node operators and partners.
Why create a whole new cryptocurrency to solve that problem and not use an existing one? According to Golub, “it helps align resources and economics and governance with our project and value proposition.” It’s similar to “why Norway would use their own currency rather than, say, the Euro.”
As of early Thursday afternoon Pacific Time, STORJ was trading at just under 15 cents. After being tied fairly closely to Bitcoin and Ethereum prices in the early stages, the token’s price fluctuations are now increasingly driven by the supply-demand dynamics of the data storage service, Golub said. The company created 425 million STORJ tokens, and will not create any more, he said.
Data Centers on the Network
There are a “handful of data centers” among the current node operators on the new network, Golub said, describing them as enterprise and research data centers. Like individual desktop users, data center operators sign up to recover capital and operational costs “locked” in unused storage capacity they have.
Capacity overprovisioning remains common-practice for data center operators, and it’s easy to see the appeal of getting paid for spare space on disks that are spinning in your racks around the clock anyway. The biggest concern these operators are likely to have is security, since participating means bringing an additional third party onto their networks. According to Golub, however, it shouldn't be, since the data they store is encrypted and non-active. Storj doesn’t run applications on the operators’ infrastructure. “You’re essentially getting encrypted blobs of data,” he said.
Playing Nice With the Open Source Community
One clever way Storj is growing its network is the reason Golub came to speak at this and last year’s Open Source Summit by the Linux Foundation.
Having led Docker for four years, and prior to that Gluster (acquired by Red Hat in 2012), he is deeply familiar with mechanics of the open source software world. And one of the biggest issues that world has been grappling with lately has been cloud giants like Amazon taking popular open source projects and turning them into cloud services, thereby making it extremely difficult for startups to build businesses around those projects. Doing this without “giving back” to the open source community is a common accusation leveled particularly at AWS.
Storj, all of whose software is open source, positions itself as a better steward of the open source ethos and not in words only. Other open source companies can partner with the startup and get paid when their users store data in the Storj cloud. The company splits 40 percent of revenue from those users with the partners, spending the rest on payouts to node operators. This is its way of giving back to the open source community, and Golub is quick to use it to draw contrast between Storj and AWS.