Eight years of content that two Department of Defense commands scraped from websites and social media – the equivalent of at least 1.8 billion posts – was sitting in a publicly downloadable cloud storage bucket, raising new questions about the domestic surveillance and privacy of U.S. citizens.
According to UpGuard researchers, who disclosed the discovery of the data on Friday, the repository of XML files contained “many apparently benign public internet and social media posts by Americans, collected in an apparent Pentagon intelligence-gathering operation, raising serious questions of privacy and civil liberties.”
A third-party government contractor named VendorX built and operated these data stores, which highlights the damage that third-party vendors can bring to an organization in the public or private sector.
The content originates from countries around the world, and from a broad array of sources, including Facebook. The posts are in many different languages, but “with an emphasis on Arabic, Farsi (spoken in Iran and Afghanistan), and a number of Central and South Asian dialects spoken in Afghanistan and Pakistan.”
UpGuard discovered the three AWS S3 buckets on Sept. 6, finding an “enormous amount of XML files” of scraped content, as well as a folder called “Coral”, likely in reference to the U.S. Army’s Coral Reef intelligence software.
“The possible misuse or exploitation of this data, perhaps against internet users in foreign countries wracked by civil violence, is a troubling possibility, as is the presence of US citizens’ internet content in buckets associated with US military intelligence operations,” UpGuard said in a blog post detailing its discovery.
The researchers, who have discovered a string of publicly downloadable AWS S3 buckets containing sensitive information in the past several months, said that it is unclear why these posts were collected for over a decade.
“Given the enormous size of these data stores, a cursory search reveals a number of foreign-sourced posts that either appear entirely benign, with no apparent ties to areas of concern for US intelligence agencies, or ones that originate from American citizens, including a vast quantity of Facebook and Twitter posts, some stating political opinions. Among the details collected are the web addresses of targeted posts, as well as other background details on the authors which provide further confirmation of their origins from American citizens,” UpGuard said.
Amazon recently released new security and encryption features for AWS S3 to provide additional visibility so users understand when a cloud storage bucket may be publicly accessible.
On Monday, Amazon launched its AWS Secret Region, becoming “the first and only commercial cloud provider to offer regions to serve government workloads across the full range of data classifications, including Unclassified, Sensitive, Secret, and Top Secret.”