Personal information belonging to more than 198 million registered U.S. voters was exposed in what security researchers at UpGuard are calling the largest known data exposure of its kind.
Republican data firm Deep Root Analytics, who was working on behalf of the Republican National Committee (RNC), is taking full responsibility for the disclosure, which was possible through an unsecured Amazon Web Services S3 bucket. The data was exposed after the company updated its security settings on June 1, according to Deep Root founder Alex Lundry.
The misconfigured database contained 1.1 terabytes of downloadable personal information compiled by the DRA and at least two other Republican contractors, TargetPoint Consulting, Inc. and Data Trust. An additional 24 terabytes of data was stored in the warehouse, but had been configured to prevent public access. According to UpGuard, these firms were paid in excess of $5 million, and “were among the RNC-hired outfits working as the core of the Trump campaign’s 2016 general election data team, relied upon in the GOP effort to influence potential voters and accurately predict their behavior.”
The data included names, dates of birth, home addresses, phone numbers, and voter registration details, but perhaps more troubling, data on their “likely political preferences using advanced algorithmic modeling across forty-eight different categories” was also among the downloadable files, UpGuard said. The data is current, too, having been last updated around January 2017, when Trump was inaugurated.
UpGuard cyber risk analyst Chris Vickery discovered the data on June 12 while searching for misconfigured data sources on behalf of the research unit of UpGuard. The data was secured on June 14 after Vickery notified federal authorities.
According to a blog post by UpGuard, “anyone with an internet connection could have accessed the Republican data operation used to power Donald Trump’s presidential victory, simply by navigating to a six-character Amazon subdomain: ‘dra-dw.’”
“Upon inspection of the contents, ‘dra-dw’ is shown to stand for ‘Deep Root Analytics Data Warehouse.’ The concept of a ‘data warehouse’ is common in modern business— essentially, it is a massive collection of data prepared specifically for complex analysis,” UpGuard said.
The absence of basic security best practices in this case is troubling, particularly with an organization that deals with this volume of data. Before contracting a third-party consultant, particularly one that is going to be handling any type of personally identifiable data, it is imperative that an organization understands what safeguards they have in place, and if necessary, what additional measures will need to be taken to ensure the data is secured.
The publicly accessible files included two directories called “data_trust” and “target_point.”
This is what was found in the first folder:
Within “data_trust” are two massive stores of personal information collectively representing up to 198 million potential voters. Consisting primarily of two file repositories, a 256 GB folder for the 2008 presidential election and a 233 GB folder for 2012, each containing fifty-one files - one for every state, as well as the District of Columbia. Each file, formatted as a comma separated value (.csv), lists an internal, 32-character alphanumeric “RNC ID”—such as, for example, 530C2598-6EF4-4A56-9A7X-2FCA466FX2E2—used to uniquely identify every potential voter in the database. These RNC IDS uniquely link disparate data sets together, combining dozens of sensitive and personally identifying data points, making it possible to piece together a striking amount of detail on individual Americans specified by name.
In the second folder, here’s what the researchers found:
The contents of the “target_point” folder were even more intrusive than those of the Data Trust repository, if less obviously intimidating at first glance: fourteen files saved in the Alteryx Database format (.yxdb), a file format designed specifically for large-scale data analysis. Most of the files were last updated in mid to late-January 2017, with several labeled as “Contact File,” with different dates signifying when they were updated.
Contained within these “Contact File” spreadsheets are the aforementioned 32-character alphanumeric RNC IDs for 198 million potential American voters, as well as the corresponding names and addresses of the voters. The clear linkage between every RNC ID and the name and identifying personal details of all 198 million people ensures all data using the RNC ID as an identifier can be tied back to the person’s real name.
Dan O'Sullivan, an analyst with UpGuard, said, "Beyond the almost limitless criminal applications of the exposed data for purposes of identity theft, fraud, and resale on the black market, the heft of the data and analytical power of the modeling could be applied to even more ambitious efforts - corporate marketing, spam, advanced political targeting. Any of these potential misuses of private information can be prevented, provided stakeholders obey a few simple precepts in collecting and storing data."
Deep Root has hired a third-party cybersecurity and forensics firm Stroz Friedberg to investigate. Deep Root does not believe the data was accessed by malicious third parties in the 13 days it was publicly accessible, but the investigation will answer that definitively.