Protecting valuable data has never been a simple proposition. However, as data boundaries have blurred and machine learning (ML) and other forms of artificial intelligence (AI) have taken root, the ability to manage and safeguard data has become exponentially more complicated for security teams.
"No longer is there a central database to lock down," says Eliano Marques, executive vice president of data and AI at data security firm Protegrity.
To be sure, the Internet of Things (IoT), application programming interfaces (APIs), cloud-based analytics, AI, ML, mobile computing, and other tools continue to radically reshape the data landscape by altering computational and security requirements. Not only are data volumes growing, data sprawl is increasingly common. In many cases, training algorithms extend across multiple data sets residing in different clouds and at different companies. There's a growing need to combine and adapt data dynamically.
A 2020 IDC report projects that 125 trillion TB of data will exist by 2025. And according to PwC's "CEO Survey," 61% of chief executive officers plan to further digitize core business operations in 2021.
"We see deals and alliances escalating over the next four years as companies compete to control data along their value chain," states Jay Cline, US privacy leader and principal at PwC US.
Beyond the Database
A growing desire to compute in the cloud and on the edge has profound ramifications. For one, organizations must accommodate increasingly stringent data regulations, such as the EU's General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA) and the US Health Insurance Portability and Accountability Act (HIPAA). Yet there's also a need to unlock the full value of data within supply chains and business partnerships, without revealing trade secrets and personally identifiable information (PII).
"AI and machine learning introduce entirely new and different computational and cybersecurity requirements that are tightly linked," Marques explains. "It's impossible to constantly adjust data protection for different data-localization requirements and data-sharing frameworks. You can't have teams assessing every situation individually and using an array of tools and technologies."
Organizations must focus on "multiple data security and privacy risks as data flows across devices, systems, and clouds," says Rehan Jahil, CEO of cybersecurity firm Securti. This includes internal risks, external risks, and third-party access to sensitive data.
A particularly nettlesome problem, Jahil warns, is that organizations often capture a Privacy Impact Assessment (PIA) that's nothing more than a "snapshot in time." This method of identifying and managing privacy risks may work well within traditional waterfall development models, but "modern, agile software development processes involve frequent code updates, which can render these PIAs obsolete by the time they are written," he says.
As a result, a more automated, API-driven approach to building privacy into the development process is needed, Jahil adds.
Adding to the complexity is ensuring that AI and data are used ethically, Marques points out. Two key categories comprise secure AI, he says: responsible AI and confidential AI. Responsible AI focuses on regulations, privacy, trust, and ethics related to decision-making using AI and ML models. Confidential AI involves how companies share data with others to address a common business problem.
For example, airlines might want to pool data to better understand maintenance, repair, and parts failure issues but avoid exposing proprietary data to the other companies. Without protections in place, others might see confidential data. The same types of issues are common among healthcare companies and financial services firms. Despite the desire to add more data to a pool, there are also deep concerns about how, where, and when the data is used.
In fact, complying with regulations is merely a starting point for a more robust and digital-centric data management framework, Jahil explains. Security and privacy must extend into a data ecosystem and out to customers and their PII. For example, CCPA has expanded the concept of PII to include any information that may help identify the individual, like hair color or personal preferences.
Unfortunately, many IT and security leaders — and their teams — view regulations and internal processes designed to manage and secure data as additional red tape, slowing processes and innovation. Nothing could be further from the truth, Marques says.
"When organizations have the right tools, technologies, and automation in place, it's possible to speed projects and address gaps," he says. "There's no need to discuss and verify every single project."
Dynamic data discovery capabilities for both structured and unstructured data is paramount, PwC's Cline says.
"This will become the norm across sectors in the future," he says. Yet the process cannot stop there. "Data classification is becoming too complex to be a [categorization] exercise because of the increasingly diverse way that privacy regulations around the world classify data," he adds.
This points to the need for specific data protection tools and technologies. These include data pseudonymization and tokenization technologies that replace one or more actual data entries or an entire record with identifiers; anonymized, non-reversable data-sharing that strips out sensitive data or PII; cloud protection tools; and emerging cryptography frameworks such as homomorphic encryption, which makes it possible to process and analyze encrypted data without actually seeing the data.
Another emerging resource is data processing languages like torch and Scikit-Learn, which embed security and privacy protections at the code level. In some cases, it may also be possible to use formal software verification techniques. These methods rely on mathematical proofs to produce components and applications free from coding errors that can introduce various vulnerabilities, including bugs, hacks, and privacy breaches. This technique is now used by Amazon, Microsoft, and Google for niche components.
Jahil says the end goal is a "privacy by design" framework, along with automated workflows that integrate real-time data intelligence with granular insights about how personal data is used and protected within an organization and across business partners. Using robust reporting and snapshots, this approach allows an enterprise to accurately map data and use a more automated, API-driven approach to building data security and privacy into development and the data management process.
Organizations that build the right AI security framework — which typically includes reorganizing data governance, AI, and privacy functions under a single enterprise leader or authority — are well-positioned to unlock the full value of data, while protecting it and boosting data privacy, according to Cline.
"Within this type of model, the privacy strategy enables the data and AI strategies, which in turn support the overall business strategy," he says.