Originally published in Security Boulevard
Every effective PII protection effort addresses three critical imperatives for zero-trust security: data discovery, access governance and risk mitigation. IT teams grappling with privacy mandates need to consider these factors across their unstructured and structured data contexts. And while regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA, the "California GDPR") outline expectations for handling personally identifiable information (PII), they aren’t much help when it comes to the tactics you need to succeed. Let’s take a look at some effective strategies – and how they differ – across structured and unstructured data.
A typical organization manages unstructured data in more than 10M files containing everything from marketing and sales to client contracts, to employee insurance and human resource information. Discovering PII in these files remains one of the toughest data security challenges for a zero-trust model and it’s easy to understand why. It is, on the other hand, a bit harder to understand why structured data discovery can also be tough.
Structured databases should provide an easy map to PII – but database designs often predate modern privacy regulations and, as a result, few production databases were designed with database privacy in mind. Sensitive information is often scattered across different databases, in different tables and in different fields. Sometimes PII is duplicated across tables or in unrelated databases. Finding it all can be tougher than you think but it’s a critical first step: database privacy solutions start with PII discovery.
Fortunately, emerging automated PII data protection tools can help find PII data in both structured and unstructured data. In the unstructured data world, rules and end-user data classification programs have long been used in an attempt to identify PII – but they haven’t been effective or manageable. Finding PII across an organization’s databases, on the other hand, is a question of determining which databases and tables contain regulated data, identifying duplications and accessing risks. Recent artificial intelligence innovations show promise in automating discovery for both structured and unstructured data.
Data Access Governance
A clear and complete understanding of who can access PII, and how they can do it, is the key to understanding risk and implementing mitigation strategies. But these notions of “who and how” differ quite a bit for structured and unstructured data. For example, large-scale databases supporting web applications, such as those handling ecommerce operations, typically connect those applications to databases via a handful of service accounts. Tracing who has access isn’t usually a problem. Increasingly, API connections to databases extend access, sometimes outside the organization itself. It goes without saying that, even though may be simple to determine who has access, each connection needs careful oversight.
Cataloging access for unstructured data is far more complicated. Empowered end users make highly consequential access control decisions, and those decisions are dispersed and ungoverned. Inappropriate sharing with external or personal emails, link sharing (especially unprotected or non-expiring links), files stored outside of designated locations, and unclassified files that slip by data loss prevention services are just a few ways data can be lost. Understanding and managing access in this context is an enormous data access governance challenge.
As with the data discovery process, recent innovations in AI can clarify who has access and whether PII access is appropriate. Replacing legacy approaches that rely on file locations, pattern-matching rules or end-user document markup, AI can assess risk based on document content and the security practices in use for similar content.
Security professionals, now armed with a clear understanding of what data they have and where the risks are, can develop more effective PII protection strategies. The tactics for protecting structured and unstructured data are, again, quite different. Here are some key tips for database privacy and risk mitigation:
- Refactor your database to eliminate duplication, clarify data structure and make PII discovery easier for whoever has to do the job once you’re gone.
- Tokenize and/or encrypt sensitive fields to add an extra layer of security on top of your access control best practices.
- Delete what you don’t need. A major PII spill of unneeded years-old data is, to be blunt, an unforced error.
- Explore emerging technologies for API security and granular database access control. Most service accounts currently have very broad access and, consequently, poor API design or implementation can be a weak link. See what you can do to tighten things up.
There are emerging tactics to also consider for unstructured data:
- Strive for least-privileges access control at the file level for all business-critical data.
- Leverage AI-based automation to discover data and assess risk
- Folder-level security isn’t good enough – in our research we’ve found sensitive files in all-hands folders in nearly every organization.
- Continuously monitor the situation. Users create thousands of new files each year and a one-time audit is not going to cut it.
- Look for ways to enlist your entire security stack in the PII risk management effort. With AI, for example, you can now autonomously assess risk and automatically tag files as sensitive. Those tags help data loss prevention solutions come closer to zero trust security nirvana.
Data privacy compliance is a complex topic; each situation is different for a particular data and regulatory environment. Having a clear understanding of how to discover, assess and protect structured and unstructured data, and their differences, provides a foundation for an effective and manageable program for PII protection and regulated data.