In Lewis Carroll’s Alice in Wonderland, Alice and the Red Queen find themselves running in a landscape that’s running with them. A surprised Alice says that, in her country, running gets you somewhere. “A slow sort of country!” replies the Queen. In Wonderland, all the running you can do only keeps you in the same place.
In business, this is often referred to as the Red Queen effect, and the same theory is at work in cybersecurity, too. As cybercriminals find new ways to attack, defenders must evolve to keep up. Driven to exploit the growth in remote work and online collaboration, attackers increasingly focus on users and the data they manage.
In a data risk report my we produced in the second quarter of 2020, we found that 80% of an organization’s data is unstructured, meaning it’s housed in the files and documents created, controlled and secured primarily by end users. Widespread work-from-home practices have degraded data security, and security professionals know it: Data exfiltration is now the primary concern for CISOs.
Data breaches grew by 273% in the first quarter, with island hopping — where an attacker uses ill-gotten access to move laterally — up 33%. Attacks that start with a compromised user via spear phishing and credential theft are now the most common pathway for data theft. User error is a factor in 22% of breaches, with oversharing (e.g. misrouted messages and inappropriate link sharing) a growing concern.
In response, cybersecurity must evolve in two critical ways. First, organizations must act to help their users protect the data they create and control. It’s no longer sufficient to leave critical data security decisions such as file access privileges, storage locations or sharing practices solely in the end user’s hands. Second, security concepts once reserved for networked resources — specifically, zero trust and least privilege — must now be applied to unstructured data to exponentially raise difficulty levels faced by would-be attackers.
The OWASP Cyber Defense Matrix offers a powerful framework for thinking about the concrete steps we need to take. The model matrixes security domains — devices, applications, networks, data and users — with activity categories: identify, protect, detect, respond and recover. It also highlights the balance between technology-centric and people-centric activities.
The entire OWASP model is well beyond the scope of this article, so we’ll focus on the three key elements with the highest potential to help us outrun the Queen: data identification, data protection and data monitoring.
Let’s start with a bare-bones maturity model to better understand the journey:
Content, location and business criticality are typically unknown except to end users. The first step in data identification is to inventory and categorize all unstructured data to achieve comprehensive visibility into file meaning and criticality regardless of the data's location.
Data is protected by access control and sharing managed by the end user. It's not visible to security professionals. To protect data, start by identifying inappropriate sharing and access grants. Notify the users, and correct any critical security issues. The end goal should be to have file access and sharing controlled by least privilege levels. Zero trust should be applied at the file level.
After identifying and protecting data, the final step is to monitor changes and risk levels not visible to security professionals. Start by defining the critical data to be monitored and establishing continuous coverage. The goal is to have full visibility into data duplication, risky user activities and exfiltration.
As suggested by the OWASP model, these tasks require a technological approach capable of autonomously processing the millions of documents in routine use by every organization. It’s a tall order, but fortunately, artificial intelligence technologies have evolved to help defenders stay ahead of the Red Queen. Deep learning and natural language processing (NLP) can make two critical contributions to the effort.
Data Discovery And Categorization
NLP offers a scalable, automated way to uncover the meaning of each file under management. The technology is also highly adept at categorizing data and identifying data peer groups. There are often dozens of data categories in any given organization. Discovery and categorization are essential first steps for perfecting unstructured data security. To close the deal, we’ll use this newfound understanding of document meaning and peer groups to create a truly new approach to unstructured data security.
Risk Assessment And Monitoring
Once categorized into peer groups, we have the necessary foundation for automated risk assessment and ongoing monitoring. Within a group, the security practices followed by the files in aggregate — such as storage locations, sharing practices and access control — create a baseline that can be used to evaluate individual files. If, for example, a peer group of legal contracts is never shared with users outside the legal team, it’s a simple — and automatable — exercise to find similar contracts that don’t follow that practice.
This is how least privileges can be automated, and how zero trust can be applied at the file level. Otherwise, unstructured data will remain what it is for most organizations today: opaque, unknown and at high risk of compromise.
Meeting The Moment
These new AI tools for unstructured data security have powerful defensive implications for some of today’s most pernicious attacks:
Spear-Phishing And Credential Theft
AI tools provide tighter access controls that limit data loss and harden against island-hopping attacks. Fewer duplicate files mean less data exposed to compromise, while overall, attackers face exponentially higher barriers.
Dynamic monitoring thanks to AI tracks document oversharing while specific file risks, such as unnecessary access privileges, can be found and fixed. Automation also enables the monitoring and protection of millions of files.
With AI, data is centrally categorized and classified without relying on end users. It can eliminate error-prone rules that rely on IT generalists, not content experts, to protect data. It also improves accuracy and efficacy of existing tools for data loss prevention without the overhead.
Perhaps the day will come when cybersecurity can outrun the Red Queen once and for all. For now, our best hope is to redouble our efforts to protect data. Fortunately, technology has once again evolved to help us stay one step ahead.
Originally published at Forbes.com.