Under GDPR, You Can Still Get Security Data


Artificial intelligence is the topic of so many conversations in the cybersecurity industry these days, and rightly so. As we gather more intelligence on cyber threats, we can use AI to do ever smarter things with that data to prevent attacks. The potential of AI will grow as we collect more data. But businesses need to keep in mind several key issues, like general data protection regulation (GDPR), when implementing AI for cybersecurity.

Companies have never generated so much data as they do today. They are gathering ever more security data, logs, events and “artifacts”—threat data that may indicate an attack. But gathering a thousand security events in a day or a week isn’t enough.

If you want to use machine learning, data analytics or AI to identify security threats you need to gather lots of data, but critically, it must be the right kinds of data.

Machine learning, which rapidly finds patterns in data, adds value when it crunches massive data sets and can spot the needle in the haystack. The more data you feed in, the more accurate and effective it becomes.

But If you ask companies how much security data they have, they’ll often say not much, as they discard it after a few weeks or only keep it as metadata. They invariably cite concerns over GDPR and data protection.

Organizations are understandably nervous that some of the security data may be personally identifiable information so they could fall foul of strict rules in GDPR limiting how long personal data can be stored.

However, GDPR legislation makes provision for cybersecurity. Data can be kept for “no longer than you need it,” according to the UK’s Information Commissioner, so if it is needed for security analysis, it can be discarded once that analysis has taken place.

And a clause in GDPR’s Article 5 says data may be kept for research or statistical purposes. The key point is data may be legally stored for months for analysis by AI algorithms if it is for the purposes of cybersecurity.

This then raises the question of how organizations are physically able to store such huge amounts of data and where they will get the computational horse power to carry out rigorous machine learning analysis of the data.

All of this becomes possible today for one reason. The cloud. Businesses can store the petabytes of data that need analysing in remote locations run by cloud providers whether Amazon Web Services, Microsoft Azure or any other. That way, there is no need to use up valuable storage space in an organization’s own data centers.

It is vital to make sure that all sorts of data – the rights sorts of data – are stored for cybersecurity analysis. That means capturing not just the bad and suspicious data. Machine learning relies on comparing the good with the bad, so data on legitimate web traffic is needed as well.

The data must be cross referenceable. It could be cloud security data or what’s known as user behaviour analytics (UBA). This analyses the behaviour of systems and the people using them to identify potential cybersecurity threats. It could require gathering data from security information and event management (SIEM) software and other tools, then analysing it for anomalies and potential threats.

An organization can also access immense computational power through the cloud, which runs huge stacks of CPUs – the brains of computing. Again, many businesses are nervous about storing security data in the cloud precisely for security reasons. They worry that the cloud is far from safe, though it is no more prone to attacks than other parts of a network. Cloud infrastructure provides the scale for businesses to store data and apply machine learning to identify threats. This opens the way for vast improvements in cybersecurity.

So, what’s the takeaway here? Cybersecurity has relied on humans to identify correlations between events occurring together and identifying potential threats. There is so much discussion of the power of AI and machine learning to do smarter cross-checking of data much faster and more effectively than what a human could do. Yet to achieve this you must have the right data for AI to be effective, and the computational capacity to process it in a timely fashion.

Of course, the cybercriminals are themselves looking at how to use AI to sharpen their attacks. The future of cybersecurity will be as much about machines fighting machines as conflict between humans.

Greg Day is vice president and chief security officer for Europe, Middle East and Africa at Palo Alto Networks.

End Points

  • If you want to use machine learning, data analytics or AI to identify security threats you need to gather lots of data.
  • But If you ask companies how much security data they have, they’ll often say not much, citing GDPR concerns.
  • But GDPR has provision that allows your organization to store the right sorts of data for cybersecurity analysis.