Jack Jones led the discussion at this month’s meeting of the FAIR Institute’s Data Utilization Work Group, including fielding this question from a FAIR Institute member about data breaches. Jack is the Institute’s Chairman and the co-author of Measuring and Managing Information Risk: A FAIR Approach.
Question: How do you distinguish what’s a breach and what is not from audit trails and our other typical sources of data?
Jack Jones: That’s a really good question. To some degree, it depends on what you mean by "breach". For example, some people consider run-of-the-mill malware infections and DDoS outages to be breaches. Others reserve the word "breach" to describe compromises of confidential information.
For this discussion, let's think of "breach" in a relatively broad sense.
If we're talking about malware, for example, any malware event on an internal system (server, workstation, etc.) is a clear indication that a breach of the perimeter occurred. In FAIR terms, we had a loss event at the perimeter layer, which resulted in a threat event (or loss event) at the internal system layer – i.e., malware somehow got past whatever resistive controls we had on the perimeter, which enabled an attack on an internal system.
That's a fairly simple and straightforward example though, which only covers part of the "breach" problem space.
Another easy one is availability [events], which are easily recognized when a system goes down or is degraded in some fashion. The great thing about availability events is they are pretty obvious by the inherent nature of what they are. Of course, sometimes it can be challenging to determine whether an outage was due to malicious actions, human error, or simple technology failure. Logs can be valuable in answering that question.
For confidentiality breaches, it’s going to depend on what the logs are capturing as to how informative they'll be regarding whether a "breach" has occurred. Even then, typical access logs will rarely be able to definitively tell us that a breach has occurred. Usually, some amount of human (or machine-learning-based) analysis has to take place to establish any degree of intelligence about whether a breach actually took place.
If, for example, normal access takes place between 8 AM and 5 PM, but the logs show that an access took place at 2 AM, that may be an indication that something is amiss and worth further investigation. Or maybe if the logs [show] that the normal volume of account access is between 5 and 20 customer records a day, but today 500 records were accessed, that might be indicative of something being amiss.
Another less obvious example might come from DLP technology that scours your internal network for where sensitive data exists. Maybe the baseline it’s operating from says, hey, this server or workstation doesn’t normally have sensitive information, but now it's finding a bunch of sensitive information on one or more of those systems. That could, of course, simply represent a change in business usage practices, or a legitimate temporary condition, but it could also be a clue that a breach has occurred.
The bottom line is that some of our telemetry will be pretty definitive regarding "breaches" (e.g., anti-malware technology), while others will simply be indicative or suggestive in nature, which should prompt further investigation.