Unlike the scope and model dimensions, where deficiencies are commonly fatal in terms of their effect on analysis accuracy, there’s more room for imperfection in data. That’s in large part because there are well-established methods for dealing with sparse data — things like Monte Carlo and other stochastic functions that exist specifically to deal with data-related uncertainty, as well as calibration methods that dramatically improve the quality of subject matter expert estimates. That’s the good news.
But wait, there’s even more good news. We’re finally beginning to see significantly better loss magnitude data (unfortunately, that’s because there have been so many breaches to gather data from), and threat related data also continues to improve. Which just leaves controls related data, and we have a ton of that, right? Well, yes and no. Yes, the cybersecurity industry does generate a lot of controls related data (more of that, in fact, than loss or threat data). But no, in many cases the data are simply not usable for CRQ.
But didn’t I just say that there’s more room for imperfection in data? Yes, but imperfection in terms of volume (sparseness) is very different from imperfection in terms of accuracy. If the sources of our data are fundamentally flawed, and particularly if there isn’t a pragmatic way of separating the good from the bad, then using those data in CRQ will inevitably lead to unreliable results. Let’s look at two primary examples of this…
The simple fact is that the CVSS scoring model involves a lot of math on ordinal values. Functionally, that’s the same thing as doing math on a color scale. In order to be quantitative, there has to be a unit of measurement — a quantity of something — like frequency, percentage, monetary values, time, etc. If there is no unit of measurement, a numeric value isn’t quantitative. In a numeric ordinal scale (e.g, 1-5 or 1-10), the numbers are simply labels for buckets/categories, just as colors would be. Yes, I know “almost everyone in cybersecurity does math on ordinal scales”, but there was a time when almost everyone in medicine practiced blood-letting. Volume of use doesn’t always correlate with efficacy.
Math on ordinal values is only one of several problems with CVSS scores, but it is significant enough by itself to invalidate them as reliable data. From a CRQ perspective, these problems mean that CVSS scores can’t reliably be translated into quantitative values. And by the way, this isn’t a “precision problem” — it’s an accuracy problem. We could live with it if it just affected measurement precision.
NIST CSF has gained a lot of traction as a means of gauging and communicating about cybersecurity programs at a high level, and I’m not here to debate its efficacy for those purposes. However, it has several characteristics that make it inappropriate as a source of control-related data for CRQ. For the sake of brevity, I’ll just discuss two of them:
And for the record, the second of these problems is true to some degree in the other control framework standards as well. I’ll discuss this in a future blog post regarding what we’re learning as we map these control frameworks to FAIR-CAM.
In the interest of avoiding TL;DR, I’ll wrap up the discussion on data in the next (and final) post of this series, and touch on what is possible from a CRQ automation perspective. I’ll also pull together everything from the series to provide an overall conclusion. Thanks for hanging in there with me.
Read the series by Jack Jones on automating cyber risk quantification.