Don’t Believe Everything You Read
Lies, damned lies, and statistics – it’s a phrase that has been attributed to Mark Twain and former British Prime Minister Benjamin Disraeli, among others. Whoever it was that said it first, he could have been talking about vulnerability statistics.
This is the view of two researches who presented their case this week at the Black Hat hackers convention in Las Vegas: Steve Christey, an information security engineer at the MITRE Corporation , and Brian Martin, president of the Open Security Foundation.
In a paper titled “Buying Into the Bias: Why Vulnerability Statistics Suck”, Christey and Martin say most statistical analyses of vulnerabilities demonstrate a serious fault in methodology or are “pure speculation.”
“They use easily available, but drastically misunderstood data to craft irrelevant questions based on wild assumptions, while never figuring out (or even asking the sources about) the limitations of the data,” their paper says. “This leads to a wide variety of bias that typically goes unchallenged, that ultimately forms statistics that make headlines and, far worse, are used to justify security budget and spending.”
Christey and Martin are involved in maintaining two repositories of vulnerability data, Common Vulnerabilities and Exposures (CVE) and the Open Sourced Vulnerability Database (OSVDB).
“We’re sick of hearing about research that is quickly determined to be sloppy after it’s been released and gained public attention,” they say. “In almost every case, the research casts aside any logical approach to generating the statistics.”
The targets of Christey and Martin’s ire “frequently do not release their methodology, and they rarely disclaim the serious pitfalls in their conclusions.”
“This stems from their serious lack of understanding about the data source they use, and how it operates. In short, vulnerability databases (VDBs) are very different and very fickle creatures. They are constantly evolving and see the world of vulnerabilities through very different glasses.”
Christey and Martin say bias is inherent in everything humans do, including the creation of VDBs, how the databases are populated with vulnerability data, and the subsequent analysis of that data.
“Not all bias is bad,” their paper says. “For example, VDBs have a bias to avoid providing inaccurate information whenever possible, and each VDB effectively has a customer base whose needs directly drive what content is published. Bias comes in many forms that we see as strongly influencing vulnerability statistics, via a number of actors involved in the process. It is important to remember that VDBs catalog the public disclosure of security vulnerabilities by a wide variety of people with vastly different skills and motivations . The disclosure process varies from person to person and introduces bias for sure, but even before the disclosure occurs, bias has already entered the picture.”
The paper says various forms of bias can combine to create “interesting spikes in vulnerability disclosure trends. To the VDB worker, they are typically apparent and sometimes amusing. To an outsider just using a data set to generate statistics, they can be a serious pitfall.”
Christey and Martin say one of the ways statistics can break down is when they are used to rate vulnerability severity.
“Researchers and journalists like to mention the raw number of vulnerabilities in two products and try to compare their relative security,” their paper says. “They frequently overlook the severity of the vulnerabilities and may not note that while one product had twice as many disclosures, a significant percentage of them were low severity. Further, they do not understand how the industry-standard scoring system works, or the bias that can creep in when using it to score vulnerabilities.”