At the new Marsh McLennan Cyber Analytics Center of Excellence, a core focus is to continually survey the cyber vendor landscape – both to be aware of the analytics players out there and their latest developments but also to source the best possible data to bring in-house to inform a comprehensive view of risk. Increasingly, the elevated risk posed by recent cyber-attacks highlights the need for businesses to have a holistic view of potential cyber vulnerabilities and exposures.
Data for cyber analytics fall into several categories, including firmographics, historical incidents, technographics (inside-out and outside-in), scoring, and loss modeling. The rest of this article walks through the importance of each type of data to the cyber analytics community.
Firmographics
Metrics about an organization, including its industry, size, and location form the basis of any cyber analytics exercise. Firmographic data consists of this information plus many additional fields. Industry can be described textually or using one of several coding schemes, such as NAICS or SIC. Company size is typically measured by revenue/turnover and employee count. These data sources also give an indication of corporate linkages, including parent companies, subsidiaries, and branch locations. High-quality firmographic data sources will be updated frequently and will use sources such as 10K filings to derive the most accurate information for publicly traded companies. Furthermore, sophisticated modeling techniques can be used to estimate information for non-public entities.
Historical Incidents
To understand the general level of cyber risk for a particular industry, annual revenue and historical incident data become quite useful. Correlating historic data across attacks helps identify patterns, detect intrusions, and mitigate potential risks. Several vendors scrape 10K filings, news articles, and perform Freedom of Information Act requests to collect broad swaths of this data, especially from the United States. This information must be treated with care, as there are several known biases in these data sets. First, the data is much less complete for non-US entities than US entities. Second, due to cyber reporting requirements, the data is less complete for smaller companies than larger companies. Third, since many cyber breaches go unreported until well after the incident occurred, counterintuitively, recent data is less reliable than older data. High-quality sources of historical incident data will try to overcome these three biases. To enhance Marsh McLennan’s view of cyber risk, historical data is further supplemented with internal loss data, to debias analyses and give businesses a more accurate view of risk.
Technographics (inside-out and outside-in)
Technographic data gives us information about the cyber security stance and posture of an organization. There are two high-level methods of sourcing such data: inside-out and outside-in. In the ‘outside-in’ paradigm, various types of sensors are deployed on the internet to unobtrusively collect internet traffic. This data can be used to determine which organizations are connected in a virtual sense, both intentionally for business purposes and unintentionally, if a company is infected with a virus and transmitting data to a malicious server. Similar to how one can wait on a public sidewalk and observe who enters and who leaves an office building, collecting this data does not require the knowledge or permission of the organizations being analyzed. From the raw data, a map of a company’s virtual supply chain can be drawn, augmented with metrics for how much traffic is flowing to malicious destinations. Additionally, the status of various traditional cyber defenses – such as email, firewall, and spoofing security – can be determined. This data is complemented by ‘inside-out’ data, which requires the cooperation of a business to collect. This can be facilitated via a questionnaire completed by the company or an application or device installed on the company’s network. This ‘inside-out’ analysis can generate a deep understanding of a company’s network, security systems, configurations, and policies, further informing the holistic view of cyber risk.
Scoring
Many vendors take firmographic data, historical incident data, and ‘outside-in’ technographic data and develop indices, scores, and ratings. These ratings can illuminate a company’s relative cyber security exposure and provide insight into risks based on an organization’s cyber symptoms. If built and calibrated properly, these ratings can also provide a view into the likelihood of potential future cyber losses. There are many such indices in the industry today, so it is important to understand and vet a particular methodology before utilizing it.
Loss Modeling
Several vendors combine many or all of the previously discussed data sources to develop modeled loss estimates at a company level or for a portfolio of companies. Most of these models employ Monte Carlo simulation to model next year’s potential cyber events tens of thousands of times, forming a full distribution of potential loss scenarios. This distribution is then used to derive metrics such as the average annual loss (AAL) or the exceedance probability (EP) curve. The AAL is a representation of the expected loss for the risk, while the EP curve illustrates various ‘tail’ scenarios that must be accounted for in a robust risk management strategy. The more sophisticated models allow the various data sources and model parameters to be refined to better represent a particular risk or portfolio of risks. For example, some models allow the analyst to study the impact of various cyber controls on loss outcomes. Marsh McLennan uses both third-party loss models in addition to proprietary models to help businesses understand their full cyber risk profile.
To have a comprehensive analytic suite for cyber risk management, one must source data from each of these categories.