An enterprise regularly changes its cyber-attack surface due to introducing new technologies with built-in vulnerabilities and developing cutting-edge attack methods that take advantage of these flaws. However, existing repositories providing such associations still need to be completed, increasing the likelihood of undermining the risk of a specific set of attack techniques with missing information.
Moreover, associations often rely on manual interpretations that could be faster than the speed of attacks and, therefore, ineffective in combating the ever-increasing list of vulnerabilities and attack actions. Hence, it is mandatory to develop methodologies to associate vulnerabilities with all relevant attack techniques automatically and accurately is critically important.
A new study has introduced a new AI model that tends to solve the problem. Scientists- at the Department of Energy’s Pacific Northwest National Laboratory, Purdue University, Carnegie Mellon University, and Boise State University– intertwined three large databases of information about computer vulnerabilities, weaknesses, and likely attack patterns.
This new framework, the scientists dubbed ‘Vulnerabilities and Weakness to Common Attack Pattern Mapping (VWC-MAP),’ can automatically identify all relevant attack techniques of a vulnerability via weakness based on their text descriptions, applying natural language process (NLP) techniques.
Mahantesh Halappanavar, a chief computer scientist at PNNL who led the overall effort, said, “Cyber defenders are inundated with information and lines of code. What they need is interpretation and support for prioritization. Where are we vulnerable? What actions can we take?”
“If you are a cyber defender, you may deal with hundreds of vulnerabilities daily. You need to know how those could be exploited and what you need to do to mitigate those threats. That’s the crucial missing piece. You want to know the implications of a bug, how that might be exploited, and how to stop that threat.”
The model is enabled by a novel two-tiered classification approach, where the first tier classifies vulnerabilities to weakness, and the second tier classifies weakness to attack techniques.
Halappanavar said, “If we can classify the vulnerabilities into general categories, and we know exactly how an attack might proceed, we could neutralize threats much more efficiently.”
The new model also extends the project to a third category- attack actions.
The team’s algorithm automatically correlates flaws with suitable attack patterns with up to 80% accuracy and links vulnerabilities with suitable weaknesses with up to 87 percent accuracy. These results are far better than what can be achieved with the instruments of the day, but scientists warn that further extensive testing of their new techniques is required.
In this study, scientists also presented two novel automated approaches: an auto-encoder (BERT) and a sequence-to-sequence model (T5) for mapping weakness-to-attack techniques by applying Text-to-Text and link prediction techniques. The first approach used a language model to associate CVEs to CWEs and then CWEs to CAPECs through a binary link prediction approach. The second approach used sequence-to-sequence techniques to translate CWEs to CAPECs with intuitive prompts for ranking the associations.
The approaches generated very similar results, which the cybersecurity expert on the team then validated.
Note: The work is open source, with a portion now available on GitHub. The team will release the rest of the code soon.
- Siddhartha Shankar Das; Ashutosh Dutta; Sumit Purohit et al. Towards Automatic Mapping of Vulnerabilities to Attack Patterns using Large Language Models. 2022 IEEE International Symposium on Technologies for Homeland Security (HST) (2023). DOI: 10.1109/HST56032.2022.10025459