CISUC

An Exploratory Study of the Combination of Static Analysis Tools Alerts for Vulnerability Detection using Machine Learning

Authors

Abstract

Due to time-to-market needs and cost of manual validation techniques, software systems are often deployed with vulnerabilities that may be exploited to gain illegitimate ac- cess/control, ultimately resulting in non-negligible consequences. Static Analysis Tools (SATs) are widely used for vulnerability detection, where the source code is analyzed without executing it. However, the performance of SATs varies considerably and a high detection rate usually comes with significant false alarms. Recent studies considered combining various SATs to improve the overall detection ability, but they do not allow exploring different performance trade-offs, as basic and rigid rules are normally followed. Machine Learning (ML) algorithms have shown promising results in several complex problems, due to their ability to fit specific needs. This paper presents an exploratory study on the combination of the output of SATs through ML algorithms to improve vulnerability detection while trying to reduce false alarms. The dataset consists of SQL Injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities detected by five different SATs in a large set of WordPress plugins developed in PHP. Results show that, for the case of SQLi, a false alarm reduction is possible without compromising the vulnerabilities detected, and that using ML allows trade-offs (e.g., reduction in false alarms at the expense of a few vulnerabilities) that are not possible with existing techniques. The paper also proposes a regression-based approach for ranking source code files considering estimates of vulnerabilities computed using the output of SATs. Results show that the approach allows creating a ranking of the source code files that largely overlaps the real ranking (based on real known vulnerabilities).

Keywords

Security, Vulnerability Detection, Static Code Analysis, Machine Learning

Conference

IX Latin-American Symposium on Dependable Computing (LADC 2019), November 2019


Cited by

No citations found