Document Type
Event
Start Date
1-12-2022 5:00 PM
Description
One of the most important challenges in the field of a software code audit is the presence of vulnerabilities in software source code. Every year, more and more software flaws are found, either internally in proprietary code or revealed publicly. These flaws are highly likely exploited and lead to system compromise, data leakage, or denial of service. C and C++ open-source code are now available in order to create a large-scale, machine-learning system for function-level vulnerability identification. We assembled a sizable dataset of millions of open-source functions that point to potential exploits. We created an efficient and scalable vulnerability detection method based on deep neural network models that learn features extracted from the source codes. To remove the pointless components and shorten the dependency, the source code is first converted into a minimal intermediate representation. We keep the semantic and syntactic information using state-of-the-art word embedding algorithms. The embedded vectors are subsequently fed into convolutional neural networks to classify the possible vulnerabilities. Furthermore, we proposed a new neural network model which seems to overcome issues associated with traditional neural networks. To measure the performance, we used evaluation metrics such as f1 score, precision, recall, accuracy, and total execution time.
Included in
GR-284 Automated Vulnerability Detection in Source Code Using Deep Neural Networks
One of the most important challenges in the field of a software code audit is the presence of vulnerabilities in software source code. Every year, more and more software flaws are found, either internally in proprietary code or revealed publicly. These flaws are highly likely exploited and lead to system compromise, data leakage, or denial of service. C and C++ open-source code are now available in order to create a large-scale, machine-learning system for function-level vulnerability identification. We assembled a sizable dataset of millions of open-source functions that point to potential exploits. We created an efficient and scalable vulnerability detection method based on deep neural network models that learn features extracted from the source codes. To remove the pointless components and shorten the dependency, the source code is first converted into a minimal intermediate representation. We keep the semantic and syntactic information using state-of-the-art word embedding algorithms. The embedded vectors are subsequently fed into convolutional neural networks to classify the possible vulnerabilities. Furthermore, we proposed a new neural network model which seems to overcome issues associated with traditional neural networks. To measure the performance, we used evaluation metrics such as f1 score, precision, recall, accuracy, and total execution time.