Deep Learning at the Shallow End: Malware Classification for Non-Domain Experts

Quan Le; Oisín Boydell; Brian Mac Namee; Mark Scanlon

doi:10.1016/j.diin.2018.04.024

Article

Deep Learning at the Shallow End: Malware Classification for Non-Domain Experts

Quan Le; Oisín Boydell; Brian Mac Namee; Mark Scanlon

July 2018 Digital Investigation

PDF BibTeX DOI Open Access

Contribution Summary

This paper addresses the need for non-expert tools in digital evidence discovery and analysis by presenting a deep learning-based malware classification approach that requires no expert domain knowledge. The approach is based on a purely data-driven method for complex pattern and feature identification, using a Convolutional Neural Network - Bi Long Short Term Memory (CNN-BiLSTM) architecture. The model achieves a high accuracy of 98.2% in classifying raw binary files into one of 9 classes of malware, with a processing time of 0.02 seconds per file. This is a significant improvement over traditional approaches that require manual effort and domain expertise. The paper also discusses the limitations of existing machine learning approaches for malware analysis and highlights the potential of deep learning in reducing the manual effort required for malware analysis.

Keywords: Deep learning; Malware classification; Non-expert tools; Digital evidence discovery; Data-driven approach; Convolutional Neural Network; Bi Long Short Term Memory; Malware analysis

Abstract

Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification.

BibTeX

@article{le2018deeplearningmalware,
author="Le, Quan and Boydell, Oisín and Mac Namee, Brian and Scanlon, Mark",
title="Deep Learning at the Shallow End: Malware Classification for Non-Domain Experts",
booktitle="Digital Investigation",
volume = "26",
year=2018,
month=07,
pages = "S118 - S126",
publisher="Elsevier",
doi = "https://doi.org/10.1016/j.diin.2018.04.024",
url = "http://www.sciencedirect.com/science/article/pii/S1742287618302032",
keywords = "Deep learning, Machine learning, Malware analysis, Reverse engineering",
abstract="Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification."
}