Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts

Xiaoyu Du; Mark Scanlon

doi:10.1145/3339252.3340517

Inproceedings

Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts

Xiaoyu Du; Mark Scanlon

August 2019 The 12th International Workshop on Digital Forensics (WSDF), held at the 14th International Conference on Availability, Reliability and Security (ARES)

PDF BibTeX DOI

Contribution Summary

This paper presents a methodology for automatically prioritizing suspicious file artefacts in digital forensic investigations, addressing the challenge of big data volumes in the field. The proposed methodology employs a supervised machine learning approach, leveraging the recorded results of previously processed cases, and a toolkit for data extraction from disk images. The methodology is designed to work in a human-in-the-loop fashion, predicting and recommending suspicious artefacts rather than providing final analysis results. The paper outlines the process of features extraction, dataset generation, training, and evaluation, and presents a toolkit for data extraction from disk images, enabling the method to be integrated with the conventional investigation process and work in an automated fashion. The proposed solution aims to improve the automation of the digital forensic investigative process, reducing manual analysis effort and improving the efficiency of the investigative process.

Keywords: Digital Forensics; Automatic Forensic Investigation; Artefact Relevancy; Machine Learning; Digital Forensic Data Processing; Automation; Data Deduplication; Triage

Abstract

The ever increasing volume of data in digital forensic investigation is one of the most discussed challenges in the field. Usually, most of the file artefacts on the seized device are not relevant to the investigation. Manually retrieving suspicious file relevant to the investigation is like finding a needle in a haystack. In this paper, a methodology for automatic prioritising suspicious file artefacts (i.e., file artefacts that are relevant to the investigation) is proposed to reduce the manual work to be conducted. This methodology is designed to work in a human-in-the-loop fashion. In other words, it predicts/recommends that an artefact is suspicious rather than giving the final analysis result. A supervised machine learning approach is employed, which leverages the recorded results of previously processed cases. The process of features extraction, dataset generation, training and evaluation are presented in this paper. In addition, a toolkit for data extraction from disk images is outlined, which enables this method to be integrated with the conventional investigation process and work in an automated fashion.

BibTeX

@inproceedings{du2019artefactclassification,
	author={Du, Xiaoyu and Scanlon, Mark},
	title="{Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts}",
	booktitle="{The 12th International Workshop on Digital Forensics (WSDF), held at the 14th International Conference on Availability, Reliability and Security (ARES)}",
	series = {ARES '19},
	year=2019,
	month=08,
	location={Canterbury, UK},
	publisher={ACM},
	address = {New York, NY, USA},
	abstract="The ever increasing volume of data in digital forensic investigation is one of the most discussed challenges in the field. Usually, most of the file artefacts on the seized device are not relevant to the investigation. Manually retrieving suspicious file relevant to the investigation is like finding a needle in a haystack. In this paper, a methodology for automatic prioritising suspicious file artefacts (i.e., file artefacts that are relevant to the investigation) is proposed to reduce the manual work to be conducted. This methodology is designed to work in a human-in-the-loop fashion. In other words, it predicts/recommends that an artefact is suspicious rather than giving the final analysis result. A supervised machine learning approach is employed, which leverages the recorded results of previously processed cases. The process of features extraction, dataset generation, training and evaluation are presented in this paper. In addition, a toolkit for data extraction from disk images is outlined, which enables this method to be integrated with the conventional investigation process and work in an automated fashion.",
  doi={10.1145/3339252.3340517},
}