Phdthesis

Alleviating the Digital Forensic Backlog: A Methodology for Automated Digital Evidence Processing

Xiaoyu Du

September 2020 School of Computer Science, University College Dublin

Contribution Summary

The digital forensic backlog is a significant challenge in the field, with severe case-hindering backlogs commonplace in law enforcement agencies worldwide. This thesis aims to alleviate the backlog through automated digital evidence processing, achieved by reducing or eliminating redundant digital evidence data handling through data deduplication and automated analysis techniques. A deduplicated evidence processing framework is designed with a Digital Forensic as a Service Framework (DFaaS) paradigm in mind, leveraging a centralized database of previously analyzed files to identify common files and detect known pertinent artefacts at the earliest stage possible in the investigation. The proposed methodology includes a novel, forensically-sound entire disk image reconstruction technique from a deduplicated evidence acquisition system, enabling remote disk acquisitions to be possible faster than the network throughput. Machine learning models are trained using known pertinent artefacts to create a relevancy score for unknown file artefacts, prioritizing the analysis of artefacts most likely to be relevant to the case first.

Keywords: digital forensic backlog; automated digital evidence processing; data deduplication; digital forensic as a service; machine learning; digital forensics; cybersecurity; digital evidence

Abstract

The ever-increasing volume of data in digital forensic investigation is one of the most dis- cussed challenges in the field. Severe, case-hindering digital evidence backlogs have become commonplace in law enforcement agencies throughout the world. The objective of the re- search outlined as part of this thesis is to help alleviate the backlog through automated digital evidence processing. This is achieved by reducing or eliminating, redundant digital evidence data handling through leveraging data deduplication and automated analysis techniques. This helps avoid the repeated re-acquisition, re-storage, and re-analysis of common evidence during investigations. This thesis describes a deduplicated evidence processing framework designed with a Digital Forensic as a Service Framework (DFaaS) paradigm in mind. In the proposed system, prior to the acquisition, artefacts are hashed and compared with a centralised database of previously analysed files to identify common files. Moreover, this process facilitates known pertinent artefacts to be detected at the earliest stage possible in the investigation, i.e., during the acquisition step. The proposed methodology includes a novel, forensically-sound entire disk image reconstruction technique from a deduplicated evidence acquisition system. That is to say, reconstructed disk hashes match the source device without having to acquire all artefacts directly from it. This enables remote disk acquisitions to be possible faster than the network throughput. Known, i.e., previously encountered, pertinent artefacts identified during the acquisition stage are then used for training machine learning models to create a relevancy score for the unknown, i.e., previously unencountered, file artefacts. The proposed technique generates a relevancy score for file similarity using each artefact’s file system metadata and associated timeline events. The file artefacts are subsequently ordered by these relevancy scores to focus the investigator towards the analysis of artefacts most likely to be relevant to the case first.

BibTeX

@phdthesis{du2020PhDAutomatedEvidenceProcessing,
	title="{Alleviating the Digital Forensic Backlog: A Methodology for Automated Digital Evidence Processing}",
	author={Du, Xiaoyu},
	school={School of Computer Science, University College Dublin},
	month=09,
	year=2020,
	address={Dublin, Ireland},
	abstract={The ever-increasing volume of data in digital forensic investigation is one of the most dis- cussed challenges in the field. Severe, case-hindering digital evidence backlogs have become commonplace in law enforcement agencies throughout the world. The objective of the re- search outlined as part of this thesis is to help alleviate the backlog through automated digital evidence processing. This is achieved by reducing or eliminating, redundant digital evidence data handling through leveraging data deduplication and automated analysis techniques. This helps avoid the repeated re-acquisition, re-storage, and re-analysis of common evidence during investigations. This thesis describes a deduplicated evidence processing framework designed with a Digital Forensic as a Service Framework (DFaaS) paradigm in mind. In the proposed system, prior to the acquisition, artefacts are hashed and compared with a centralised database of previously analysed files to identify common files. Moreover, this process facilitates known pertinent artefacts to be detected at the earliest stage possible in the investigation, i.e., during the acquisition step. The proposed methodology includes a novel, forensically-sound entire disk image reconstruction technique from a deduplicated evidence acquisition system. That is to say, reconstructed disk hashes match the source device without having to acquire all artefacts directly from it. This enables remote disk acquisitions to be possible faster than the network throughput. Known, i.e., previously encountered, pertinent artefacts identified during the acquisition stage are then used for training machine learning models to create a relevancy score for the unknown, i.e., previously unencountered, file artefacts. The proposed technique generates a relevancy score for file similarity using each artefact’s file system metadata and associated timeline events. The file artefacts are subsequently ordered by these relevancy scores to focus the investigator towards the analysis of artefacts most likely to be relevant to the case first.}
}