Inproceedings
Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline Events
Contribution Summary
This paper addresses the challenge of digital forensic evidence backlogs by presenting an approach for automated artefact relevancy determination. The method leverages artefact metadata and associated timeline events to generate a relevancy score for each file artefact. This score is used to rank artefacts by their likely relevance to the investigation. The approach is based on a centralised, Digital Forensics as a Service (DFaaS) paradigm, enabling the use of previously encountered pertinent files to classify newly discovered files. The method is validated through experimentation with three emulated investigation scenarios, demonstrating its potential to aid investigators in the discovery and prioritisation of evidence. The approach has the potential to significantly reduce the time and effort required for manual analysis, making it a valuable tool for digital forensic investigations.
Keywords: Automated Artefact Analysis; Evidence Prioritisation; Event-based Evidence Analysis; Digital Forensics as a Service; Machine Learning; Artefact Relevancy Determination; Timeline Analysis; Digital Evidence Backlogs
Abstract
Case-hindering, multi-year digital forensic evidence backlogs have become commonplace in law enforcement agencies throughout the world. This is due to an ever-growing number of cases requiring digital forensic investigation coupled with the growing volume of data to be processed per case. Leveraging previously processed digital forensic cases and their component artefact relevancy classifications facilitates the opportunity for training automated artificial intelligence based evidence processing systems to aid investigators in the discovery and prioritisation of evidence. This paper presents one approach for file artefact relevancy determination based on the growing move towards a centralised, Digital Forensics as a Service (DFaaS) paradigm. This approach enables the use of previously encountered illegal files to detect pertinent files in an investigation. Trained models can aid in the detection of these files during the acquisition stage, i.e., during their upload to a DFaaS system. The technique used is based on a relevancy score determined from file similarity using each artefact's filesystem metadata and associated timeline events. The approach presented is validated against three experimental usage scenarios.
BibTeX
@inproceedings{du2020ArtefactRelevancy,
author={Du, Xiaoyu and Le, Quan and Scanlon, Mark},
title="{Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline Events}",
booktitle="{The 6th IEEE International Conference on Cyber Security and Protection of Digital Services (Cyber Security)}",
year=2020,
month=06,
location={Virtual Event},
publisher={IEEE},
abstract="Case-hindering, multi-year digital forensic evidence backlogs have become commonplace in law enforcement agencies throughout the world. This is due to an ever-growing number of cases requiring digital forensic investigation coupled with the growing volume of data to be processed per case. Leveraging previously processed digital forensic cases and their component artefact relevancy classifications facilitates the opportunity for training automated artificial intelligence based evidence processing systems to aid investigators in the discovery and prioritisation of evidence. This paper presents one approach for file artefact relevancy determination based on the growing move towards a centralised, Digital Forensics as a Service (DFaaS) paradigm. This approach enables the use of previously encountered illegal files to detect pertinent files in an investigation. Trained models can aid in the detection of these files during the acquisition stage, i.e., during their upload to a DFaaS system. The technique used is based on a relevancy score determined from file similarity using each artefact's filesystem metadata and associated timeline events. The approach presented is validated against three experimental usage scenarios.",
doi={10.1109/CyberSecurity49315.2020.9138874},
}