ChatGPT for digital forensic investigation: The good, the bad, and the unknown

Mark Scanlon; Frank Breitinger; Christopher Hargreaves; Jan-Niclas Hilgert; John Sheppard

doi:10.1016/j.fsidi.2023.301609

Article

ChatGPT for digital forensic investigation: The good, the bad, and the unknown

Mark Scanlon; Frank Breitinger; Christopher Hargreaves; Jan-Niclas Hilgert; John Sheppard

January 2023 Forensic Science International: Digital Investigation

Best Paper DFRWS APAC 2023

PDF BibTeX DOI Open Access

Contribution Summary

This paper presents a comprehensive evaluation of the impact of ChatGPT on digital forensics, focusing on its latest pre-trained model, GPT-4. The study assesses ChatGPT's capabilities in various digital forensic use cases, including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. The authors conduct a series of experiments to evaluate ChatGPT's strengths and risks in these areas, highlighting both the potential benefits and limitations of using ChatGPT in digital forensic investigations. The study concludes that while ChatGPT can be a useful supporting tool for knowledgeable users, it requires careful consideration of its strengths and weaknesses, particularly in terms of its potential for generating incorrect or misleading information. The paper also discusses the potential risks associated with using ChatGPT in digital forensic investigations, including the risk of relying on incorrect or incomplete information, and the need for investigators to critically evaluate the output of ChatGPT. The study provides a comprehensive overview of the current state of ChatGPT in digital forensics and highlights the need for further research in this area.

Keywords: ChatGPT; digital forensics; artificial intelligence; generative pre-trained transformers; large language models; GPT-4; digital forensic investigation; artefact understanding

Abstract

The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of text-based training data. This paper assesses the impact and potential impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4. A series of experiments are conducted to assess its capability across several digital forensic use cases including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. Across these topics, its strengths and risks are outlined and a number of general conclusions are drawn. Overall this paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present, since the evidence would need to be uploaded to the service, or they require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, it could act as a useful supporting tool in some circumstances.

BibTeX

@article{scanlon2023ChatGPTforDigitalForensics,
title = "{ChatGPT for digital forensic investigation: The good, the bad, and the unknown}",
journal = {Forensic Science International: Digital Investigation},
volume = {46},
pages = {301609},
year = {2023},
issn = {2666-2817},
doi = {https://doi.org/10.1016/j.fsidi.2023.301609},
url = {https://www.sciencedirect.com/science/article/pii/S266628172300121X},
author = {Mark Scanlon and Frank Breitinger and Christopher Hargreaves and Jan-Niclas Hilgert and John Sheppard},
keywords = {ChatGPT, Digital forensics, Artificial intelligence, Generative pre-trained transformers (GPT), Large language models (LLM)},
abstract = {The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of text-based training data. This paper assesses the impact and potential impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4. A series of experiments are conducted to assess its capability across several digital forensic use cases including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. Across these topics, its strengths and risks are outlined and a number of general conclusions are drawn. Overall this paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present, since the evidence would need to be uploaded to the service, or they require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, it could act as a useful supporting tool in some circumstances.}
}