Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations

Gaëtan Michelet; Hans Henseler; Harm van Beek; Mark Scanlon; Frank Breitinger

doi:10.1145/3748264

Article

Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations

Gaëtan Michelet; Hans Henseler; Harm van Beek; Mark Scanlon; Frank Breitinger

July 2025 ACM Digital Threats: Research and Practice

PDF BibTeX DOI

Contribution Summary

This paper addresses the underexplored area of fine-tuning large language models (LLMs) for specific digital forensics tasks. The authors propose recommendations for fine-tuning LLMs tailored to digital forensics, emphasizing aspects such as task definition, base model selection, and dataset preparation. A case study on chat summarization is presented to demonstrate the practicality of the approach, evaluating multiple fine-tuned models to assess their performance. The study shares insights from the fine-tuning process, covering computational power issues, data challenges, and evaluation methods. The authors also discuss the costs, limits, and performance increase trade-off to determine if the fine-tuning process is beneficial or not. The developed datasets and fine-tuned models are made available to the academic community for further experimentation.

Keywords: Digital Forensics Investigation; Fine-tuning; Local Large Language Models (LLM); Chat Logs Summarization; Reporting Automation; Large Language Models; Digital Forensics; Fine-tuning Recommendations

Abstract

Large language models (LLMs) have rapidly gained popularity in various fields, including digital forensics (DF), where they offer the potential to accelerate investigative processes. Although several studies have explored LLMs for tasks such as evidence identification, artifact analysis, and report writing, fine-tuning models for specific forensic applications remains underexplored. This paper addresses this gap by proposing recommendations for fine-tuning LLMs tailored to digital forensics tasks. A case study on chat summarization is presented to showcase the applicability of the recommendations, where we evaluate multiple fine-tuned models to assess their performance. The study concludes with sharing the lessons learned from the case study.

BibTeX

@article{Michelet2025Fine-TuningLLMDF,
title = {Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations},
journal = {ACM Digital Threats: Research and Practice},
volume = {},
pages = {3748264},
month = 07,
year = {2025},
issn = {2576-5337},
doi = {https://doi.org/10.1145/3748264},
author = {Michelet, Ga\"{e}tan and Henseler, Hans and van Beek, Harm and Scanlon, Mark and Breitinger, Frank},
keywords = {Digital Forensics Investigation, Fine-tuning, Local Large Language Models (LLM), Chat Logs Summarization, Reporting Automation},
abstract = {Large language models (LLMs) have rapidly gained popularity in various fields, including digital forensics (DF), where they offer the potential to accelerate investigative processes. Although several studies have explored LLMs for tasks such as evidence identification, artifact analysis, and report writing, fine-tuning models for specific forensic applications remains underexplored. This paper addresses this gap by proposing recommendations for fine-tuning LLMs tailored to digital forensics tasks. A case study on chat summarization is presented to showcase the applicability of the recommendations, where we evaluate multiple fine-tuned models to assess their performance. The study concludes with sharing the lessons learned from the case study.}
}