Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations

Congratulations to Gaetan Michelet and co-authors Hans Henseler, Harm van Beek, Mark Scanlon, and Frank Breitinger on the publication of Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations in ACM Digital Threats: Research and Practice.

Co-authors: Hans Henseler, Harm van Beek, Mark Scanlon, and Frank Breitinger.

AI-generated summary of the contribution: This paper addresses the underexplored area of fine-tuning large language models (LLMs) for specific digital forensics tasks. The authors propose recommendations for fine-tuning LLMs tailored to digital forensics, emphasizing aspects such as task definition, base model selection, and dataset preparation. A case study on chat summarization is presented to demonstrate the practicality of the approach, evaluating multiple fine-tuned models to assess their performance. The study shares insights from the fine-tuning process, covering computational power issues, data challenges, and evaluation methods. The authors also discuss the costs, limits, and performance increase trade-off to determine if the fine-tuning process is beneficial or not. The developed datasets and fine-tuned models are made available to the academic community for further experimentation.

Read the publication.