AutoDFBench: A Framework for AI Generated Digital Forensic Code and Tool Testing and Evaluation

Congratulations to Akila Wickramasekara and co-authors Alanna Densmore, Frank Breitinger, Hudan Studiawan, and Mark Scanlon on the publication of AutoDFBench: A Framework for AI Generated Digital Forensic Code and Tool Testing and Evaluation in Digital Forensics Doctoral Symposium.

Co-authors: Alanna Densmore, Frank Breitinger, Hudan Studiawan, and Mark Scanlon.

AI-generated summary of the contribution: AutoDFBench is a novel framework designed to address the challenge of manually evaluating AI-generated digital forensic code. It validates AI-generated code and tools against NIST’s Computer Forensics Tool Testing Program (CFTT) procedures and calculates a benchmarking score. The framework operates in four phases: data preparation, API handling, code execution, and result recording with score calculation. It benchmarks generative AI systems, such as Large Language Models (LLMs) and automated code generation agents, for digital forensic applications. The framework is validated using forensic string search tests, involving over 24,200 tests with five top-performing code generation LLMs. The results highlight significant limitations of DF-specific solutions generated by generic LLMs.

Read the publication.