Putting AI to the Test in Real-World Legal Work

An AI evaluation report for in-house counsel

OVERVIEW

This AI benchmarking project aims to answer the question:

Is AI ready to handle in-house legal work, or is it all marketing hype?

Our goal is to promote a transparent understanding of how these AI tools perform in real-world conditions and to encourage responsible adoption of AI in legal work. [1] 

This is the first report in a series examining AI tools across common in-house legal workflows. In this edition, we focus on information extraction tasks, questions lawyers often ask of their own document sets (Info Extraction Tasks). We tested 6 AI tools using real world queries submitted by in-house counsel and compared the outputs of purpose-built legal AI tools with general-purpose AI tools. 

Discover how these tools performed and learn what every legal counsel needs to know before trusting AI with legal work.

Loading chart...
6

AI tools tested, covering both legal domain and general-purpose AI tools

108

AI outputs reviewed and scored by legal domain experts

6

Common AI failure modes identified, revealing where AI struggles with legal tasks

CITE THIS REPORT

@report{guo2025putting,
  title={Putting AI To Work In-House: A report on AI performance in real-world info extraction tasks},
  author={Guo, Anna and Souza Rodrigues, Arthur},
  year={2025},
  month={4},
  note={Preprint},
  url={https://www.legalbenchmarks.ai}
}

[1] This report was co-authored by Anna Guo and Arthur Souza Rodrigues, both practicing lawyers. They are not affiliated with or paid by any vendor, law firm, or corporate buyer.

The project was undertaken independently alongside their work. While the evaluation had its limitations due to resource constraints, they strived to be as rigorous and thorough as possible-and are eager to keep learning together with the legal community in this fast-moving AI landscape.