The Legal Industry's Shared AI Evaluation Framework

Designed and validated by 100+ legal and technology leaders from the buy side across the globe.

Legal teams have well-established playbooks for hiring humans, but none for hiring AI agents. This is that playbook, built by the legal community.

Download the Evaluation Toolkit Explore the AI Evaluation Framework

Independent

No vendor sponsorship or commercial influence.

Practitioner-shaped

Shaped by buy-side legal, AI, and technology leaders.

Open-source

Publicly available and community-driven.

The Problem

Why Legal AI Procurement is Broken

0%^*

Of legal teams are not committed to their current AI vendor.

Many legal teams are still searching for tools that meet their needs.

0%^*

Of legal teams evaluate AI tools without IT or security involvement.

Many procurement decisions happen without a structured technical review.

0+^**

AI tools now target legal teams.

Many promise similar capabilities, making meaningful vendor comparison difficult.

The result: months of duplicated effort, inconsistent evaluation, and decisions driven by demos and marketing rather than evidence.

The Solution

The Legal AI Evaluation Framework

The Legal AI Evaluation Framework is the legal community's first open-access evaluation framework built to help legal teams make defensible, evidence-based procurement decisions.

100+

Legal leaders across 82 organizations and law firms

Countries represented

100

Evaluation sub-tests across 8 core criteria

Evaluation Criteria

The 8 Core Evaluation Criteria

Criterion

What It Covers

1.Strategic Fit

Alignment to legal use cases, IT systems, jurisdictions, languages, and long-term legal team needs.

2.Functionality

User interface design, workflow integration, customisation, input handling, and real-world usability within existing technology stacks and processes.

3.Robustness

Factual accuracy, completeness, instruction fidelity, verifiability, citation quality, and consistency of AI-generated outputs across legal tasks.

4.Security

Architecture and data flow transparency, access control, retrieval boundaries, adversarial resistance, AI safety, and alignment with the organisation’s security policy requirements.

5.Data Privacy

Data use restrictions (including no-training provisions covering aggregated and derived data), deletion, localisation, vector embedding governance, and sub-processor practices.

6.Vendor Risk

Licensing, contractual security commitments, data portability, audit trails, transparency, vendor track record, business continuity, and incident response.

7.Adoption Support

Training, support, issue resolution, documentation, and usage reporting to support rollout and sustained adoption.

8.Cost & Resourcing

Pricing model, total cost of ownership, and internal operational capacity required to deploy and maintain the tool.

Download the Evaluation Toolkit

Evaluation Toolkit

The 3 Stages of Legal AI Evaluation

Like hiring a knowledge worker, each stage of evaluating a legal AI tool demands a different level of scrutiny. The Evaluation Toolkit turns the Legal AI Evaluation Framework into practical tools for buyers at each stage of evaluation.

Stage 1

·Resume screening

Pre-Demo Checklist

Pass/fail screening to decide if a demo is worth booking

Proceed / Do Not Proceed

Download Checklist

Stage 2

·Interview

Demo Scorecard

Scored validation of whether the tool performs as claimed

Proceed / Do Not Proceed

Download Scorecard

Stage 3

·Working trial

Pilot Scorecard

Evidence-based, weighted evaluation using real workflows

Proceed / Do Not Proceed

Download Scorecard

Getting Started

How to Use This Toolkit

This toolkit applies whether you are:

Starting a new search for a legal AI tool
Midway through vendor conversations and need structure
Running a pilot and need a consistent way to compare tools
Reassessing or renewing a contract with your current provider

Researching tools?

Start with the Pre-Demo Checklist (Stage 1)

Sitting in demos?

Use the Demo Scorecard (Stage 2)

Trialing a tool?

Use the Pilot Scorecard (Stage 3)

Starting from scratch? Work through all 3 stages in order.

Then make it yours. Adapt criteria based on your team's priorities. The toolkits are starting points, not rigid templates. Every legal team operates differently. The framework gives you the structure — you decide what matters most.

Timeline & Roadmap

How We Built This

This framework was built from the ground up by the people who actually do this work.

Framework Development

Complete

November to December 2025

Synthesised legal AI evaluation approaches used in real vendor selection processes and assembled the first draft of the framework.

Community Feedback

Complete

February 21, 2026

Opened the draft for community input. More than 100 legal leaders contributed feedback.

v1 Finalisation

Complete

Late February 2026

Incorporated community feedback into Version 1 of the Legal AI Evaluation Framework and the accompanying Evaluation ToolKit.

Publication and Launch

Live

March 11, 2026

Public release of the Legal AI Evaluation Framework v1, together with the practical toolkits that operationalise the framework for real evaluation workflows.

What Comes Next

Ongoing

From March 2026

Gathering post-launch feedback and releasing additional practical resources, including a vendor questionnaire, the evaluation framework report, a quick-reference cheat sheet, and community insights.

This is a living framework. It will evolve as the tools, the models, and the market change. Want to shape what comes next?

Join the community

The Team

Built by Practitioners Across Legal, IT, Security, and Privacy

This framework is built and maintained by legal professionals, technologists, security experts, and privacy specialists.

Core Team

Designed and published the framework. Responsible for day-to-day development, content, and coordination.

Anna Guo

Founder, Legal Benchmarks

Over the past 13 months, Anna has benchmarked AI performance in real legal workflows alongside 500+ legal professionals. That work has shaped hundreds of procurement decisions around AI tooling.

Elgar Weijtmans

Technologist & Former Lawyer

Led legal AI procurement and evaluated 40+ tools last year. Brings an end-to-end view of AI assessment, from screening and testing to piloting and making the final tool decision.

Roel Schrijvers

General Counsel

Focused on data security and operational risk in AI systems. Pushes the team to ask the questions most organisations miss around security, safety, and deployment risk.

Sunny Kim

Comms Lead

Leads communications and PR for the framework. Brings experience in legal and professional services communications to help the community understand and engage with the project.

Advisory Board

Senior leaders from legal, technology, AI governance, and security who guide strategic decisions, share their perspective on new directions, and review drafts before they go public.

Alexandra Robins

GC, Reptune

Andie Garford-Tull

General Counsel, AI Transformation and Governance, Dentsu

Andrew Greenfeld

Head of Legal Operations, Zoom

Blake Hei

General Counsel, Hoyoverse

Celia Reinsvold

Director, Legal AI Transformation

Daniel Lönnberg

Head of Data and Innovation, Familjens Jurist i Sverige AB

Danylo Panov

Head of Legal Operations, Romexsoft

Gary Ang, PHD

ex-MAS AI Risk Lead & Investment Risk Head

Jason Tamara Widjaja

Executive AI Director

Jean Li

APAC General Counsel, Rapyd

Johan Granqvist

Head of Legal and Compliance, Swarmia

Laura Jeffords Greenberg

General Counsel, Worksome

Marc Mandel

General Counsel, Exos

Nada Alnajafi

Legal Ops Leader & Senior Corporate Counsel, Franklin Templeton

Nicolas Alejandro Panigutti

Legal Manager, Global Legal Transformation Team, Santander

Stephanie Dominy

General Counsel and Head of Ops, Tessl

Tara Herman

Senior Vice President, Deputy General Counsel, Girl Scouts of the USA

Timm Ernst

Head of Legal Tech Group, Henkel

Victor Green

Vice President, Contracts Services, Morgan Stanley

Weng Yip

Vice President, Legal, Razer Inc.

Steering Committee

A broader group of practitioners across legal, IT, security, and privacy who validate and shape the assessment criteria, review and give feedback on the framework, and publicly signal support. All members are acknowledged in the published v1 guideline and toolkit.

Aalia Manie

Ahmet Guler

Alicia Larrazabal Erminy

Anastasiia Chiganov-Zalesskaia

Andrada Popescu

Annemarie Brennan

Arjen Eeken

Arthur Souza Rodrigues

Azhar Aziz-Ismail

Bensu Aydin

Bert Vries

Carlos Eduardo Vieira Costa de Lamare

Carrie Stephenson

Charlie Morgan

Chris Werner

Claudio Bild

Claudio Klaus

Damian Tommasino

Dana Kempler

Dana Kretschmer-Konzal

Daniella Domokos

Denisa Kopandi

Dick van Lankeren Matthes

Douwe Groenevelt

Elena Tzvetinova

Elizabeth Evans

Emelie Wesselink

Eric Lystrup

Fiona Q. Nguyen

Gayk Ayvazyan

Grant Ramsey

Hugo Chow

Jan-Willem Prakke

Jennifer Sing

Joep Feringa

Jolanda Rose

Jonathan Savar

Joseph Zeke Rucker

Kars van Houten

Kevan Wee

Kim Humphrey

Lasse Milinski

Lien Tran

Luara Locateli

Lucia Loyo

Mark Zijlstra

Martin Woodward

Matthew Kohel

Maxim Svyatov

Meena Parbhu

Meeta Agarwal

Melissa Köhler-van der Hulst

Michael Odo

Mohamed Al Mamari

Nate Kostelnik

Olivia Singh

Pim Betist

Prapti Patel

Rebecca Gwilt

Robert Brosgill

Robert Jweinat

Robin Musch

Rok Popov Ledinski

Rutger Lambriex

Sarah R. Moros

Shaik Ashfaq

Siavash Shamskho

Spencer Gusick

Sven van de Kamp

Tommaso Ricci

Tunaseli Kamburoglu

Waridah Makena

Wei Yee Tan

Xuan Ming Tan

Zehra Sahin

and more

FAQ

Frequently Asked Questions

Newsletter

Stay updated

Get behind-the-scenes updates, contribution opportunities, and report updates.

* Legal Benchmarks 2026 community survey data; forthcoming AI Evaluation Framework Report.

** December 2025 Legaltech Hub directory/map count of GenAI legal tech solutions

Disclaimer

This framework is a community resource published by Legal Benchmarks, developed with contributions from legal and AI professionals across more than 100 organisations. It is provided for informational and educational purposes only and does not constitute legal, technical, or professional advice. No contributor, editor, or affiliated organisation accepts liability for decisions made on the basis of this framework. Organisations should adapt the framework to their own context, risk profile, and regulatory environment, and consult qualified professionals before making procurement or deployment decisions.

License

This document may be shared and adapted under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence, provided that Legal Benchmarks is credited as the original source.

The Legal Industry's Shared AI Evaluation Framework

Independent

Practitioner-shaped

Open-source

Why Legal AI Procurement is Broken

The Legal AI Evaluation Framework

The 8 Core Evaluation Criteria

The 3 Stages of Legal AI Evaluation

Pre-Demo Checklist

Demo Scorecard

Pilot Scorecard

How to Use This Toolkit

How We Built This

Framework Development

Community Feedback

v1 Finalisation

Publication and Launch

What Comes Next

Built by Practitioners Across Legal, IT, Security, and Privacy

Core Team

Advisory Board

Steering Committee

Frequently Asked Questions

What is this framework and who is it for?

How is this different from a risk assessment?

Can I adapt this for my organization?

Can this be used for evaluating AI tools outside legal?

We’re building our internal AI tool or using general-purpose models. Does this still apply?

I don’t agree with some of the criteria or weightings.

There are more evaluation stages than three.

Legal AI tools are all wrappers. This evaluation is pointless.

Is this open governance?

What does open-access mean?

How do I get involved?

I am a vendor and I want to contribute.

I am a vendor and this is overburdensome to comply with.

How often will this be updated?

Is this aligned with ISO or any formal standard?

Who is funding this project?

Does this framework evaluate or endorse specific tools?

What if a vendor refuses to answer questions in the toolkit?

Disclaimer

License