4.2 Limitations
Limitations
Narrow Task Scope
We tested only 18 Info Extraction Tasks. Broader legal tasks like legal research, drafting or redlining were not assessed, so findings may not generalize beyond similar use cases.
English-Only Evaluation
All tasks used English documents. Real-world scenarios often involve varied languages. We did not test multilingual capabilities, even though some vendors offer them. One task involved back-translation to Chinese, but was offered in English.
Snapshot in Time
Results reflect AI performance as of early 2025. With rapid AI development, accuracy and features may evolve significantly within months.
Subjectivity in Human Evaluation
To ensure objectivity, each output was blind-reviewed by 2 independent human evaluators using a standardized rubric, with disagreements resolved by a 3rd reviewer. We did not conduct LLM-based reviews due to resource constraints and their known limitations, such as inconsistent judgments, prompt sensitivity, and limited legal domain expertise.
Broader Factors Not Assessed
This evaluation did not assess the following broader dimensions of AI platforms:
- Security & Privacy (e.g. data handling, storage, compliance with privacy laws)
- Governance & Assurance (e.g. explainability, auditability, alignment with standards)
- Pricing & Value (e.g. cost-effectiveness, pricing models)
- Support & Reliability (e.g. uptime, vendor responsiveness)
- Trust & Safety (e.g. bias, misuse risks)
These areas are critical for enterprise adoption, and we aim to consider them in future reviews.
No Multi-Turn Dialogue
Each AI Assistant had one shot per task. In practice, results may improve with prompt iteration and continuous dialogue. Our findings reflect a "cold start" scenario.
Limited Legal AI Vendor Coverage
We focused our evaluation on 2 legal AI vendors: GC AI and Vecflow (Oliver). We're grateful to both for supporting our independent review. While many other legal AI vendors were keen to provide demos, several declined to participate in a structured evaluation, preferring to self-publish performance results or would only offer results collected by themselves, which we did not accept.
We are practicing lawyers and real users who utilize these tools in our day-to-day work. Our aim is to share honest, hands-on insights to help other lawyers like us better understand and adopt these technologies. While this assessment doesn't replicate the formality of a "Michelin-style" review or academic research paper, we remain committed to transparency, independence, and practical relevance.
We welcome legal AI vendors that believe in fostering transparency, informed adoption, and open dialogue in the legal community to participate in future evaluations.