3.4 AI Assistant Weaknesses
While AI Assistants can streamline certain legal workflows, they frequently fail in 6 scenarios that legal teams should anticipate. Understanding where AI goes wrong and how to work around it is essential for in-house counsel seeking safe, efficient use of these tools.
Failure Mode 1: AI struggles with open-ended questions
When prompts are vague, broad, or lack clear boundaries, AI often returns incomplete answers.
Example (Task 13): The prompt instructed the AI Assistant to: "Identify all the commitments the Agent makes regarding regulatory compliance and conduct." Only 2 of 6 tools (NotebookLM and ChatGPT) correctly extracted all the relevant clauses from across the agreement. The others produced partial results, each omitting at least 1 major section.
Failure Mode 2: AI hallucinates when the answer is missing
When the required information isn't present, but the question suggests it should be, some AI tools fabricate an answer rather than admit uncertainty.
Example (Task 9): In Task 9, 4 tools fabricated or misclassified the definition for "Critical Data", a term not defined in the source regulation.
Example (Task 10): In Task 10, 2 tools fabricated the liability cap amount and/or the time-based qualifier, despite the actual figures being redacted in the contract.
Task 10 Query: Extract the rules from the following limitation-of-liability clause. Specifically: 1) What is the liability cap and are there any special time-based provisions affecting it?
Liability Cap and Time-Based Provisions:
• The total cumulative liability of either party under the agreement is capped, but the specific cap amount is redacted
• There is a time-based provision affecting the cap:
- If a claim arises within a certain period after the Effective Date, the liability cap is equal to the total amount of payments projected to be made by the Customer during that period
Failure Mode 3: AI is error-prone when handling multiple documents
Despite marketing claims, most AI Assistants cannot properly and consistently analyze or extract information across multiple files, even when the documents are short and the task is seemingly simple.
Example (Task 16): In Task 16, where none of the AI Assistants succeeded in accurately extracting the governing laws from 11 contracts, it's worth noting that even for AI Assistants where the documents were within their context window limits, the provided answers were incomplete and/or partially inaccurate.
Failure Mode 4: AI mirrors user assumptions in leading questions
If a question contains a false premise, AI Assistants tend to reinforce it rather than verifying it against the document.
Example (Task 6): When asked, "What are LinkedIn's indemnity obligations?" in a Linkedin MSA, only GC AI and Oliver correctly answered that LinkedIn had none. Other AI Assistants incorrectly described vendor indemnities as LinkedIn's, or referenced unrelated clauses. In a follow-up test, when asked a neutral version ("What are the indemnification obligations of the parties?"), all AI Assistants gave the correct answer.
Task 6 Query: What are clauses that detail the LinkedIn's indemnity obligations?
The indemnity obligations in LinkedIn's Master Services Agreement (MSA) are detailed under the Indemnity section.
Failure Mode 5: AI fails when technical or file-handling issues prevent content access
Some AI Assistants fail not because of poor reasoning but due to technical constraints, like file format issues, OCR failures, upload limits, or content filters.
Example (Task 16): In Task 16, tools hit file upload limits (e.g., Copilot allows only one file per query, GPT up to ten), which required AI Assistants to extract the governing law from 11 contracts. DeepSeek could only process 84% of the combined input.
Example (Task 14): In Task 14, Copilot refused to process a file entirely, likely due to content filters misclassifying it as sensitive.
Example (Task 7): When faced with low-quality scanned PDFs, some tools confidently returned incorrect answers, even when the underlying text was unreadable, instead of flagging OCR errors.
Task 7 Query: What's the total value contract value?
The total fixed contract value is $1,860,561 as stated in Article B.2. PRICES of the contract document.
Failure Mode 6: AI struggles with contradictory information
When multiple conflicting references appear in the source, AI assistants may extract one and ignore the other, failing to recognize or flag the discrepancy.
Example (Task 11): When asked to extract the effective date from a contract that did not explicitly define it but mentioned 2 different dates in separate sections (each potentially valid depending on interpretation), some AI assistants returned only 1 date without acknowledging the other.