3.4 AI Assistant Weaknesses

While AI Assistants can streamline certain legal workflows, they frequently fail in 6 scenarios that legal teams should anticipate. Understanding where AI goes wrong and how to work around it is essential for in-house counsel seeking safe, efficient use of these tools.

Failure Mode 1: AI struggles with open-ended questions

When prompts are vague, broad, or lack clear boundaries, AI often returns incomplete answers.

Example (Task 13): The prompt instructed the AI Assistant to: "Identify all the commitments the Agent makes regarding regulatory compliance and conduct." Only 2 of 6 tools (NotebookLM and ChatGPT) correctly extracted all the relevant clauses from across the agreement. The others produced partial results, each omitting at least 1 major section.

Why it matters: AI may miss key clauses if the task isn't narrowly scoped, potentially leading counsel to overlook important rights or obligations and provide partial answers.

Failure Mode 2: AI hallucinates when the answer is missing

When the required information isn't present, but the question suggests it should be, some AI tools fabricate an answer rather than admit uncertainty.

Example (Task 9): In Task 9, 4 tools fabricated or misclassified the definition for "Critical Data", a term not defined in the source regulation.

Example (Task 10): In Task 10, 2 tools fabricated the liability cap amount and/or the time-based qualifier, despite the actual figures being redacted in the contract.

Task 10 Query: Extract the rules from the following limitation-of-liability clause. Specifically: 1) What is the liability cap and are there any special time-based provisions affecting it?

Liability Cap and Time-Based Provisions:

• The total cumulative liability of either party under the agreement is capped, but the specific cap amount is redacted

• There is a time-based provision affecting the cap:

If a claim arises within a certain period after the Effective Date, the liability cap is equal to the total amount of payments projected to be made by the Customer during that period

Correct Response (Pass)

Incorrect Response (Fail)

Why it matters: Confident, fabricated answers can easily slip past review and lead to incorrect legal advice or risk assessments.

Failure Mode 3: AI is error-prone when handling multiple documents

Despite marketing claims, most AI Assistants cannot properly and consistently analyze or extract information across multiple files, even when the documents are short and the task is seemingly simple.

Example (Task 16): In Task 16, where none of the AI Assistants succeeded in accurately extracting the governing laws from 11 contracts, it's worth noting that even for AI Assistants where the documents were within their context window limits, the provided answers were incomplete and/or partially inaccurate.

Why it matters: Multi-doc review is a standard part of legal work, and AI's inability to handle it reliably limits its usefulness in real-world scenarios.

Failure Mode 4: AI mirrors user assumptions in leading questions

If a question contains a false premise, AI Assistants tend to reinforce it rather than verifying it against the document.

Example (Task 6): When asked, "What are LinkedIn's indemnity obligations?" in a Linkedin MSA, only GC AI and Oliver correctly answered that LinkedIn had none. Other AI Assistants incorrectly described vendor indemnities as LinkedIn's, or referenced unrelated clauses. In a follow-up test, when asked a neutral version ("What are the indemnification obligations of the parties?"), all AI Assistants gave the correct answer.

Task 6 Query: What are clauses that detail the LinkedIn's indemnity obligations?

The indemnity obligations in LinkedIn's Master Services Agreement (MSA) are detailed under the Indemnity section.

Correct Response (Pass)

Incorrect Response (Fail)

Why it matters: Leading prompts can steer AI toward incorrect interpretations that look convincing on the surface.

Failure Mode 5: AI fails when technical or file-handling issues prevent content access

Some AI Assistants fail not because of poor reasoning but due to technical constraints, like file format issues, OCR failures, upload limits, or content filters.

Example (Task 16): In Task 16, tools hit file upload limits (e.g., Copilot allows only one file per query, GPT up to ten), which required AI Assistants to extract the governing law from 11 contracts. DeepSeek could only process 84% of the combined input.

Example (Task 14): In Task 14, Copilot refused to process a file entirely, likely due to content filters misclassifying it as sensitive.

Example (Task 7): When faced with low-quality scanned PDFs, some tools confidently returned incorrect answers, even when the underlying text was unreadable, instead of flagging OCR errors.

Task 7 Query: What's the total value contract value?

The total fixed contract value is $1,860,561 as stated in Article B.2. PRICES of the contract document.

Correct Response (Pass)

Incorrect Response (Fail)

Why it matters: Technical limitations can prevent AI from accessing or processing content, leading to incomplete or failed analysis regardless of the AI's capabilities. AI tools may silently skip unreadable content or refuse to process a file without pointing it out or explaining why, leaving lawyers to troubleshoot issues often without knowing whether the content was ever reviewed.

Failure Mode 6: AI struggles with contradictory information

When multiple conflicting references appear in the source, AI assistants may extract one and ignore the other, failing to recognize or flag the discrepancy.

Example (Task 11): When asked to extract the effective date from a contract that did not explicitly define it but mentioned 2 different dates in separate sections (each potentially valid depending on interpretation), some AI assistants returned only 1 date without acknowledging the other.

Why it matters: If an AI assistant doesn't flag conflicting information, a lawyer might take the answer at face value and miss a critical ambiguity.