2.2 Task Dataset

In-house counsel working in the United States, the United Kingdom, Singapore and China submitted real-world Info Extraction Tasks to this evaluation. [4]

Each task contribution consisted of:

  1. a user query (in the in-house lawyer's own words);
  2. one or more source documents (e.g. contracts, privacy policies, terms & conditions, or regulations) from which the answer must be extracted; and
  3. attributes for an accurate answer (Accuracy Attributes)

We selected a diverse mix of 18 Info Extraction Tasks from these submissions. Some key dimensions of variation in the tasks included:

Dataset Characteristics

LabelsTags
Query ScopeOpen-Ended; Narrow/Binary
Query ClarityClear; Ambiguous
Document QualityHigh Quality; Low Quality
Document ScopeSingle Document; Multiple Documents
Information MatchClear Single Match; Multiple Matches; Ambiguous/Partial Match; No Match; Contradictory Matches
Extraction ComplexityNon-interpretative (simple data extraction); Interpretative (requiring legal/contextual understanding)

This dataset variety was intentional. It allows us to observe how each AI Assistant handles easy vs. hard questions, clear vs. vague instructions, single vs. multiple documents, etc., similar to how real queries would vary in the legal practice.

The dataset tasks can be accessed upon email request addressed to aguozy@gmail.com

[4] This evaluation uses the term “Info Extraction Tasks” based on how in-house lawyers themselves defined the tasks they submitted. In practice, these tasks often involve more than surface-level retrieval. Some require legal interpretation to understand the query or to navigate the nuances in the source materials.

We did not filter or rewrite tasks to make them narrowly scoped or cleanly extractive. Instead, we preserved the queries in their original form and labeled each task based on its observable characteristics. This reflects how in-house legal work actually unfolds where questions are rarely tidy and the distinction between extraction, reasoning, and analysis often blurs.