The Washington law Benchmark (WLB)

CSI-lab/RCW_2025_Positive_Query_Pairs

A large-scale, synthetic dataset designed specifically to advance Legal Information Retrieval (IR) and Semantic Search. It bridges the critical "semantic gap" between natural language and formal statutory legalese.

Massive Scale

Contains hundreds of thousands of meticulously paired examples mapping hypothetical scenarios and local legislative drafts directly to their governing WA State Statutes.

Bridging the Gap

Effectively translates conversational and drafted text into the dense, structured legalese found within the official Revised Code of Washington.

Perfect for Training

Ideal for fine-tuning dense retrievers (like BGE, E5, or MiniLM) using contrastive learning techniques such as Multiple Negatives Ranking (MNR) loss.

Use Case: Bridging the Semantic Gap

The dataset can train models to link conversational logic to strict statutory structures.

Citizen / Draft Query

"Can the city fine me $1,000 for leaving trash in the local park?"

Vector Space
Formal Statute (RCW)

RCW 70A.200.060
Maximum penalty for civil littering infractions shall not exceed five hundred dollars...

Supported Tasks

Sentence Similarity & IR

The primary use case. Train embedding models to retrieve the correct governing RCW given a plain-English query.

Retrieval-Augmented Gen

Evaluate the retrieval step of legal AI assistants and chatbots to ensure accurate context and prevent LLM hallucinations.

Text Classification

Categorize legal queries into one of the 500+ distinct legal domains implicitly generated in the dataset metadata.

Dataset Structure

Each row in the dataset represents a 1-to-1 positive pair between a synthetic query and a state law. It is structured in standard JSONL format.

Example Instances (JSONL)
{
  "query_type": "natural_language",
  "query": "what are the rules for suing a washington state county",
  "context": "RCW 4.08.120 - Action against public corporations. An action may be maintained against a county or other of the public corporations mentioned or described in RCW 4.08.110, either upon a contract made by such county, or other public corporation in its corporate character and within the scope of its authority, or for an injury to the rights of the plaintiff arising from some act or omission of such county or other public corporation. [ 1953 c 118 s 2 . Prior: Code 1881 s 662; 1869 p 154 s 602 ; RRS s 951.]",
  "metadata": {
    "rcw_number": "4.08.120",
    "title": "Action against public corporations."
  }
}

{
  "query_type": "exact_keyword",
  "query": "RCW 4.12 washington superior court new county",
  "context": "RCW 4.12.070 - Change to newly created county. Any party in a civil action pending in the superior court in a county out of whose limits a new county, in whole or in part, has been created, may file with the clerk of such superior court an affidavit setting forth that he or she is a resident of such newly created county, and that the venue of such action is transitory, or that the venue of such action is local, and that it ought properly to be tried in such newly created county; and thereupon the clerk shall make out a transcript of the proceedings already had in such action in such superior court, and certify it under the seal of the court, and transmit such transcript, together with the papers on file in his or her office connected with such action, to the clerk of the superior court of such newly created county, wherein it shall be proceeded with as in other cases. [ 2011 c 336 s 80 ; 1891 c 33 s 2 ; Code 1881 s 53; 1877 p 12 s 54 ; 1869 p 14 s 54 ; 1854 p 377 s 2 ; RRS s 211.]",
  "metadata": {
    "rcw_number": "4.12.070",
    "title": "Change to newly created county."
  }
}

Interactive Dataset Viewer

Open in Hugging Face

Explore a live sample of the training pairs directly below. You can search, filter, and inspect the raw structure of the data before downloading it for your own fine-tuning runs.

huggingface.co/datasets/CSI-lab/RCW_2025_Positive_Query_Pairs