Revised Code of Washington (RCW) 2026 Edition

CSI-lab/Law_RCW_Dataset

A comprehensive, structured dataset containing the full text of all permanent laws in force in the State of Washington as of 2026. Designed specifically to facilitate legal Natural Language Processing (NLP) research.

Preprocessing Pipeline

Visualizing the transformation from unstructured legislative web pages to a machine-readable JSON array.

Raw Web Data
<div class="rcw">
<h3>1.04.010</h3>
<p>The ninety-one titles...</p>
<span class="note">[1950 c 16]</span>
</div>
Extraction &
Normalization
Structured Corpus
"rcw_number": "9A.60.020",
"title": "Forgery"
"law": "A person is guilty of forgery..."

Motivation & Intended Use

While federal laws are frequently modeled, state-level statutory frameworks represent a critical, under-resourced domain for legal AI. This dataset provides a reliable knowledge base for the following applications:

LLM Pre-training

Continual pre-training or fine-tuning of models to improve comprehension of statutory drafting conventions and legal terminology.

Information Retrieval

Developing and evaluating dense retrieval systems for precise legal search and semantic matching.

RAG & QA

Providing a reliable, structured knowledge base for retrieval-augmented generation and open-domain statutory Question Answering.

Network Analysis

Analyzing cross-references and citations within the text to map the dependency structure of state law.

Dataset Structure

The dataset is distributed as a single, contiguous JSON array. Each element in the array represents a unique, isolated section of the Revised Code of Washington.

Data Fields

  • rcw_number

    The official, hierarchical section identifier (e.g., "1.04.010"), corresponding to Title, Chapter, and Section.

  • title

    The official heading or catchline of the statutory section.

  • text

    The complete, unabridged statutory text including legislative histories, notes, and citations.

Example Instance (JSON Array)
[
  {
    "rcw_number": "1.04.010",
    "title": "Revised Code of Washington enacted.",
    "text": "The ninety-one titles with chapters and sections designated"  
             as the "Revised Code of Washington..."
  }
]

Collection & Preprocessing

Data was systematically scraped from official repositories for the 2026 legislative year.

  • Extraction: HTML/XML parsed to isolate sections without omitting notes.
  • Normalization: Extraneous formatting stripped and Unicode normalized.
  • Alignment: Validated against the official Table of Contents for completeness.

Limitations & Legal Status

Public Domain

In accordance with U.S. copyright law and the "Edicts of Government" doctrine, official state laws are not subject to copyright. This dataset is released into the Public Domain.

Structural Limitations

Reflects 2026 edition only. It does not contain judicial precedent (Case Law) or agency regulations. Performance may degrade on non-WA jurisdictions.

Not Legal Counsel

Models trained on this data must not provide automated legal advice. Developers should implement robust safeguards against the unauthorized practice of law.

Interactive Dataset Viewer

Open in Hugging Face

Explore the corpus structure below. You can search by RCW number or keywords to inspect how the legislative text was parsed and normalized.

huggingface.co/datasets/CSI-lab/Law_RCW_Dataset