Revised Code of Washington
(RCW)
2026 Edition
CSI-lab/Law_RCW_Dataset
A comprehensive, structured dataset containing the full text of all permanent laws in force in the State of Washington as of 2026. Designed specifically to facilitate legal Natural Language Processing (NLP) research.
Preprocessing Pipeline
Visualizing the transformation from unstructured legislative web pages to a machine-readable JSON array.
<h3>1.04.010</h3>
<p>The ninety-one titles...</p>
<span class="note">[1950 c 16]</span>
</div>
Normalization
Motivation & Intended Use
While federal laws are frequently modeled, state-level statutory frameworks represent a critical, under-resourced domain for legal AI. This dataset provides a reliable knowledge base for the following applications:
LLM Pre-training
Continual pre-training or fine-tuning of models to improve comprehension of statutory drafting conventions and legal terminology.
Information Retrieval
Developing and evaluating dense retrieval systems for precise legal search and semantic matching.
RAG & QA
Providing a reliable, structured knowledge base for retrieval-augmented generation and open-domain statutory Question Answering.
Network Analysis
Analyzing cross-references and citations within the text to map the dependency structure of state law.
Dataset Structure
The dataset is distributed as a single, contiguous JSON array. Each element in the array represents a unique, isolated section of the Revised Code of Washington.
Data Fields
-
rcw_numberThe official, hierarchical section identifier (e.g., "1.04.010"), corresponding to Title, Chapter, and Section.
-
titleThe official heading or catchline of the statutory section.
-
textThe complete, unabridged statutory text including legislative histories, notes, and citations.
[ { "rcw_number": "1.04.010", "title": "Revised Code of Washington enacted.", "text": "The ninety-one titles with chapters and sections designated" as the "Revised Code of Washington..." } ]
Collection & Preprocessing
Data was systematically scraped from official repositories for the 2026 legislative year.
- Extraction: HTML/XML parsed to isolate sections without omitting notes.
- Normalization: Extraneous formatting stripped and Unicode normalized.
- Alignment: Validated against the official Table of Contents for completeness.
Limitations & Legal Status
Public Domain
In accordance with U.S. copyright law and the "Edicts of Government" doctrine, official state laws are not subject to copyright. This dataset is released into the Public Domain.
Structural Limitations
Reflects 2026 edition only. It does not contain judicial precedent (Case Law) or agency regulations. Performance may degrade on non-WA jurisdictions.
Not Legal Counsel
Models trained on this data must not provide automated legal advice. Developers should implement robust safeguards against the unauthorized practice of law.
Interactive Dataset Viewer
Open in Hugging FaceExplore the corpus structure below. You can search by RCW number or keywords to inspect how the legislative text was parsed and normalized.