Back to feed

sledtools/pika branch #112

pika-orch-incus-cleanup-23

Add codex-backed invariants check

Target branch: master

Merge Commit: 4acb7cd6a4bdec60d9ab2341f45f71b61a63dd22

branch: merged tutorial: ready ci: success
Open CI Details

Continuous Integration

CI: success

Compact status on the review page, with full logs on the CI page.

Open CI Details

Latest run #139 success

9 passed

head c54b2473cf2e46b145ac2e14e5502a373d888c83 · queued 2026-03-26 17:43:53 · 9 lane(s)

queued 12s · ran 1m 34s

check-pika-rust · success check-pika-followup · success check-notifications · success check-agent-contracts · success check-rmp · success check-pikachat · success check-apple-host-sanity · success check-pikachat-openclaw-e2e · success check-fixture · success

Summary

This branch introduces a Codex-backed architecture invariant review system for the pika project. It adds a TOML specification format for declaring architectural invariants (rules the codebase must uphold), a Python script that feeds those invariants to OpenAI Codex for automated grading, Just task wiring so developers can run the check with just invariants, and a unit test suite validating the spec loader, prompt builder, report validator, and schema generator. The goal is to catch architectural drift — such as unwanted coupling between the reusable pikaci execution layer and Pika-specific CI policy — through LLM-assisted code review rather than purely mechanical linting.

Tutorial Steps

Define the invariants specification in TOML

Intent: Establish a declarative, version-controlled file where architectural rules are expressed as structured records. Each invariant carries an ID, area tag, kind (must / allowed), a natural-language statement, optional file-glob scope, and an optional reviewer hint.

Affected files: invariants/invariants.toml

Evidence
@@ -0,0 +1,34 @@
+version = 1
+name = "pika-project-invariants"
+
+[[invariant]]
+id = "PIKACI-001"
+area = "pikaci"
+kind = "allowed"
+statement = "pikaci may depend on pika-cloud."
+scope = [
+  "crates/pikaci/**",
+  "crates/pika-news/**",
+]
+
+[[invariant]]
+id = "PIKACI-002"
+...
+kind = "must"
+statement = "The reusable pikaci execution layer does not hardcode Pika-specific CI lanes, targets, or package-specific test commands."
+
+[[invariant]]
+id = "PIKACI-003"
+...
+statement = "Pika-specific path filters and lane catalogs live outside the reusable pikaci execution layer."

The file invariants/invariants.toml is the single source of truth for every architectural rule the project wants to enforce via LLM review.

Format highlights

FieldPurpose
versionSchema version (must be 1).
idUnique, human-readable identifier (e.g. PIKACI-001).
kind"must" = required property; "allowed" = permitted dependency / coupling.
scopeArray of file globs the reviewer should focus on.
hintFree-text guidance steering the LLM toward non-obvious checks.

Three invariants are shipped initially, all in the pikaci area:

  1. PIKACI-001 (allowed) — pikaci may depend on pika-cloud.
  2. PIKACI-002 (must) — The reusable execution layer must not hard-code Pika-specific lanes or test commands.
  3. PIKACI-003 (must) — Pika-specific path filters and lane catalogs must live outside the execution layer.

Because the hint field appears only on PIKACI-002 and PIKACI-003, reviewers (human or LLM) get extra direction only where the check is subtle.

Implement the invariant review script

Intent: Provide the end-to-end orchestration that loads the TOML spec, constructs a structured prompt, invokes the Codex CLI with a JSON output schema, and prints a human-readable pass/fail report.

Affected files: scripts/check_invariants.py

Evidence
@@ -0,0 +1,275 @@
+ROOT = Path(__file__).resolve().parents[1]
+DEFAULT_SPEC_PATH = ROOT / "invariants" / "invariants.toml"
@@ ... @@
+def load_spec(path: Path) -> dict[str, Any]:
+    with path.open("rb") as handle:
+        spec = tomllib.load(handle)
+    if spec.get("version") != 1:
+        raise SystemExit(f"unsupported invariants spec version in {path}")
@@ ... @@
+def run_codex_review(
+    prompt: str,
+    schema: dict[str, Any],
+    model: str | None,
+    verbose: bool = False,
+) -> tuple[dict[str, Any], str]:
+    ...
+        cmd = [
+            "codex",
+            "-a", "never",
+            "exec",
+            "--sandbox", "workspace-write",
+            "--ephemeral",
+            ...
@@ ... @@
+def print_report(spec: dict[str, Any], report: dict[str, Any]) -> int:
+    results = validate_report(spec, report)
+    ...
+    return 1 if failures else 0

scripts/check_invariants.py is the heart of the feature. It is structured as five composable stages:

1. Argument parsing (parse_args)

Accepts --spec (path to TOML), --model (Codex model override, also via PIKA_INVARIANTS_CODEX_MODEL), --json-out (persist raw report), and --verbose.

2. Spec loading and validation (load_spec)

Uses tomllib (Python 3.11+) to parse the TOML. Validates:

  • version == 1
  • At least one [[invariant]] entry
  • No duplicate IDs
  • kind is "must" or "allowed"
  • statement is a non-empty string
  • scope is a string array when present

Every validation failure calls raise SystemExit(...) with a descriptive message, keeping the script suitable for CI where a non-zero exit must be informative.

3. Prompt construction (build_prompt)

Assembles a structured natural-language prompt containing:

  • The repo root path for context
  • Each invariant rendered with id, area, kind, statement, scope, and optional hint
  • Explicit instructions to grade as pass/fail, default to fail on ambiguity, and return JSON

4. Schema generation (output_schema)

Builds a JSON Schema (draft-07) that constrains Codex output to exactly the expected invariant IDs with grades, rationales, and 1–3 evidence file paths. This is passed to Codex via --output-schema to guarantee structured output.

5. Codex invocation and report rendering

run_codex_review shells out to the codex CLI with --sandbox workspace-write, --ephemeral, and -a never (no approval required). The JSON report is written to a temp file and read back.

validate_report checks completeness and uniqueness of the returned IDs, then re-sorts results to match the spec order.

print_report outputs a human-friendly summary and returns exit code 1 if any invariant failed — making it CI-friendly.

Wire the check into the Just task system

Intent: Make the invariant review discoverable and runnable through the project's existing Just-based developer workflow.

Affected files: just/checks.just, justfile

Evidence
@@ -34,6 +34,10 @@ pre-commit-full: pre-commit
+# Run the Codex-backed architecture invariant review.
+invariants:
+    python3 ./scripts/check_invariants.py
@@ -160,6 +162,10 @@ qa:
+# Run the Codex-backed architecture invariant review.
+invariants:
+    @just checks::invariants
@@ -28,6 +28,8 @@ info:
+    @echo "  Architecture invariants:"
+    @echo "    just invariants"

Two Just targets are added:

TargetLocationPurpose
checks::invariantsjust/checks.just:37Runs python3 ./scripts/check_invariants.py directly.
invariantsjustfile:165Top-level alias forwarding to checks::invariants.

The info recipe in the root justfile is also updated to advertise just invariants under a new "Architecture invariants" heading, keeping the built-in help text current.

This follows the project's convention: implementation recipes live in just/checks.just, top-level convenience aliases live in the root justfile.

Add unit tests for the review script

Intent: Verify the pure-logic functions (spec loading, prompt building, report validation, schema generation) without requiring a live Codex backend, enabling fast CI feedback.

Affected files: scripts/test_check_invariants.py

Evidence
@@ -0,0 +1,119 @@
+def load_script_module():
+    spec = importlib.util.spec_from_file_location("check_invariants", SCRIPT)
+    ...
+    spec.loader.exec_module(module)
+    return module
@@ ... @@
+    def test_load_spec_rejects_duplicate_ids(self) -> None:
+        ...
+        with self.assertRaises(SystemExit) as ctx:
+            module.load_spec(spec_path)
+        self.assertIn("duplicate invariant id DUP-001", str(ctx.exception))
@@ ... @@
+    def test_validate_report_preserves_spec_order(self) -> None:
+        ...
+        self.assertEqual([entry["id"] for entry in normalized], ["ONE", "TWO"])
@@ ... @@
+    def test_output_schema_matches_invariant_ids(self) -> None:
+        ...
+        self.assertEqual(
+            schema["properties"]["results"]["items"]["properties"]["id"]["enum"],
+            ["ONE", "TWO"],
+        )

scripts/test_check_invariants.py uses importlib.util to dynamically import check_invariants.py as a module (avoiding package installation), then exercises four scenarios:

Tests

  1. test_load_spec_rejects_duplicate_ids — Writes a TOML file with two DUP-001 entries to a temp directory and asserts load_spec raises SystemExit mentioning the duplicate.

  2. test_build_prompt_includes_scope_and_hint — Feeds a synthetic spec with scope globs and a hint, then checks the generated prompt string contains the expected scope: and hint: lines.

  3. test_validate_report_preserves_spec_order — Supplies a report where results arrive in reverse order (TWO before ONE) and verifies validate_report re-sorts them to match the spec's declaration order.

  4. test_output_schema_matches_invariant_ids — Confirms the JSON Schema's enum constraint lists exactly the IDs from the spec, ensuring Codex is constrained to valid invariant IDs.

All tests are offline (no Codex call) and run with python3 -m unittest scripts/test_check_invariants.py.

Diff