Back to feed

sledtools/pika branch #107

pika-cloud-incus-lifecycle-helper

Unify Incus lifecycle guest writers

Target branch: master

Merge Commit: 606071442097a94b996dc09f0aca73d48f4ca349

branch: merged tutorial: ready ci: success
Open CI Details

Continuous Integration

CI: success

Compact status on the review page, with full logs on the CI page.

Open CI Details

Latest run #133 success

10 passed

head 64bedb53375b26ac2edbefd41ab6bc9f352f5395 · queued 2026-03-26 02:22:53 · 10 lane(s)

queued 12s · ran 1m 58s

check-pika-rust · success check-pika-followup · success check-notifications · success check-agent-contracts · success check-rmp · success check-pikachat · success check-pikachat-typescript · success check-apple-host-sanity · success check-pikachat-openclaw-e2e · success check-fixture · success

Summary

This branch extracts duplicated lifecycle-artifact writing logic (status, events, results) from three independent guest writers—the Rust-generated managed OpenClaw autostart script, the pikaci Incus guest Python runner, and their inline helpers—into a single shared Python CLI tool (pika-cloud-lifecycle). The new helper is packaged as a Nix derivation, installed into both Incus guest images, and invoked by each caller through subprocess/exec with a structured CLI interface. Inline jq/bash JSON construction and Python write_json/append_event implementations are removed in favour of delegating to the helper, which owns schema versioning, atomic writes, boot-id stamping, and file-lock-protected event sequencing. Contract tests are updated to pin against the helper source rather than the individual callers.

Tutorial Steps

Introduce the shared lifecycle helper Python script

Intent: Create a single authoritative implementation for writing lifecycle status, event, and result JSON artifacts so that every Incus guest image uses identical serialization, atomic-write, and sequencing logic.

Affected files: nix/incus/pika-cloud-lifecycle.py.in

Evidence
@@ -0,0 +1,204 @@
+#!@python3@
+import argparse
+import fcntl
+import json
+import os
+import pathlib
+import tempfile
+from datetime import datetime
+
+
+LIFECYCLE_SCHEMA_VERSION = 1
@@ +76,0 +76,16 @@
+def write_status(
+    status_path: pathlib.Path,
+    state: str,
+    message: str,
+    details,
+) -> None:
+    payload = {
+        "schema_version": LIFECYCLE_SCHEMA_VERSION,
+        "state": state,
+        "updated_at": finished_at(),
+        "message": message,
+    }
@@ +100,0 +100,30 @@
+def append_event(
+    events_path: pathlib.Path,
+    seq_path: pathlib.Path,
+    kind: str,
+    message: str,
+    details,
+) -> None:
+    ...
+    with seq_path.open("a+", encoding="utf-8") as seq_file:
+        fcntl.flock(seq_file.fileno(), fcntl.LOCK_EX)
@@ +148,0 +148,12 @@
+def write_result(
+    result_path: pathlib.Path,
+    status: str,
+    exit_code: int,
+    message: str,
+    details,
+) -> None:

The new file nix/incus/pika-cloud-lifecycle.py.in is a self-contained Python CLI with three subcommands: status, event, and result. Key design choices:

  • Schema version is owned here (LIFECYCLE_SCHEMA_VERSION = 1), removing it from every caller.
  • Atomic writes use tempfile.NamedTemporaryFile + os.replace + os.fsync, an upgrade over the previous bare write_text in the pikaci image.
  • Event sequencing is file-lock protected (fcntl.LOCK_EX) with a dedicated .seq sidecar file, replacing the fragile in-process global counter (EVENT_SEQ) and the bash event_seq variable.
  • Boot ID is read from /proc/sys/kernel/random/boot_id once per invocation and included when available.
  • Argument validation uses argparse with choices constraints (RUNTIME_STATES, TERMINAL_STATUSES) so invalid lifecycle states are rejected at the CLI boundary.
  • The @python3@ shebang placeholder is substituted at Nix build time.

Add the Nix derivation that packages the helper

Intent: Wrap the Python script into a Nix package (`writeScriptBin`) so it can be composed into any NixOS guest image and appear on `$PATH` as `pika-cloud-lifecycle`.

Affected files: nix/incus/pika-cloud-lifecycle-helper.nix

Evidence
@@ -0,0 +1,8 @@
+{ pkgs }:
+
+pkgs.writeScriptBin "pika-cloud-lifecycle" (
+  builtins.replaceStrings
+    [ "@python3@" ]
+    [ "${pkgs.python3}/bin/python3" ]
+    (builtins.readFile ./pika-cloud-lifecycle.py.in)
+)

A minimal Nix expression that reads the .py.in template, substitutes the @python3@ placeholder with the store path of python3, and produces a writeScriptBin derivation. The resulting binary lands at /run/current-system/sw/bin/pika-cloud-lifecycle when included in environment.systemPackages.

Install the helper into both Incus guest images

Intent: Make the lifecycle helper binary available at runtime inside both the managed-agent (OpenClaw) image and the pikaci image.

Affected files: nix/incus/managed-agent-image.nix, nix/incus/pikaci-image.nix

Evidence
@@ -1,6 +1,7 @@
 { lib, pkgs, modulesPath, pikachatPkg, openclawGatewayPkg, ... }:
 
 let
+  pikaCloudLifecycle = import ./pika-cloud-lifecycle-helper.nix { inherit pkgs; };
@@ -64,6 +65,7 @@
     python3
     pikachatPkg
     openclawGatewayPkg
+    pikaCloudLifecycle
@@ -1,6 +1,7 @@
 { lib, pkgs, modulesPath, ... }:
 
 let
+  pikaCloudLifecycle = import ./pika-cloud-lifecycle-helper.nix { inherit pkgs; };
@@ -351,6 +366,7 @@
     postgresql
     procps
     python3
+    pikaCloudLifecycle

Both managed-agent-image.nix and pikaci-image.nix import the new helper derivation and add it to environment.systemPackages (and in the managed-agent case, also to the systemd unit's path). This ensures the pika-cloud-lifecycle binary is available during guest bootstrap regardless of which image type is running.

Rewrite the managed OpenClaw autostart script to delegate to the helper

Intent: Remove all inline `jq`-based JSON construction, the `write_json_atomically` bash function, the `current_boot_id` function, and the `event_seq` counter from the Rust-generated guest autostart script, replacing them with CLI calls to `pika-cloud-lifecycle`.

Affected files: crates/pika-server/src/managed_openclaw_guest.rs

Evidence
@@ -126,8 +126,7 @@
 set -euo pipefail
 
 STARTUP_PLAN_PATH="/{startup_plan_path}"
-LIFECYCLE_SCHEMA_VERSION={lifecycle_schema_version}
-event_seq=0
+LIFECYCLE_HELPER="/run/current-system/sw/bin/pika-cloud-lifecycle"
@@ -212,60 +194,32 @@
-  payload="$(
-    jq -nc \
-      --argjson schema_version "$LIFECYCLE_SCHEMA_VERSION" \
-      --arg state "$state" \
+  local -a args=(
+    status
+    --status-path "$(runtime_status_path)"
+    --state "$state"
+    --message "$message"
+  )
+  if [[ "$details_json" != "null" ]]; then
+    args+=(--details-json "$details_json")
+  fi
+  "$LIFECYCLE_HELPER" "${args[@]}"
@@ -155,12 +155,6 @@
-current_boot_id() {{
-  if [[ -r /proc/sys/kernel/random/boot_id ]]; then
-    tr -d '\n' < /proc/sys/kernel/random/boot_id
-  fi
-}}
-
@@ -174,17 +167,6 @@
-write_json_atomically() {{
-  local path="$1"
-  local payload="$2"
-  local dir
-  dir="$(dirname "$path")"
-  mkdir -p "$dir"
-  local tmp="$dir/.tmp.$(basename "$path").$$"
-  printf '%s\n' "$payload" > "$tmp"
-  mv "$tmp" "$path"
-}}

The generated bash autostart script is significantly simplified:

  1. LIFECYCLE_SCHEMA_VERSION and event_seq variables are removed — the helper owns both.
  2. current_boot_id() — deleted; the helper reads /proc/sys/kernel/random/boot_id itself.
  3. write_json_atomically() — deleted; the helper uses tempfile + os.replace.
  4. write_status(), append_event(), write_result() — each now builds an argument array and exec's $LIFECYCLE_HELPER with the appropriate subcommand and flags. The --details-json flag is conditionally appended only when the caller passes a non-null value.

The Rust constant LIFECYCLE_SCHEMA_VERSION is also removed from the import list since the generated script no longer references it.

Rewrite the pikaci guest runner to delegate to the helper

Intent: Remove the inline Python lifecycle functions (`write_json`, `write_status`, `append_event`, `write_result`, `finished_at`, `EVENT_SEQ`) from the pikaci runner and replace them with thin wrappers that shell out to `pika-cloud-lifecycle`.

Affected files: nix/incus/pikaci-image.nix

Evidence
@@ -14,61 +15,8 @@
     PIKACI_UID = 1000
     USERS_GID = 100
     SYSTEM_BIN = "/run/current-system/sw/bin"
+    LIFECYCLE_HELPER = "/run/current-system/sw/bin/pika-cloud-lifecycle"
     INCUS_GUEST_RUN_REQUEST_SCHEMA_VERSION = 2
-    LIFECYCLE_SCHEMA_VERSION = 1
-    EVENT_SEQ = 0
@@ -114,6 +62,73 @@
+    def run_lifecycle_helper(*args) -> None:
+        subprocess.run([LIFECYCLE_HELPER, *args], check=True)
+
+
+    def write_status(
+        status_path: pathlib.Path,
+        state: str,
+        message: str,
+        details_json: str | None = None,
+    ) -> None:
@@ -132,7 +147,9 @@
+    status_path = state_dir / "status.json"
     events_path = state_dir / "events.jsonl"
+    result_path = state_dir / "result.json"
@@ -204,18 +221,16 @@
             write_status(
-                state_dir,
+                status_path,
                 "starting",
                 "launching guest command",
-                command=command,
-                run_as_root=run_as_root,
+                json.dumps({"command": command, "run_as_root": run_as_root}),

The pikaci runner sees the largest structural change:

  • ~55 lines of inline lifecycle code deleted including finished_at(), write_json(), write_status(), append_event(), write_result(), and the EVENT_SEQ global.
  • Replaced with thin wrappers that call run_lifecycle_helper(*args)subprocess.run([LIFECYCLE_HELPER, ...], check=True).
  • Callers now pass explicit file paths (status_path, events_path, result_path) instead of state_dir, giving each function a single-responsibility path argument.
  • Details changed from **kwargs to details_json — callers now json.dumps(...) their detail dicts and pass the string, which the helper parses. This avoids tight coupling between the caller's Python kwargs and the helper's serialization.

Update contract tests to pin against the helper source

Intent: Shift the lifecycle contract tests from asserting against inline code in the guest image sources to asserting against the shared helper, ensuring the single source of truth is what gets validated.

Affected files: crates/pikaci/src/executor.rs, crates/pika-server/src/managed_openclaw_guest.rs

Evidence
@@ -2477,6 +2477,10 @@
+    fn incus_lifecycle_helper_source() -> &'static str {
+        include_str!("../../../nix/incus/pika-cloud-lifecycle.py.in")
+    }
@@ -2579,51 +2583,99 @@
     #[test]
-    fn remote_linux_incus_image_pins_shared_status_and_event_contract() {
-        let source = incus_guest_image_source();
+    fn remote_linux_incus_lifecycle_helper_pins_shared_status_and_event_contract() {
+        let source = incus_lifecycle_helper_source();
@@ -2583,99 +2583,99 @@
+                "LIFECYCLE_SCHEMA_VERSION = 1",
+                "def write_status(",
                 "\"schema_version\": LIFECYCLE_SCHEMA_VERSION",
+                "payload[\"boot_id\"] = boot_id",
+                "def default_seq_path(events_path: pathlib.Path) -> pathlib.Path:",
+                "fcntl.flock(seq_file.fileno(), fcntl.LOCK_EX)",
@@ -787,10 +730,12 @@
+                "LIFECYCLE_HELPER=\"/run/current-system/sw/bin/pika-cloud-lifecycle\"",
+                "\"$LIFECYCLE_HELPER\" \"${args[@]}\"",
+                "--status-path \"$(runtime_status_path)\"",
@@ +810,3 +754,3 @@
+        assert!(!script.contains("event_seq="));
+        assert!(!script.contains("write_json_atomically"));
+        assert!(!script.contains("current_boot_id"));

The Rust test suite is restructured to match the new architecture:

In executor.rs (pikaci tests):

  • A new incus_lifecycle_helper_source() function include_str!s the helper .py.in directly.
  • remote_linux_incus_image_pins_shared_status_and_event_contract is renamed to remote_linux_incus_lifecycle_helper_pins_shared_status_and_event_contract and now asserts against the helper source for schema fields, boot ID handling, file-lock sequencing, and default_seq_path.
  • Two new tests (remote_linux_incus_image_calls_shared_lifecycle_helper_for_status_and_events and ..._for_terminal_results) assert that the pikaci image source delegates to the helper binary rather than implementing lifecycle logic inline.
  • Negative assertions confirm the image source no longer contains EVENT_SEQ = 0 or def finished_at().

In managed_openclaw_guest.rs (OpenClaw tests):

  • Assertions shift from checking for jq template fragments (schema_version: $schema_version, state: $state) to checking for CLI delegation patterns (--status-path, --state, "$LIFECYCLE_HELPER" "${args[@]}").
  • Negative assertions confirm the bash script no longer contains event_seq=, write_json_atomically, or current_boot_id.

Refine orchestration skill documentation

Intent: Capture process improvements discovered during this refactoring—specifically around tracking abstraction opportunities and reviewing the journal before declaring a project complete.

Affected files: .agents/skills/orchestrate/SKILL.md

Evidence
@@ -21,8 +21,10 @@
+  - Capture abstraction opportunities too, especially when you deliberately choose a smaller tactical seam over a larger redesign.
   - Do not widen the active chunk just because the journal grew.
   - Triage the journal at the end of the broader project or migration once the primary seams are landed.
+  - Before declaring the broader project done, review the journal and explicitly decide what to land now vs leave for later.

Two lines are added to the orchestration skill's journal guidance:

  1. Capture abstraction opportunities — when you consciously pick a smaller tactical seam instead of a larger redesign (as this branch does by extracting a CLI helper rather than, say, a shared library), note the larger opportunity in the journal.
  2. Review before done — before declaring the broader project complete, explicitly triage journal entries to decide what ships now versus what becomes future work.

These additions codify lessons learned from the lifecycle-helper extraction itself.

Diff