Back to feed

sledtools/pika branch #160

pika-cloud-incus-plan

docs: add pika cloud incus plan

Target branch: master

Merge Commit: dd226e25633b04501b30652d682a2de49116c592

branch: merged tutorial: ready ci: success
Open CI Details

Continuous Integration

CI: success

Compact status on the review page, with full logs on the CI page.

Open CI Details

Latest run #199 success

5 passed

head 52c1a91320a42331d9e96a803c73f20572dce0f6 · queued 2026-04-01 08:57:44 · 5 lane(s)

queued 15s · ran 13s

check-pika-rust · success check-pika-followup · success check-apple-host-sanity · success check-apple-desktop-compile · success check-apple-ios-compile · success

Summary

This branch introduces the Pika Cloud Incus plan — a comprehensive living design document that defines a shared runtime substrate (pika-cloud) to unify the managed-agent and CI VM contracts under a single Incus-only architecture. Alongside the plan, the branch includes small but meaningful code hygiene changes: replacing #[allow(dead_code)] with #[cfg(test)] for test-only constants in jerichoci, removing dead code (an unused variable assignment and an unused shell function) from the Apple remote CI script, and fixing documentation links and removing stale legacy references in the Apple remote access docs.

Tutorial Steps

Replace `#[allow(dead_code)]` with `#[cfg(test)]` for test-only constants

Intent: Four SSH-related environment variable constants in `jerichoci/src/run.rs` were annotated with `#[allow(dead_code)]` to suppress compiler warnings, but they are actually only used in test code. Replacing the annotation with `#[cfg(test)]` is more precise: it communicates intent, removes the constants from production builds entirely, and avoids masking genuinely dead code in the future.

Affected files: crates/jerichoci/src/run.rs

Evidence
@@ -101,15 +101,15 @@ const PREPARED_OUTPUT_FULFILLMENT_SSH_BINARY_ENV: &str =
-#[allow(dead_code)]
+#[cfg(test)]
 const PREPARED_OUTPUT_FULFILLMENT_SSH_HOST_ENV: &str = "PIKACI_PREPARED_OUTPUT_FULFILL_SSH_HOST";
-#[allow(dead_code)]
+#[cfg(test)]
 const PREPARED_OUTPUT_FULFILLMENT_SSH_REMOTE_LAUNCHER_BINARY_ENV: &str =
-#[allow(dead_code)]
+#[cfg(test)]
 const PREPARED_OUTPUT_FULFILLMENT_SSH_REMOTE_HELPER_BINARY_ENV: &str =
-#[allow(dead_code)]
+#[cfg(test)]
 const PREPARED_OUTPUT_FULFILLMENT_SSH_REMOTE_WORK_DIR_ENV: &str =

In crates/jerichoci/src/run.rs, four constants related to SSH fulfillment configuration (SSH_HOST_ENV, SSH_REMOTE_LAUNCHER_BINARY_ENV, SSH_REMOTE_HELPER_BINARY_ENV, SSH_REMOTE_WORK_DIR_ENV) had their #[allow(dead_code)] attributes replaced with #[cfg(test)].

Why this matters:

  • #[allow(dead_code)] silences the compiler warning but still compiles the constant into production builds. It also masks the possibility of detecting truly dead code later.
  • #[cfg(test)] tells the compiler these constants exist only in test builds, which is the correct semantic: they are environment variable keys used exclusively in integration/unit tests for SSH fulfillment.
  • This is a low-risk, high-signal cleanup that aligns with the broader theme of this branch — removing ambiguity and tightening contracts.

Fix documentation links and remove legacy migration reference

Intent: The Apple remote access documentation contained an absolute local filesystem path for a link and a parenthetical reference to a legacy secret file format (`secrets/pikaci-apple.env.age`) that is no longer relevant. Both are corrected to keep the docs accurate and portable.

Affected files: docs/pikaci-apple-remote-access.md

Evidence
@@ -74,8 +74,8 @@
-- Non-secret Apple CI config is checked into [.github/pikaci-apple.env](/Users/justin/code/pika/worktrees/pikaci-mac/.github/pikaci-apple.env).
-- Apple CI secrets are checked into `secrets/pikaci-apple.sops.yaml` (with legacy fallback to `secrets/pikaci-apple.env.age` during migration).
+- Non-secret Apple CI config is checked into [.github/pikaci-apple.env](../.github/pikaci-apple.env).
+- Apple CI secrets are checked into `secrets/pikaci-apple.sops.yaml`.

Two fixes in docs/pikaci-apple-remote-access.md:

  1. Absolute path replaced with relative path. The link to .github/pikaci-apple.env was pointing at /Users/justin/code/pika/worktrees/pikaci-mac/.github/pikaci-apple.env, an absolute path from a developer worktree. This is now the portable relative path ../.github/pikaci-apple.env.

  2. Legacy fallback reference removed. The parenthetical (with legacy fallback to secrets/pikaci-apple.env.age during migration) is deleted. The migration to SOPS-based secrets is complete, so referencing the old .env.age format is misleading. The line now simply states that secrets live in secrets/pikaci-apple.sops.yaml.

Remove dead code from the Apple remote CI script

Intent: Two pieces of dead shell code are removed from `scripts/pikaci-apple-remote.sh`: an unused variable assignment (`ssh_binary_defaulted=0`) and an unused helper function (`remote_q`). This reduces maintenance surface and eliminates confusion about whether these are part of active control flow.

Affected files: scripts/pikaci-apple-remote.sh

Evidence
@@ -128,7 +128,6 @@ while [[ $# -gt 0 ]]; do
     --ssh-binary)
       ssh_binary="${2:?missing value for --ssh-binary}"
-      ssh_binary_defaulted=0
       shift 2
@@ -487,10 +486,6 @@ run_locked_body() {
   trap cleanup EXIT
 
-  remote_q() {
-    printf "'%s'" "${1//\'/\'\"\'\"\'}"
-  }

Two removals in scripts/pikaci-apple-remote.sh:

  1. ssh_binary_defaulted=0 assignment removed (line 131). When the --ssh-binary flag was parsed, the script set ssh_binary_defaulted=0 — but this variable was never read anywhere in the script. The ssh_binary value itself is still correctly captured; only the unused tracking flag is deleted.

  2. remote_q() function removed (lines 490-492). This was a shell-quoting helper that single-quoted a string with internal quote escaping. It was defined inside run_locked_body() but never called. Removing it shrinks the function body and eliminates a potential source of confusion for future readers who might assume it is part of the remote execution contract.

Add the Pika Cloud Incus plan document

Intent: This is the core of the branch: a new living plan document (`todos/pika-cloud-incus-plan.md`) that defines the architectural direction for converging the managed-agent and CI runtime substrates into a single shared `pika-cloud` boundary built exclusively on Incus.

Affected files: todos/pika-cloud-incus-plan.md

Evidence
@@ -0,0 +1,406 @@
+# Pika Cloud (Incus-Only) Living Plan
+- Incus is the only target substrate we should actively support for new work.
+- `microvm.nix` should be treated as legacy or transitional, not as a co-equal backend.
+- hard rename or replace `pika-agent-control-plane` with `pika-cloud`
+- keep it library-first initially
+- make it Incus-first rather than pretending to be backend-neutral
+1. Create the focused `pika-cloud` substrate crate.
+2. Move or rename the shared contract currently living in `pika-agent-control-plane`.
+3. Add the shared guest lifecycle event and final-result schema.
+4. Add one Incus-first `RuntimeSpec`.
+5. Teach one narrow `pikaci` Incus path to use the shared guest lifecycle/result contract

The plan document is 406 lines and covers the full arc from problem statement through phased implementation. Here is a structured walkthrough of its key sections.

Problem Statement

Today two overlapping but separate VM contracts exist:

  • Managed agents — centered on pika-agent-control-plane, pika-server, and the agent startup/ready-marker model.
  • CI runtimes — centered on pikaci, backend-specific guest request payloads, snapshot mounts, and backend-specific launch logic.

This duplication makes Incus migration feel like two separate projects and scatters runtime layout assumptions across multiple crates and Nix configurations.

Core Decisions (Locked)

The document explicitly locks several directional decisions so they are not reopened during implementation:

DecisionDetail
Incus-onlyNew substrate work targets Incus exclusively; microvm.nix is legacy.
Hard renamepika-agent-control-plane becomes pika-cloud with no compatibility shims.
Library-firstThe first slice is a crate, not a service.
File-based guest lifecyclev1 uses events.jsonl, status.json, result.json under /run/pika-cloud/.
One persistent volume per managed agentSimplest durable-state model for v1.
CI destroy-on-completionCI VMs are ephemeral by default.

Shared Runtime API

The plan proposes a single RuntimeSpec type covering both managed-agent and CI use cases. Policy fields (restart policy, retention policy, mount types) differentiate the two — not separate API surfaces. The spec covers image selection, volume/directory mounts, bootstrap payloads, lifecycle events, terminal-completion semantics, output collection, and restart/retention policy.

Guest Lifecycle Contract

A two-tier state model separates infrastructure-observed states from guest-emitted states:

  • Infrastructure (host-observed): requested, provisioning, booted, unreachable, stopped, destroyed.
  • Workload (guest-emitted): starting, ready, failed, completed.

The cloud layer owns the mechanism (collecting events, enforcing watchdog timeouts) but not the meaning (what ready semantically implies is left to the consumer — pika-server or pikaci).

Consumer Boundaries

  • pika-server retains ownership of user/customer semantics, billing, agent product states, and translating app intent into a RuntimeSpec.
  • pikaci retains ownership of lane scheduling, job-to-RuntimeSpec translation, and interpreting terminal results into CI pass/fail.
  • pika-cloud owns the shared runtime spec types, lifecycle schemas, mount/retention policy types, Incus orchestration helpers, and high-level runtime operations (ensure, inspect, collect, destroy).

Phased Implementation

PhaseScope
0 — Lock DirectionCommit to Incus-only, library-first, guest-defined readiness.
1 — Create pika-cloudRename/replace pika-agent-control-plane; add lifecycle and result schemas.
2 — CI Contract MigrationMove shared Incus runtime request concepts out of pikaci.
3 — Managed Agent MigrationPoint pika-server at the same shared substrate.
4 — Legacy CleanupRemove duplicate runtime code, narrow or delete vm-spawner, update Nix composition.

Open Questions

Three questions are explicitly left open for implementation to inform:

  1. Crate topology — direct rename of existing path vs. new crate that absorbs old code.
  2. Lifecycle richness — fixed vocabulary vs. extensible typed event payloads from day one.
  3. Incus ownership model — both consumers call Incus directly via the shared library, or one process is the sole Incus caller.

Diff