Evidence
@@ -0,0 +1,406 @@
+# Pika Cloud (Incus-Only) Living Plan
+- Incus is the only target substrate we should actively support for new work.
+- `microvm.nix` should be treated as legacy or transitional, not as a co-equal backend.
+- hard rename or replace `pika-agent-control-plane` with `pika-cloud`
+- keep it library-first initially
+- make it Incus-first rather than pretending to be backend-neutral
+1. Create the focused `pika-cloud` substrate crate.
+2. Move or rename the shared contract currently living in `pika-agent-control-plane`.
+3. Add the shared guest lifecycle event and final-result schema.
+4. Add one Incus-first `RuntimeSpec`.
+5. Teach one narrow `pikaci` Incus path to use the shared guest lifecycle/result contract
The plan document is 406 lines and covers the full arc from problem statement through phased implementation. Here is a structured walkthrough of its key sections.
Problem Statement
Today two overlapping but separate VM contracts exist:
- Managed agents — centered on
pika-agent-control-plane, pika-server, and the agent startup/ready-marker model.
- CI runtimes — centered on
pikaci, backend-specific guest request payloads, snapshot mounts, and backend-specific launch logic.
This duplication makes Incus migration feel like two separate projects and scatters runtime layout assumptions across multiple crates and Nix configurations.
Core Decisions (Locked)
The document explicitly locks several directional decisions so they are not reopened during implementation:
| Decision | Detail |
| Incus-only | New substrate work targets Incus exclusively; microvm.nix is legacy. |
| Hard rename | pika-agent-control-plane becomes pika-cloud with no compatibility shims. |
| Library-first | The first slice is a crate, not a service. |
| File-based guest lifecycle | v1 uses events.jsonl, status.json, result.json under /run/pika-cloud/. |
| One persistent volume per managed agent | Simplest durable-state model for v1. |
| CI destroy-on-completion | CI VMs are ephemeral by default. |
Shared Runtime API
The plan proposes a single RuntimeSpec type covering both managed-agent and CI use cases. Policy fields (restart policy, retention policy, mount types) differentiate the two — not separate API surfaces. The spec covers image selection, volume/directory mounts, bootstrap payloads, lifecycle events, terminal-completion semantics, output collection, and restart/retention policy.
Guest Lifecycle Contract
A two-tier state model separates infrastructure-observed states from guest-emitted states:
- Infrastructure (host-observed):
requested, provisioning, booted, unreachable, stopped, destroyed.
- Workload (guest-emitted):
starting, ready, failed, completed.
The cloud layer owns the mechanism (collecting events, enforcing watchdog timeouts) but not the meaning (what ready semantically implies is left to the consumer — pika-server or pikaci).
Consumer Boundaries
pika-server retains ownership of user/customer semantics, billing, agent product states, and translating app intent into a RuntimeSpec.
pikaci retains ownership of lane scheduling, job-to-RuntimeSpec translation, and interpreting terminal results into CI pass/fail.
pika-cloud owns the shared runtime spec types, lifecycle schemas, mount/retention policy types, Incus orchestration helpers, and high-level runtime operations (ensure, inspect, collect, destroy).
Phased Implementation
| Phase | Scope |
| 0 — Lock Direction | Commit to Incus-only, library-first, guest-defined readiness. |
1 — Create pika-cloud | Rename/replace pika-agent-control-plane; add lifecycle and result schemas. |
| 2 — CI Contract Migration | Move shared Incus runtime request concepts out of pikaci. |
| 3 — Managed Agent Migration | Point pika-server at the same shared substrate. |
| 4 — Legacy Cleanup | Remove duplicate runtime code, narrow or delete vm-spawner, update Nix composition. |
Open Questions
Three questions are explicitly left open for implementation to inform:
- Crate topology — direct rename of existing path vs. new crate that absorbs old code.
- Lifecycle richness — fixed vocabulary vs. extensible typed event payloads from day one.
- Incus ownership model — both consumers call Incus directly via the shared library, or one process is the sole Incus caller.