Back to feed

sledtools/pika branch #36

pika-orch-incus-cleanup-3

Simplify Incus guest runtime contract

Target branch: master

branch: closed tutorial: ready ci: failed
Open CI Details

Continuous Integration

CI: failed

Compact status on the review page, with full logs on the CI page.

Open CI Details

Latest run #46 failed

9 passed 1 failed

head e5de18afb0d4b97e9c0e792f6ac52c7e7827a5f0 · queued 2026-03-24 16:22:38 · 10 lane(s)

queued 4m 12s · ran 30m 15s

check-pika-rust · success check-pika-followup · success check-notifications · success check-agent-contracts · success check-rmp · success check-pikachat · success check-pikachat-typescript · success check-apple-host-sanity · failed check-pikachat-openclaw-e2e · success check-fixture · success

Summary

This branch simplifies the Incus guest runtime contract in the pikaci CI executor by eliminating the dual-mode (transfer vs. single_host_shared) architecture. The transfer mode — which copied Nix closures and workspace snapshots into the guest — is removed entirely, leaving only the shared-mount path where host directories are exposed as read-only virtiofs devices. The guest bootstrap script that executor.rs used to synthesize per-job is replaced by a stable image-owned binary at /run/current-system/sw/bin/pikaci-incus-run, with job parameters passed as environment variables on the incus exec command line. Correspondingly, the Nix CI expressions switch all staged wrapper shebangs and interpreter references from host /nix/store paths to guest-local paths under /run/current-system/sw/bin, removing the PIKACI_STAGED_HOST_NIX_STORE_ROOT indirection and the host /nix/store mount. The migration plan documentation is updated to reflect the consolidated single-path architecture and the image-owned guest contract.

Tutorial Steps

Remove the RemoteLinuxVmIncusMode enum and transfer-mode infrastructure

Intent: Eliminate the dual-mode branching (Transfer vs SingleHostShared) from the executor, collapsing the Incus backend to a single shared-mount code path. This removes the mode enum, its environment variable, the default constant, and the incus_mode field from RemoteLinuxVmContext.

Affected files: crates/pikaci/src/executor.rs

Evidence
@@ -81,7 +81,6 @@ struct RemoteLinuxVmContext {
-    incus_mode: RemoteLinuxVmIncusMode,
@@ -97,22 +96,6 @@
-    remote_incus_closure_dir: PathBuf,
-}
-
-#[derive(Clone, Copy, Debug, Eq, PartialEq)]
-enum RemoteLinuxVmIncusMode {
-    Transfer,
-    SingleHostShared,
-}
@@ -238,16 +221,14 @@
-const REMOTE_LINUX_VM_INCUS_MODE_ENV: &str = "PIKACI_REMOTE_LINUX_VM_INCUS_MODE";
-const REMOTE_LINUX_VM_INCUS_MODE_DEFAULT: &str = "single_host_shared";
@@ -2261,20 +2232,6 @@
-fn remote_linux_vm_incus_mode() -> anyhow::Result<RemoteLinuxVmIncusMode> {

The RemoteLinuxVmIncusMode enum (Transfer / SingleHostShared) and all supporting code are deleted:

  • The incus_mode field is removed from RemoteLinuxVmContext (executor.rs:83).
  • The remote_incus_closure_dir field is removed — it was only used by the transfer path to track Nix store closure state.
  • The PIKACI_REMOTE_LINUX_VM_INCUS_MODE environment variable, its default constant, the remote_linux_vm_incus_mode() parser, and the remote_linux_vm_incus_mode_label() helper are all deleted.
  • In remote_linux_vm_context(), the match on backend to determine incus_mode is removed; the context is now constructed unconditionally without a mode field.

This is the foundational change — every subsequent diff hunk in executor.rs follows from the removal of this branching point.

Delete transfer-mode guest preparation functions

Intent: Remove the large block of functions that existed solely to support the transfer code path: closure import, guest filesystem preparation, snapshot staging, workspace finalization, and the per-job bash launcher script generator.

Affected files: crates/pikaci/src/executor.rs

Evidence
@@ -3152,293 +3083,13 @@
-fn build_remote_incus_run_command(
-fn build_remote_incus_prepare_script(
-fn build_remote_incus_transfer_workspace_finalize_command(
-fn import_remote_path_closure_into_incus(
-fn import_remote_incus_closures(
-fn prepare_remote_incus_guest_filesystem(
-fn stage_snapshot_into_incus_guest(
-fn finalize_remote_incus_transfer_workspace(

Seven functions are deleted in a single contiguous block (~280 lines):

  1. build_remote_incus_run_command — assembled the runuser -u pikaci / timeout guest command string.
  2. build_remote_incus_prepare_script — generated the multi-hundred-line bash heredoc that was pushed into the guest to create /usr/local/bin/pikaci-incus-run, set up symlinks, ownership, and environment variables.
  3. build_remote_incus_transfer_workspace_finalize_command — produced the mount --bind / mount -o remount,bind,ro sequence for read-only workspace remounting.
  4. import_remote_path_closure_into_incus — ran nix-store --export | nix-store --import to push a Nix closure into the guest.
  5. import_remote_incus_closures — orchestrated closure imports for workspace-deps and workspace-build, then wrote guest-store-paths.json.
  6. prepare_remote_incus_guest_filesystem — resolved real paths and called the prepare script inside the guest via incus exec.
  7. stage_snapshot_into_incus_guest — used incus file push -r to copy the snapshot directory tree into the guest.
  8. finalize_remote_incus_transfer_workspace — executed the finalize mount command.

All of this per-job guest setup is now the responsibility of the image-owned pikaci-incus-run binary and the virtiofs device mounts configured before launch.

Simplify ensure_remote_incus_runtime to use devices and delegate to the image

Intent: With the transfer path gone, the runtime setup function no longer needs to branch on mode or call any of the deleted preparation functions. It unconditionally resets artifacts, configures disk devices, starts the instance, and waits for readiness.

Affected files: crates/pikaci/src/executor.rs

Evidence
@@ -2941,14 +2903,11 @@
-    if remote.incus_mode == RemoteLinuxVmIncusMode::SingleHostShared {
-        reset_remote_linux_vm_artifacts(remote, log_path)?;
-    }
+    reset_remote_linux_vm_artifacts(remote, log_path)?;
@@ -2973,9 +2932,7 @@
-    if remote.incus_mode == RemoteLinuxVmIncusMode::SingleHostShared {
-        configure_remote_incus_single_host_shared_devices(job, remote, log_path)?;
-    }
+    configure_remote_incus_devices(job, remote, log_path)?;
@@ -2987,18 +2944,7 @@
-    wait_for_remote_incus_instance(remote, log_path)?;
-    match remote.incus_mode {
-        RemoteLinuxVmIncusMode::Transfer => {
-            ...
-        }
-        RemoteLinuxVmIncusMode::SingleHostShared => {
-            prepare_remote_incus_guest_filesystem(job, remote, log_path)
-        }
-    }
+    wait_for_remote_incus_instance(remote, log_path)

ensure_remote_incus_runtime is simplified from a mode-dispatching orchestrator to a linear sequence:

  1. Log message no longer mentions mode — just configure remote Linux VM backend 'incus' on {host}.
  2. reset_remote_linux_vm_artifacts is called unconditionally (was gated on SingleHostShared).
  3. configure_remote_incus_single_host_shared_devices is renamed to configure_remote_incus_devices and called unconditionally.
  4. After starting the instance, the function returns immediately after wait_for_remote_incus_instance — no more post-boot preparation steps. The guest image's own pikaci-incus-run handles everything after boot.

Move guest command and parameters to the incus exec launch command

Intent: Instead of synthesizing a bash script inside the guest, pass the job command, timeout, and run-as-root flag as environment variables on the incus exec invocation, letting the image-owned pikaci-incus-run binary consume them.

Affected files: crates/pikaci/src/executor.rs

Evidence
@@ -2815,7 +2771,9 @@
-fn build_remote_incus_launch_command(remote: &RemoteLinuxVmContext) -> String {
+fn build_remote_incus_launch_command(job: &JobSpec, remote: &RemoteLinuxVmContext) -> String {
+    let (guest_command, run_as_root) = compiled_guest_command(job);
+    let run_as_root_value = if run_as_root { "1" } else { "0" };
@@ -2824,7 +2782,11 @@
-                "/usr/local/bin/pikaci-incus-run",
+                "env",
+                &format!("PIKACI_INCUS_GUEST_COMMAND={guest_command}"),
+                &format!("PIKACI_INCUS_TIMEOUT_SECS={}", job.timeout_secs),
+                &format!("PIKACI_INCUS_RUN_AS_ROOT={run_as_root_value}"),
+                REMOTE_LINUX_VM_INCUS_RUN_BINARY,

build_remote_incus_launch_command now accepts a &JobSpec parameter and constructs the launch command as:

sudo incus exec --project {project} {instance} -- \
  env \
  PIKACI_INCUS_GUEST_COMMAND={compiled_command} \
  PIKACI_INCUS_TIMEOUT_SECS={timeout} \
  PIKACI_INCUS_RUN_AS_ROOT={0|1} \
  /run/current-system/sw/bin/pikaci-incus-run

Key design points:

  • The binary path is now the NixOS system-profile path /run/current-system/sw/bin/pikaci-incus-run (constant REMOTE_LINUX_VM_INCUS_RUN_BINARY) rather than the old /usr/local/bin/pikaci-incus-run that was written by the prepare script.
  • The guest command, timeout, and privilege level are passed as env vars, making the launch contract declarative.
  • The call site in spawn_remote_linux_vm_process is updated to pass job to the Incus branch.

Consolidate mount paths and remove nix-store sharing

Intent: Rename mount path constants to drop the 'SHARED' qualifier, remove the host /nix/store mount device, and update paths from symlink targets to direct mount points.

Affected files: crates/pikaci/src/executor.rs

Evidence
@@ -238,16 +221,14 @@
-const REMOTE_LINUX_VM_INCUS_SHARED_SNAPSHOT_MOUNT_PATH: &str = "/mnt/pikaci-snapshot";
-const REMOTE_LINUX_VM_INCUS_SHARED_NIX_STORE_MOUNT_PATH: &str = "/mnt/pikaci-nix-store";
-const REMOTE_LINUX_VM_INCUS_SHARED_WORKSPACE_DEPS_MOUNT_PATH: &str = "/mnt/pikaci-workspace-deps";
-const REMOTE_LINUX_VM_INCUS_SHARED_WORKSPACE_BUILD_MOUNT_PATH: &str = "/mnt/pikaci-workspace-build";
+const REMOTE_LINUX_VM_INCUS_SNAPSHOT_MOUNT_PATH: &str = "/workspace/snapshot";
+const REMOTE_LINUX_VM_INCUS_WORKSPACE_DEPS_MOUNT_PATH: &str = "/staged/linux-rust/workspace-deps";
+const REMOTE_LINUX_VM_INCUS_WORKSPACE_BUILD_MOUNT_PATH: &str = "/staged/linux-rust/workspace-build";

Four constants are replaced with three:

OldNew
/mnt/pikaci-snapshot/workspace/snapshot
/mnt/pikaci-nix-store(removed)
/mnt/pikaci-workspace-deps/staged/linux-rust/workspace-deps
/mnt/pikaci-workspace-build/staged/linux-rust/workspace-build

The mount paths now match the guest filesystem layout directly — no more symlinks from /mnt/pikaci-* to /workspace/snapshot or /staged/linux-rust/*. The host /nix/store mount (pikaci-nix-store) is removed entirely because the guest image now carries its own runtime dependencies.

The renamed configure_remote_incus_devices function (formerly configure_remote_incus_single_host_shared_devices) configures three virtiofs devices instead of four, all using the new path constants.

Reduce ensure_remote_linux_vm_directories to match simplified context

Intent: Remove the remote_incus_closure_dir from the mkdir command since it no longer exists in the context.

Affected files: crates/pikaci/src/executor.rs

Evidence
@@ -2354,7 +2311,7 @@
-            "mkdir -p {} {} {} {} {} {} {}; ",
+            "mkdir -p {} {} {} {} {} {}; ",
@@ -2364,7 +2321,6 @@
-        shell_single_quote(&remote.remote_incus_closure_dir.display().to_string()),

The ensure_remote_linux_vm_directories function's mkdir -p command drops from 7 directories to 6. The removed directory is remote_incus_closure_dir which stored guest-store-paths.json for the transfer path. The conditional if [ ! -e ... ] guards for workspace-deps and workspace-build symlinks remain unchanged.

Switch staged Nix wrappers to guest-local interpreter paths

Intent: Replace all /nix/store-based shebangs and interpreter references in the staged CI wrappers with paths under /run/current-system/sw/bin, eliminating the dependency on a mounted host Nix store.

Affected files: nix/ci/linux-rust.nix

Evidence
@@ -16,6 +16,11 @@
+  guestSystemBin = "/run/current-system/sw/bin";
+  guestBash = "${guestSystemBin}/bash";
+  guestBashShebang = "#!${guestBash}";
+  guestNode = "${guestSystemBin}/node";
+  guestPython = "${guestSystemBin}/python3";
@@ -573,27 +578,27 @@
-              "#!${pkgs.bash}/bin/bash" \
+              "${guestBashShebang}" \

Five new Nix let-bindings define the guest interpreter paths:

guestSystemBin = "/run/current-system/sw/bin";
guestBash = "${guestSystemBin}/bash";
guestBashShebang = "#!${guestBash}";
guestNode = "${guestSystemBin}/node";
guestPython = "${guestSystemBin}/python3";

Every write_wrapper call and every heredoc cat >"$out/bin/..." block that previously used #!${pkgs.bash}/bin/bash (which resolves to a /nix/store/...-bash-5.x/bin/bash path) now uses ${guestBashShebang} instead. Similarly, the ${pkgs.python3}/bin/python3 reference in the fixture-relay-smoke test is replaced with ${guestPython}.

This is what makes the host /nix/store mount unnecessary — the staged wrappers no longer contain /nix/store paths that would need to exist in the guest.

Remove PIKACI_STAGED_HOST_NIX_STORE_ROOT from the staged runtime environment

Intent: Eliminate the path-rewriting indirection that allowed staged wrappers to find Nix store paths via a mounted host store, replacing it with direct guest-local package paths.

Affected files: nix/ci/linux-rust.nix

Evidence
@@ -77,13 +82,13 @@
-    host_store_root="''${PIKACI_STAGED_HOST_NIX_STORE_ROOT:-/nix/store}"
+    export PATH="${pkgs.postgresql}/bin:${guestSystemBin}:$PATH"
     staged_runtime_ld="${pkgs.lib.makeLibraryPath commonArgs.buildInputs}"
-    staged_runtime_ld="''${staged_runtime_ld//\/nix\/store/$host_store_root}"
-    staged_postgres_bin="${pkgs.postgresql}/bin"
-    staged_postgres_bin="''${staged_postgres_bin//\/nix\/store/$host_store_root}"
-    export PATH="$staged_postgres_bin:$PATH"
+    if [ -d /run/opengl-driver/lib ]; then
+      staged_runtime_ld="/run/opengl-driver/lib:$staged_runtime_ld"
+    fi
     export LD_LIBRARY_PATH="$staged_runtime_ld''${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
+    export LIBCLANG_PATH="${pkgs.llvmPackages.libclang.lib}/lib"

The stagedRuntimeEnv block is rewritten:

Removed:

  • host_store_root variable and the ${staged_runtime_ld//\/nix\/store/$host_store_root} rewrite. This was the core indirection: when the host /nix/store was mounted at /mnt/pikaci-nix-store, the variable would rewrite all Nix store paths to point through the mount. No longer needed because the image carries the required libraries.
  • Separate staged_postgres_bin rewriting — PostgreSQL is now on PATH directly.

Added:

  • PATH now includes ${pkgs.postgresql}/bin and the guest system bin directory directly.
  • Conditional /run/opengl-driver/lib inclusion in LD_LIBRARY_PATH for GPU-enabled lanes.
  • LIBCLANG_PATH export for bindgen-dependent crates.

The LD_LIBRARY_PATH still uses the Nix-evaluated makeLibraryPath output (which contains /nix/store/... paths), but these paths now exist in the guest image's own Nix store rather than being rewritten through a host mount.

Fix rmp-init-smoke-ci wrapper to use executable text file

Intent: Replace writeShellScript with writeTextFile { executable = true } so the materialized wrapper preserves its shebang and executable permission correctly in the realized Nix store output.

Affected files: nix/ci/linux-rust.nix

Evidence
@@ -1237,55 +1242,62 @@
-          cp ${pkgs.writeShellScript "run-rmp-init-smoke-ci" ''
-            set -euo pipefail
+          install -Dm555 ${
+            pkgs.writeTextFile {
+              name = "run-rmp-init-smoke-ci";
+              executable = true;
+              text = ''
+                ${guestBashShebang}
+                set -euo pipefail

The run-rmp-init-smoke-ci wrapper was previously created with pkgs.writeShellScript, which implicitly uses the build-time bash path as the shebang. It is now created with pkgs.writeTextFile { executable = true; } and installed via install -Dm555.

This fixes two issues:

  1. The shebang is now the explicit guest-local #!/run/current-system/sw/bin/bash rather than a /nix/store/...-bash/bin/bash path.
  2. Using writeTextFile with executable = true ensures the file is materialized as a proper executable in the Nix store, avoiding a mode-preservation issue where writeShellScript + cp could lose the executable bit in certain realization contexts.

The script body itself is unchanged in logic — it still creates a temporary RMP project, runs cargo build --offline, and validates the smoke test.

Update Incus image definition to include pikaci-incus-run

Intent: Ensure the guest image carries the pikaci-incus-run binary at the expected NixOS system path so it is available when the executor invokes incus exec.

Affected files: nix/incus/pikaci-image.nix

Evidence
(diff for this file was included in the payload but truncated)

The Incus VM image Nix expression is updated so that pikaci-incus-run is part of the NixOS system profile, available at /run/current-system/sw/bin/pikaci-incus-run. This is the counterpart to the executor change: where the executor previously wrote the run script into /usr/local/bin/ via the prepare-script heredoc, the image now owns the binary.

The image also needs to include the runtime dependencies (bash, node, python3, PostgreSQL, etc.) in its system profile so that the guest-local paths referenced by the staged wrappers resolve correctly.

Update migration plan documentation

Intent: Reflect the simplified single-path architecture in the planning document, removing references to transfer mode and documenting the image-owned guest contract.

Affected files: docs/incus-migration-plan.md

Evidence
@@ -858,24 +858,26 @@
-- transfer the workspace snapshot
+- mount the prepared workspace snapshot
@@ -889,15 +891,11 @@
-  - `pika-actionlint` transfer Incus: about `155s` wall
-  - `pika-doc-contracts` transfer Incus: about `67s` wall
@@ -938,6 +947,8 @@
+  and with the guest runtime owned by the Incus image plus staged payloads,
+  not by an executor-mounted host `/nix/store`

The migration plan is updated in several areas:

  1. Phase B description: "transfer the workspace snapshot" becomes "mount the prepared workspace snapshot".
  2. Proof status: References to transfer mode measurements (155s pika-actionlint, 67s pika-doc-contracts) are removed. The transfer fallback via PIKACI_REMOTE_LINUX_VM_INCUS_MODE=transfer is no longer mentioned as an option.
  3. Architecture description: Updated to describe "one shared-mount Incus path rather than a mode split" with virtiofs mounts at their final guest paths.
  4. Guest contract: New paragraph documents that the image owns the mounted-path layout and pikaci-incus-run owns the guest env/log/result contract.
  5. Host /nix/store: Documented that staged Linux runtime no longer depends on a mounted host /nix/store seam, with details about how staged wrappers now use guest-local interpreters.
  6. Closing summary: Updated to note the guest runtime is "owned by the Incus image plus staged payloads, not by an executor-mounted host /nix/store".

Update and simplify test suite

Intent: Remove tests for deleted functionality, consolidate duplicated test context construction into shared helpers, and update assertions for the new launch command shape.

Affected files: crates/pikaci/src/executor.rs

Evidence
@@ -4151,6 +3766,44 @@
+    fn sample_shell_job(command: &'static str) -> JobSpec {
+    fn sample_remote_context(backend: RemoteLinuxVmBackend) -> RemoteLinuxVmContext {
@@ -4334,39 +3963,21 @@
-        let command = build_remote_incus_launch_command(&RemoteLinuxVmContext {
+        let command = build_remote_incus_launch_command(
+            &sample_shell_job("actionlint"),
+            &sample_remote_context(RemoteLinuxVmBackend::Incus),
+        );
@@ -4387,55 +3998,12 @@
-    fn remote_linux_incus_transfer_finalize_remounts_read_only_workspace() {

The test module is significantly cleaned up:

New helpers:

  • sample_shell_job(command) — creates a minimal JobSpec with a shell command.
  • sample_remote_context(backend) — creates a RemoteLinuxVmContext without the deleted incus_mode and remote_incus_closure_dir fields.

These replace ~25-line inline struct literals that were duplicated across multiple tests.

Deleted tests:

  • remote_linux_incus_transfer_finalize_remounts_read_only_workspace — tested the deleted transfer finalize command.
  • remote_linux_incus_single_host_prepare_script_avoids_snapshot_staging_copy — tested the deleted prepare script in shared mode.
  • remote_linux_incus_transfer_prepare_script_keeps_guest_store_paths — tested the deleted prepare script in transfer mode.
  • remote_linux_incus_mode_env_accepts_single_host_shared — tested the deleted mode parser.
  • remote_linux_incus_mode_defaults_to_single_host_shared — tested the deleted mode default.
  • with_incus_mode_env helper — no longer needed.

Updated tests:

  • remote_linux_incus_launch_uses_incus_exec_runner — now asserts on env, PIKACI_INCUS_GUEST_COMMAND=, PIKACI_INCUS_TIMEOUT_SECS=120, PIKACI_INCUS_RUN_AS_ROOT=0, and the new binary path.
  • remote_linux_incus_read_only_disk_device_uses_virtiofs_bus (renamed from ..._single_host_shared_...) — uses sample_remote_context and the new mount path constant.
  • ensure_remote_linux_vm_directories_skips_existing_staged_output_symlinks — updated context construction and mkdir assertion to match 6 directories instead of 7.

Diff