This branch hardens the pika-news mirror runtime by introducing an exclusive file lock (pika-news-mirror.lock) that prevents concurrent mirror syncs, a configurable hard timeout (mirror_timeout_secs) that kills git processes that exceed their deadline, and structured lock-status metadata that surfaces active/stale mirror runs in both the admin dashboard and health API. Together these changes eliminate the failure mode where a hung git push holds the repo lock indefinitely, blocking all subsequent mirror and CI work. The lock file carries JSON metadata (PID, trigger source, start time) so operators can diagnose stuck syncs from the admin UI without SSH-ing into the host. Failure classification in the branch store is extended with four new kinds — obsolete, stale, busy, and timeout — and the admin dashboard uses a refined tone function to render these as warnings rather than hard errors where appropriate.
Tutorial Steps
Add the `fs2` dependency for cross-platform file locking
Intent: Bring in the `fs2` crate which provides `try_lock_exclusive()` / `unlock()` on `std::fs::File`, enabling advisory file locks without platform-specific syscall wrappers.
The fs2 crate is added as a workspace dependency. It wraps flock(2) on Unix and LockFileEx on Windows, giving portable exclusive file locking semantics. The lock is used later in forge.rs to serialize mirror sync operations on the canonical bare repo.
Introduce the `mirror_timeout_secs` configuration knob
Intent: Give operators a way to bound how long any single mirror git command (push or ls-remote) is allowed to run before being killed, preventing indefinitely-hung processes.
A new mirror_timeout_secs field is added to ForgeRepoConfig with a default of 120 seconds. The Config::effective_forge_repo() normalizer applies the default whenever a mirror remote is configured and the value is absent or explicitly zero — this prevents operators from accidentally disabling the timeout. The example TOML, README documentation, and the NixOS module are all updated in lockstep.
Three new tests cover the config layer:
Parsing an explicit value (Some(120)).
Defaulting when mirror_remote is set but no timeout is specified.
Treating an explicit 0 as "use the default" rather than "no timeout".
Implement the mirror file lock with JSON metadata
Intent: Serialize concurrent mirror sync attempts by acquiring an exclusive flock on `<git_dir>/pika-news-mirror.lock`, and write structured JSON metadata into the lock file so that other processes (and the admin UI) can inspect who holds the lock and for how long.
MirrorLockMetadata — a JSON-serializable struct written into the lock file after acquisition: { pid, trigger_source, operation, started_at, started_at_unix_secs }.
MirrorLockGuard — an RAII wrapper around std::fs::File whose Drop impl calls file.unlock(), ensuring the flock is released even on panics.
MirrorLockStatus — the public-facing view returned by current_mirror_lock_status(), which non-destructively probes the lock file with try_lock_exclusive and reads back the metadata if the lock is held.
The acquire_mirror_lock function:
Opens <git_dir>/pika-news-mirror.lock with create(true).
Tries try_lock_exclusive().
On success: writes metadata JSON and returns the guard.
On WouldBlock: reads the existing metadata, computes the age, and bails with either a "stale" message (age > timeout) or a "busy" message.
This means a second mirror trigger arriving while one is already running gets a clear, classifiable error rather than silently queuing behind a potentially-hung git process.
Gate `sync_mirror` behind the lock and add the obsolete-trigger fast path
Intent: Make every mirror sync attempt acquire the lock, and add a short-circuit: if the lock is held but the mirror is already fully in sync, skip with an 'obsolete' classification instead of failing with 'busy'.
@@ -451,7 +483,20 @@ pub fn sync_mirror(
repo: &ForgeRepoConfig,
remote_name: &str,
github_token: Option<&str>,
+ trigger_source: &str,
) -> anyhow::Result<MirrorSyncOutcome> {
+ let _mirror_lock = match acquire_mirror_lock(repo, trigger_source) {
+ Ok(lock) => lock,
+ Err(lock_err) => {
+ if let Ok(outcome) = inspect_mirror(repo, remote_name) {
+ if outcome.lagging_ref_count == 0 {
+ return Err(lock_err
+ .context("mirror is already in sync; treating this trigger as obsolete"));
+ }
+ }
+ return Err(lock_err);
+ }
+ };
sync_mirror gains a new trigger_source: &str parameter (propagated from the caller in mirror.rs). Before touching git, it attempts to acquire the mirror lock:
Lock acquired: proceeds with push_mirror + inspect_mirror as before.
Lock denied + mirror already in sync: wraps the lock error with an "obsolete" context string. The branch store classifier later maps this to failure_kind = "obsolete", a benign state.
Lock denied + mirror lagging: returns the raw lock error, classified as "busy" or "stale".
This avoids unnecessary noise: if two webhook deliveries race and the first one already pushed everything, the second one silently records itself as obsolete rather than showing a scary red error.
Add hard timeouts to git push and ls-remote via `run_command_output_bounded`
Intent: Replace unbounded `Command::output()` calls for mirror push and remote inspection with a polling loop that kills the child process after the configured timeout, preventing hung git processes from holding the lock forever.
@@ -713,6 +901,58 @@ fn git_bare<const N: usize>
+fn run_command_output_bounded(
+ mut cmd: Command,
+ timeout: Duration,
+ context: &str,
+) -> anyhow::Result<Output> {
+ cmd.stdout(Stdio::piped()).stderr(Stdio::piped());
+ let mut child = cmd.spawn().with_context(|| context.to_string())?;
+ let started = Instant::now();
+ loop {
+ if child.try_wait()...is_some() {
+ return child.wait_with_output()...;
+ }
+ if started.elapsed() > timeout {
+ let _ = child.kill();
+ ...bail!("{context} timed out after {}s", timeout.as_secs());
+ }
+ thread::sleep(Duration::from_millis(50));
+ }
+}
A new generic helper run_command_output_bounded(cmd, timeout, context) replaces direct .output() calls for the two mirror-facing git commands:
push_mirror — the git push --prune that writes to the remote.
remote_head_refs — the git ls-remote --heads that inspects the remote.
The helper spawns the child with piped stdout/stderr, then polls try_wait() in a 50ms loop. If the child hasn't exited within timeout, it sends SIGKILL, collects whatever stderr was produced, and returns a bail error containing "timed out after Ns".
Both commands also gain -c gc.auto=0 -c maintenance.auto=0 flags to prevent git from opportunistically running garbage collection mid-push, which could add unpredictable latency inside the timeout window.
A dedicated test (bounded_command_times_out_and_kills_child) validates that sleep 2 is killed after 100ms.
Extend failure classification with runtime safety error kinds
Intent: Teach the branch store's mirror failure classifier to recognize the four new error signatures — obsolete, stale, busy, timeout — so they can be stored and queried distinctly from configuration errors or generic failures.
@@ -3234,6 +3234,18 @@ fn classify_mirror_failure_kind(status: &str, error_text: Option<&str>) -> Option
let lower = error_text.unwrap_or_default().to_ascii_lowercase();
+ if lower.contains("already in sync; treating this trigger as obsolete") {
+ return Some("obsolete");
+ }
+ if lower.contains("stale mirror sync still holds repo lock") {
+ return Some("stale");
+ }
+ if lower.contains("mirror sync already running") {
+ return Some("busy");
+ }
+ if lower.contains("timed out after") {
+ return Some("timeout");
+ }
The classify_mirror_failure_kind function gains four new prefix checks, each returning a distinct kind string:
Marker substring
Kind
Meaning
already in sync; treating this trigger as obsolete
obsolete
Lock was held but mirror was already up-to-date
stale mirror sync still holds repo lock
stale
Lock is held past the timeout threshold
mirror sync already running
busy
Another sync is actively running
timed out after
timeout
Git process exceeded mirror_timeout_secs
These are checked before the existing config_markers block, so they take precedence. A new test (classifies_mirror_failure_kinds_for_runtime_safety_cases) exercises all four paths with representative error strings.
Surface lock status and timeout in the mirror runtime status API
Intent: Expose the live lock state and the configured timeout to API consumers so the admin dashboard and any monitoring integrations can display whether a mirror sync is currently running, stale, or idle.
timeout_secs: the effective timeout for the current configuration.
active_run: an Option<MirrorLockStatus> that is Some(...) whenever the lock file is currently held, populated by the non-blocking current_mirror_lock_status() probe.
The mirror_runtime_status() function populates these by reading the forge repo config and probing the lock file. This data flows through to the /news/admin JSON endpoint and the health status builder.
Refine admin dashboard health rendering for new failure kinds
Intent: Update the server-side health builder and the client-side admin JavaScript to render busy/obsolete failures as warnings (non-alarming) and stale/timeout failures as errors, and to show live lock status when a sync is in progress.
build_mirror_health_status is restructured with a new early-return path: if runtime.active_run is Some, it returns immediately with state "active" or "error" (for stale locks) and a descriptive summary including PID, trigger, and elapsed time.
The existing failure-kind mapping is refined:
config, stale, timeout → state "error"
busy, obsolete → state "active" (non-alarming)
Everything else falls through to the previous idle/error logic.
The summary text is now a match on the failure kind rather than a generic "last background attempt failed" message.
Client-side (admin.html)
A new mirrorFailureTone(kind, status, lagging) helper centralizes the CSS class selection for both the current-status card and the history list, replacing duplicated ternary chains. The card rendering gains a new block for runtime.active_run that shows trigger source, PID, elapsed time, and timeout in real time. The timeout value is shown in all mirror status views.
Propagate `mirror_timeout_secs` through all test fixtures
Intent: Keep all existing tests compiling by adding the new required field to every `ForgeRepoConfig` literal across the test suite.
Every ForgeRepoConfig struct literal in the test modules of ci.rs, forge.rs, mirror.rs, poller.rs, and web.rs is extended with mirror_timeout_secs: None (or Some(120) where a mirror remote is configured). This is purely mechanical — no test logic changes, just keeping the struct exhaustive. The MirrorRuntimeStatus test fixtures in web.rs similarly gain timeout_secs and active_run fields.