Evaluation Metrics¶
Eval-workers capture per-evaluation Nix metrics and persist them, mirroring how builds capture resource metrics. The data drives the Job Board's "Expensive Evals" panel and self-tunes the scheduler so heavy evaluations land on big-RAM machines.
What's captured¶
Three tables are written per evaluation:
evaluation_metric- per-eval aggregate: total thunks, function calls, primop calls, lookups, allocated bytes, peak GC heap (MB), peak RSS (MB), the per-phase wall-clock (fetch_ms,eval_flake_ms,eval_drv_ms,total_eval_ms) and theworker_idthat ran it.evaluation_attr_cost- per-entry-point hotspots: thunks, function calls, eval wall-clock and allocated bytes bucketed by user entry-point.flake_output_node- the walked flake-output subgraph:path,parent,name,kind,is_derivation,drv_path.
Per-entry-point hotspots¶
Costs are bucketed by the eval's wildcard target (the user entry-point), so you can see which outputs are expensive to evaluate rather than just the eval total. The aggregation runs in the resolver as each request completes.
Walked flake graph¶
The discovery BFS records every flake output it actually walked - no extra
evaluation is forced. The subgraph is stored as flake_output_node rows and
served back as a nix flake show-like tree for frontend rendering.
Self-tuning RAM-to-machine routing¶
A per-project rolling window takes the p95 of eval peak_rss_mb over the last
24h and feeds it to the scheduler's ResourceFitRule (see
scheduler scoring). A project whose evaluations have
historically needed lots of RAM is routed to big-RAM eval machines, and the
prediction re-tunes itself as new evals complete - there are no manual
thresholds.
Board endpoints¶
GET /board/evals/expensive-by-resource?metric={time,rss,heap,thunks,fncalls,alloc}&window_days=N- the top evaluations by a resource, org-scoped.
metricis matched against a closed allow-list, so it cannot inject SQL. GET /evals/{evaluation}/flake-graph- the walked flake-output graph for one evaluation.
Both are surfaced in the Job Board's "Expensive Evals" panel.
Toggle and overhead¶
GRADIENT_EVAL_METRICS_ENABLED (worker-side, default true) gates capture.
When false, the eval-worker skips the per-request stats read entirely, so
there is zero added overhead. Even when enabled the overhead is one cheap
cumulative-counter read per resolver request, diffed per worker - there is no
--count-calls-style instrumentation.