Scheduler Scoring¶
When a worker requests work, the scheduler ranks every eligible queued job for
that worker and offers the highest scorer. Scoring is pluggable: the score
crate (backend/score) defines a ScoringPolicy trait and a set of composable
rules, and the active policy is selected at startup.
Selecting a policy¶
Set the server option settings.schedulerScoringPolicy (env
GRADIENT_SCHEDULER_SCORING_POLICY, clap field
EvalArgs.scheduler_scoring_policy). Values: simple (the basic rule set) and
resource-aware. resource-aware is the default selected when the env var is
unset, and unknown names log a warning and fall back to resource-aware.
policy_by_name (backend/score/src/policy.rs) resolves the string to an
Arc<dyn ScoringPolicy>.
Architecture¶
ScoringPolicy(policy.rs):name()plusscore(&JobContext, &WorkerContext, &InstanceContext) -> f64.RulePolicy: a namedVec<Box<dyn ScoreRule>>whosescoresums each rule.ScoreRule(rule.rs): onescore(&JobContext, &WorkerContext, &InstanceContext) -> f64contribution plus adescription()explaining what it rewards or penalizes. The board UI reads these viaGET /board/scoring/rules(rule_catalog()) to show a help popup next to each rule name.- Three scoring inputs:
JobContext- per-candidate: kind, architecture, missing-path count, missing NAR size, dependency count,queued_at/ready_at,rescore_countand the owning org's work share.WorkerContext- the worker's architectures, system features, fetch capability and optional live metrics (free RAM, CPU-core score, disk/network speed).InstanceContext- instance-wide 5m/1h/24h windowed averages plus live counts (active/pending builds, total/idle workers), recomputed everyGRADIENT_INSTANCE_METRICS_INTERVALseconds (default 30) and used to make soft-rule thresholds instance-relative.ScoredJobexposes lazy providers (closure_size,history) so a policy pays for closure/history lookups only when a rule reads them.
The scheduler builds the contexts, calls the configured policy per candidate, and assigns the worker its top-scoring job - unless that score is negative.
Soft rules vs. disqualifiers¶
Each rule is one of two classes:
- Soft rule - returns a bounded
[0, cap]bonus, with the threshold for saturation taken from the instance window (e.g.MissingNarSizeRulecaps at 500,MissingPathsRuleat 200,DependencyCountRuleat 50,WaitTimeRulescaled by the instance average wait). Soft rules never push a job below zero. - Disqualifier - may go negative (
RescoreWaitRule,ReserveFetchWorkersRule).ResourceFitRuleis mixed: its RAM-overshoot side is a disqualifier; its CPU-affinity side is a soft bonus.
FairShareRule is currently disabled in resource_aware_rules: its idle gate
counted zero-occupancy workers rather than spare build capacity, so multi-slot
workers below their limits were treated as saturated and the penalty fired
spuriously.
take_best_of_kind will not dispatch a candidate whose total score is
negative - the worker idles that round and the job is retried next cycle. This
gates builds the worker cannot yet serve well rather than forcing a bad
placement.
RescoreWaitRule¶
A build for which no worker has yet reported missing_nar_size scores -1000,
holding it back until the cache state is known. After rescore_count reaches 4
dispatch rounds the penalty drops to 0 so the build stops blocking. Eval jobs
are never penalized.
WaitTimeRule¶
Wait is measured from ready_at (when the job's dependencies cleared), not
queued_at, so dependency wait is excluded. The bonus grows with wait, scaled
by the instance average wait, and saturates at the rule cap for anti-starvation.
FairShareRule¶
Penalty proportional to the owning org's share of in-flight work - weighted by build duration (prefer-local builds count at half), not by job count - so a few long builds and many short ones are balanced fairly.
simple policy rules¶
| Rule | Class | Effect |
|---|---|---|
MissingPathsRule |
soft | [0,200] bonus for path availability the worker can serve, instance-relative to the average missing-path count. |
MissingNarSizeRule |
soft | [0,500] bonus for low fetch size, scaled by the instance average NAR size. |
DependencyCountRule |
soft | [0,50] bonus per dependency for build jobs (unblocks more downstream work first). |
WaitTimeRule |
soft | Bonus growing with ready_at wait, scaled by the instance average wait, for anti-starvation. |
RescoreWaitRule |
disqualifier | -1000 for a build with no reported missing_nar_size, until rescore_count hits 4; never penalizes eval. |
BuiltinDeprioritizeRule |
soft | +50 bonus for real-architecture build jobs, 0 for builtin builds so they yield their slot, and +100 for a builtin build on a worker reporting no architectures so that arch-less worker is not left idle. |
ReserveFetchWorkersRule |
disqualifier | Penalty when a fetch-capable worker is offered a cached-eval job, relaxed as idle capacity grows. |
resource-aware policy rules¶
Adds the following on top of the simple rule set:
| Rule | Class | Effect |
|---|---|---|
ResourceFitRule |
soft + disqualifier | Penalty scaling with predicted-RAM overshoot of free RAM (amplified by past/instance OOM rate); bonus for CPU-heavy jobs on higher-CPU-score workers. Now also applies to evaluation jobs (previously builds-only), using a per-project p95 of historical eval peak-RSS so heavy evals route to big-RAM workers. No-op without history samples or worker metrics. |
ResourceSaturationRule |
disqualifier | -1000 when the worker's live CPU usage is >= 90% or free RAM is <= 10% of total, plus another -1000 when the build's historical peak RAM x1.1 exceeds the worker's free RAM (likely OOM); the two stack (up to -2000). Keeps real builds off overloaded or too-small workers. Exempts builtin-architecture (substitute-only) builds and evals; no-op without worker metrics, and the RAM-fit check needs history samples. |
PreferLocalBuildRule |
soft | Bonus for preferLocalBuild derivations on a worker that already holds (most of) the closure, decaying with missing paths. |
FairShareRule |
disqualifier (disabled) | Penalty proportional to the org's share of in-flight work (duration-weighted; prefer-local at half), so a quiet org is served promptly when a busy org floods the queue. Currently disabled - see the idle-gate note above. |
NetworkAffinityRule |
soft | Bonus for fixed-output derivations on faster-network workers, scaling to a reference speed then capping. No-op for non-FOD jobs or without a network metric. |
DiskAffinityRule |
soft | Bonus for disk-heavy jobs on faster-disk workers, scaling to a reference speed then capping. No-op below the disk-heavy threshold or without a disk metric. |
Worker speed signals¶
The worker measures both speeds passively and reports them on the 10 s
WorkerMetrics heartbeat as exponentially-weighted moving averages:
- Network speed (Mbps): derived from real NAR transfers (upload in
push_direct, download inNarReceiver::accept_chunk) - bytes over elapsed transfer time. Fixed-output derivations are flagged at parse time (is_fixed_output, persisted on thederivationrow) and steered toward faster-network workers. - Disk speed (MB/s): derived from per-build cgroup
io.stat(disk_read_bytes + disk_write_bytes) over build wall-time. The wall-time divisor underestimates peak throughput for CPU-bound builds, but across many builds it reliably separates tmpfs-RAM / SSD / HDD/tmpbuild dirs. Per derivation,HistoryPrediction.avg_disk_bytesestimates disk dependence.
Both stay None until the first NAR transfer / build, in which case the
affinity rules are no-ops.
Adding a rule or policy¶
Implement ScoreRule (including description()) for a new contribution, add it
to a rule list in policy.rs (or a new list), and register the named policy in
policy_by_name. The rule is automatically picked up by rule_catalog() and so
appears in the board's rule-help popup. Each rule has unit tests in
backend/gradient-score/src/rules; run them with cargo test -p gradient-score.