- initial architecture (#33081) - monitor service but no monitor api - reserve runner by applying a runner label - serialise* decision jobs with job concurrency - “serialise” as in concurrency, not encoding - known caveats - decision job needs github api token - downtime after reservation? timeout job - job concurrency could fail under contention - uh oh (#33276) - run id labels (#33283) - if we label runners to take them, why not label them with more metadata?

end at 0:15 0:09 0:08 0:08; 0:07 - how servo can have fast CI for not a lot of money

Web engine CI
on a shoestring budget

delan azabani (she/her)

azabani.com

December 2025

end at 2:43 2:01 2:00 1:53; 1:49 - servo: greenfield web browser engine → demanding CI requirements - we’re currently on github and github actions - but as time goes on, more for network effects early on than for merits - wouldn’t be surprised if we moved to codeberg in the next year or two

Servo’s situation

Servo currently uses GitHub Actions (GHA) quite heavily
- Many platforms: Linux, Windows, macOS,
  Android, and OpenHarmony
- Many tests: Web Platform Tests (50K+ tests,
  1.8M+ subtests), WebGPU CTS, devtools, unit tests…
- Many configurations: MSRV, libservo, linting,
  release, benchmarking…
GHA is a frustrating CI service with baffling limitations

end at 6:17 5:07 4:19 3:21; 3:34 - very useful for small workloads, plus the logic that coordinates workloads - things like taking a tryjob request for “linux” into a run that just builds for linux - or a tryjob request for “full” into a run that builds all the platforms and runs all the tests - for a project of our scale, these runners fall apart for anything beyond that

GitHub-hosted runners

GitHub provides their own runners
Essential for glue logic and small workloads
Painful for building and testing a browser engine
- The gratis runners are tiny and slow
- The paid runners are very expensive
- Need more tools or deps? Install them every time
- Caching is a joke, not suitable for incremental builds

end at 12:34 10:52 8:37 6:26; 6:47 - alternatives we considered - third-party runner providers - namespace, warpbuild, … - often just as expensive per hour as github’s first-party runners - key selling point tends to be better caching - “dumb terminal” jobs → tricky to do without losing access to actions ecosystem - but you should probably avoid it anyway: it’s platform lock-in and yaml sucks - and if it weren’t for forgejo actions, it would be vendor lock-in too - self-host a whole CI service - some CI services like jenkins and bamboo have built-in container orchestration - none of them have really solved the problem of virtual machine orchestration - we lacked the dedicated personnel to operate something on the critical path

Alternatives considered

SaaS providers: expensive, often no Win and/or macOS
RunsOn: less expensive, AWS-only, no macOS support
“Proxy” jobs: would compete for concurrent job limit
- Tricky to do without losing access to GHA ecosystem
Self-host a whole CI service
- Built-in container orchestration, but virtual machines?
- More ops burden: CI service now on the critical path

end at 14:17 12:26 9:49 7:27; 7:38

Self-hosted runners

Self-hosted runners are better!
Give the runners as much RAM and CPU as we want
Custom build environments tailored to the project
- Bake in whatever tools we want
- Bake in a prebuilt Servo for incremental builds

end at ...; 8:31

How much faster?

mach try full workflow: 61m30s → 25m47s (−58%)
linux-unit-tests job: 34m29s → 3m15s (−90%)
windows-unit-tests job: 59m14s → 8m4s (−86%)
lint job: 11m54s → 2m25s (−79%)
wpt jobs: 25m35s → 20m50s (−18%)
- But we also went from 20 runners → 3 runners

end at 17:09 14:35 11:45 9:49; 10:41 - common to all versions of this system - augments the built-in CI service of the forge (github / forgejo) - almost transparent user experience - just one or two extra jobs per job, and some unique ids

What makes our system unique

Augments the built-in CI service of the forge
Almost transparent user experience
Linux, Windows, and macOS runners
Graceful fallback to GitHub-hosted runners
Secure enough for a large public project
Completely self-hosted, so it’s dirt cheap^

end at -:-- 16:09 13:09 11:10; 12:11 - unfair comparison, because it assumes we would need the same amount of hours

Completely self-hosted
so it’s dirt cheap^

We spend 312 EUR/month on general-purpose runners
On comparable GitHub runners: 1421–2500 EUR/month
On comparable third-party runners: 503–1077 EUR/month

end at ...; ...;

Three ways to use runners

Mandatory self-hosted: (with natural queueing)
- runs-on: self-hosted-image:servo-ubuntu2204
Graceful fallback to GitHub-hosted:
- Decision job: POST https://monitor/select-runner
- runs-on: ${{ needs.decision.outputs.label }}
Graceful fallback plus queueing:
- Decision job: POST https://queue/select-runner
- runs-on: ${{ needs.decision.outputs.label }}

end at --:-- --:-- 29:25 28:10; 16:13 - faster checkouts

Faster checkouts

No repo checkout unless you run actions/checkout
- But servo/servo has 130K+ files and 6K+ directories
- This is especially slow on Windows and macOS
Bake a repo checkout into every runner image
Replace actions/checkout with our own action:
git fetch --depth=1 origin $commit git switch --detach git reset --hard FETCH_HEAD

end at --:-- --:-- 30:17 29:18; 17:43

Incremental builds

Cargo supports incremental builds
We’re now baking a repo checkout into every image
Why not build Servo and bake that in too?
- Not perfect — some compilation units get false rebuilds
- Probably don’t use this for release artifacts

end at -:-- 17:06 13:58 11:55; 13:08

Servo’s deployment

Five servers on Hetzner^
- 3x AX102 (Zen 4 16c/32t, 128G RAM) = 312 EUR/month
- 2x AX42 (Zen 3 8c/16t, 64G RAM) = 92 EUR/month
NixOS + libvirt + KVM + ZFS
Custom orchestration

end at -:-- 18:37 15:01 13:15; 14:36

How does it work?

Monitor service orchestrates the runners
- Rebuilds virtual machine images
- Spawns virtual machines for runners
- Registers runners with CI service API
- Labels runners when asked to reserve them
  (optional, but required for graceful fallback)
Queue service allows queueing with fallback (optional)

end at ...; 18:58

Graceful fallback

(skip)

end at -:-- 21:45 17:18 15:27; skipped

Graceful fallback

Every job has to choose a runner label in advance
runs-on: ubuntu-latest # GitHub-hosted
runs-on: self-hosted-image:servo-ubuntu2204
Once you choose the runner label, there’s no turning back
Borrowing a built-in label does not prioritise self-hosted runners over GitHub-hosted runners
So there’s no way to fall back… or is there?

end at -:-- 22:54 18:22 16:46; skipped

Decision jobs

Prepend a job that chooses a runner label
1. let label = runner available?
  | yes => [self-hosted label]
  | no => [GitHub-hosted label]
2. $ echo "label=${label}" | tee -a $GITHUB_OUTPUT
Use the step output in runs-on
runs-on: ${{ needs.decision.outputs.label }}
But two decisions must not be made concurrently

end at -:-- 24:38 20:11 18:34; skipped - problem: most solutions known at the time were inherently racy - solved: *reserve* a runner by applying a label to it - these labels are of the form `reserved-for:uuidv4` - then the workload job can `runs-on: reserved-for:uuidv4`

Decisions must be serialised

Stack Overflow and GitHub answers are inherently racy:
any idle runners? —(TOCTOU)→ commit to self-hosted
We can reserve a runner by applying a unique label to it!
- GH API: add custom label: reserved-for:<UUIDv4>
- runs-on: ${{ needs.decision.outputs.label }}
  runs-on: 6826776b-5c18-4ef5-8129-4644a698ae59
Initially do this directly in the decision job (servo#33081)

end at -:-- 26:38 21:55 20:49; skipped

Decisions must be serialised

Labelling runners requires a privileged GitHub API token
Even with reservation, decisions must still be serialised:
runner not yet reserved? —(TOCTOU)→ label the runner
- But hey, at least we have job concurrency… right?
- Wrong: runs will fail under even modest contention :(

end at -:-- 28:16 23:45 22:29; skipped - monitor api (#33315) - move reservation into the servers managing the runners - serialise the monitor api requests - problem: what happens if the runner fails to materialise? - jobs are `queued`, then they are `in_progress` - problem: you can only declare a time limit for `in_progress`, not `queued`

Decisions must be serialised

So move decisions out of the decision jobs (servo#33315)
But what happens if reserved runners fail to materialise?
You can limit in_progress time in GHA, but not queued time

end at -:-- 30:39 24:56 24:06; skipped - timeout jobs - each timeout job is a watchdog for your workload job, ensuring that it actually gets a runner - query the github api for that job run id, check `status` / `created_at`

Timeout jobs

Watchdog for your workload job, ensuring it gets a runner
1. Wait a short amount of time (e.g. 120 seconds)
2. Query the CI service API for the workload job
3. If the job is still queued, cancel the run
Only run this when you actually use a self-hosted runner:
if: ${{ fromJSON(needs.decision.outputs.is-self-hosted) }}

end at -:-- 32:52 27:02 26:02; skipped - unique ids - problem: you can’t know the job run id of the workload job - they can be instantiated multiple times via workflow calls - timeout job does not (and can not) express any dependency on workload job - in other words, the workload job and the timeout job are just two jobs

Uniquely identifying jobs

How do we know the run id of the workload job?
- Jobs can be instantiated many times via workflow calls
The only supported job relationship is needs
- Workload job needs decision job
- Timeout job needs decision job
- Timeout job can’t needs workload job
needs relationships are not exposed in the API

end at --:-- --:-- 27:50 26:41; skipped - solved: tie them together with the uuidv4 generated in the decision job - in the friendly / display `name` of the job. yes, really, we string-match

Uniquely identifying jobs

Tie them together by putting the <UUIDv4> in the name:
name: Linux [${{ needs.decision.outputs.unique-id }}]
name: Linux [6826776b-5c18-4ef5-8129-4644a698ae59]
Query the CI service API for all jobs in the workflow run
Check the status of the job whose name contains
[${{ needs.decision.outputs.unique-id }}]
Yes, really, we have to string-match the name :)))

end at --:-- --:-- 31:59 30:45; skipped - tokenless api - with the monitor api, the workflow now needs a secret - otherwise anyone could deny service by reserving all of the runners - so while the workflow no longer needs a privileged GitHub API token, we’ve got the same problem, just in a different place

Tokenless API

Monitor API requires access to secrets in the workflow
- All pull_request_target runs have access to secrets
  - …but you generally don’t want to use it anyway
- Most pull_request runs do not have access to secrets
How do we prove the request is genuine and authorised,
if we can’t authenticate with a token?

end at --:-- --:-- 33:06 31:34; skipped - we can publish an artifact representing the request - this is unforgeable!

Tokenless API

Upload an artifact representing the request!
Hit the monitor API
- /select-runner ?unique_id &qualified_repo &run_id
- (the profile_key is in the artifact)
Important: delete the artifact, so it can’t be reused
(and set the minimum auto-delete, in case that fails)

end at --:-- --:-- 35:17 34:02; skipped

Global queue

Fallback happens immediately if no runners are available
But if GitHub-hosted runs take 5x as long as self-hosted,
we can wait up to 80% of that time and still win
Run a queue service that allows jobs to wait for capacity
Decision jobs hit the queue API instead of the monitor API
Queue API says “wait” with HTTP 503 + ‘Retry-After’
Queue API then proxies the reservation to a monitor API

Runner images

(skip)

end at --:-- --:-- 37:27 36:17; 19:26

Runner images

GitHub uses Packer for their stock runner images
Our monitor service manages image rebuilds
Initially kicked off manually, now fully automated (#6)
- Driven by Rust with reflink copies (#32)
Mounting images to inject data is no longer viable (#30)
- macOS has no usable FS with Linux write support
- Get tools and deps from the monitor’s web server (#32)

end at --:-- --:-- 38:45 37:17

Runner images

Consistent approach to automating operating systems
- OS installation: whatever is easiest
- Config bootstrap: native config management (if any)
  - Use it as little as possible
- Image config: native scripting
- Runner boot: native scripting

end at --:-- --:-- 39:41 37:56

Linux runners

if ! [ -e built ]; then
    # Image config
    touch built
    poweroff
else
    # Runner boot
fi

OS installation: prebuilt Ubuntu cloud images
Config bootstrap: cloud-init config
- Use it as little as possible (systemd journal to tty7, netplan config, curl and run next stage)
Image config: bash script
Runner boot: same bash script

end at --:-- --:-- 40:57 38:58

Windows runners

OS installation: autounattend.xml (generator)
Config bootstrap: same autounattend.xml
- Use it as little as possible
- Create elevated scheduled task to curl and run next stage
- Install NetKVM driver, do some reg imports, reboot
Image config: PowerShell script
Runner boot: same PowerShell script

end at --:-- --:-- 43:04 40:34

macOS runners

OS installation: by hand :( but only once
Config bootstrap: curl|sh by hand :( but only once
- Use it as little as possible (zsh script)
- ~~Enable SSH~~, enable autologin, ~~enable sudo NOPASSWD~~
- Install a LaunchAgent to curl and run next stage
- Disable broken session restore feature in Terminal.app
Image config and runner boot: zsh script

end at --:-- --:-- 47:03 44:34; 24:37

Future directions

Decoupling the system from Servo
macOS arm64 runners (#64)
Support for Forgejo Actions (#94)
Support for other CI services?
Dynamic runner counts / autoscaling
Hot runners with memory ballooning
microVM runners?

github.com/servo/ci-runners

Slides: go.daz.cat/3tdhp

Transcript: go.daz.cat/2ra8x

Web engine CIon a shoestring budget

Servo’s situation

GitHub-hosted runners

Alternatives considered

Self-hosted runners

How much faster?

What makes our system unique

Completely self-hostedso it’s dirt cheap^

Three ways to use runners

Faster checkouts

Incremental builds

Servo’s deployment

How does it work?

Graceful fallback

Graceful fallback

Decision jobs

Decisions must be serialised

Decisions must be serialised

Decisions must be serialised

Timeout jobs

Uniquely identifying jobs

Uniquely identifying jobs

Tokenless API

Tokenless API

Global queue

Runner images

Runner images

Runner images

Linux runners

Windows runners

macOS runners

Future directions

Web engine CI
on a shoestring budget

Completely self-hosted
so it’s dirt cheap^