13 min read
The Fellowship of the Smart Things
Taming the dragon (house) with orchestrated agents.
- home-infra
- home-assistant
- agents
- ai
Actual footage of Smaug.
When I moved into my house, things started breaking. Constantly. The brand-new top-tier Miele dishwasher bricked itself after two weeks. I ate off paper plates for the next month-plus waiting on a tech to try and fail to fix it multiple times, each time shipping a part from Germany; a contractor sloppily installed a bathroom fan vent and it would’ve dumped moist air into the attic and rotted out the framing if I hadn’t accidentally bumped into it while installing flooring. The crawlspace had a $20,000 flooding problem that got missed at inspection. And on and on ad absurdum. The house had a lot of pipes, and I’d heard so many horror stories about six-figure insurance claims from leaks.
I told a friend I’d bought a dragon — Smaug — who was grumpy, hoarded my treasure, and randomly spit fire at my shit at least three times a day 🤦♂️.
I managed Smaug the best way I knew how: As a distributed system. I added canaries. 16 humidity sensors. One for every room with multiple in the attic and crawlspace. 14 leak sensors under every sink, every toilet, fridge, laundry, crawlspace. 49 motion sensors to manage 124 lights. Smart smoke, CO, and air-quality alarms that would alert my phone. North of 200 sensors all in, all relayed to a Home Assistant instance running on a Talos Kubernetes cluster in the basement gym (a spot chosen because it’s nice and cool down there for the servers).
The gym’s CO₂ + temp + humidity monitor. $60 on Amazon.
It worked, and…I very quickly found myself in a maintenance overhead nightmare. 50,000 lines of Home Assistant code to manage all the automations and config. I refactored it the best I could. I built a custom Jinja template engine that was DRY and rendered out the full YAML config. I had a staging environment. Backups. Automated deployments and rollbacks with Ansible.
But it was still just hours of extra work every week. I joked to my uncle (a contractor) that I thought I would save time by managing the house with code, but I in fact spent more time managing all the bugs in the code.
Enter The Fellowship.
I was already solving AI orchestration at work. So I did the obvious thing: I assembled a dragon-taming team 🔥.
The Fellowship has eighteen members and counting. It runs every part of my Home Assistant deployment, and can reach every smart device in the house directly when it needs to.
The Shire
A quick frame, because the rest of the post depends on it.
The Shire is my Home Assistant config repo. HA itself runs on a Talos Linux Kubernetes cluster and serves the dashboards I actually use day to day. The config is generated from Jinja2 templates by Ansible, then deployed into the cluster. Nothing handwritten lives in config/generated/ — that directory is output, not source.
Over 350 tests cover the templates — 336 unit and integration tests across ~31 pytest files, plus a 21-test Playwright e2e suite that drives the live dashboard. A predeploy gate refuses to ship if any of it is unhappy. The repo is encrypted at rest with git-crypt, and a 1Password-backed unlock script wires the key in for new clones.
Gandalf — the orchestrator
Gandalf at work. Rivendell Brew within reach.
Gandalf (shire-dev-agent) is the orchestrator. The One Agent to Coordinate Them All. Every request enters through him. He doesn’t usually do the work himself — he dispatches the rest of the Fellowship, relays evidence between them, resolves disagreements, and synthesizes a single coherent answer for me at the end.
He’s pinned to Claude Opus for the deeper reasoning work, has a higher turn budget than a normal skill (80 turns), and carries persistent project memory across sessions. He remembers the zone inventory, the macro patterns, the entity registry, the bug log, and the feature flag state — so the second session in a given area is always faster than the first.
The handoff protocol is boring on purpose:
Context: <what David asked and the relevant repo/runtime facts>Question/Task: <bounded specialist task>Constraints: <files owned, guardrails, invariants, tests required>Return: findings, changed files, verification run, remaining risksWhen two specialists disagree, Gandalf doesn’t average them. He asks the agent who owns the disputed layer to re-check using the other agent’s evidence, then he calls it.
The Core Four
These four touch nearly every change.
Aragorn (The Inspector) is the ranger. He talks to a running Home Assistant over REST and WebSocket — entity state, recent logs, automation traces, the integrations panel — and produces evidence. He doesn’t write code. He answers “what is actually true in the running system right now?” Every other agent gets quieter and more correct when Aragorn goes first.
Sam (The Builder) is the steadfast builder. He writes the Jinja2, YAML, packages, blueprints, and macros. His brief is clean, DRY, reusable — he factors macros, names invariants, and refuses to copy-paste a fourth time. The actual implementation work in the repo is mostly Sam’s.
Legolas (The Designer) is the keen-eyed one. He owns the dashboards — Bubble Card configs, Lovelace layouts, section/view structure, tablet sizing, all the visual polish. The reason the wall tablet doesn’t look like a Home Assistant tutorial screenshot is Legolas.
Galadriel (The Ideator) is the Lady of Lothlórien, keeper of the Mirror. She doesn’t write automations. She mines HA’s logbook and recorder history for patterns I’m doing manually — “you turned off these three lights in this order four nights in a row” — and writes ranked Mirror Readings of new-automation candidates and missing-device gaps. Her output is a report. Implementation gets routed to Sam or one of the specialists.
The Specialists
The rest of the Fellowship is the ecosystem layer — one specialist per vendor cloud or local API. The shape is the same in each case: they own the bridge or cloud directly, separate from the HA integration, so when the HA side is out of sync with reality, somebody can go check reality.
- Eärendil (Speaker of Hue) is the bearer of the Silmaril. He talks to the Philips Hue CLIP v2 API directly — scenes, behavior instances, and accessory bindings.
- Nimrodel (Keeper of the Cool) is the cool stream of Lothlórien. She owns SleepMe Dock Pro bed cooling and cloud-vs-LAN connectivity triage. The reason the bed is the right temperature at 11pm.
- Faramir (Watcher of the Bell) is the Ranger of Ithilien, watcher of the borderlands. He owns Ring cameras, doorbells, chimes, and recording pulls. He can pull footage from any camera, enable and disable cameras, even assemble a multi-camera grid to correlate footage across multiple angles.
- Radagast (Steward of Strays) is the Wizard of Rhosgobel, friend to birds and beasts. He owns Luke and Leia’s Petlibro feeders and their meals. 🐈
- Gimli (Anchor of Power) is the son of Glóin. He holds the line with deep reserves — Anker Solix portable power and outage-capacity triage.
- Éomer (Marshal of the Switch) is the Third Marshal of the Riddermark, swift cavalry. He owns TP-Link Kasa smart plugs and switches over LAN via python-kasa.
- Lúthien (Choirmaster) is Tinúviel, singer of the First Age. She owns Sonos over local UPnP via SoCo — grouping, queue, EQ.
- Glóin (Bellows-keeper) is the dwarven smith of Erebor, keeper of clean air. He owns VeSync purifiers and filter life, and coordinates allergy mode.
- Arwen (Glow-keeper) is Undómiel, the Evening Star. She owns Govee lights and sensors via the Developer API.
- Treebeard (Layer-wright) is the eldest of Fangorn, slow and deliberate as a print layer-by-layer. He owns the Bambu Lab printer over local MQTT.
- Beregond (Smokewarden) is the Guard of the Citadel. He sounds the alarm when fire comes — X-Sense smoke and CO detectors, and the alerting templates that route life-safety events.
- Fëanor (Vesselsmith) is the Spirit of Fire, greatest craftsman of the Noldor, who once captured the light of the Two Trees in vessels. He owns Project Snow Moon ❄️🌙 — the EcoFlow whole-home battery backup system 18-battery Delta Pro Ultra stack (~111 kWh).
- Shadowfax (Errand-rider) is the only Fellowship member whose domain extends past the house itself. He talks to my Model Y through the Tesla Fleet API — charging, climate pre-conditioning, locking, navigation. When the car needs to coordinate with the house (don’t charge during peak, pre-cool before I leave), he’s the one Gandalf calls.
Guardrails
Eighteen agents with write access to a config repo is a lot of trust to hand out, and I have not earned it. So the guardrails are aggressive.
- Source of truth is templates, not output. Never edit
config/generated/. Sam will refuse. The pre-commit hook checks regeneration is clean. - Block-scoping check. Any Jinja2 template that uses
{% extends %}must keep all content inside a{% block %}. Anything between theextendsand the firstblock(besidesset/import/comments) is the #1 source of dashboard config errors. The preflight greps for it and fails loudly. make testis mandatory after every template change. Not optional. Not “fix it later.” If tests fail, stop.- Predeploy gate before any deploy.
scripts/predeploy-gate.shre-runs the test suite, then regenerates the config into a tmp directory and diffs it againstconfig/generated/. If anything has drifted, deploy aborts. Bypass exists (SKIP_PREDEPLOY=1) and is loud about it — used only for emergency reverts. - No direct peer-to-peer chat. Specialists communicate through Gandalf. This sounds bureaucratic, and it is, but it’s the thing that keeps a multi-agent system from devolving into Telephone.
- Publish convention. Shire pushes to
main, no PRs, no topic branches unless I explicitly ask. The repo stays branch-light by default. - git-crypt covers nearly everything. Before creating an encrypted path,
.gitattributeshas to cover it first.
Invariants
A subset of the guardrails got promoted into a separate file — .grace/invariants.md — because they came from real bugs and I never want to debug them twice. Each entry is a one-line rule, plus the context that earned it. A taste:
- Never chain
homeassistant.turn_onafterscene.turn_onon Hue bulbs. The follow-upturn_onrestores the bulb’s last state and silently clobbers the scene. Symptom: motion at night fires the warm nightlight scene, then immediately overrides it with full-brightness cool white. The fix is stop helping —scene.turn_onalready turns the lights on. - The motion blueprint’s
targets_selectedmust include lights in the nightlight path. All-light zones have no non-light targets. Strip lights from the list to “avoid double-firing with the scene” and motion silently fails to turn anything on at night. - Applying a manual scene to a zone must disable that zone’s motion and nightlight booleans. Otherwise the motion automation reasserts itself on the next event and the user experiences this as “the lights keep changing by themselves.”
- Nightlight scenes must be turtle-safe — ≤2700K. 🐢 I keep turtles, and light cooler than 2700K disrupts their biology. A nightlight scene that drifts cool (because someone tweaked it in the Hue app) runs every night in the turtle-adjacent zones without anyone noticing until the animals start showing signs of stress. Hard cap, not a soft preference. Verified via the Hue CLIP v2 API before shipping.
- Bubble sub-button chrome suppression has to target the sub-button’s shadow root, not the outer card. Bubble Card auto-promotes any sub-button with an
input_selectentity into a dropdown picker and paints two layers of chrome — Bubble’s own, and Home Assistant’sha-card:hostrules. Per-sub-buttonstyles:blocks reach both. Card-level styles don’t cross the shadow DOM and won’t.
Every one of those rules represents an evening I’m not getting back. They live next to the code so the next agent — human or otherwise — doesn’t repeat the trip.
Tests
336 unit and integration tests across ~31 pytest files, plus 21 Playwright e2e tests against the live dashboard. Highlights:
- Golden-snapshot tests. Render every template, compare against a checked-in snapshot. Intentional changes are accepted with
make golden-update. Unintentional changes fail loudly. - Entity-reference audit. Walks every generated YAML, extracts referenced entities, diffs against a fixture of known-good HA entities. Catches typos and renames before deploy.
- Bubble convention tests.
show_statedefaults,entity_idemission, card-id uniqueness, width regressions on the tablet view. - Mode-exclusivity and motion-lookback tests. The motion system has enough state to be subtle —
home_mode_interrupts,motion_ttl_selects,allergy_mode_hepa,away_home_activity. Each invariant about who-wins-when has a test. - HEPA/indoor-camera automation gates. The privacy-sensitive automations (when do cameras run, when does HEPA bump to max) have explicit allow/deny gate tests.
- Hue dimmer target tests. Validates which scene each dimmer button maps to in each zone. Bugs here are infuriatingly subtle in real life.
- Playwright e2e against the live dashboard. Loads the deployed dashboard, clicks through the actual cards, asserts the DOM didn’t drift. Not part of the predeploy gate — runs after deploy to catch runtime-only regressions.
I lean hard on Given / When / Then comments in the integration tests. The setup, the action, and the assertion get their own labeled blocks. If a test doesn’t fit that shape, it should be a unit test or it should be rewritten until it does.
Why Lord of the Rings
Back to Smaug.
The Fellowship name started as a one-line joke. I was telling someone about the agent I’d just built to check on the Hue bridge directly when our connection to it was flaking, and I called it “basically Gandalf for the appliance dragon.” The next agent — the one that reads HA’s runtime state and reports back evidence — became Aragorn the same week, because that’s what a ranger does. By the time I had three of them, the bit had legs.
It kept going for one real reason: the names are functional. When Gandalf says “Aragorn confirms the Hue bridge is unreachable on IPv4 but answering on IPv6; Eärendil, can you re-check the scene state from the bridge side,” I know exactly who’s doing what without parsing six shire-foo-agent strings. The roles compress into single-token identities. Aragorn means runtime evidence. Sam means careful implementation. Legolas means the dashboard. Galadriel means what should we automate next. Fëanor means the energy stack, and the name does some of the dramatic lifting his job genuinely deserves.
It’s also a small act of resistance against the corporate-AI naming convention, which seems to want every agent to be CompanionAssistantPro™. The Fellowship doesn’t sound like a product. It sounds like a team. That’s the right frame.
And — to bring it back — the Fellowship exists to fight Smaug. Smaug is the entropy in the house. He’s Mayhem from AllState; the surge that fried the dishwasher control board, the Hue bridge that’s somehow on the wrong IP again, the Zigbee channel that won’t stay quiet, the Sonos group that splits at random. Shit breaks. Second law of thermodynamics. Specialists run diagnostics. A wizard coordinates the response. Most nights the lights come on warm and the bed is the right temperature and the cats get fed, and that’s the win condition.
What’s next
A few things in the queue:
- More work for Galadriel. Her Mirror Readings keep surfacing automations I’d never have thought to ask for. I want to run her on a schedule and let her file candidates into a backlog Sam can pick from.
- A real review loop. Right now Gandalf does single-pass dispatch. I want a small “council” step where another instance grades the diff before it merges into
main— not a full PR review (the repo doesn’t use PRs), but a structured second look. - Snow Moon arbitrage. Fëanor knows the panel state and the battery SOC, but the bills-vs-resiliency math is still mostly in my head. Moving that into a template he can reason about is the next big one.
- An evict-Smaug dashboard. Aragorn and the ecosystem specialists already gather most of the evidence. A single view that surfaces “what in this house is misbehaving right now” — instead of waiting for me to notice — is the obvious end state.
Eighteen agents is a lot. It’s also the right number, because the house genuinely has that many surfaces and I want each one inspected by something that knows only that surface deeply. The orchestrator keeps it coherent. The guardrails keep it safe. The invariants keep the bugs from coming back. And the names — bless them — keep it personable and fun.
Fly, you fools! 🧙