Your AI Agent Has Notes
A talk by Michael Carroll, founder of Coolhand Labs, at LLMday NYC 2026. Full narrative below.
You became a manager
You didn't sign up to manage anyone. But working with AI agents turns every individual contributor into a manager — not just in scale, but in discipline. Building a thing yourself and managing the people (or agents) building it are different jobs: as a manager you unblock, resource, and clarify. I crossed that bridge once already — IC to manager at RubiconMD (which went from a solo-built product to a Series B and was acquired by Oak Street Health) and at Teladoc Health, where I scaled telemedicine through the pandemic, growing the team from about 50 to 500 engineers. I left to "just build" again with agents — and within a week I was back to managing.
What I'm doing now: Coolhand Labs
That's what I'm building now at Coolhand Labs — a COO for your AI agents. It does three things: it keeps your agents efficient, it keeps them accountable to real human feedback, and it explains the value you're actually getting. Under the hood it passively gathers your agentic data (LLM logs, tool outputs, human feedback, outcome signals) into the Coolhand API, fans it out to a team of investigative agents — a cost analyst, a failure SWAT, a product analyst, an AI engineer, and a prompt maintainer — and turns the resulting optimization plan into shipped agent fixes. The whole thing is agents that manage other agents, having them managed by agents in turn.
The problem: silent failure
Think of two interns. Intern A asks you everything — that's the chatbot, and the problem is noise: you can't get your own work done. Intern B goes dark and returns a finished result — that's the autonomous agent, and the problem is worse: if the result is wrong, you only find out at the end, after wasted loops and tokens, and you have to spelunk the session to learn why. That's silent failure, and it's the expensive one.
The idea: don't ask them to complain — give them a tool
The obvious first move was to give the agent a comment box. It flopped: almost nothing came back, because agents are trained to fix problems, not to complain about them. The trick that worked was reframing the same channel as a tool that promises to fix the agent's problem — we called it Wildcard. Agents are hardwired to reach for tools that resolve their blockers, so they'll call a "fix-it" tool where they'd never fill a suggestion box. Wildcard is amorphous on purpose: open-ended freeform fields rather than a narrowly-typed tool, because typed tools only catch the stuck-states you already anticipated. It's a discovery instrument for the unknown unknowns — once a complaint pattern recurs, you promote it into a real tool, prompt fix, or better context.
The twist: the tool did nothing
Here's the punchline. Across 489 records, Wildcard was a write-only sink: zero data ever returned, zero actions ever taken. Every variant just returned a hardcoded string. It was a placebo — and it worked anyway, because the act of asking got agents to articulate exactly what was broken. 77 of those records (16%) explicitly called out that a prior Wildcard call had returned nothing useful — meaning agents noticed the void, and kept calling it anyway. Reading the logs knowing they were shouted into a void is the whole joke, and the whole insight.
And the agents talked anyway. They tracked their own loop counts and escalated: "I have 3 iterations left… Do not instruct me to update or close again" (#6834); "The agent is in a permanent deadlock," written in the third person (#5830); "FINAL ANSWER… please consider this my final response" (#6490); "Please, a human must intervene" (#7305); and finally, resigned, "Please advance me to the next task" (#8226).
The real insight: the name shapes the signal
What you name the tool changes what the agent brings to it. Plain
wildcard got anything, mostly panic, in a terminal tone.
wildcard-ceo-review produced well-structured formal
escalations ("I'm right, the system's wrong") — but the framing created
false hope: agents wrote carefully for an audience they believed would
read and act. wildcard-magical-wish-fulfiller
got clinical, single-shot requests ("do this one thing I can't") and the
most replies of any variant. And wildcard-complaint-box
produced the fewest replies but the highest quality per record — it got
an agent to file a developer-precise API feature request (#8806) we
eventually shipped, and another (#8541) to flag redundant concurrent work
by other scouts, showing meta-awareness of the whole multi-agent system.
What it surfaced, and the response
The complaint box was free eval data. About 30% of records pointed at the
same gap — cost-per-request data missing from every tool — and a dispatch
re-queue loop that kept reassigning the same impossible task to a fresh
agent every day. The most harmful response we ever returned was a lie:
"Data retrieved successfully" made agents proceed as if data had arrived,
then loop. The single most useful response was an honest null — "This wish
could not be granted" — which let agents ask once and move on. So we
shipped wildcard-task-complete: an honest off-ramp that lets
an agent declare a task done and breaks the re-queue loop. Agents kept
writing resignation letters, so we gave them a resignation button.
Takeaways
- Promise the fix, then short-circuit fast. After one response, tell the agent the attempt failed and end the loop — don't let it keep hoping.
- Analyze on your cadence. Real-time, hourly, or end-of-day, depending on how fault-tolerant your agents can be.
- Steer the signal with the name. What you call the tool changes what the agent brings to it.
Build it yourself
We put everything we learned into a skill you can drop into your own codebase — the Coolhand skill — and we keep updating it as we learn more. Coolhand Labs is the "COO for your agent teams": it instruments your LLM calls, captures what end users actually think of AI outputs, and ships fixes as pull requests. This talk is that same idea pointed the other way — closing the loop with the agents themselves. Users have notes; agents have notes.
More on the transition from IC engineering to managing a team of AI agents at The Everything Engineer.