Case Study · Proof of Play
I built a Claude Code skill marketplace so every engineer could run sophisticated AI workflows, the full build-test-review-ship loop, live business reports, and balance simulations, without reinventing them or blowing their context window.
AI leverage was siloed. The most useful workflows, a rigorous build-and-review loop, reports that read live business data, a combat balance simulator, lived in one person's head or one person's prompt history. Everyone else either reinvented them, used a weaker version, or pasted so much context into a single session that the model lost the thread. The result was inconsistent quality and a ceiling on how much the team could actually get out of AI.
The fix could not be “one giant mega-prompt.” It had to be modular: a shared library of focused skills that any engineer could install on demand, run the same way every time, and compose, without each one carrying the weight of all the others.
A Claude Code plugin marketplace for the company: a curated set of agentic skills, organized into suites, that an engineer installs from a single source. You pull in only what your task needs, so your context window stays clean and the workflow runs identically for everyone. Four suites cover the work:
The everyday engineering loop as one command. /dev runs the full arc, clarify the task, plan it, critique the plan, implement, test, review, open the PR, and triage feedback, with focused skills underneath it: review-local, review-pr, review-triage, scrutinize, adversary, draft-pr-ticket, sync-base, fix-line. One engineer plus this loop ships like a small team.
Reports that used to be manual analyst work, now one command each, reading straight from the systems of record: crash reports (Sentry + Amplitude), desync reports, daily summaries, push-notification performance (Braze + Amplitude), app-store rating reports (stores + Sentry + Linear), paid-marketing and growth reports, guild-chat analysis, and API-latency, with an allreports orchestrator that runs the whole briefing.
Designer-facing skills built on the simulation work: balancelab (a what-if combat simulator), balancelab-benchmark (replays real player runs against new content), and unlocks documentation and auditing. Designers ask balance questions without pulling an engineer.
Audience-reframing skills like ceoify that turn dense engineering output into the brief a given reader actually needs.
The architecture is the point. A single marketplace holds the skills; each engineer installs only the suite or skill they need; the heavy context lives inside the skill, not in the person's session. New capability ships once and the whole team has it.
flowchart TB
subgraph MKT["Shared skill marketplace (built by Ben)"]
direction LR
DEV["Dev loop
/dev · review · triage · PR"]:::a
REP["Operational reports
crash · ratings · growth · paid"]:::b
GD["Game-design tooling
BalanceLab · benchmark · unlocks"]:::c
CM["Comms
audience reframing"]:::d
end
MKT ==>|"install only what you need"| ENG["Every engineer
clean context window"]:::e
ENG --> OUT["Faster loops · fewer regressions
consistent quality · shared AI leverage"]:::out
classDef a fill:#dbeafe,stroke:#2563eb,color:#0b3a8f;
classDef b fill:#fef9c3,stroke:#ca8a04,color:#713f12;
classDef c fill:#dcfce7,stroke:#16a34a,color:#14532d;
classDef d fill:#fae8ff,stroke:#a21caf,color:#701a75;
classDef e fill:#eef2ff,stroke:#6366f1,color:#1e1b4b;
classDef out fill:#e0f2fe,stroke:#0284c7,color:#06425e;
style MKT fill:#f0f9ff,stroke:#2563eb,stroke-width:2px;
flowchart LR C[Clarify]:::s --> P[Plan]:::s --> CR[Critic]:::g --> I[Implement]:::s --> T[Test]:::s --> R[Review]:::g --> PR[Open PR]:::s --> TR[Triage feedback]:::good classDef s fill:#eef2ff,stroke:#6366f1,color:#1e1b4b; classDef g fill:#fef9c3,stroke:#ca8a04,color:#713f12; classDef good fill:#dcfce7,stroke:#16a34a,color:#14532d;
/dev loop: a single command walks a task from clarify to triaged PR, with critic and review gates built in.The artifact is a skill marketplace; the capability is bigger. Most organizations adopting AI end up with a few power users and everyone else left behind, because the good workflows never get packaged and shared. What I built is the opposite: a shared AI operating model where leverage is captured once, distributed to everyone, and improved centrally. Standing that up, deciding what to package, designing the install-on-demand model, and getting a team to actually run on it, is a repeatable capability, not a one-off script. It is exactly the kind of system I would set up for a portfolio company that wants its whole team operating at the AI frontier, not just its loudest engineer.
/dev) for the full clarify-to-PR loop