benyoung.ai
← Back Download PDF

Case Study · Proof of Play

A shared operating system of AI skills my whole team plugged into.

I built a Claude Code skill marketplace so every engineer could run sophisticated AI workflows, the full build-test-review-ship loop, live business reports, and balance simulations, without reinventing them or blowing their context window.

Role: Special Projects Lead Internal platform 4 plugin suites, ~30 skills Adopted team-wide
A modular grid of glowing skill plugins connected out to every engineer's workstation
One shared marketplace of agentic skills; each engineer installs only the ones they need.

The problem

AI leverage was siloed. The most useful workflows, a rigorous build-and-review loop, reports that read live business data, a combat balance simulator, lived in one person's head or one person's prompt history. Everyone else either reinvented them, used a weaker version, or pasted so much context into a single session that the model lost the thread. The result was inconsistent quality and a ceiling on how much the team could actually get out of AI.

The fix could not be “one giant mega-prompt.” It had to be modular: a shared library of focused skills that any engineer could install on demand, run the same way every time, and compose, without each one carrying the weight of all the others.

What I built

A Claude Code plugin marketplace for the company: a curated set of agentic skills, organized into suites, that an engineer installs from a single source. You pull in only what your task needs, so your context window stays clean and the workflow runs identically for everyone. Four suites cover the work:

Dev loop

The everyday engineering loop as one command. /dev runs the full arc, clarify the task, plan it, critique the plan, implement, test, review, open the PR, and triage feedback, with focused skills underneath it: review-local, review-pr, review-triage, scrutinize, adversary, draft-pr-ticket, sync-base, fix-line. One engineer plus this loop ships like a small team.

Operational reports, wired to live data

Reports that used to be manual analyst work, now one command each, reading straight from the systems of record: crash reports (Sentry + Amplitude), desync reports, daily summaries, push-notification performance (Braze + Amplitude), app-store rating reports (stores + Sentry + Linear), paid-marketing and growth reports, guild-chat analysis, and API-latency, with an allreports orchestrator that runs the whole briefing.

Game-design tooling

Designer-facing skills built on the simulation work: balancelab (a what-if combat simulator), balancelab-benchmark (replays real player runs against new content), and unlocks documentation and auditing. Designers ask balance questions without pulling an engineer.

Comms

Audience-reframing skills like ceoify that turn dense engineering output into the brief a given reader actually needs.

How it works

The architecture is the point. A single marketplace holds the skills; each engineer installs only the suite or skill they need; the heavy context lives inside the skill, not in the person's session. New capability ships once and the whole team has it.

flowchart TB
  subgraph MKT["Shared skill marketplace (built by Ben)"]
    direction LR
    DEV["Dev loop
/dev · review · triage · PR"]:::a REP["Operational reports
crash · ratings · growth · paid"]:::b GD["Game-design tooling
BalanceLab · benchmark · unlocks"]:::c CM["Comms
audience reframing"]:::d end MKT ==>|"install only what you need"| ENG["Every engineer
clean context window"]:::e ENG --> OUT["Faster loops · fewer regressions
consistent quality · shared AI leverage"]:::out classDef a fill:#dbeafe,stroke:#2563eb,color:#0b3a8f; classDef b fill:#fef9c3,stroke:#ca8a04,color:#713f12; classDef c fill:#dcfce7,stroke:#16a34a,color:#14532d; classDef d fill:#fae8ff,stroke:#a21caf,color:#701a75; classDef e fill:#eef2ff,stroke:#6366f1,color:#1e1b4b; classDef out fill:#e0f2fe,stroke:#0284c7,color:#06425e; style MKT fill:#f0f9ff,stroke:#2563eb,stroke-width:2px;
One marketplace, four suites, installed on demand. Capability ships once; the whole team gets it, without anyone overloading a single session.
flowchart LR
  C[Clarify]:::s --> P[Plan]:::s --> CR[Critic]:::g --> I[Implement]:::s --> T[Test]:::s --> R[Review]:::g --> PR[Open PR]:::s --> TR[Triage feedback]:::good
  classDef s fill:#eef2ff,stroke:#6366f1,color:#1e1b4b;
  classDef g fill:#fef9c3,stroke:#ca8a04,color:#713f12;
  classDef good fill:#dcfce7,stroke:#16a34a,color:#14532d;
The /dev loop: a single command walks a task from clarify to triaged PR, with critic and review gates built in.

The value it added

Why this matters beyond one team

The artifact is a skill marketplace; the capability is bigger. Most organizations adopting AI end up with a few power users and everyone else left behind, because the good workflows never get packaged and shared. What I built is the opposite: a shared AI operating model where leverage is captured once, distributed to everyone, and improved centrally. Standing that up, deciding what to package, designing the install-on-demand model, and getting a team to actually run on it, is a repeatable capability, not a one-off script. It is exactly the kind of system I would set up for a portfolio company that wants its whole team operating at the AI frontier, not just its loudest engineer.

By the numbers

~30
Agentic skills across four suites
4
Suites: dev loop, reports, game-design, comms
1
Command (/dev) for the full clarify-to-PR loop
7+
Live data sources wired into reports (Sentry, Amplitude, Braze, Linear, app stores, and more)
Team-wide
Adopted across the engineering org, not a personal tool
Install
Only what you need, so context windows stay clean
Built at Proof of Play and run by the engineering team. The same pattern, capture AI leverage once and distribute it to everyone, is one of the highest-leverage things I can stand up for an organization. Happy to walk through the architecture in a conversation.