Case Study · Personal product
Inner Voice is a multimodal AI companion I prototyped: it interviews you about your goals, clones your voice while you talk, and then reflects your own aspirations and peak moments back to you, in your own voice.
There is real power in hearing your own voice. I had been recording affirmations and playing them back to myself, and it works in a way another narrator never quite does. Inner Voice turns that into a product: an AI coach that speaks to you in your own voice, reminding you of the person you are trying to be and the wins you have already had.
It grows out of earlier wellness work of mine on countering negative self-talk. The wager is simple: a coach that sounds like you, recalling your peak moments back to you, is more believable and more motivating than any generic voice.
The non-obvious part is the onboarding. Instead of asking you to read scripted lines into a microphone, Inner Voice interviews you about your goals and aspirations, and captures and refines your voice while you answer. The voice clone is a byproduct of a conversation you would want to have anyway. From there it runs a real-time multimodal loop:
flowchart LR U["You speak
goals + key questions"]:::a --> W["Whisper
speech to text"]:::b W --> CL["Claude
conversation + coaching"]:::c U -. "voice captured
during the interview" .-> VC["Voice clone TTS
Cartesia / ElevenLabs"]:::d CL --> VC VC --> OUT["It speaks back
in your own voice"]:::good classDef a fill:#eef2ff,stroke:#6366f1,color:#1e1b4b; classDef b fill:#fef9c3,stroke:#ca8a04,color:#713f12; classDef c fill:#fae8ff,stroke:#a21caf,color:#701a75; classDef d fill:#dcfce7,stroke:#16a34a,color:#14532d; classDef good fill:#dbeafe,stroke:#2563eb,color:#0b3a8f;
End to end it is a React Native and Expo client over a Python FastAPI backend, wiring Claude for the conversation, Whisper for transcription, and voice-cloning TTS (Cartesia and ElevenLabs) for the reply, with real audio processing (noise reduction, segmentation) in between.
This is a working prototype, not a shipped product. It spins up and runs the full loop, and I paused it when work got busy. I am including it because the interesting part is not polish, it is that the hard, multimodal plumbing works: a real-time pipeline across three AI systems, plus an original onboarding idea that turns voice capture into something a user actually wants to do. It is proof I can take a non-obvious product concept and stand up the full multimodal stack behind it.