I built Langusta because I could read Spanish but froze whenever I had to speak it, and booking human tutors meant scheduling + a fair bit of embarrassment.
It's a PWA (no install needed) with a real-time voice pipeline: streaming speech-to-text -> an LLM tutor that keeps an unscripted conversation at your level and corrects you -> low-latency text-to-speech back. The part I care most about isn't the chat — it's the loop around it: every word you stumble on is auto-captured, then resurfaced later with a Leitner spaced-repetition schedule and silent flashcards, so talking actually turns into retention. It also persists transcripts and seeds each new session with a compact recap of recent ones (a few summaries + the tail of the last conversation, ~1k tokens, prompt-cached), so it continues across sessions instead of resetting — happy to go into the recall/token-budget tradeoffs.
Stack: Pipecat for the voice pipeline (AssemblyAI STT, an Anthropic model for the tutor with prompt caching, ElevenLabs TTS), FastAPI + Postgres, React PWA, on a single VM because WebRTC media needs UDP. Happy to go deep on the latency tradeoffs and how the SRS interacts with the conversation.
There's a 10-minute trial with no signup wall to start talking. I'd love feedback on the conversation quality and the corrections - and on whether the spaced-repetition loop feels useful or gimmicky.
I built Langusta because I could read Spanish but froze whenever I had to speak it, and booking human tutors meant scheduling + a fair bit of embarrassment.
It's a PWA (no install needed) with a real-time voice pipeline: streaming speech-to-text -> an LLM tutor that keeps an unscripted conversation at your level and corrects you -> low-latency text-to-speech back. The part I care most about isn't the chat — it's the loop around it: every word you stumble on is auto-captured, then resurfaced later with a Leitner spaced-repetition schedule and silent flashcards, so talking actually turns into retention. It also persists transcripts and seeds each new session with a compact recap of recent ones (a few summaries + the tail of the last conversation, ~1k tokens, prompt-cached), so it continues across sessions instead of resetting — happy to go into the recall/token-budget tradeoffs.
Stack: Pipecat for the voice pipeline (AssemblyAI STT, an Anthropic model for the tutor with prompt caching, ElevenLabs TTS), FastAPI + Postgres, React PWA, on a single VM because WebRTC media needs UDP. Happy to go deep on the latency tradeoffs and how the SRS interacts with the conversation.
There's a 10-minute trial with no signup wall to start talking. I'd love feedback on the conversation quality and the corrections - and on whether the spaced-repetition loop feels useful or gimmicky.
Sounds interesting. FYI I had to vouch this comment because it got killed.
thank you!