[ 01 / 05 ]
Infrastructure
Voice agent stack
┌──────────────────────┐ ┌─────────────────────────┐
│ App Server :8004 │ ←──── │ GPU Server :8005 │
│ WebSocket · VAD │ WS │ Ultravox · Veena TTS │
│ session manager │ │ FastAPI · vLLM │
└──────────────────────┘ └─────────────────────────┘
Dual-server voice agent: an App server handling VAD, WebSocket, and session state, paired with a GPU server running Ultravox and Veena TTS. Resolved sentence-level chunking for Veena drift, moved UltraVAD from CPU to GPU, and fixed audio-playback jitter with a single persistent AudioContext and scheduled buffer starts.
Self-hosted GLM-4.7-Flash
30B MoE served on an H200 NVL via vLLM nightly + transformers 5.x. Resolved FP8 KV cache repetition degeneration by moving to BF16, traced silent token consumption from thinking mode, and shelved MTP speculative decoding after poor acceptance. Final config: BF16, 65K context, tool-call parser, systemd daemon.
[ 03 / 05 ]
Product · SaaS
Mailer — inbox intelligence
Multi-user SaaS for email analytics with Gmail + Outlook support. AI classification through a self-hosted GLM-4.7-Flash, daily PDF intelligence reports via WeasyPrint, and a React 18 + Vite frontend against a FastAPI backend. asyncpg / PostgreSQL for durability, Redis Streams for work queues.
Vaayu — meeting analysis
Python tool that ingests diarized transcript JSON, runs GLM-4.7-Flash over it, and emits branded PDF reports via ReportLab. Later iteration moved to custom Flowable subclasses, zebra-striped tables, and color-coded priority badges; an earlier 4-pass generation version was built for VIH Metaverse with brand colors intact.
[ 05 / 05 ]
Frontend · LMS
ViH Playground — voice LMS
React 18 + Vite LMS frontend for voice-based teacher agents. Alongside the UI I owned the audio-playback layer: hunted down a choppy-audio bug caused by per-chunk AudioContext creation and rewrote it around a single persistent context with precisely scheduled buffer starts for seamless streaming.
ASR pipeline
faster-whisper transcription service for English + Hindi call-center audio on dual L40S GPUs — adaptive dual-pass pipeline, word-level speaker diarization, real-time WebSocket streaming endpoints.