Rishabh Singh — AI Research Scientist

Building production AI systems at the intersection of voice, language, and inference.

I'm Rishabh — an AI Research Scientist at ViH Metaverse. I work across the stack of modern applied AI: voice agents, GPU inference, agentic pipelines, and the quiet plumbing that lets large models actually ship.

Now · 2025 →

AI Research Scientist
ViH Metaverse

Prev · 2024 — 2025

Java Developer
HSBC · Payments

Prev · 2023 — 2024

SDE Intern
Morgan Stanley

Based in

Bengaluru, IN
Remote-friendly

[ 01 / 05 ]

Infrastructure

Voice agent stack

  ┌──────────────────────┐        ┌─────────────────────────┐
  │   App Server :8004   │ ←──── │  GPU Server  :8005      │
  │   WebSocket · VAD    │  WS   │  Ultravox · Veena TTS  │
  │   session manager    │        │  FastAPI · vLLM         │
  └──────────────────────┘        └─────────────────────────┘

Dual-server voice agent: an App server handling VAD, WebSocket, and session state, paired with a GPU server running Ultravox and Veena TTS. Resolved sentence-level chunking for Veena drift, moved UltraVAD from CPU to GPU, and fixed audio-playback jitter with a single persistent AudioContext and scheduled buffer starts.

Ultravox Veena TTS FastAPI WebSocket vLLM H200 NVL

[ 02 / 05 ]

Inference

Self-hosted GLM-4.7-Flash

30B MoE served on an H200 NVL via vLLM nightly + transformers 5.x. Resolved FP8 KV cache repetition degeneration by moving to BF16, traced silent token consumption from thinking mode, and shelved MTP speculative decoding after poor acceptance. Final config: BF16, 65K context, tool-call parser, systemd daemon.

vLLM BF16 65K ctx systemd 30B MoE

[ 03 / 05 ]

Product · SaaS

Mailer — inbox intelligence

Multi-user SaaS for email analytics with Gmail + Outlook support. AI classification through a self-hosted GLM-4.7-Flash, daily PDF intelligence reports via WeasyPrint, and a React 18 + Vite frontend against a FastAPI backend. asyncpg / PostgreSQL for durability, Redis Streams for work queues.

React 18 Vite FastAPI asyncpg Redis Streams WeasyPrint

[ 04 / 05 ]

Tooling

Vaayu — meeting analysis

Python tool that ingests diarized transcript JSON, runs GLM-4.7-Flash over it, and emits branded PDF reports via ReportLab. Later iteration moved to custom Flowable subclasses, zebra-striped tables, and color-coded priority badges; an earlier 4-pass generation version was built for VIH Metaverse with brand colors intact.

Python GLM-4.7-Flash ReportLab PDF

[ 05 / 05 ]

Frontend · LMS

ViH Playground — voice LMS

React 18 + Vite LMS frontend for voice-based teacher agents. Alongside the UI I owned the audio-playback layer: hunted down a choppy-audio bug caused by per-chunk AudioContext creation and rewrote it around a single persistent context with precisely scheduled buffer starts for seamless streaming.

React 18 Vite Web Audio API WebSocket LMS

[ + ]

Earlier

ASR pipeline

faster-whisper transcription service for English + Hindi call-center audio on dual L40S GPUs — adaptive dual-pass pipeline, word-level speaker diarization, real-time WebSocket streaming endpoints.

faster-whisper L40S × 2 Hindi + EN

Building production AI systems at the intersection of voice, language, and inference.

I build the unsexy parts — the inference servers, the agent loops, the retry logic — so the magic on top actually works in production.

A short timeline of the places that have shaped how I build.

Recent systems I've designed, shipped, or rescued from a late-night bug.

Voice agent stack

Self-hosted GLM-4.7-Flash

Mailer — inbox intelligence

Vaayu — meeting analysis

ViH Playground — voice LMS

ASR pipeline

I teach Agentic AI in public — mostly on Instagram, where the feedback is fast and honest.

@algo.bites

If you're shipping AI, or thinking about it, I'd love to hear from you.