R.S // 001 Bengaluru · IN
2026 — portfolio

Building production AI systems at the intersection of voice, language, and inference.

I'm Rishabh — an AI Research Scientist at ViH Metaverse. I work across the stack of modern applied AI: voice agents, GPU inference, agentic pipelines, and the quiet plumbing that lets large models actually ship.

Now · 2025 →
AI Research Scientist
ViH Metaverse
Prev · 2024 — 2025
Java Developer
HSBC · Payments
Prev · 2023 — 2024
SDE Intern
Morgan Stanley
Based in
Bengaluru, IN
Remote-friendly
01 — About

I build the unsexy parts — the inference servers, the agent loops, the retry logic — so the magic on top actually works in production.

Brief

My work lives somewhere between research engineering and product — close enough to GPUs to debug a KV-cache regression, close enough to users to care about a slow page load.

Before ViH Metaverse, I was a Java Developer in the Payments department at HSBC, working on financial messaging systems in Java / Spring Boot / Kubernetes, and interned at Morgan Stanley on observability and secure database practices. Somewhere along the way I won the Smart India Hackathon for a tourism platform built for the Ministry of Culture, and have been mentoring UI/UX and AR/VR since.

These days I spend most of my time deep in vLLM configs, voice latency budgets, and the strange craft of making a 30B MoE behave itself under real traffic.

Focus areas
  • 01Voice AI Infrastructure Ultravox, Veena TTS, Indic TTS, VAD pipelines, WebSocket streaming.GPU
  • 02Model Deployment & Serving vLLM, FP8 / BF16 tradeoffs, speculative decoding, systemd daemons.Ops
  • 03Agentic Pipelines GLM-powered classification, routing, drafting, reporting.LLM
  • 04ASR & Diarization faster-whisper, adaptive dual-pass, word-level speaker attribution.Speech
  • 05Full-stack Product React + Vite frontends, FastAPI backends, Postgres + Redis Streams.Web
02 — Experience

A short timeline of the places that have shaped how I build.

2025  —  Present // now
AI Research ScientistViH Metaverse · India
Leading voice AI and agentic infra. Deployed GLM-4.7-Flash (30B MoE) on H200 NVL via vLLM; built dual-server voice agent architecture (Ultravox + Veena TTS); shipped Mailer — a multi-user SaaS for email intelligence — end-to-end.
vLLM H200 NVL Ultravox FastAPI React 18
2024  —  2025
Java DeveloperHSBC · Payments Department
Built and optimized Java Spring Boot services for financial messaging and payments infrastructure. Owned CI/CD automation, Kubernetes deployments, and production monitoring via Splunk. Collaborated with QA on release hardening.
Java Spring Boot Kubernetes Splunk
2023  —  2024
SDE InternMorgan Stanley
Worked on observability and system security. Built production log filters in Splunk, implemented high-latency alerting, and helped enforce secure database practices across services.
Observability Splunk Security
2019  —  2023
B.Tech · Computer Science & EngineeringLakshmi Narain College of Technology, Bhopal
Foundations in Data Structures, Algorithms, Operating Systems, and Database Systems. Won the Smart India Hackathon building a tourism platform for the Ministry of Culture; mentored UI/UX and AR/VR at Developer Student Clubs.
SIH Winner DSC Mentor
03 — Selected Work

Recent systems I've designed, shipped, or rescued from a late-night bug.

[ 01 / 05 ]
Infrastructure

Voice agent stack

  ┌──────────────────────┐        ┌─────────────────────────┐
     App Server :8004   │ ←──── │  GPU Server  :8005      
     WebSocket · VAD    │  WS   │  Ultravox · Veena TTS  
     session manager    │        │  FastAPI · vLLM         
  └──────────────────────┘        └─────────────────────────┘

Dual-server voice agent: an App server handling VAD, WebSocket, and session state, paired with a GPU server running Ultravox and Veena TTS. Resolved sentence-level chunking for Veena drift, moved UltraVAD from CPU to GPU, and fixed audio-playback jitter with a single persistent AudioContext and scheduled buffer starts.

Ultravox Veena TTS FastAPI WebSocket vLLM H200 NVL
[ 02 / 05 ]
Inference

Self-hosted GLM-4.7-Flash

30B MoE served on an H200 NVL via vLLM nightly + transformers 5.x. Resolved FP8 KV cache repetition degeneration by moving to BF16, traced silent token consumption from thinking mode, and shelved MTP speculative decoding after poor acceptance. Final config: BF16, 65K context, tool-call parser, systemd daemon.

vLLM BF16 65K ctx systemd 30B MoE
[ 03 / 05 ]
Product · SaaS

Mailer — inbox intelligence

Multi-user SaaS for email analytics with Gmail + Outlook support. AI classification through a self-hosted GLM-4.7-Flash, daily PDF intelligence reports via WeasyPrint, and a React 18 + Vite frontend against a FastAPI backend. asyncpg / PostgreSQL for durability, Redis Streams for work queues.

React 18 Vite FastAPI asyncpg Redis Streams WeasyPrint
[ 04 / 05 ]
Tooling

Vaayu — meeting analysis

Python tool that ingests diarized transcript JSON, runs GLM-4.7-Flash over it, and emits branded PDF reports via ReportLab. Later iteration moved to custom Flowable subclasses, zebra-striped tables, and color-coded priority badges; an earlier 4-pass generation version was built for VIH Metaverse with brand colors intact.

Python GLM-4.7-Flash ReportLab PDF
[ 05 / 05 ]
Frontend · LMS

ViH Playground — voice LMS

React 18 + Vite LMS frontend for voice-based teacher agents. Alongside the UI I owned the audio-playback layer: hunted down a choppy-audio bug caused by per-chunk AudioContext creation and rewrote it around a single persistent context with precisely scheduled buffer starts for seamless streaming.

React 18 Vite Web Audio API WebSocket LMS
[ + ]
Earlier

ASR pipeline

faster-whisper transcription service for English + Hindi call-center audio on dual L40S GPUs — adaptive dual-pass pipeline, word-level speaker diarization, real-time WebSocket streaming endpoints.

faster-whisper L40S × 2 Hindi + EN
04 — Writing

I teach Agentic AI in public — mostly on Instagram, where the feedback is fast and honest.

Handle

@algo.bites

A steady stream of short-form posts breaking down Gen AI and Agentic AI concepts — the patterns, the tradeoffs, and the stuff the papers don't quite say out loud.

Follow on Instagram
Audience
Growing.
05 — Say hello

If you're shipping AI, or thinking about it, I'd love to hear from you.