Pratham Savaliya

ML Engineer · Bengaluru

I ship things that run in production, break them, fix them, and learn faster that way.

Shipped

Helpish Solo-built · Production · 1,000+ min processed

Meeting intelligence. Layered async pipeline — audio capture → transcription → analysis → persistence, each stage independently swappable. Browser recorder with mic + tab audio mixing, live transcription via Deepgram, speaker diarisation, and voice profile embeddings. Live sales coaching that tracks pitch coverage and surfaces client signals in real time. Cross-session memory with contradiction detection, visual timeline, and shared collaborator access.

Open Source

Chronicle Self-hostable wearable AI backend · 60+ ★ · Active contributor
  • Added Google Drive, Dropbox, and WebSocket as first-class audio sources; co-designed plugin architecture enabling third-party apps to build on the voice AI stack
  • Built speaker diarisation + identification benchmark suite — measures accuracy, latency, and speaker confusion across diverse recordings
  • Implementing MCP connector support enabling Chronicle plugins to interact with external services via natural language tool calls

From Scratch

I implement papers to actually understand them. Each has unit tests validating against reference implementations.

Stable Diffusion Full architecture in PyTorch
Transformers Attention, positional encoding, encoder-decoder
LoRA Low-rank adaptation from paper to training
BERT Masked LM + NSP pretraining
Swin Transformer Shifted window attention
CUDA / GPU Programming Memory hierarchy, kernels, parallel patterns
SonicSpeech Deep learning architectures for speech + audio
Backprop Scalar autograd from scratch
Dillusion SOTA diffusion paper implementations

Also: cross-lingual embedding alignment (English ↔ Hindi/Gujarati) via Procrustes alignment on FastText — word translation and multilingual similarity search in a shared vector space.

Stack

ML & Training
PyTorch LoRA SFT fine-tuning RAG GraphRAG agentic systems eval pipelines
Speech & Audio
Deepgram Whisper VAD speaker diarisation ASR voice-profile embeddings
Infrastructure
FastAPI Redis Docker AWS queue-based async
Low-level
CUDA C/C++ parallel programming

More

Hackathon winner ×10 — OMI Bangalore, Indo-Japan @ IIT Gandhinagar, Hack This Fall, and others
IEEE: APBTMS — computer vision for real-time bus passenger tracking and overcrowding detection, 2024
Cohere AI Gujarati LLM — top contributor to dataset creation and curation for a low-resource Indian language model
Currently exploring: concurrency-chaos — threads, async, queues, Redis, event-driven systems
Reading: Goodreads  ·  Hack demos: YouTube