Pratham Savaliya

ML Engineer · Bengaluru

I ship things that run in production, break them, fix them, and learn faster that way.

Shipped

Helpish Solo-built · Production · 1,000+ min processed

Meeting intelligence. Layered async pipeline — audio capture → transcription → analysis → persistence, each stage independently swappable. Browser recorder with mic + tab audio mixing, live transcription via Deepgram, speaker diarisation, and voice profile embeddings. Live sales coaching that tracks pitch coverage and surfaces client signals in real time. Cross-session memory with contradiction detection, visual timeline, and shared collaborator access.

Open Source

Chronicle Self-hostable wearable AI backend · 60+ ★ · Active contributor

Added Google Drive, Dropbox, and WebSocket as first-class audio sources; co-designed plugin architecture enabling third-party apps to build on the voice AI stack
Built speaker diarisation + identification benchmark suite — measures accuracy, latency, and speaker confusion across diverse recordings
Implementing MCP connector support enabling Chronicle plugins to interact with external services via natural language tool calls

From Scratch

I implement papers to actually understand them. Each has unit tests validating against reference implementations.

Stable Diffusion Full architecture in PyTorch

Transformers Attention, positional encoding, encoder-decoder

LoRA Low-rank adaptation from paper to training

BERT Masked LM + NSP pretraining

Swin Transformer Shifted window attention

CUDA / GPU Programming Memory hierarchy, kernels, parallel patterns

SonicSpeech Deep learning architectures for speech + audio

Backprop Scalar autograd from scratch

Dillusion SOTA diffusion paper implementations

Also: cross-lingual embedding alignment (English ↔ Hindi/Gujarati) via Procrustes alignment on FastText — word translation and multilingual similarity search in a shared vector space.

Stack

ML & Training

PyTorch LoRA SFT fine-tuning RAG GraphRAG agentic systems eval pipelines

Speech & Audio

Deepgram Whisper VAD speaker diarisation ASR voice-profile embeddings

Infrastructure

FastAPI Redis Docker AWS queue-based async

Low-level

CUDA C/C++ parallel programming

Hackathon winner ×10 — OMI Bangalore, Indo-Japan @ IIT Gandhinagar, Hack This Fall, and others

IEEE: APBTMS — computer vision for real-time bus passenger tracking and overcrowding detection, 2024

Cohere AI Gujarati LLM — top contributor to dataset creation and curation for a low-resource Indian language model

Currently exploring: concurrency-chaos — threads, async, queues, Redis, event-driven systems

Reading: Goodreads · Hack demos: YouTube

Shipped

Open Source

From Scratch

Stack

More