Kaushik Bokka | Advicehub

About my work

I build systems where models meet meaning. Over the past few years, I’ve worked across open source, devtools, consumer, and multimodal AI, from designing frameworks that help developers integrate Visual Language Models into production to building narrative-aware agents that understand long videos and movies.

My work sits at the intersection of multimodal AI (VLMs), LLM agents, and real-world product systems — not just models, but systems that actually ship and scale.

I’ve worked across open source, startups, and research systems:

Senior Research Engineer & maintainer for PyTorch Lightning (~5M+ monthly downloads), where I led distributed training features like TPU Pod support
Built evaluation and retrieval systems for long video understanding at Rumi Labs (a16z-backed), including multi-agent pipelines and multi-context RAG
Co-developed VLM Run’s vision MCP server and schema-driven extraction systems for structured data from images, PDFs, and video
Co-founded Sequels AI — an agentic AI media OS generating contextual content in real-time, later licensed to an a16z-backed startup
What I can help you with:

Multimodal AI systems (Vision + Language)

Extract structured data from images, documents, or video
Build pipelines that actually work in production (not just demos)

LLM agents & workflows

LangGraph / agent orchestration
Tool-using agents, evaluation, and reliability

AI system architecture

Designing scalable backends (FastAPI, queues, async pipelines)
RAG systems with real-world constraints (latency, retrieval quality, cost)

Evaluation & reliability

Build eval systems for LLMs, agents, or multimodal models
Define metrics that actually reflect product performance
How I typically work:
Turn vague ideas into clear system designs
Identify what’s actually hard vs hype
Help you ship faster with fewer iterations
If you’re building:
AI products using real-world data
Multimodal systems (vision, video, documents)
Agent-based workflows
Developer tools for AI

—I can help you go from idea → architecture → working system.

Highlights

Roles held

Experience

AI Consultant

Current

Reiko2024-12-01 – 2026-04-11

Built evaluation, retrieval, and RL systems for multimodal AI, with a focus on long video understanding, agentic pipelines, and computer use agents.

Rumi Labs (a16z-backed)

Built RUMI-EVALS, an automated evaluation framework for long video understanding that generates high-quality MCQs from scene annotations to test narrative, temporal, and character reasoning

Developed a LangGraph-based multi-agent pipeline for question synthesis, tournament-based scene ranking, and multi-context evaluation across modalities

Architected a RAG system with dual-context retrieval (temporal + semantic), query intent classification, and multi-vector embeddings (narrative/sensory/spatial/emotional)

AGI Inc (Menlo-backed)

Designed evaluation + RL systems for computer-use agents, focusing on trajectory-based evaluation, reward modeling, and measuring task success, efficiency, and robustness

VLM Run (SPC-backed)

Co-developed the MCP server for vision, enabling LLM agents to process and extract structured data from visual inputs

Built a scalable video transcription backend using queue-based pipelines for high-throughput processing

Led development of open-source tooling (SDK, hub, cookbooks) to improve developer adoption

Implemented schema-driven extraction (Pydantic + GraphQL-style filtering) with autocasting and visualization

Built agentic systems for extracting structured data from complex documents (multi-layout tables)

Guardrails AI (Zetta Ventures-backed)

Built integrations across LlamaIndex, LangChain, NVIDIA NeMo Guardrails, and Portkey

Improved traceability via LangSmith + RunnableConfig

Led open-source growth strategy and contributor experience improvements

Co-Founder & Head of AI

Sequels AI2023-05-11 – 2024-11-11

Licensed the tech to an a16z-backed startup.

Agentic AI Television OS revolutionizing viewing experiences through contextual content generation and personalized entertainment ecosystems. Product Demo. Internal Demo.

Product Vision & Leadership

Co-founded an AI-driven entertainment platform that generates contextually relevant content, adapting to users' real-time viewing and emotional states, integrating social elements, and universal remote control capabilities

Built and led a cross-functional team of 7+ engineers recruited from Meta, Twitch, IBM, OKX, ShareChat, etc.

Core AI & ML Systems

Developed Perceiver Engine agentic research framework generating contextual "bullets" for media scenes—for eg, achieved 140+ diverse content blocks for complex scenes like "Einstein meets Oppenheimer's first encounter."

Created Subplot Story Tree algorithm to decompose complex narratives into hierarchical subplots, enabling enhanced story comprehension and detailed character analysis

Engineered content scene mapping infrastructure using fine-tuned transformer models to associate generated content with precise timestamps, delivering contextually relevant information at exact viewing moments

Implemented the LLMOps platform, ensuring efficient, reliable, and scalable LLM utilization across production systems

Platform & Infrastructure

Built Cosmo Observability Platform as a consumer app proxy, providing comprehensive analytics on movie titles and content performance to drive ML system optimization

Deployed Workflow Orchestration Platform using self-hosted Prefect for automated data retrieval, ingestion, and research pipeline management

Developed a comprehensive content management platform with FastAPI backend for optimized CRUD operations and a Python client interface for seamless research pipeline integration

Product Features & Applications

Created "Catch Me Up!" interactive feature delivering personalized recaps, character insights, and thematic breakdowns for enhanced viewer engagement

Spearheaded the Sports Match Post Generator System, automating engaging content creation for pre-game, in-game, and post-game coverage across international sports events

Senior Research Engineer & Open Source Maintainer

Lightning AI2021-01-11 – 2023-01-11

Lightning AI is a leading MLOps startup that makes it easy to build, train, fine-tune, and deploy models.

PyTorch Lightning is a lightweight PyTorch framework for high-performance AI research (~31k stars, ~5m monthly downloads).

Major collaborators and led features with: PyTorch team at Meta, TPU XLA team at Google, AI Economist team at Salesforce, Habana team at Intel, Sagemaker team at Amazon, and a few more.

Pioneered TPU Pod training feature with Lightning, making it the first PyTorch framework to offer this capability.

Spearheaded initial development of the Generative AI Muse App

Core contributor to the new stable accelerator and strategy API.

Led the stable development for training on TPU Accelerator.

Drove integration of Habana Accelerator with Lightning.

Led the WarpDrive Lightning integration with Salesforce.

Presented at major technical conferences including ODSC Boston, ODSC Europe, ODSC West, Geekle Data Science Summit, FOSSUnited and a few more.

Led the integration of Rich Progress Bar for PyTorch Lightning.

Co-led the initial Lightning Documentation Revamp.

Established procedural pipeline for app development and submission with the product team.

Developed multiple Lightning apps, including Video Search and HackerNews App.

Worked on integrating Sagemaker Distributed Training Strategy.

Added support for the Object Detection Task in Lightning Flash.

Consistently contributed to Lightning projects through feature development, bug fixes, code reviews, releases, refactors, and technical blog posts.