case-study

Multimodal Video Search Platform

Video search case study combining keyframes, ASR/OCR, object and face signals, visual embeddings, transcript embeddings, and hybrid retrieval.

Overview

Multimodal Video Search Platform is a case study for search across video and rich media. The system normalizes uploads, extracts keyframes, runs transcript and OCR processing, maintains visual and text embeddings, writes dense and sparse indexes, and serves ranked results through calibrated hybrid retrieval. The public entry focuses on architecture, agent responsibilities, benchmark posture, and recovery paths using sanitized architecture notes.

What It Covers

Combines keyframe extraction, ASR/OCR, visual embeddings, transcript embeddings, object signals, and face signals
Uses dense vector retrieval and sparse search together instead of relying on a single modality
Adds quality-agent style regression checks for hybrid retrieval, ASR/OCR coverage, and recovery workflows
Uses sanitized architecture diagrams, metrics posture, and recovery notes for public review

Stack And Topics

Python
FastAPI
Qdrant
Postgres
CLIP
OCR
ASR
Hybrid Search
Celery

Public Signals

Signal lanes: 5 keyframes, ASR, OCR, objects, faces
Index types: 2 dense vector and sparse retrieval
Agent roles: 5 ingestion, embedding, retrieval, quality, recovery
Metric posture: sample benchmark regression signal, not production accuracy claim

Overview

What It Covers

Stack And Topics

Public Signals

References