From AI Research to Production: Optimizing Multimodal Retrieval, OCR, and System Design
A grounded technical breakdown of three multimodal AI systems I co-built in 2025 — KPTER (SOICT 2025), ZSE-Cap (Top-4 @ ACM MM 2025), and a fine-tuned Sino-Nom OCR detector — and the engineering lessons that survived past the leaderboard.
May 5, 2026