Portfolio case study
Android Remote Control with VLM AI Agents
Hands-free Android automation via server-side VLM agents deciding the next tap/swipe/type.
Overview
Android app streams screenshots to Vision-Language agents that decide and execute actions. Built for real-time instruction following, automated testing, and accessibility/ops automation.
What It Covers
- Real-time instruction processing
- Automated testing & task automation
- Novel device interaction
Stack And Topics
- Android
- Vision-Language Models
- Server-side AI
Public Signals
- Components: 2 Android client + Python server
- Supported actions: 7 tap, scroll, text, home, back, overview, screenshot
- API endpoints: 8+ healthz, metrics, devices, actions, screenshots, debug