Back to portfolio

Portfolio case study

Android Remote Control with VLM AI Agents

Hands-free Android automation via server-side VLM agents deciding the next tap/swipe/type.

Overview

Android app streams screenshots to Vision-Language agents that decide and execute actions. Built for real-time instruction following, automated testing, and accessibility/ops automation.

What It Covers

  • Real-time instruction processing
  • Automated testing & task automation
  • Novel device interaction

Stack And Topics

  • Android
  • Vision-Language Models
  • Server-side AI

Public Signals

  • Components: 2 Android client + Python server
  • Supported actions: 7 tap, scroll, text, home, back, overview, screenshot
  • API endpoints: 8+ healthz, metrics, devices, actions, screenshots, debug

References