Skip to the content.

Description: An innovative Android application enabling “hands-free” device automation. It captures screenshots and transmits them to server-based Vision-Language AI agents, which then determine and execute the next UI action (tap, swipe, type).

Key Features: Real-time instruction processing, automated testing, task automation, novel device interaction.

Tech Stack: Android, Vision-Language Models (VLMs), Server-side AI.

Demo: View MP4 on Google Drive

Android Remote Control with VLM AI Agents

Back to top