Description: An innovative Android application enabling “hands-free” device automation. It captures screenshots and transmits them to server-based Vision-Language AI agents, which then determine and execute the next UI action (tap, swipe, type).
Key Features: Real-time instruction processing, automated testing, task automation, novel device interaction.
Tech Stack: Android, Vision-Language Models (VLMs), Server-side AI.
Demo: View MP4 on Google Drive