Skip to the content.

whoami

I’m a deep learning and computer vision enthusiast who loves building things that just might make life easier—or at least more interesting. If you spot me squinting at a screen, I’m probably wrangling neural networks or figuring out how to make an AI agent tap around an Android screen all by itself. Here’s a quick peek at my journey and some of the projects I’ve built along the way.


In a Nutshell


Toolbelt & Tech Playground


Some Projects & Creations

  1. Android Remote Control with VLM AI Agents
  2. Control VLM-LLM Agent Silently With Your Breath
  3. Create, Chat & AR Experience with AI-Character (Text2Room)
  4. Label and Inpaint Anything in a Room Interior
  5. Smart Drive for Smart City: Predict Optimal Speed
  6. Estimate Golf Ball Trajectory
  7. Pixel-Wise Segmentation of Spare Parts for 3D Printing
  8. Food Recognition App
  9. Python Library: AutoToloka
  10. Python Library: shiftlab-ocr
  11. Face Antispoofing & Multi-Modal Vision-Language Models
  12. GitHub Repo Summarizer (Chrome Extension)
  13. ChatGPT Scrollbar (Chrome Extension)

1. Android Remote Control with VLM AI Agents

“Hands-free” Android automation? Yes, please.
A custom Android app that captures screenshots and sends them to vision-language AI agents which determine the next UI action—tap, swipe, or type.

Demo Link: View MP4 in Google Drive
Android Remote Control with VLM AI Agents


2. Control VLM-LLM Agent Silently With Your Breath

Start or stop the neural network with your breath. “Start listening” might be 2–3 short exhalations, while a smooth exhalation says “stop.” It’s all about recognizing breathing patterns, not your voice. After calibration (reading text aloud vs. silently), it learns to detect words from the sounds of breathing or even sniffles.

Demo Link: View GIF in Google Drive
Breathing Control Demo


3. Create, Chat & AR Experience with AI-Character (Text2Room)

Image & Video Generation • Inpainting • TryOn • Reasoning & More
Spin up an AI “character,” style them, dress them up, chat via Telegram, or even place them in your living room! Ideal for marketing campaigns, creative collaborations, or simply exploring next-gen generative AI.

Create, Chat & AR Experience with AI-Character


4. Label and Inpaint Anything in a Room Interior

Label objects in a photo and then seamlessly inpaint them—complete with realistic shadows and lighting for interior makeovers.

Inpainting Demos (Google Drive):

Marble Floor with Reflections Download sample

Original Another Example
Interior Example 1 Interior Example 2

5. Smart Drive for Smart City: Predict Optimal Speed to the Nearest Traffic Light or Jam

Find the optimal speed to the nearest traffic light. For example, while driving you may wonder whether to speed up a bit or slow down—the program predicts the ideal speed for your journey, calculating optimal speeds for each traffic light or even nearby traffic jams.

Smart Drive Prediction


6. Estimate Golf Ball Trajectory

Analyze your golf swing or develop sports analytics solutions—this AI estimates the golf ball trajectory and more.

Estimate Golf Ball Trajectory


7. Pixel-Wise Segmentation of Spare Parts for 3D Printing

Precisely identify which parts need 3D printing or rework.

Local files:

Example 1 Example 2
Key Segmentation 1 Key Segmentation 2

8. Food Recognition App

When you want your phone to know what’s for dinner
An AI app that identifies food items (packaged or fresh) and performs OCR on labels.

Demo Link: View GIF in Google Drive


9. Python Library: AutoToloka

Speedy Dataset Prep & Crowdsourcing
A Python library to help set up and validate datasets using interactive segmentation and multi-modal networks under the hood.

AutoToloka on PyPI


10. Python Library: shiftlab-ocr

A library for handwriting text segmentation and character recognition.

shiftlab-ocr on PyPI


11. Face Antispoofing & Multi-Modal Vision-Language Models

Experimenting with CLIP and other multi-modal setups, this project tackles face authentication spoofing by bridging text-image embeddings with specialized neural networks.

YouTube Presentation


12. GitHub Repo Summarizer (Chrome Extension)

Speed-read Your Repositories
Fetches and summarizes the code structure of GitHub repositories using your locally stored GitHub personal access token—no servers involved.

GitHub Repo Summarizer


13. ChatGPT Scrollbar (Chrome Extension)

Tired of endlessly scrolling through ChatGPT’s conversation feed?
This nifty extension adds a navigable scrollbar with clickable dashes for quick jumps.

Demo Link: View GIF in Google Drive
ChatGPT Scrollbar Demo

ChatGPT Scrollbar on Chrome Web Store


More Highlights

See more on LinkedIn


More Projects

GitHub: github.com/zack-dev-cm
GitHub: github.com/ZackPashkin

If you’re looking for:

Then let’s talk!

Email: kaisenaiko@gmail.com

Thanks for stopping by. Let’s see where AI can take us next!