whoami
I’m a deep learning and computer vision enthusiast who loves building things that just might make life easier—or at least more interesting. If you spot me squinting at a screen, I’m probably wrangling neural networks or figuring out how to make an AI agent tap around an Android screen all by itself. Here’s a quick peek at my journey and some of the projects I’ve built along the way.
In a Nutshell
- Senior Deep Learning Engineer with 5+ years of experience, focusing on everything from classification and segmentation to OCR and multi-modal transformer reasoning models.
- Upwork Top Rated Plus contractor, recognized in the top 1% of AI developers, with 100% Job Success.
- Mentor & Teacher: I’ve led corporate programs, taught undergraduates the fundamentals of computer vision, and discovered I really enjoy sharing knowledge.
Toolbelt & Tech Playground
- Frameworks/Libraries: PyTorch, TensorFlow, Keras, FastAI, OpenAI APIs, CLIP, Vision-Language foundation models.
- Languages: Primarily Python, with supporting roles from Dart (Flutter), Kotlin/Java, Swift, and C++.
- DevOps & Infra: GCP, AWS, Docker, Kubernetes, Cloud Build, Cloud Run, and a dash of ML Ops for good measure.
- Mobile & Embedded: TensorRT, TFLite, CoreML, ONNX, etc.
Some Projects & Creations
Quick Links to Projects
- Android Remote Control with VLM AI Agents
- Control VLM-LLM Agent Silently With Your Breath
- Create, Chat & AR Experience with AI-Character (Text2Room)
- Label and Inpaint Anything in a Room Interior
- Smart Drive for Smart City: Predict Optimal Speed
- Estimate Golf Ball Trajectory
- Pixel-Wise Segmentation of Spare Parts for 3D Printing
- Food Recognition App
- Python Library: AutoToloka
- Python Library: shiftlab-ocr
- Face Antispoofing & Multi-Modal Vision-Language Models
- GitHub Repo Summarizer (Chrome Extension)
- ChatGPT Scrollbar (Chrome Extension)
1. Android Remote Control with VLM AI Agents
“Hands-free” Android automation? Yes, please.
A custom Android app that captures screenshots and sends them to vision-language AI agents which determine the next UI action—tap, swipe, or type.
- Real-Time: Receives instructions from powerful server-based models.
- Use Cases: Automated testing, daily phone tasks, or exploring novel ways to control a device.
Demo Link: View MP4 in Google Drive
2. Control VLM-LLM Agent Silently With Your Breath
Start or stop the neural network with your breath. “Start listening” might be 2–3 short exhalations, while a smooth exhalation says “stop.” It’s all about recognizing breathing patterns, not your voice. After calibration (reading text aloud vs. silently), it learns to detect words from the sounds of breathing or even sniffles.
Demo Link: View GIF in Google Drive
3. Create, Chat & AR Experience with AI-Character (Text2Room)
Image & Video Generation • Inpainting • TryOn • Reasoning & More
Spin up an AI “character,” style them, dress them up, chat via Telegram, or even place them in your living room! Ideal for marketing campaigns, creative collaborations, or simply exploring next-gen generative AI.
4. Label and Inpaint Anything in a Room Interior
Label objects in a photo and then seamlessly inpaint them—complete with realistic shadows and lighting for interior makeovers.
Inpainting Demos (Google Drive):
Original | Another Example |
---|---|
![]() |
![]() |
5. Smart Drive for Smart City: Predict Optimal Speed to the Nearest Traffic Light or Jam
Find the optimal speed to the nearest traffic light. For example, while driving you may wonder whether to speed up a bit or slow down—the program predicts the ideal speed for your journey, calculating optimal speeds for each traffic light or even nearby traffic jams.
6. Estimate Golf Ball Trajectory
Analyze your golf swing or develop sports analytics solutions—this AI estimates the golf ball trajectory and more.
7. Pixel-Wise Segmentation of Spare Parts for 3D Printing
Precisely identify which parts need 3D printing or rework.
Local files:
Example 1 | Example 2 |
---|---|
![]() |
![]() |
8. Food Recognition App
When you want your phone to know what’s for dinner
An AI app that identifies food items (packaged or fresh) and performs OCR on labels.
- Nutritional Info: Extracts brand names, nutrient data, and portion sizes.
- Real-Time Performance: Over 90% accuracy, optimized for CPU/GPU inference.
- Cross-Platform: Available on both iOS & Android.
Demo Link: View GIF in Google Drive
9. Python Library: AutoToloka
Speedy Dataset Prep & Crowdsourcing
A Python library to help set up and validate datasets using interactive segmentation and multi-modal networks under the hood.
- Reduces Labeling Costs: Automates a significant portion of manual labeling.
- Scalable: Easily integrates with pipeline tools, containerized deployments, and major cloud providers.
10. Python Library: shiftlab-ocr
A library for handwriting text segmentation and character recognition.
11. Face Antispoofing & Multi-Modal Vision-Language Models
Experimenting with CLIP and other multi-modal setups, this project tackles face authentication spoofing by bridging text-image embeddings with specialized neural networks.
12. GitHub Repo Summarizer (Chrome Extension)
Speed-read Your Repositories
Fetches and summarizes the code structure of GitHub repositories using your locally stored GitHub personal access token—no servers involved.
- Privacy First: Your token remains on your device.
- Auto Summaries: Quickly see how a repository is organized, from directories to key code files.
13. ChatGPT Scrollbar (Chrome Extension)
Tired of endlessly scrolling through ChatGPT’s conversation feed?
This nifty extension adds a navigable scrollbar with clickable dashes for quick jumps.
- Local-Only Storage: No external data collection.
- Auto-Hide Feature: Keeps your screen tidy when not in use.
Demo Link: View GIF in Google Drive
ChatGPT Scrollbar on Chrome Web Store
More Highlights
- Top 1% on Upwork for AI/ML tasks.
- Mentored teams at corporate events, universities, and School of AI chapters.
- Hackathon Finalist: Recognized in competitions like Digital Transformation and PicsArt AI.
More Projects
GitHub: github.com/zack-dev-cm
GitHub: github.com/ZackPashkin
If you’re looking for:
- Custom AI Solutions (computer vision, NLP, or multi-modal)
- Mobile & Embedded Model Optimization
- ML Ops for GCP/AWS or on-prem solutions
Then let’s talk!
Email: kaisenaiko@gmail.com
Thanks for stopping by. Let’s see where AI can take us next!