Surfaced
AI Search Visibility scanner + GEO content methodology.
Measures where a brand is missing from Google's AI Overview and generates content to fill the gap.
Stack: early R&D, not finalized.
Solo · R&D phase.
I build AI products end-to-end.
From the business need down to the infrastructure.
From the business need, through the agentic flows that run them, down to the infrastructure that makes them work: inference, proxies, data. Most people do one layer; I build the whole stack. 79 PRs into vllm-mlx and a self-built LTE proxy pool are how far down that goes. Exit Hosting90 (2020).
I founded Hosting90 in 2002. Eighteen years of building from garage to a 25-person team to an international exit to WY Group in 2020.
Then I took a year off. Went deeper into the AI/ML stack — local LLM models, agentic workflows, inference infrastructure. Realized there's a massive gap between what AI labs publish and what a solo founder can actually run on their own infrastructure.
That gap is what interests me most right now.
So that's what I do now: I take a business need and build the whole working system around it — the agentic flows that run it and the infrastructure underneath. Most people do one layer; I span the full stack, from need to infra. As CTO at MirandaMedia Group I built and led the technical architecture and AI stack of three production AI products — Advanty, Margly, and Discury — agentic systems that make decisions, call tools, and carry out multi-step work on their own. Today I build independently (Surfaced, R&D) and contribute to vllm-mlx.
That span shows in how I chose the inference stack per workload across those three: Advanty's batch work ran fully on owned inference (Qwen 3.6 on vllm-mlx, Apple Silicon); Margly ran on frontier cloud (Google AI) for the reliability its agent orchestration needed; Discury orchestrated both. The 79 PRs I've merged into vllm-mlx — and the LTE proxy pool I built for data access — are how far down the stack I go when the economics demand it.
Alongside my own work, I do audits of inference economics and agentic workflows for AI startups and tech companies, and I'm open to advisory and fractional-CTO engagements.
An open question I'm working through: I'm building Surfaced to apply GEO (Generative Engine Optimization) — getting cited in Google's AI Overview — but I don't yet know how reproducible that is across different niches. Until I have my own proven case studies, it stays a project in development; a content offering without its own track record is just selling promises.
Based near Prague. Czech and English (written). I publish about LLM economics and infrastructure patterns.

I build across the whole stack — the agentic flows that run a product, and the inference, proxy, and data infrastructure underneath them. Here's what that looks like at the orchestration layer; the infrastructure proof is below.
Routing requests between local models (Qwen, Gemma) and cloud APIs (Claude, GPT) based on task complexity + cost. Production cost savings of 60–80% vs. pure cloud setup.
→ Built into Discury's high-volume agent tasks — owned Qwen 3.6, with frontier cloud models only where task quality demanded the premium.
No brittle prompt chains. Agent receives a tool set, decides on its own. Requires a strong reasoning model + correct tool granularity. Lessons learned from production deployment.
→ Built into Margly for autonomous multi-step orchestration over merchant order, cost, and ad data.
Model Context Protocol as foundation for tool integration. Practical patterns for context management, error recovery, and debugging multi-step agents.
Tool calling loops, hallucinated calls, context window poisoning, infinite retry loops. What I've seen in production and how to fix it.
→ Patterns derived from building and running three production AI products as CTO.
Prefill vs generation cost. KV cache reuse. Speculative decoding for agent loops. Practical ROI analyses.
→ Why Advanty and Discury were built on owned inference — measured ROI on M3 Ultra vs. cloud API per task class; local inference automatically failed over to public cloud when unavailable.
I write about this regularly. If you have a production agentic workflow that's bleeding tokens or has failure mode issues, get in touch →
AI Search Visibility scanner + GEO content methodology.
Measures where a brand is missing from Google's AI Overview and generates content to fill the gap.
Stack: early R&D, not finalized.
Solo · R&D phase.
Three production AI products where I built and led the technical architecture and AI stack. Past role — not products of mine today.
AI-powered competitive intelligence for marketing agencies.
Agents auto-tag ads, extract hooks, classify CTAs, and tag creatives — all as reliable structured outputs.
Stack: Qwen 3.6 on vllm-mlx (Apple Silicon M3 Ultra). A batch-friendly workload with reliable structured outputs — owned inference made sense economically and operationally.
Built and led the technical architecture and AI stack as CTO at MirandaMedia Group.
Shoptet e-commerce analytics for online merchants.
AI agents identify margin leaks, recommend pricing changes, auto-tag transactions, and run multi-step orchestration over orders, shipping, ad costs, and returns.
Stack: Google AI (Gemini). Chosen deliberately — Margly's complex multi-step tool calling and autonomous orchestration required frontier-model reliability that open-weights models didn't yet match at this task class.
Built and led the technical architecture and AI stack as CTO at MirandaMedia Group.
Customer intelligence — mines Reddit, Hacker News, and Product Hunt for pain points, trends, and market gaps.
Discovery and classification agents surface signals at high volume; summarization agents distill the nuance worth acting on.
Stack: hybrid orchestration. Discovery and classification agents on Qwen 3.6 / vllm-mlx (high-volume, batch-tolerant); final summarization and nuance-heavy reasoning on Google AI where the per-token premium was justified by output quality. Routing decided per agent task.
Built and led the technical architecture and AI stack as CTO at MirandaMedia Group.
When the unit economics demand it, I go all the way down — to the inference layer and to the data/access layer. vllm-mlx (79 merged PRs) and a self-built LTE proxy pool are the two ends of that story: owned inference that made products affordable to run, and a residential-IP scraping stack that makes gated public data reachable. I do MLX out of efficiency necessity, not as a research specialty.
79 merged PRs to open-source LLM inference for Apple Silicon (581+ stars). Primary implementor of Anthropic Messages API (/v1/messages) — the compatibility layer that makes vllm-mlx work with Claude Code and OpenCode.
Main areas of work:
A self-built LTE proxy pool — Raspberry Pis plus consumer MiFi modems on rotating CGNAT residential IPs — that puts scraping traffic on organically residential addresses, with a commercial proxy as hot fallback. Anti-detect scraping across Cloudflare / DataDome / Akamai. Real throughput from production pipelines (10k+ requests/day).
→ Read: the LTE proxy poolApple M3 Ultra 256GB as primary inference machine. Workloads with 9:1 prefill/generation ratio (image classification, content tagging, structured extraction). 274 tok/s sustained throughput on Gemma 4 26B-A4B at concurrency 8.
Real ROI analyses: M3 Ultra vs RTX PRO 6000 Blackwell for different workload types. Cost-per-token calculations across cloud providers vs. owned infrastructure. Payback period modeling for hardware investments.
vLLM, SGLang, llama.cpp, MLX. When to use which stack. Quantization tradeoffs. Multi-model serving. Auto-scaling on bare metal vs Kubernetes.
Cost arbitrage is strategy, not preference. Who owns the inference stack competes on different terms than who pays the OpenAI bill. Engineering decision with P&L impact.
Trends are expensive. Working production systems = long-term moat. Six months with one provider > three months chasing every new release.
I spent 18 years running a tech company — where the CEO chair meant understanding code and cash flow at the same time. Today, when I solve architecture, I see P&L consequences. When I talk to investors, I talk about KV cache too. This combination is rare and that's where the value lives.
I've had an exit. I know what the VC track looks like. I consciously choose bootstrap because for AI infrastructure tooling, profit beats scale. Not dogma — context-aware decision.
20 years taught me shipping features ≠ creating value. I measure myself and projects by real outcomes (retention, margin, ARR), not activities (PRs, posts, meetings). This perspective only comes after several building/selling cycles.
If this resonates, we might be on the same wavelength.
Core contributor to vllm-mlx. Building Surfaced (solo, R&D). Open to fractional-CTO and advisory work.
79 merged PRs to vllm-mlx (open-source LLM inference for Apple Silicon, 581+ stars). Authored the Anthropic Messages API compatibility layer that makes vllm-mlx work with Claude Code. Main focus: KV cache quantization (QuaRot, asymmetric, TurboQuant), constrained decoding, MLLM infrastructure, production reliability.
Built and led the technical architecture and AI stack of Advanty — AI-powered competitive intelligence for marketing agencies — as CTO at MirandaMedia Group.
Built and led the technical architecture and AI stack of Margly (e-commerce profitability analytics for Shoptet) and Discury (customer intelligence platform) as CTO at MirandaMedia Group.
AI customer-support chatbot for e-commerce. Live today, handed over to the operating team.
Started working with local LLM models and inference infrastructure.
B2B monitoring dashboard for manufacturers and distributors tracking partner stock and pricing across e-shops. Live today, handed over.
Sale of Hosting90 systems s.r.o. to WY Group (operator of Ignum brand). Transaction publicly announced.
→ hostingy.netStart of entrepreneurial journey in hosting and web services. Operated as Hosting90 systems s.r.o. (Company ID 28545711).
AI customer-support chatbot for e-commerce — resolves up to 98% of inquiries without a human, recommends products and closes sales. Drops into Shopify, WooCommerce, Magento, PrestaShop or OpenCart via a JS snippet. Co-founded; I owned the technical build. Live today, now run by the team.
→ lobot.chatB2B monitoring dashboard for manufacturers and distributors — tracks partner stock levels and pricing across e-shops, with real-time alerts and historical price trends. Customers include Lenovo, Niceboy and Infinix. Co-founded; owned the data pipeline and infrastructure. Live today, handed over.
→ www.guruwatch.czEmail or LinkedIn — written communication in Czech or English, same speed.
For calls, I'm strongest in Czech; English calls work best when scheduled with a clear agenda. I usually reply same day.