I architect production AI systems.
For over 16 years, I’ve built scalable backend infrastructure. Today, I design deterministic systems around probabilistic models: multi-agent orchestration platforms, retrieval infrastructure, and distributed execution engines that operate reliably under load.
I don’t build AI demos. I build systems that survive production.
My work sits at the boundary between stochastic language models and deterministic software architecture. That boundary is where most systems fail and where real engineering matters.
Core domains:
- Multi-agent orchestration and execution engines
- Retrieval-Augmented Generation (RAG) infrastructure
- Distributed task systems and background processing
- AI evaluation harnesses and reliability tooling
- Schema-first backend architecture using FastAPI, Celery, Redis, and PostgreSQL
Generative models are probabilistic. Infrastructure must not be.
- Multi-agent coordination graphs and execution loops
- Tool routing and structured output enforcement
- Streaming pipelines with background task isolation
- Failure recovery, retry policies, and state reconciliation
- Long-running workflow orchestration
Designed to eliminate silent failure modes and nondeterministic behavior in AI-driven systems.
- Hybrid search pipelines combining sparse and dense retrieval
- Embedding normalization and scoring strategies
- Retrieval evaluation using precision, recall, and nDCG
- FAISS to cuVS experimentation and performance benchmarking
- Hallucination risk mitigation through retrieval grounding
Built to transform retrieval quality from intuition into measurable signal.
- Asynchronous task orchestration and worker pools
- Message brokers and job queues
- Schema-first API contracts and strict boundaries
- Observability, tracing, and load diagnostics
- Deterministic control planes around AI components
Focused on reliability, not novelty.
Architected and deployed a multi-agent execution system integrating:
- Tool routing
- RAG context injection
- Structured outputs
- Streaming and background processing
- Failure handling and idempotent retries
Built with FastAPI, Celery, Redis, and PostgreSQL. Designed for concurrency, resilience, and long-running execution flows.
Engineered automated pipelines to:
- Measure retrieval quality
- Compare embedding and scoring strategies
- Detect hallucination risk patterns
- Benchmark latency and system stability under stress
Focused on bridging research metrics with production guarantees.
- NVIDIA AI Certified: Generative AI
- NVIDIA AI Certified: Agentic AI Applications with Large Language Models
- Codeable Certified WordPress Expert
- Explicit over magical
- Boring systems are better than clever hacks
- Deterministic boundaries around probabilistic models
- Clean architecture enables safe iteration
- Production reliability is the benchmark
LinkedIn: https://www.linkedin.com/in/joseph-gabito/
Email: dsc [dot] official [dot] mail at gmail [dot] com






