Capability map
Technical Expertise
As a senior freelance SRE, my work spans high-volume observability (11 TB/day across Loki, Elasticsearch, and Thanos), infrastructure automation (the VM delivery pipeline that managed 3000+ VMs), on-prem Proxmox/Ceph at scale (100+ nodes), and AI-assisted SRE tooling (Graphia, GPU serving, RAG pipelines). The stack below reflects what I run in production today.
- Observability & Reliability: Grafana (2000 users, 10 instances), Loki, Thanos, Vector, Elasticsearch, CloudWatch. SLO-oriented diagnostics, recording rules, and platform hygiene at gambling scale.
- Infrastructure Automation: Terraform multi-provider modules (vSphere, Proxmox, OpenStack), Ansible, Consul, NetBox. I build the tooling that lets small teams operate at fleet scale.
- On-prem & Virtualization: Proxmox (100+ nodes, PXE automated), Ceph, ZFS, NFS, SAN, NVMe over Fabric. vSphere migrations, HA cluster design, multi-DC networking.
- AI-Assisted Tooling: Graphia (SRE agent for Grafana), vLLM serving, RAG pipelines, MCP integration. RBAC-aware, built for real operations.
Below is a structured view of the technologies I use most: