Technical Expertise · RetakeData

As a senior freelance SRE, my work spans high-volume observability (11 TB/day across Loki, Elasticsearch, and Thanos), infrastructure automation (the VM delivery pipeline that managed 3000+ VMs), on-prem Proxmox/Ceph at scale (100+ nodes), and AI-assisted SRE tooling (Graphia, GPU serving, RAG pipelines). The stack below reflects what I run in production today.

Observability & Reliability: Grafana (2000 users, 10 instances), Loki, Thanos, Vector, Elasticsearch, CloudWatch. SLO-oriented diagnostics, recording rules, and platform hygiene at gambling scale.
Infrastructure Automation: Terraform multi-provider modules (vSphere, Proxmox, OpenStack), Ansible, Consul, NetBox. I build the tooling that lets small teams operate at fleet scale.
On-prem & Virtualization: Proxmox (100+ nodes, PXE automated), Ceph, ZFS, NFS, SAN, NVMe over Fabric. vSphere migrations, HA cluster design, multi-DC networking.
AI-Assisted Tooling: Graphia (SRE agent for Grafana), vLLM serving, RAG pipelines, MCP integration. RBAC-aware, built for real operations.

Below is a structured view of the technologies I use most: