Observability at scale
Grafana, Loki, Thanos, Vector, Elasticsearch. Stack audits, ingestion tuning, Fluentd-to-Vector migrations, recording rules, query performance, and SLO dashboarding.
- Grafana
- Loki
- Thanos
- Vector
RetakeData / infrastructure practice
RetakeData helps infrastructure teams regain control of sensitive systems — observability, automation, virtualization, private cloud, and local AI. Designed and operated by a senior SRE with 10+ years on high-stakes platforms.
Hardware to software · Full-stack infrastructure · Bare metal to private AI
On-prem, private-cloud, and hybrid infrastructure engineering. Observability, automation, virtualization, and private AI — for teams that need to keep sensitive workloads under their own control.
Grafana, Loki, Thanos, Vector, Elasticsearch. Stack audits, ingestion tuning, Fluentd-to-Vector migrations, recording rules, query performance, and SLO dashboarding.
Terraform multi-provider modules, Ansible, GitOps delivery pipelines, Consul and NetBox inventory. The automation that lets small teams operate at fleet scale.
100+ Proxmox nodes deployed with PXE automation. Storage backends including ZFS, NFS, SAN, NVMe-oF, and Ceph. Migrations from vSphere, HA cluster design.
Some operational data should not leave your network: incidents, logs, runbooks, internal docs, and procedures. RetakeData builds local AI systems for those environments: vLLM model serving, private RAG pipelines, RBAC-aware assistants, and integrations with your existing Proxmox, observability, and documentation stack. No external API dependency for sensitive workflows, predictable costs, full data control.
A few receipts from 10 years operating infrastructure that can't quietly fail.
Operated multi-cluster observability handling 11 TB/day across Loki, Elasticsearch, and Thanos. Fine-tuned ingestion, migrated Fluentd to Vector, built recording rules converting TB/day of load-balancer logs into metrics. In a business where minutes of downtime mean seven-figure impact.
Built a 6-phase autonomous VM delivery pipeline: Git PR, Terraform across vSphere/Proxmox/OpenStack, Consul and NetBox auto-registration, Ansible configuration, HAProxy backend registration, Centreon monitoring. No Kubernetes needed.
Deployed GPU-equipped servers in the datacenter running vLLM with a private RAG pipeline over 1000+ documents. Built Graphia, an RBAC-aware SRE agent that abstracts Grafana complexity for engineering teams.
Tools built around real operational pain points, kept practical, open, and useful beyond our own environment.

Modern SSH multiplexing with multi-source inventory and tmux or iTerm2 backends.
GitHub →
Terminal audit UI for OpenClaw sessions with live events and real-time streaming.
GitHub →
Terraform provider for Centreon API V2, monitoring configuration managed as infrastructure as code.
GitHub →If your team is building on-prem, private-cloud, or hybrid infrastructure for sensitive workloads, let's talk.
Get in touch