RetakeData / infrastructure practice

Take control of your data

RetakeData helps infrastructure teams regain control of sensitive systems — observability, automation, virtualization, private cloud, and local AI. Designed and operated by a senior SRE with 10+ years on high-stakes platforms.

Let's chat View CV

Hardware to software · Full-stack infrastructure · Bare metal to private AI

FULL-STACK CONTROL

What we do

On-prem, private-cloud, and hybrid infrastructure engineering. Observability, automation, virtualization, and private AI — for teams that need to keep sensitive workloads under their own control.

OBS

Observability at scale

Grafana, Loki, Thanos, Vector, Elasticsearch. Stack audits, ingestion tuning, Fluentd-to-Vector migrations, recording rules, query performance, and SLO dashboarding.

Grafana
Loki
Thanos
Vector

IAC

Infrastructure automation & IaC

Terraform multi-provider modules, Ansible, GitOps delivery pipelines, Consul and NetBox inventory. The automation that lets small teams operate at fleet scale.

Terraform
Ansible
NetBox
Consul

ONP

Proxmox / Ceph / on-prem HA

100+ Proxmox nodes deployed with PXE automation. Storage backends including ZFS, NFS, SAN, NVMe-oF, and Ceph. Migrations from vSphere, HA cluster design.

Proxmox
Ceph
ZFS
HA

Private AI for infrastructure teams

Some operational data should not leave your network: incidents, logs, runbooks, internal docs, and procedures. RetakeData builds local AI systems for those environments: vLLM model serving, private RAG pipelines, RBAC-aware assistants, and integrations with your existing Proxmox, observability, and documentation stack. No external API dependency for sensitive workflows, predictable costs, full data control.

LLM

OSS Models

vLLM
Llama
Qwen
Mistral

EMB

Embedder

BGE-M3
Nomic
sentence-transformers

VDB

Vector DB

pgvector
Qdrant
Chroma

Selected work

A few receipts from 10 years operating infrastructure that can't quietly fail.

11 TB/day - 400 Loki pods - 50TB Thanos

Observability at high-stakes scale

Operated multi-cluster observability handling 11 TB/day across Loki, Elasticsearch, and Thanos. Fine-tuned ingestion, migrated Fluentd to Vector, built recording rules converting TB/day of load-balancer logs into metrics. In a business where minutes of downtime mean seven-figure impact.

Role: Lead SREScale: 11 TB/dayStack: Loki · Thanos · Vector

3000 VMs - 4 providers - Git to monitored

VM delivery pipeline

Built a 6-phase autonomous VM delivery pipeline: Git PR, Terraform across vSphere/Proxmox/OpenStack, Consul and NetBox auto-registration, Ansible configuration, HAProxy backend registration, Centreon monitoring. No Kubernetes needed.

Role: Infra engineerScale: 3000 VMsStack: Terraform · Ansible · Consul

6 GPUs - 1000+ PDFs - vLLM serving

On-prem AI infrastructure

Deployed GPU-equipped servers in the datacenter running vLLM with a private RAG pipeline over 1000+ documents. Built Graphia, an RBAC-aware SRE agent that abstracts Grafana complexity for engineering teams.

Role: Platform leadScale: 6 GPUs · 1000+ docsStack: vLLM · RAG · Graphia

See full CV →

Open Source

Tools built around real operational pain points, kept practical, open, and useful beyond our own environment.

OPEN SOURCE

SSHplex

Modern SSH multiplexing with multi-source inventory and tmux or iTerm2 backends.

GitHub →

OPEN SOURCE

OpenClaw Audit TUI

Terminal audit UI for OpenClaw sessions with live events and real-time streaming.

GitHub →

OPEN SOURCE

terraform-provider-centreon

Terraform provider for Centreon API V2, monitoring configuration managed as infrastructure as code.

GitHub →

See all open source work →

Need to keep critical infrastructure under your control?

If your team is building on-prem, private-cloud, or hybrid infrastructure for sensitive workloads, let's talk.

Get in touch