posts
Building tools from operational pain
How RetakeData turns real infrastructure bottlenecks into open-source tools: a Terraform provider for Centreon, an SSH multiplexer for large fleets, and AI-assisted development workflows.
Every tool we build starts the same way: a recurring operational pain that no existing product solves well enough. This is the story of two tools that came directly from production infrastructure work, and the AI-assisted workflow that made them possible.
The Centreon gap
At a previous engagement, the team managed monitoring configuration through Centreon. The only Terraform provider available targeted the legacy CLAPI API, unmaintained for over five years. The V1 API provider lacked features needed for modern infrastructure management.
We needed monitoring configuration as code, and the tool didn’t exist. So we built it.
The terraform-provider-centreon was our first Go project, built from scratch against Centreon API V2. The approach: feed the OpenAPI documentation to an AI assistant, generate integration code with logging, unit tests, and error handling, then iterate through testing until it met production standards.
The result: a fully open-source Terraform provider that fills a five-year gap in the Centreon ecosystem. No CLAPI, no legacy debt, API V2 native.
Stack: Go, Terraform Provider SDK V2, GitHub Actions CI/CD, PyPI-style semantic releases.
The SSH fleet problem
Managing connections to hundreds of VMs across multiple datacenters is a daily reality for infrastructure teams. The team relied on Remote Desktop Manager (RDM), but licensing costs were high and every new host required manual database entry. Broadcasting commands across sessions was clunky.
After finding no suitable alternative, we built SSHplex.
The requirements were clear:
- Pull hosts dynamically from NetBox and Ansible inventory, not a manual database
- Broadcast commands across multiple SSH sessions simultaneously
- tmux integration for session persistence, because closing a terminal shouldn’t kill a background task
- A modern terminal UI that doesn’t feel like a relic
SSHplex was built in three phases: foundation (config, NetBox connectivity, basic TUI, single SSH), core features (multi-select, tmux management, error handling), and polish (command broadcasting, session persistence, caching). CI/CD was set up early with two pipelines: PR-triggered testing and tag-based releases to GitHub and PyPI.
Stack: Python, Textual TUI framework, NetBox API, tmux, GitHub Actions.
The AI-assisted workflow
Both projects share a common thread: they were built with AI as a pair programmer, not an architect.
The workflow that worked:
- Architecture decisions stay human. We define the module boundaries, the data flow, the interfaces.
- Structured prompts, not random suggestions. Example: “Create a NetBox API client class with connection pooling, automatic retry logic with exponential backoff, proper SSL certificate handling, and comprehensive error handling for device and VM queries.”
- Iterative refinement. Start with high-level structure, break into specific tasks, use agent mode for debugging passes.
- Code review is mandatory. All generated code goes through manual review and real-world testing.
The AI excelled at boilerplate elimination, API integration patterns, test scaffolding, and maintaining consistency across modules. Where it struggled: context drift on long sessions, over-engineering on early models (Claude 3.5 era), and occasional misalignment on domain-specific logic.
Net result: approximately 40% faster development velocity while keeping architectural integrity intact. The tools are production-tested, not AI-generated prototypes.
What’s next
These tools exist because real infrastructure needed them. The next projects in the pipeline follow the same pattern: Graphia (RBAC-aware SRE agent for Grafana infrastructure), observability tooling around Loki and Thanos, and automation that bridges the gap between Git push and live traffic.
Open source isn’t a side project for us. It’s an extension of the operational work, kept practical, inspectable, and useful beyond our own environment.
Browse all projects on the Open Source page.