<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>RetakeData</title><link>https://retakedata.com/</link><description>Recent content on RetakeData</description><generator>Hugo</generator><language>en</language><copyright>&lt;a href="https://creativecommons.org/licenses/by-nc/4.0/" target="_blank" rel="noopener">CC BY-NC 4.0&lt;/a></copyright><lastBuildDate>Mon, 09 Jun 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://retakedata.com/index.xml" rel="self" type="application/rss+xml"/><item><title>Observability Stack Audit</title><link>https://retakedata.com/missions/observability-stack-audit/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://retakedata.com/missions/observability-stack-audit/</guid><description>&lt;p>You have dashboards nobody trusts, alerts that fire at 3 AM for no reason, and a Loki bill that keeps climbing. The stack works, technically. But it is not working for your team.&lt;/p>
&lt;p>I operated an 11 TB/day observability stack across Loki (400 pods, 3TB RAM), Elasticsearch (200TB, 3 DCs), and Thanos (50TB, 200 nodes). My specific work: migrating Fluentd to Vector for performance, fine-tuning Loki with memcache clusters, building recording rules that converted TB/day of load-balancer logs into queryable metrics, and maintaining 12-month Thanos storage for ML forecasting.&lt;/p></description></item><item><title>VM Migration Factory</title><link>https://retakedata.com/missions/vm-migration-factory/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://retakedata.com/missions/vm-migration-factory/</guid><description>&lt;p>Most teams operating hundreds of VMs do it manually: ticket, provision, configure, register, repeat. Every VM is slightly different. Every migration is a project. Every provider has its own quirks that someone has to remember.&lt;/p>
&lt;p>This mission replaces that with a pipeline. I built this exact system to manage 3000+ VMs across 4 providers and 10 datacenters, without Kubernetes. You open a PR with a tfvars file, merge it, and the VM goes from nothing to configured, monitored, and receiving traffic.&lt;/p></description></item><item><title>Proxmox / Ceph HA Platform</title><link>https://retakedata.com/missions/proxmox-ceph-ha-platform/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://retakedata.com/missions/proxmox-ceph-ha-platform/</guid><description>&lt;p>VMware licensing changes pushed a lot of teams to look for alternatives. Proxmox is the answer, but a Proxmox cluster that survives real failure scenarios needs proper Ceph design, network segmentation, quorum tuning, and automated provisioning.&lt;/p>
&lt;p>I deployed 100+ Proxmox nodes with PXE automation across every storage backend: ZFS, NFS, SAN, NVMe over Fabric, and Ceph. I led vSphere-to-Proxmox migrations (Pure Storage SAN, NVMe-oF, MultipathD), Proxmox 4-to-8 upgrades with near-zero downtime using NFS buffer, and designed HA architectures with LACP/EVPN/VPC.&lt;/p></description></item><item><title>On-Prem AI for Operations</title><link>https://retakedata.com/missions/onprem-ai-operations/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://retakedata.com/missions/onprem-ai-operations/</guid><description>&lt;p>Every team wants AI-assisted operations. Not every team can send their logs, metrics, and incident data to OpenAI. If you operate under GDPR constraints, data sovereignty requirements, or strict security policies, on-prem AI is not optional.&lt;/p>
&lt;p>I built this exact setup: 6 GPUs across 2 servers in the datacenter, vLLM serving, a private RAG pipeline over 1000+ PDFs with PostgreSQL/pgvector, and Graphia, an RBAC-aware SRE agent that lets engineering teams query their Grafana infrastructure without needing to know LogQL or PromQL.&lt;/p></description></item><item><title>Building SSHplex: A Modern TUI for SSH Connection Multiplexing</title><link>https://retakedata.com/posts/2025/06/building-sshplex-a-modern-tui-for-ssh-connection-multiplexing/</link><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid>https://retakedata.com/posts/2025/06/building-sshplex-a-modern-tui-for-ssh-connection-multiplexing/</guid><description>&lt;p>&lt;img src="https://retakedata.com/images/sshplex-session-manager.png" alt="SSHplex Session Manager">&lt;/p>
&lt;h2 id="the-problem">The Problem&lt;/h2>
&lt;p>At Kindred, we relied on Remote Desktop Manager (RDM) to manage connections to our Windows and Linux hosts for broadcasting commands and checking system states. However, licensing costs were high and every new host required manual database entry. After finding no suitable alternatives, I decided to build my own solution.&lt;/p>
&lt;h2 id="solution-design">Solution Design&lt;/h2>
&lt;p>SSHplex needed three core capabilities: a modern terminal UI with host selection and bulk operations, flexible data source integration (&lt;span class="key">NetBox&lt;/span> and Ansible inventory), and terminal multiplexer support with session persistence for background tasks.&lt;/p></description></item><item><title>Building SSHplex: More details</title><link>https://retakedata.com/posts/2025/06/building-sshplex-more-details/</link><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid>https://retakedata.com/posts/2025/06/building-sshplex-more-details/</guid><description>&lt;p>&lt;img src="https://retakedata.com/images/sshplex-session-manager.png" alt="SSHplex Session Manager">&lt;/p>
&lt;h2 id="the-problem">The Problem&lt;/h2>
&lt;p>At Kindred, we relied on Remote Desktop Manager (RDM) to manage connections to our Windows and Linux hosts. I primarily used it to connect to multiple VMs simultaneously and broadcast commands to check system states or run quick commands where Ansible ad-hoc was either too slow or when I needed immediate feedback.&lt;/p>
&lt;p>However, we faced two major issues:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Licensing costs&lt;/strong>: The license was expiring and renewal was expensive&lt;/li>
&lt;li>&lt;strong>Maintenance overhead&lt;/strong>: Every new host had to be manually added to the RDM SQL Server database&lt;/li>
&lt;/ul>
&lt;p>After searching for alternatives, I found nothing that met our specific needs. So I decided to build my own solution.&lt;/p></description></item><item><title>AI Transformed My Journey as a System Engineer: Developing a Terraform Provider for Centreon</title><link>https://retakedata.com/posts/2025/02/ai-transformed-my-journey-as-a-system-engineer-developing-a-terraform-provider-for-centreon/</link><pubDate>Tue, 25 Feb 2025 00:00:00 +0000</pubDate><guid>https://retakedata.com/posts/2025/02/ai-transformed-my-journey-as-a-system-engineer-developing-a-terraform-provider-for-centreon/</guid><description>&lt;p>As a day-to-day &lt;span class="key">Terraform&lt;/span> user with a decent foundation in Python, I never imagined that developing a Terraform provider would significantly impact my system engineering skills. Yet, leveraging AI tools enabled me to build a provider for Centreon API V2 and step into the &lt;span class="key">Go&lt;/span> ecosystem—an essential leap for my work at Kindred.&lt;/p>
&lt;h2 id="overview">Overview&lt;/h2>
&lt;p>For years, there was a significant gap in available tools: the only existing Centreon Terraform provider was built around the legacy CLAPI, which had not been updated in over five years. While there was also a V1 (distinct from CLAPI), it lacked the features needed for modern infrastructure management. My need for an up-to-date solution at Kindred pushed me to create a new provider based on the latest Centreon API V2, ensuring future-proof functionality and seamless integration with current workflows.&lt;/p></description></item><item><title>Hello World</title><link>https://retakedata.com/posts/2025/02/hello-world/</link><pubDate>Mon, 24 Feb 2025 00:00:00 +0000</pubDate><guid>https://retakedata.com/posts/2025/02/hello-world/</guid><description>&lt;p>Welcome to my blog! I&amp;rsquo;m a French systems engineer with a long-standing passion for systems, security, and networking that dates back to my younger years. What started as curiosity has evolved into a fulfilling career and continuous learning journey.&lt;/p>
&lt;h2 id="about-me">About Me&lt;/h2>
&lt;p>I&amp;rsquo;ve built my career around understanding and implementing robust system architectures, but I believe there&amp;rsquo;s always room to grow. Recently, I&amp;rsquo;ve been diving deeper into programming with a particular focus on Go and Python. Despite being what some might call a &amp;ldquo;late learner&amp;rdquo; in the programming world, I&amp;rsquo;m determined to master these skills to complement my systems expertise.&lt;/p></description></item><item><title>About</title><link>https://retakedata.com/about/</link><pubDate>Wed, 01 Jan 2025 00:00:00 +0000</pubDate><guid>https://retakedata.com/about/</guid><description>&lt;p>SMJED is the independent infrastructure engineering practice of Sabri MJAHED, SRE with 10+ years across sysadmin and platform engineering roles. We build on-prem, private-cloud, and hybrid platforms for teams that need full control over their infrastructure and their data — not a dependency on someone else&amp;rsquo;s cloud.&lt;/p>
&lt;h2 id="the-approach">The approach&lt;/h2>
&lt;p>Generalists by design. Real infrastructure problems don&amp;rsquo;t stay inside one specialty. They cross observability, automation, storage, networking, and security in ways that require someone who can operate across all of them. Rack and cable on Monday, debug a Loki ingestion bottleneck on Tuesday, architect a Proxmox HA cluster on Wednesday.&lt;/p></description></item><item><title/><link>https://retakedata.com/missions/skills-context/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://retakedata.com/missions/skills-context/</guid><description>&lt;h1 id="skills-context--sabris-real-experience">Skills Context — Sabri&amp;rsquo;s Real Experience&lt;/h1>
&lt;blockquote>
&lt;p>Working doc for posts, missions, and narrative refinement.
NOT part of the Hugo build. Reference-only.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h2 id="1-observability">1. Observability&lt;/h2>
&lt;h3 id="grafana">Grafana&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Scale&lt;/strong>: 2000 users across 10 instances&lt;/li>
&lt;li>&lt;strong>Deployment&lt;/strong>: Both on-prem (deb packages + DBs) and full Docker via Helm&lt;/li>
&lt;li>&lt;strong>Datasources&lt;/strong>: Elasticsearch, VictoriaLogs, Splunk, Loki, Prometheus metrics&lt;/li>
&lt;li>&lt;strong>Dashboarding&lt;/strong>: Heavy transformation work across multiple datasources&lt;/li>
&lt;li>&lt;strong>Tooling built&lt;/strong>: &amp;ldquo;Grafana Housekeeping&amp;rdquo; — gathers all resources (users, dashboards, alerts, contact points, datasources), checks for stale/broken/unused, sends reporting to Jira for manager-driven cleanups&lt;/li>
&lt;li>&lt;strong>MCP&lt;/strong>: Used Grafana MCP&lt;/li>
&lt;/ul>
&lt;h3 id="prometheus--thanos">Prometheus / Thanos&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Scale&lt;/strong>: Thanos cluster with global querier (to avoid having scattered Thanos instances in Grafana)&lt;/li>
&lt;li>&lt;strong>Topology&lt;/strong>: Storage gateway cache + read gateway cache across 4 different clusters&lt;/li>
&lt;li>&lt;strong>Storage&lt;/strong>: ~50TB managed, long-term (12-month) bucket&lt;/li>
&lt;li>&lt;strong>Nodes&lt;/strong>: ~200 replica nodes (8GB RAM, 3 CPU each)&lt;/li>
&lt;li>&lt;strong>Project focus&lt;/strong>: Maintaining and adding long-term storage for ML toolbox forecasting&lt;/li>
&lt;/ul>
&lt;h3 id="loki">Loki&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Scale&lt;/strong>: 4 clusters, ~400 pods total, ~3TB RAM + significant CPU&lt;/li>
&lt;li>&lt;strong>Cache&lt;/strong>: 2TB memcache cluster for 24h hot storage&lt;/li>
&lt;li>&lt;strong>Work&lt;/strong>: Fine-tuning, installing cluster cache, experimenting with metric-splitting configurations&lt;/li>
&lt;li>&lt;strong>Recording rules&lt;/strong>: Converted TB/day load-balancer logs into metrics&lt;/li>
&lt;/ul>
&lt;h3 id="vector">Vector&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Migration&lt;/strong>: Migrated from Fluentd to Vector due to Fluentd performance issues&lt;/li>
&lt;li>&lt;strong>Scope&lt;/strong>: Full pipeline migration + log transformations&lt;/li>
&lt;/ul>
&lt;h3 id="elk">ELK&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Full stack&lt;/strong>: Beats, Kafka, Logstash (heavy transformations, also shipped to Splunk HEC), Elasticsearch, APM + RUM&lt;/li>
&lt;li>&lt;strong>Scale&lt;/strong>: 200TB across 3 datacenters, high-availability setup&lt;/li>
&lt;li>&lt;strong>Users&lt;/strong>: Team of 20 developers&lt;/li>
&lt;li>&lt;strong>Compliance&lt;/strong>: Different retention policies for gambling regulator purposes&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="2-kubernetes--platform">2. Kubernetes &amp;amp; Platform&lt;/h2>
&lt;h3 id="kubernetes">Kubernetes&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Level&lt;/strong>: Primarily user-level (not cluster administrator), operates apps via CI/CD&lt;/li>
&lt;li>&lt;strong>Workflow&lt;/strong>: Jenkins + Makefile → build → push to JFrog → merged and deployed via ArgoCD&lt;/li>
&lt;li>&lt;strong>Helm&lt;/strong>: Created Helm charts for own apps (Graphia, search query exporter, Grafana Housekeeping)&lt;/li>
&lt;li>&lt;strong>Apps deployed&lt;/strong>: Graphia (SRE agent), search query exporter, Grafana Housekeeping tool&lt;/li>
&lt;/ul>
&lt;h3 id="search-query-exporter-app-built-at-fdj">Search Query Exporter (app built at FDJ)&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Purpose&lt;/strong>: Measure real user-experience search performance latency for SLO/SLIs&lt;/li>
&lt;li>&lt;strong>How it works&lt;/strong>: Queries Splunk, Thanos, and Loki across multiple time ranges (10m, 1h, 6h, 24h, 7d, 30d)&lt;/li>
&lt;li>&lt;strong>Output&lt;/strong>: Checks how the platform responds over time, compares against SLO budgets, flags when too slow&lt;/li>
&lt;li>&lt;strong>Design&lt;/strong>: Easily configurable, multi-backend querying&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="3-cicd">3. CI/CD&lt;/h2>
&lt;ul>
&lt;li>TODO: Still need detail on Jenkins, GitLab CI, GitHub Actions specifically&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="4-automation--iac">4. Automation &amp;amp; IaC&lt;/h2>
&lt;h3 id="pxe--proxmox">PXE / Proxmox&lt;/h3>
&lt;ul>
&lt;li>Automated PXE installation for Proxmox nodes using Ansible&lt;/li>
&lt;/ul>
&lt;h3 id="full-vm-delivery-pipeline-kubernetes-without-kubernetes">Full VM Delivery Pipeline (&amp;ldquo;Kubernetes without Kubernetes&amp;rdquo;)&lt;/h3>
&lt;p>The complete flow, end to end:&lt;/p></description></item><item><title>My Resume</title><link>https://retakedata.com/cv/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://retakedata.com/cv/</guid><description>&lt;h2 id="summary">Summary&lt;/h2>
&lt;blockquote>
&lt;p>Senior freelance &lt;span class="key">SRE&lt;/span> with &lt;span class="key">10 years&lt;/span> of experience operating infrastructure that can&amp;rsquo;t quietly fail. I work across high-volume observability, &lt;span class="key">Kubernetes&lt;/span> platform engineering, on-prem &lt;span class="key">Proxmox/Ceph&lt;/span> clusters, and AI-assisted SRE tooling.&lt;/p>
&lt;/blockquote>
&lt;div class="cv-downloads">
 &lt;a href="https://retakedata.com/files/sabri-mjahed-cv-en.pdf" download>Download CV PDF - English&lt;/a>
 &lt;a href="https://retakedata.com/files/sabri-mjahed-cv-fr.pdf" download>Download CV PDF - French&lt;/a>
&lt;/div>
&lt;h2 id="open-source-contributions">Open Source Contributions&lt;/h2>
&lt;p>🌟 &lt;strong>SSHplex&lt;/strong>&lt;/p>
&lt;blockquote>
&lt;p>Built and maintained an open source terminal UI for SSH connection multiplexing, designed for infrastructure teams that need fast host discovery, bulk operations, and persistent sessions.&lt;/p>
&lt;/blockquote>
&lt;ul>
&lt;li>GitHub Repository: &lt;a href="https://github.com/Sabrimjd/SSHPlex">SSHPlex&lt;/a>&lt;/li>
&lt;li>Blog Post: &lt;a href="https://retakedata.com/posts/2025/06/building-sshplex-a-modern-tui-for-ssh-connection-multiplexing/">Building SSHplex&lt;/a>&lt;/li>
&lt;li>Combines NetBox, Ansible, Consul, and static lists as sources of truth for hosts and devices&lt;/li>
&lt;li>Supports three mux backends: tmux standalone, tmux + iTerm2, and native iTerm2 on macOS&lt;/li>
&lt;li>Provides broadcast commands and persistent sessions to replace expensive legacy tooling&lt;/li>
&lt;/ul>
&lt;h2 id="experience">Experience&lt;/h2>
&lt;h3 id="a-hrefhttpswwwkindredgroupcomimg-srcimgkindredwebp-altkindred-france-styleheight-30px-padding-right-10px-vertical-align-middlea-kindred-france--site-reliability-engineer--2021---present">&lt;a href="https://www.kindredgroup.com/">&lt;img src="https://retakedata.com/img/Kindred.webp" alt="Kindred France" style="height: 30px; padding-right: 10px; vertical-align: middle;">&lt;/a> Kindred France | Site Reliability Engineer | 2021 - Present&lt;/h3>
&lt;div class="skills-container">
 &lt;a href="https://retakedata.com/tags/kubernetes" class="skill-badge">Kubernetes&lt;/a>
 &lt;a href="https://retakedata.com/tags/grafana" class="skill-badge">Grafana&lt;/a>
 &lt;a href="https://retakedata.com/tags/loki" class="skill-badge">Loki&lt;/a>
 &lt;a href="https://retakedata.com/tags/thanos" class="skill-badge">Thanos&lt;/a>
 &lt;a href="https://retakedata.com/tags/vector" class="skill-badge">Vector&lt;/a>
 &lt;a href="https://retakedata.com/tags/jenkins" class="skill-badge">Jenkins&lt;/a>
 &lt;a href="https://retakedata.com/tags/gitlab-ci" class="skill-badge">GitLab CI&lt;/a>
 &lt;a href="https://retakedata.com/tags/terraform" class="skill-badge">Terraform&lt;/a>
&lt;/div>
&lt;ul>
&lt;li>Progressed from System Engineer to Site Reliability Engineer, shifting focus from infrastructure automation toward platform reliability, observability, diagnostics and performance.&lt;/li>
&lt;li>Operate observability workflows around Kubernetes with Thanos, Loki, Grafana, and Vector as core technologies.&lt;/li>
&lt;li>Built a HouseKeeping tool to diagnose stale and broken Grafana resources, reducing dashboard/config drift and improving platform hygiene.&lt;/li>
&lt;li>Built a Search Query Exporter to diagnose query slowness and establish SLOs across Thanos and Loki.&lt;/li>
&lt;li>Designed an SLO Dashboard Framework to standardize service-level visibility and make reliability reporting easier to adopt across teams.&lt;/li>
&lt;li>Building Graphia, a domain-specific SRE agent for Grafana diagnosis - RBAC-aware behavior, MCP-based diagnosis flows, and safeguards for enterprise operations.&lt;/li>
&lt;li>Daily hands-on work with Helm charts, Argo CD, container image lifecycle, Jenkins, GitLab, AWS CloudWatch, and CUR2 cost analysis.&lt;/li>
&lt;/ul>
&lt;h4 id="current-stack-and-ownership">Current stack and ownership&lt;/h4>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Area&lt;/th>
 &lt;th>Components/Tools&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Observability&lt;/td>
 &lt;td>Grafana, Loki, Thanos, Vector, AWS CloudWatch&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Platform Engineering&lt;/td>
 &lt;td>Kubernetes, Helm, Argo CD, Container Images&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>CI/CD &amp;amp; Automation&lt;/td>
 &lt;td>Jenkins, GitLab CI, Terraform, Ansible&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Data &amp;amp; Storage&lt;/td>
 &lt;td>Kafka, Redis, PostgreSQL, Microsoft SQL, Couchbase&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Programming &amp;amp; AI&lt;/td>
 &lt;td>Go, Python, Bash, AI, MCP&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h4 id="previous-impact-within-the-same-company">Previous impact within the same company&lt;/h4>
&lt;ul>
&lt;li>Led the automated deployment of VMs and applications through CI/CD, enabling multiple deployments per day.&lt;/li>
&lt;li>Used Terraform to deploy across 10 datacenters and 4 providers (OpenStack, Proxmox, vSphere, NetBox) from shared templates.&lt;/li>
&lt;li>Used Ansible for VM initialization and application deployment, with Consul feeding service pools for HAProxy and Prometheus.&lt;/li>
&lt;li>Operated multi-cluster observability at &lt;span class="key">multi-TB/day&lt;/span> ingestion across logs, metrics, and traces, with &lt;span class="key">Kafka&lt;/span> pipelines feeding SIEM, logging, EDR, APM, and uptime monitoring.&lt;/li>
&lt;li>Integrated a highly available Proxmox cluster across 4 racks and 2 datacenters with Ceph, including PXE-based automation and 25 Gb networking per host.&lt;/li>
&lt;li>Accountable for the French security scope, driving remediation work for vulnerabilities and production hardening.&lt;/li>
&lt;/ul>
&lt;h3 id="a-hrefhttpswwwvincfrimg-srcimgvincpng-altvinc-styleheight-30px-padding-right-10px-vertical-align-middlea-vinc--system-engineer--2019---2021">&lt;a href="https://www.vinc.fr/">&lt;img src="https://retakedata.com/img/Vinc.png" alt="VINC" style="height: 30px; padding-right: 10px; vertical-align: middle;">&lt;/a> VINC | System engineer | 2019 - 2021&lt;/h3>
&lt;div class="skills-container">
 &lt;a href="https://retakedata.com/tags/proxmox" class="skill-badge">Proxmox&lt;/a>
 &lt;a href="https://retakedata.com/tags/dns" class="skill-badge">DNS&lt;/a>
 &lt;a href="https://retakedata.com/tags/ha" class="skill-badge">High Availability&lt;/a>
&lt;/div>
&lt;ul>
&lt;li>Architected the new platform with new BGP routers and firewalls.&lt;/li>
&lt;li>Managed Proxmox cluster across 2 datacenters.&lt;/li>
&lt;li>Responsible for SLA and client communication during production incidents.&lt;/li>
&lt;li>Implemented websites around client needs.&lt;/li>
&lt;li>Implemented a new DNS stack with high availability in mind.&lt;/li>
&lt;/ul>
&lt;h3 id="multi-visp--azuria--system-administrator--2017---2019">Multi-Visp / Azuria | System administrator | 2017 - 2019&lt;/h3>
&lt;div class="skills-container">
 &lt;a href="https://retakedata.com/tags/network" class="skill-badge">Network Infrastructure&lt;/a>
 &lt;a href="https://retakedata.com/tags/vpn" class="skill-badge">VPN&lt;/a>
 &lt;a href="https://retakedata.com/tags/datacenter" class="skill-badge">Datacenter Management&lt;/a>
 &lt;a href="https://retakedata.com/tags/wifi" class="skill-badge">WiFi&lt;/a>
&lt;/div>
&lt;ul>
&lt;li>Installed complete new racks in Telehouse2.&lt;/li>
&lt;li>Cable management between two rooms.&lt;/li>
&lt;li>Installed managed Wi-Fi equipment.&lt;/li>
&lt;li>Implemented high-availability multi-datacenter VPN services.&lt;/li>
&lt;/ul>
&lt;h2 id="contact-information">Contact Information&lt;/h2>
&lt;ul>
&lt;li>Email: &lt;a href="mailto:contact@smjed.net">contact@smjed.net&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Open Source</title><link>https://retakedata.com/open-source/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://retakedata.com/open-source/</guid><description>&lt;style>
 .oss-grid {
 display: grid;
 grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
 gap: 1.5rem;
 margin: 2rem 0;
 }

 .oss-card {
 position: relative;
 border: 1px solid rgba(120, 120, 120, 0.25);
 border-radius: 14px;
 overflow: hidden;
 background: rgba(120, 120, 120, 0.06);
 box-shadow: 0 18px 40px rgba(0, 0, 0, 0.08);
 transition: transform 180ms ease, box-shadow 180ms ease, border-color 180ms ease;
 cursor: pointer;
 }

 .oss-card:hover,
 .oss-card:focus-within {
 transform: translateY(-4px);
 box-shadow: 0 24px 48px rgba(0, 0, 0, 0.14);
 border-color: rgba(120, 120, 120, 0.4);
 }

 .oss-card__overlay {
 position: absolute;
 inset: 0;
 z-index: 1;
 }

 .oss-card img {
 display: block;
 width: 100%;
 aspect-ratio: 16 / 10;
 object-fit: cover;
 background: #0f1115;
 }

 .oss-card__body {
 padding: 1.1rem 1.15rem 1.2rem;
 }

 .oss-card__body h3 {
 margin: 0 0 0.5rem;
 }

 .oss-card__body p {
 margin: 0 0 0.85rem;
 line-height: 1.6;
 }

 .oss-card__body ul {
 margin: 0 0 1rem 1.15rem;
 }

 .oss-badges {
 display: flex;
 flex-wrap: wrap;
 gap: 0.45rem;
 margin: 0 0 0.9rem;
 }

 .oss-badge {
 display: inline-flex;
 align-items: center;
 padding: 0.22rem 0.55rem;
 border-radius: 999px;
 border: 1px solid rgba(120, 120, 120, 0.24);
 background: rgba(120, 120, 120, 0.1);
 font-size: 0.78rem;
 line-height: 1.1;
 white-space: nowrap;
 }

 .oss-card__links {
 position: relative;
 z-index: 2;
 display: flex;
 flex-wrap: wrap;
 gap: 0.8rem;
 }

 .oss-card__links a {
 text-decoration: none;
 }

 .oss-mini-grid {
 display: grid;
 grid-template-columns: repeat(auto-fit, minmax(220px, 1fr));
 gap: 1rem;
 margin: 1.5rem 0 2rem;
 }

 .oss-mini-card {
 position: relative;
 border: 1px solid rgba(120, 120, 120, 0.22);
 border-radius: 12px;
 padding: 1rem;
 background: rgba(120, 120, 120, 0.04);
 transition: transform 180ms ease, box-shadow 180ms ease, border-color 180ms ease;
 cursor: pointer;
 }

 .oss-mini-card:hover,
 .oss-mini-card:focus-within {
 transform: translateY(-3px);
 box-shadow: 0 16px 32px rgba(0, 0, 0, 0.12);
 border-color: rgba(120, 120, 120, 0.35);
 }

 .oss-mini-card__overlay {
 position: absolute;
 inset: 0;
 z-index: 1;
 }

 .oss-mini-card h3 {
 margin: 0 0 0.45rem;
 font-size: 1rem;
 }

 .oss-mini-card p {
 margin: 0 0 0.8rem;
 line-height: 1.55;
 font-size: 0.96rem;
 }

 .oss-mini-card a:not(.oss-mini-card__overlay) {
 position: relative;
 z-index: 2;
 }
&lt;/style>
&lt;p>I treat open source as an extension of my &lt;span class="key">SRE&lt;/span> and platform engineering work: build tools around real operational pain points, keep them practical, and make them useful beyond my own environment.&lt;/p></description></item><item><title>Technical Expertise</title><link>https://retakedata.com/skills/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://retakedata.com/skills/</guid><description>&lt;p>As a senior freelance &lt;span class="key">SRE&lt;/span>, my work spans high-volume &lt;span class="key">observability&lt;/span> (11 TB/day across Loki, Elasticsearch, and Thanos), &lt;span class="key">infrastructure automation&lt;/span> (the VM delivery pipeline that managed 3000+ VMs), on-prem &lt;span class="key">Proxmox/Ceph&lt;/span> at scale (100+ nodes), and &lt;span class="key">AI-assisted SRE tooling&lt;/span> (Graphia, GPU serving, RAG pipelines). The stack below reflects what I run in production today.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Observability &amp;amp; Reliability&lt;/strong>: Grafana (2000 users, 10 instances), Loki, Thanos, Vector, Elasticsearch, CloudWatch. SLO-oriented diagnostics, recording rules, and platform hygiene at gambling scale.&lt;/li>
&lt;li>&lt;strong>Infrastructure Automation&lt;/strong>: Terraform multi-provider modules (vSphere, Proxmox, OpenStack), Ansible, Consul, NetBox. I build the tooling that lets small teams operate at fleet scale.&lt;/li>
&lt;li>&lt;strong>On-prem &amp;amp; Virtualization&lt;/strong>: Proxmox (100+ nodes, PXE automated), Ceph, ZFS, NFS, SAN, NVMe over Fabric. vSphere migrations, HA cluster design, multi-DC networking.&lt;/li>
&lt;li>&lt;strong>AI-Assisted Tooling&lt;/strong>: Graphia (SRE agent for Grafana), vLLM serving, RAG pipelines, MCP integration. RBAC-aware, built for real operations.&lt;/li>
&lt;/ul>
&lt;p>Below is a structured view of the technologies I use most:&lt;/p></description></item></channel></rss>