From per-customer Azure stacks to RKE2 + GitOps

We replaced manual, per-customer Azure deployments with a portable Kubernetes platform using RKE2, Rancher, Flux v2, and OCI-backed Helm charts. The result is three tenancy tiers across cloud and on-prem, Entra ID–backed RBAC, auditable GitOps, and cost that tracks usage—idle stays lean, peak scales up to process every request.
The client’s B2C product ran as per-customer Azure stacks that were expensive (> $10k/month), hard to update, and lacked affordable observability. Azure limitations (e.g., immutable VNet IP ranges post-provision) and Terraform-driven update chains turned routine releases into multi-hour operations, repeated for every tenant. We re-platformed to RKE2 with Rancher, drove all app changes through Flux v2 GitOps and OCI Helm, and added Entra ID SSO + RBAC and External Secrets to support Azure Key Vault or HashiCorp Vault. This is not a “flat cost reduction” story—compute scales up with traffic and down when quiet—but deployments are now sub-10 minutes with near-zero downtime, telemetry costs per client dropped from ~$600 to $20–$100, and the platform runs in Azure or on-prem under the same operating model.
End-to-end application release
<10 minutes
Per-tenant logs/traces/metrics spend
$600 → $20–$100
4 deployment models
Shared tenant, client subscription, BYO Azure subscription, fully isolated on-prem.
The team needed a portable, compliant, multi-tenant platform without per-customer duplication, Terraform-only updates, or Azure IP-range traps.
Per-customer Azure stacks with no compute sharing
Each tenant required a full environment, driving costs and operational overhead; scaling one customer didn’t help another.
Updates bound to Terraform replays and Azure immutabilities
Rolling out app changes meant re-running Terraform with preplanned address spaces; mistakes forced rebuilds and downtime risk.
Observability too expensive for B2C scale
Azure-managed telemetry made full logs/traces/metrics economically unrealistic across many tenants; visibility was inconsistent.
We built a portable Kubernetes platform with RKE2, Flux v2, Rancher, and Entra ID to support shared, BYO, and isolated deployments with auditable changes and predictable ops.
GitOps with Flux v2 and OCI-backed Helm
All workloads ship as Helm charts stored in an OCI registry (ACR/GHCR). Flux kustomizations per environment (dev/test/stage/prod) apply changes continuously. Commit signing and PR approvals provide authorship and provenance; rollbacks are Git reversions, not Terraform reruns.
RKE2 everywhere + Rancher fleet management with Entra ID RBAC
RKE2 powers clusters in Azure and on-prem. Rancher manages cluster lifecycle and upgrades; Azure Entra ID provides SSO with group-mapped Roles/RoleBindings (developers, cluster admins, compliance reviewers). NetworkPolicies and ResourceQuotas enforce tenant boundaries.
Multi-tenancy architecture and deployment options
Three tenancy tiers: (1) shared tenant cluster with per-tenant namespaces, ingress, quotas, and centralized monitoring; (2) tenant on the client’s subscription (logs either local or forwarded to a shared monitoring cluster); (3) BYO Azure subscription and full isolation; plus an on-prem option for regulated environments. Per-tenant Helm values drive config without cloning infra.
Stuck with per-customer cloud stacks, slow releases, and telemetry you can’t afford?
Ready to start your cloud journey with us?
Technologies we used
• RKE2 (Rancher Kubernetes Engine 2)
• Rancher (cluster access, upgrades, fleet management)
• Flux v2 (GitOps Toolkit) and Kustomize
• Helm 3 with OCI registry (ACR/GHCR)
• Microsoft Entra ID (SSO and RBAC groups)
• External Secrets Operator
• Azure Key Vault
• HashiCorp Vault
• Kubernetes Namespaces, NetworkPolicies, ResourceQuotas, Ingress
• OpenTelemetry Collector
• Prometheus
• Grafana
• Loki
• Tempo
• NGINX Ingress Controller