The Multigres Operator is a Kubernetes operator for managing distributed, sharded PostgreSQL clusters across multiple failure domains (zones or regions). It provides a unified API to define the topology of your database system, handling the complex orchestration of shards, cells (failure domains), and gateways.
- Features
- Installation
- How it Works
- Configuration & Defaults
- Backup & Restore
- Observability
- Webhook & Certificate Management
- Pool Replication & Quorum
- Constraints & Limits
- Further Reading
- Global Cluster Management: Single source of truth (
MultigresCluster) for the entire database topology. - Automated Sharding: Manages
TableGroupsandShardsas first-class citizens. - Direct Pod Management: Manages individual Pods and PVCs directly (no StatefulSets), enabling targeted decommissioning, rolling updates with primary awareness, and granular PVC lifecycle control.
- Failover & High Availability: Orchestrates Primary/Standby failovers across defined Cells.
- Template System: Define configuration once (
CoreTemplate,CellTemplate,ShardTemplate) and reuse it across the cluster. - Hierarchical Defaults: Smart override logic allowing for global defaults, namespace defaults, and granular overrides.
- Integrated Cert Management: Built-in self-signed certificate generation and rotation for validating webhooks, with optional support for
cert-manager.
- Kubernetes v1.25+
Install the operator with built-in self-signed certificate management:
kubectl apply --server-side -f \
https://github.com/numtide/multigres-operator/releases/latest/download/install.yamlThis deploys the operator into the multigres-operator namespace with:
- All CRDs (MultigresCluster, Cell, Shard, TableGroup, TopoServer, and templates)
- RBAC roles and bindings
- Mutating and Validating webhooks with self-signed certificates (auto-rotated)
- The operator Deployment
- Metrics endpoint
Once the operator is running, try a sample cluster:
kubectl apply -f https://raw.githubusercontent.com/numtide/multigres-operator/main/config/samples/minimal.yamlFor more sample configurations, see the samples directory.
| Option | Description | Guide |
|---|---|---|
| Self-signed certs (default) | Zero-config TLS β operator generates and rotates its own CA. | (Installed above) |
| cert-manager | External certificate management via cert-manager. | Cert-Manager Demo |
| Observability stack | Full metrics, tracing, and dashboards (Prometheus, Tempo, Grafana). | Observability Demo |
The Multigres Operator follows a Parent/Child architecture. You, the user, manage the Root resource (MultigresCluster) and its shared Templates. The operator automatically creates and reconciles all necessary child resources (Cells, TableGroups, Shards, TopoServers) to match your desired state.
[MultigresCluster] π (Root CR - User Editable)
β
βββ π Defines [TemplateDefaults] (Cluster-wide default templates)
β
βββ π [GlobalTopoServer] (Child CR) β π Uses [CoreTemplate] OR inline [spec]
β
βββ π€ MultiAdmin Resources β π Uses [CoreTemplate] OR inline [spec]
β
βββ π [Cell] (Child CR) β π Uses [CellTemplate] OR inline [spec]
β β
β βββ πͺ MultiGateway Resources
β βββ π‘ [LocalTopoServer] (Child CR, optional)
β
βββ ποΈ [TableGroup] (Child CR)
β
βββ π¦ [Shard] (Child CR) β π Uses [ShardTemplate] OR inline [spec]
β
βββ π§ MultiOrch Resources (Deployment)
βββ π Pools (Operator-managed Pods + PVCs)
π [CoreTemplate] (User-editable, scoped config)
βββ globalTopoServer
βββ multiadmin
π [CellTemplate] (User-editable, scoped config)
βββ multigateway
βββ localTopoServer (optional)
π [ShardTemplate] (User-editable, scoped config)
βββ multiorch
βββ pools (postgres + multipooler)
Important:
- Only
MultigresCluster,CoreTemplate,CellTemplate, andShardTemplateare meant to be edited by users. - Child resources (
Cell,TableGroup,Shard,TopoServer) are Read-Only. Any manual changes to them will be immediately reverted by the operator to ensure the system stays in sync with the root configuration.
The operator uses a 4-Level Override Chain to resolve configuration for every component. This allows you to keep your MultigresCluster spec clean while maintaining full control when needed.
When determining the configuration for a component (e.g., a Shard), the operator looks for configuration in this order:
- Inline Spec / Explicit Template Ref: Defined directly on the component in the
MultigresClusterYAML. - Cluster-Level Template Default: Defined in
spec.templateDefaultsof theMultigresCluster. - Namespace-Level Default: A template of the correct kind (e.g.,
ShardTemplate) named"default"in the same namespace. - Operator Hardcoded Defaults: Fallback values built into the operator Webhook.
Templates allow you to define standard configurations (e.g., "Standard High-Availability Cell"). You can then apply specific overrides on top of a template.
Example: Using a Template with Overrides
spec:
cells:
- name: "us-east-1a"
cellTemplate: "standard-ha-cell" # <--- Uses the template
overrides: # <--- Patches specific fields
multigateway:
replicas: 5 # <--- Overrides only the replica countNote on Overrides: When using overrides, you must provide the complete struct for the section you are overriding if it's a pointer. For specific fields like resources, it's safer to ensure you provide the full context if the merge behavior isn't granular enough for your needs (currently, the resolver performs a deep merge).
Warning
When a template (CoreTemplate, CellTemplate, or ShardTemplate) is updated, all clusters using that template are reconciled immediately. This means changes to a shared template propagate to every referencing cluster at once.
For production environments where you want controlled rollouts, consider versioning templates by name:
# Instead of editing "standard-shard" in-place...
apiVersion: multigres.com/v1alpha1
kind: ShardTemplate
metadata:
name: standard-shard-v2 # <--- New version = new resource
spec:
# ... updated configurationThen update each cluster's templateRef individually when ready:
spec:
templateDefaults:
shardTemplate: "standard-shard-v2" # <--- Opt-in to the new versionNote
Avoid using default-named templates (the namespace-level fallback) in production if you need controlled rollouts. They cannot be versioned since any cluster without an explicit template reference will automatically use whichever template is named default.
This mechanism may change in future versions. See Template Propagation for details on planned improvements.
The operator integrates pgBackRest for automated backups, WAL archiving, and point-in-time recovery (PITR). Two storage backends are supported: S3 (recommended for production and multi-cell clusters) and Filesystem (PVC-based, for development/single-node). Backup configuration is fully declarative and propagates from the cluster level down to individual shards.
Key features:
- Replica-based backups β backups run on a replica to avoid impacting the primary
- S3 credential options β IRSA, static credentials, or EC2 instance metadata
- Auto-generated TLS β pgBackRest inter-node TLS is managed automatically, with optional cert-manager support
Warning
Filesystem backups are cell-local. Cross-cell failover cannot restore from another cell's backup. Use S3 for multi-cell clusters.
π Full documentation: Backup & Restore Guide
The operator ships with built-in support for metrics, alerting, distributed tracing, and structured logging.
- Metrics β Prometheus endpoint with 8 operator-specific metrics + controller-runtime framework metrics
- Alerts β 7 pre-configured PrometheusRule alerts with dedicated runbooks (view runbooks)
- Grafana Dashboards β Operator health dashboard + per-cluster topology dashboard
- Distributed Tracing β OpenTelemetry OTLP support, disabled by default, zero overhead when off
- Structured Logging β JSON logging with automatic
trace_id/span_idinjection for log-trace correlation
π Full documentation: Observability Guide Β· Observability Demo
The operator includes a Mutating and Validating Webhook to enforce defaults and data integrity.
By default, the operator manages its own TLS certificates using the generic pkg/cert module. This implements a Split-Secret PKI architecture:
- Bootstrap: On startup, the cert rotator generates a self-signed Root CA (ECDSA P-256) and a Server Certificate, storing them in two separate Kubernetes Secrets.
- CA Bundle Injection: A post-reconcile hook automatically patches the
MutatingWebhookConfigurationandValidatingWebhookConfigurationwith the CA bundle. - Rotation: A background loop checks certificates hourly. Certs nearing expiry (or signed by a rotated CA) are automatically renewed without downtime.
- Owner References: Both secrets are owned by the operator Deployment, so they are garbage-collected on uninstall.
If you prefer to use cert-manager or another external tool, deploy using the cert-manager overlay (install-certmanager.yaml). This overlay:
- Creates a
CertificateandClusterIssuerresource for cert-manager to manage. - Mounts the cert-manager-provisioned secret to
/var/run/secrets/webhookso certificates exist on disk at startup.
The operator automatically detects the certificate management strategy on startup:
- If certificates already exist on disk and the operator did not previously manage them (no cert-strategy annotation), it assumes an external provider (e.g. cert-manager) and skips internal rotation.
- If no certificates exist on disk, or the operator previously annotated the ValidatingWebhookConfiguration, internal certificate rotation is enabled.
π Cert-Manager walkthrough: Cert-Manager Demo
Multigres uses the ANY_2 durability policy by default, which requires every write to be acknowledged by at least 2 nodes (the primary + 1 synchronous standby). This has implications for how many replicas you should run per cell in readWrite pools.
| Replicas per Cell | Configuration | Rolling Upgrade Behavior |
|---|---|---|
| 1 | 1 pod (primary only, no standbys) | Downtime during upgrades. No standby to maintain quorum. |
| 2 | 1 primary + 1 standby | Downtime during upgrades. Draining the standby leaves zero synchronous standbys, violating ANY_2. Upstream multigres rejects the UpdateSynchronousStandbyList REMOVE because it would empty the synchronous standby list. |
| 3 (recommended) | 1 primary + 2 standbys | Zero-downtime upgrades. One standby can be drained while the other maintains quorum. |
The operator enforces a hard minimum of 1 replica per cell (the CRD rejects replicasPerCell: 0). For readWrite pools with fewer than 3 replicas, the webhook returns an admission warning (not a rejection) explaining the quorum limitation.
readOnly pools are not subject to this warning since they don't participate in write quorum.
Please be aware of the following constraints in the current version:
- Database Limit: Only 1 database is supported per cluster. It must be named
postgresand markeddefault: true. - Shard Naming: Shards currently must be named
0-inf- this is a limitation of the current implementation of Multigres. - Naming Lengths:
- TableGroup Names: If the combined name (
cluster-db-tg) exceeds 28 characters, the operator automatically hashes the database and tablegroup names to ensure that the resulting child resource names (Shards, Pods, PVCs) stay within Kubernetes limits (63 chars). - Cluster Name: Recommended to be under 20 characters to ensure that even with hashing, suffixes fit comfortably.
- TableGroup Names: If the combined name (
- Immutable Fields: Some fields like
zoneandregionin Cell definitions are immutable after creation. - Append-Only Pools and Cells: Pools and cells cannot be renamed or removed from a cluster. This prevents orphaned pods and stale etcd registrations.
| Resource | Description |
|---|---|
| Operator Capability Levels | Maturity assessment against the Operator Framework capability model |
| Storage Management | PVC deletion policies (Retain/Delete) and volume expansion |
| Configuration Reference | Operator flags, environment variables, and logging |
| Demos | Guided walkthroughs (webhook, cert-manager, observability) |
| Developer Documentation | Internal architecture, controller patterns, caching strategy |
| Contributing | Development setup, local Kind deployment, code style |
| Changelog | Release history |