Custom domains¶
Note
Canonical source: guides/custom-domains-setup.md. Transcluded here for the
docs site. Edit the source file if anything is wrong.
(DigitalOcean Kubernetes / Rancher)
This guide walks through deploying the custom domain feature on a DO Kubernetes cluster managed via Rancher. It covers infrastructure provisioning, Kubernetes manifests, environment configuration, and the end-to-end verification checklist.
Read the design doc first: .plans/custom-domain-support.md
Architecture Recap¶
┌──────────────────────────────────────────────┐
*.midsummer.cloud │ LB1 (EXISTING) TLS-term wildcard HTTP:8080 │──▶ nginx :8080 ──▶ Hypercorn :8082
└──────────────────────────────────────────────┘
┌──────────────────────────────────────────────┐
custom domains │ LB2 (NEW) TLS passthrough TCP 80+443 │──▶ nginx :8080 (ACME + 301→https)
└──────────────────────────────────────────────┘──▶ nginx :8443 (SSL, SNI per-domain cert) ──▶ Hypercorn :8082
LB1 (existing): Wildcard
*.midsummer.cloudcert, terminates TLS, forwards HTTP to pod:8080. Untouched.LB2 (new): TLS passthrough — no cert on the LB. Traffic reaches nginx
:8443which picks the per-domain LE cert via SNI. Port80forwarded to pod:8080for ACME HTTP-01 challenges and301→httpsredirects.
Step 1 — Provision Load Balancer 2 (LB2)¶
In the DigitalOcean console (or via doctl):
doctl compute load-balancer create \
--name midsummer-custom-lb \
--region <YOUR_REGION> \
--forwarding-rules entry-protocol:tcp,entry-port:443,target-protocol:tcp,target-port:8443 \
--forwarding-rules entry-protocol:tcp,entry-port:80,target-protocol:tcp,target-port:8080 \
--health-check protocol:http,port:8080,path:/ping/,check-interval-seconds:10,response-timeout-seconds:5,healthy-threshold:2,unhealthy-threshold:3 \
--tag k8s:<YOUR_CLUSTER_ID>
Or via the DO cloud console:
Networking → Load Balancers → Create
Name:
midsummer-custom-lbRegion: same as your cluster
Forwarding Rules:
Rule 1:
TCP:443→TCP:8443(TLS passthrough for custom domains)Rule 2:
TCP:80→TCP:8080(ACME HTTP-01 + redirect)
Health Check:
HTTP:8080/ping/every 10s, threshold 2/3Sticky Sessions: Disabled
TLS: None — this LB must NOT terminate TLS (passthrough only)
Important: Do NOT add a certificate to this LB. TLS termination happens inside nginx on port 8443 using per-domain Let’s Encrypt certs.
Note the LB’s public IP (e.g. 203.0.113.50). You’ll need it for DNS.
Step 2 — DNS Records¶
In your DNS provider (managing midsummer.cloud):
Record |
Type |
Name |
Value |
|---|---|---|---|
A |
A |
|
|
This custom.midsummer.cloud hostname is what tenants will CNAME their custom domains to. It’s configured via CUSTOM_DOMAIN_CNAME_TARGET in the app’s environment.
Tenants will create their own DNS records:
CNAME register.furrycon.org → custom.midsummer.cloud
Step 3 — Persistent Volume for Let’s Encrypt Certificates¶
Certs in /etc/letsencrypt/ must survive pod restarts. Create a PVC and mount it.
kustomize/resources/custom-domains-pvc.yaml (or equivalent Helm values)¶
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: midsummer-letsencrypt
namespace: midsummer
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: do-block-storage
Update the Deployment to mount it¶
Add to your Deployment/Pod spec (whether you use a Helm chart, Kustomize, or Rancher UI):
volumeMounts:
- name: letsencrypt
mountPath: /etc/letsencrypt
- name: acme-webroot
mountPath: /var/www/letsencrypt
- name: nginx-confd
mountPath: /etc/nginx/conf.d
volumes:
- name: letsencrypt
persistentVolumeClaim:
claimName: midsummer-letsencrypt
- name: acme-webroot
persistentVolumeClaim:
claimName: midsummer-acme-webroot
- name: nginx-confd
emptyDir: {} # generated at runtime by the management command
Note on
nginx-confd: Thecustom-domains.conffile is generated at runtime bymanage.py customdomain_regen_nginxand written to/etc/nginx/conf.d/. If you mount anemptyDirvolume there, the file will be lost on pod restart. You must runmanage.py customdomain_regen_nginxas part of the pod startup (see Step 5), or use an init container that calls it.
Note on multi-pod replicas: If you run >1 pod replica, the PVC
ReadWriteOnceaccess mode means only one pod can mount it. For multi-replica setups, useReadWriteManywith a DO Filespace (NFS) or switch to acert-managerCRD approach. For initial deployment, single-replica is recommended.
Step 4 — Update the Kubernetes Service¶
Your existing Service likely only exposes 8080 and 8082. Add port 8443:
apiVersion: v1
kind: Service
metadata:
name: midsummer
namespace: midsummer
spec:
type: LoadBalancer # or ClusterIP + Ingress if you're using an Ingress controller
ports:
- name: http
port: 80
targetPort: 8080
- name: https-custom
port: 443
targetPort: 8443
- name: app
port: 8082
targetPort: 8082
selector:
app: midsummer
If you use an Ingress controller (nginx-ingress, Traefik, etc.) instead of a raw LB, the TLS passthrough setup differs — you’d use TCP passthrough via an Ingress Service of type LoadBalancer for port 443 → 8443. The DO LB approach above is simpler.
Step 5 — Environment Variables¶
Add these to your Deployment’s environment (Rancher UI → Workloads → your deployment → Environment Variables, or in your Helm values / Kustomize overlay):
Variable |
Value |
Notes |
|---|---|---|
|
|
The hostname tenants CNAME to. Must resolve to LB2’s IP. |
|
|
Let’s Encrypt registration email. |
|
|
Start with |
|
|
Must match |
|
|
Celery queue name for cert tasks. Create a worker for it (see Step 7). |
|
|
Set |
|
|
K8s Deployment name, used by kubectl exec to find sibling pods. |
|
(from Downward API) |
K8s namespace the pods run in. |
|
(from Downward API) |
Current pod name (set via Downward API). |
Other existing vars (MIDSUMMER_PROD, MIDSUMMER_DB_URL, CELERY_BROKER_URL, etc.) remain unchanged.
Step 6 — Build & Deploy the Updated Image¶
The Dockerfile already includes the certbot/cron/ssl-cert installs and EXPOSEs port 8443. Build and push:
# From the project root
docker build -t your-registry/midsummer:custom-domains .
docker push your-registry/midsummer:custom-domains
Update your Deployment’s image tag to custom-domains (or whatever your tag strategy is).
Step 7 — Celery Worker for Certificate Management¶
The tenant.tasks.issue_custom_domain_task is routed to the certmgr queue. You need at least one Celery worker listening on that queue:
celery -A midsummer worker -Q certmgr --loglevel=info
In Kubernetes, add this as a sidecar container in the same pod (shares the /etc/letsencrypt volume), or as a separate Deployment with the same image + volume mount. In Rancher, you can add a sidecar via the workload UI.
Example sidecar in the Deployment:
- name: celery-certmgr
image: your-registry/midsummer:custom-domains
command: ["celery", "-A", "midsummer", "worker", "-Q", "certmgr", "--loglevel=info"]
envFrom:
- configMapRef:
name: midsummer-env
- secretRef:
name: midsummer-secrets
volumeMounts:
- name: letsencrypt
mountPath: /etc/letsencrypt
- name: acme-webroot
mountPath: /var/www/letsencrypt
- name: nginx-confd
mountPath: /etc/nginx/conf.d
Step 8 — Startup / Init Procedure¶
On first deployment (and after any pod restart), the generated custom-domains.conf will be empty (or lost if on an emptyDir). You need an init step that regenerates it from the database:
Add an init container to the pod:
initContainers:
- name: regen-nginx
image: your-registry/midsummer:custom-domains
command: ["python", "manage.py", "customdomain_regen_nginx"]
envFrom:
- configMapRef:
name: midsummer-env
- secretRef:
name: midsummer-secrets
volumeMounts:
- name: letsencrypt
mountPath: /etc/letsencrypt
- name: acme-webroot
mountPath: /var/www/letsencrypt
- name: nginx-confd
mountPath: /etc/nginx/conf.d
This ensures /etc/nginx/conf.d/custom-domains.conf is populated from the DB before nginx starts.
Step 9 — Cron Setup¶
The Dockerfile installs cron and supervisord runs cron -f. The cron file cron/midsummer-custom-domains is copied to /etc/cron.d/ by the Dockerfile.
However, inside a Kubernetes pod, cron jobs run in the pod’s filesystem. If you’re using PVC for /etc/letsencrypt but emptyDir for /etc/nginx/conf.d, the cron jobs work because they share the same volumes.
If you prefer Kubernetes-native CronJobs instead of in-pod cron:
apiVersion: batch/v1
kind: CronJob
metadata:
name: custom-domain-renew
namespace: midsummer
spec:
schedule: "17 6 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: renew
image: your-registry/midsummer:custom-domains
command: ["python", "manage.py", "customdomain_renew"]
envFrom:
- configMapRef:
name: midsummer-env
- secretRef:
name: midsummer-secrets
volumeMounts:
- name: letsencrypt
mountPath: /etc/letsencrypt
- name: acme-webroot
mountPath: /var/www/letsencrypt
- name: nginx-confd
mountPath: /etc/nginx/conf.d
restartPolicy: OnFailure
volumes:
- name: letsencrypt
persistentVolumeClaim:
claimName: midsummer-letsencrypt
- name: acme-webroot
emptyDir: {}
- name: nginx-confd
emptyDir: {} # Won't work across pods — see note below
Important: If you use K8s CronJobs instead of in-pod cron, the
emptyDirfornginx-confdwon’t be shared with the app pod. You need either: (a) a shared PVC fornginx-confd, (b) the CronJob callscustomdomain_regen_nginxon the app pod viakubectl exec, or (c) stick with in-pod cron (simplest for single-replica). For initial deployment, in-pod cron is recommended.
Step 10 — End-to-End Verification Checklist¶
Perform these steps in order to verify the full flow. Keep ACME_STAGING=true until all steps pass, then flip to false.
10.1 — Infrastructure¶
LB2 is provisioned with TLS passthrough on
443→8443and TCP80→8080custom.midsummer.cloudA record resolves to LB2’s public IP:dig custom.midsummer.cloudPod starts successfully;
nginx -tpasses (check logs:KUBECTL logs <pod> -c proxy)Health check at
http://<pod-ip>:8080/ping/returnspongPort 8443 is reachable from LB2:
curl -k https://custom.midsummer.cloud:443/returns 421 (snakeoil default)
10.2 — Add a Custom Domain (Staging)¶
In the tenantui, go to Domains → Add Custom Domain
Enter a test subdomain you control (e.g.,
test.yourdomain.com)Select mode “Event” and pick an event
Click Add Domain — confirm the DNS instructions dialog appears
In your DNS provider, create a CNAME:
test.yourdomain.com→custom.midsummer.cloudWait 1-5 minutes for DNS propagation
Click Verify DNS — status should change to
DNS Verified
10.3 — Issue SSL Certificate (Staging)¶
Click Issue SSL — this enqueues the Celery task
Watch the celery-certmgr logs:
kubectl logs <pod> -c celery-certmgr— look forCertificate issued for test.yourdomain.comOr run manually:
kubectl exec <pod> -c api -- python manage.py customdomain_issue --domain test.yourdomain.comVerify in tenantui: status changes to
SSL Issued, cert expiry date populatedCheck the cert:
kubectl exec <pod> -- certbot certificates --stagingCheck the nginx config was regenerated:
kubectl exec <pod> -- cat /etc/nginx/conf.d/custom-domains.confVerify HTTPS works:
curl -vI https://test.yourdomain.com/— should show the Let’s Encrypt staging cert and a 301→HTTPS or 200 response
10.4 — Switch to Production Let’s Encrypt¶
Update the deployment environment:
ACME_STAGING=falseRedeploy / restart the pod
Force-reissue the test domain:
kubectl exec <pod> -c api -- python manage.py customdomain_issue --domain test.yourdomain.comThis will issue a production LE cert (the staging cert will be replaced)
Verify:
curl -vI https://test.yourdomain.com/— should show a valid (non-staging) Let’s Encrypt certificate issued byR3Verify the tenant’s site loads correctly on the custom domain
10.5 — Renewal¶
Test renewal:
kubectl exec <pod> -c api -- python manage.py customdomain_renewFor staging: add
--stagingflag (or ensureACME_STAGING=true)
Verify the cron logs:
kubectl logs <pod> -c custom-domain-cron— should show the daily renewal entryCheck that the
customdomain_renewmanagement command runs without error
10.6 — Cleanup¶
Remove the test domain from tenantui: Delete → confirm the
Domainrouting row is also removedVerify the domain returns 421 on port 8443 (or no longer resolves)
Troubleshooting¶
nginx -t fails after customdomain_regen_nginx¶
Check the generated config:
kubectl exec <pod> -- cat /etc/nginx/conf.d/custom-domains.confCheck for missing cert files:
ls -la /etc/letsencrypt/live/<domain>/inside the podIf cert dirs are missing, the Domain was marked
ssl_issuedbut the cert wasn’t actually issued — reset its status:python manage.py shell→CustomDomain.objects.filter(domain='<domain>').update(status='failed')and re-issue
Pod restart loses nginx config¶
The init container (
regen-nginx) should regenerate from DB on every pod startIf you skipped Step 8 (init container), run manually:
kubectl exec <pod> -c api -- python manage.py customdomain_regen_nginx
Custom domain shows 404 or “No Event matched”¶
Verify the
Domainrow exists:Domain.objects.filter(domain='<domain>')Verify the
CustomDomainrow hasstatus='ssl_issued'Verify
EventSetupMiddlewarefinds the event: check thatCustomDomain.eventis set (not null formode='event')Check
request.current_custom_domainin a Django shell to confirm the middleware is routing correctly
Apex domain (e.g., furrycon.org) doesn’t work with CNAME¶
Some DNS providers don’t support CNAME at the apex. Use an ALIAS/ANAME record instead (Cloudflare, DNSimple, Route53 support this)
Alternative: advise tenants to use a subdomain (e.g.,
events.furrycon.org) which always supports CNAMEThe UI already shows an “apex note” in the DNS instructions dialog
Rancher-Specific Notes¶
Workload → Deployments → midsummer → Environment Variables — add the custom domain vars here
Workload → Deployments → midsummer → Add Sidecar — add the
celery-certmgrcontainer with the same image +celery -A midsummer worker -Q certmgrcommandStorage → PersistentVolumeClaims — create the
midsummer-letsencryptPVC (1Gi,do-block-storage)Service Discovery → Services — ensure the midsummer service exposes port 8443
Load Balancing → Load Balancers — create LB2 with the forwarding rules from Step 1
The Rancher UI makes it straightforward to add environment variables, volumes, and sidecars to an existing Deployment without hand-editing YAML.
Step 11 — Multi-Pod Deployment (5+ Replicas)¶
With multiple pods, you have two problems:
Certificates and nginx config must be shared across all pods — otherwise only one pod has the certs
nginx reload must reach all pods — otherwise only one pod serves the right config
nginx Reload Across All Pods¶
After cert issuance, regenerate_nginx_config() calls reload_nginx_all_pods(), which:
Reloads nginx on the local pod
Uses
kubectl execto reload nginx on all sibling pods matching the deployment label
For this to work, set these environment variables:
Variable |
Value |
Notes |
|---|---|---|
|
|
K8s Deployment name, used to find sibling pods |
|
|
Namespace the pods run in |
|
(from Downward API) |
The current pod’s name, used to skip self |
|
|
Set to |
Rancher adds CATTLE_K8S_NAMESPACE automatically. For POD_NAME, use the Downward API:
spec:
containers:
- name: midsummer
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: DEPLOYMENT_NAME
value: "midsummer"
RBAC for kubectl exec (if using pod-wide reload)¶
If you use the kubectl exec approach, the pod’s service account needs RBAC permission to exec into pods:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-exec-role
namespace: default
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-exec-binding
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: pod-exec-role
subjects:
- kind: ServiceAccount
name: default
namespace: default
Simpler Alternative: Rolling Restart¶
If you don’t want to deal with RBAC and kubectl exec, set RELOAD_ALL_PODS=false and instead trigger a rolling restart after cert changes:
kubectl rollout restart deployment/midsummer -n default
Each pod starts fresh, the init container runs customdomain_regen_nginx, and nginx loads the latest config. This is the simplest approach and works with ReadWriteOnce PVCs (the cert volume is mounted on a single pod which is the cert-manager).
The tradeoff: ~30 seconds of downtime per pod during restart, but pods restart one at a time in a rolling update so there’s no full outage.