Custom domains¶

Note

Canonical source: guides/custom-domains-setup.md. Transcluded here for the docs site. Edit the source file if anything is wrong.

(DigitalOcean Kubernetes / Rancher)

This guide walks through deploying the custom domain feature on a DO Kubernetes cluster managed via Rancher. It covers infrastructure provisioning, Kubernetes manifests, environment configuration, and the end-to-end verification checklist.

Read the design doc first: .plans/custom-domain-support.md

Architecture Recap¶

                    ┌──────────────────────────────────────────────┐
 *.midsummer.cloud │ LB1 (EXISTING)  TLS-term wildcard  HTTP:8080  │──▶ nginx :8080 ──▶ Hypercorn :8082
                    └──────────────────────────────────────────────┘

                    ┌──────────────────────────────────────────────┐
 custom domains    │ LB2 (NEW)  TLS passthrough  TCP 80+443        │──▶ nginx :8080 (ACME + 301→https)
                    └──────────────────────────────────────────────┘──▶ nginx :8443 (SSL, SNI per-domain cert) ──▶ Hypercorn :8082

LB1 (existing): Wildcard *.midsummer.cloud cert, terminates TLS, forwards HTTP to pod :8080. Untouched.
LB2 (new): TLS passthrough — no cert on the LB. Traffic reaches nginx :8443 which picks the per-domain LE cert via SNI. Port 80 forwarded to pod :8080 for ACME HTTP-01 challenges and 301→https redirects.

Step 1 — Provision Load Balancer 2 (LB2)¶

In the DigitalOcean console (or via doctl):

doctl compute load-balancer create \
  --name midsummer-custom-lb \
  --region <YOUR_REGION> \
  --forwarding-rules entry-protocol:tcp,entry-port:443,target-protocol:tcp,target-port:8443 \
  --forwarding-rules entry-protocol:tcp,entry-port:80,target-protocol:tcp,target-port:8080 \
  --health-check protocol:http,port:8080,path:/ping/,check-interval-seconds:10,response-timeout-seconds:5,healthy-threshold:2,unhealthy-threshold:3 \
  --tag k8s:<YOUR_CLUSTER_ID>

Or via the DO cloud console:

Networking → Load Balancers → Create
Name: midsummer-custom-lb
Region: same as your cluster
Forwarding Rules:
- Rule 1: TCP:443 → TCP:8443 (TLS passthrough for custom domains)
- Rule 2: TCP:80 → TCP:8080 (ACME HTTP-01 + redirect)
Health Check: HTTP:8080/ping/ every 10s, threshold 2/3
Sticky Sessions: Disabled
TLS: None — this LB must NOT terminate TLS (passthrough only)

Important: Do NOT add a certificate to this LB. TLS termination happens inside nginx on port 8443 using per-domain Let’s Encrypt certs.

Note the LB’s public IP (e.g. 203.0.113.50). You’ll need it for DNS.

Step 2 — DNS Records¶

In your DNS provider (managing midsummer.cloud):

Record	Type	Name	Value
A	A	`custom.midsummer.cloud`	`<LB2_PUBLIC_IP>`

This custom.midsummer.cloud hostname is what tenants will CNAME their custom domains to. It’s configured via CUSTOM_DOMAIN_CNAME_TARGET in the app’s environment.

Tenants will create their own DNS records:

CNAME  register.furrycon.org  →  custom.midsummer.cloud

Step 3 — Persistent Volume for Let’s Encrypt Certificates¶

Certs in /etc/letsencrypt/ must survive pod restarts. Create a PVC and mount it.

`kustomize/resources/custom-domains-pvc.yaml` (or equivalent Helm values)¶

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: midsummer-letsencrypt
  namespace: midsummer
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: do-block-storage

Update the Deployment to mount it¶

Add to your Deployment/Pod spec (whether you use a Helm chart, Kustomize, or Rancher UI):

volumeMounts:
  - name: letsencrypt
    mountPath: /etc/letsencrypt
  - name: acme-webroot
    mountPath: /var/www/letsencrypt
  - name: nginx-confd
    mountPath: /etc/nginx/conf.d

volumes:
  - name: letsencrypt
    persistentVolumeClaim:
      claimName: midsummer-letsencrypt
  - name: acme-webroot
    persistentVolumeClaim:
      claimName: midsummer-acme-webroot
  - name: nginx-confd
    emptyDir: {}           # generated at runtime by the management command

Note on nginx-confd: The custom-domains.conf file is generated at runtime by manage.py customdomain_regen_nginx and written to /etc/nginx/conf.d/. If you mount an emptyDir volume there, the file will be lost on pod restart. You must run manage.py customdomain_regen_nginx as part of the pod startup (see Step 5), or use an init container that calls it.

Note on multi-pod replicas: If you run >1 pod replica, the PVC ReadWriteOnce access mode means only one pod can mount it. For multi-replica setups, use ReadWriteMany with a DO Filespace (NFS) or switch to a cert-manager CRD approach. For initial deployment, single-replica is recommended.

Step 4 — Update the Kubernetes Service¶

Your existing Service likely only exposes 8080 and 8082. Add port 8443:

apiVersion: v1
kind: Service
metadata:
  name: midsummer
  namespace: midsummer
spec:
  type: LoadBalancer   # or ClusterIP + Ingress if you're using an Ingress controller
  ports:
    - name: http
      port: 80
      targetPort: 8080
    - name: https-custom
      port: 443
      targetPort: 8443
    - name: app
      port: 8082
      targetPort: 8082
  selector:
    app: midsummer

If you use an Ingress controller (nginx-ingress, Traefik, etc.) instead of a raw LB, the TLS passthrough setup differs — you’d use TCP passthrough via an Ingress Service of type LoadBalancer for port 443 → 8443. The DO LB approach above is simpler.

Step 5 — Environment Variables¶

Add these to your Deployment’s environment (Rancher UI → Workloads → your deployment → Environment Variables, or in your Helm values / Kustomize overlay):

Variable	Value	Notes
`CUSTOM_DOMAIN_CNAME_TARGET`	`custom.midsummer.cloud`	The hostname tenants CNAME to. Must resolve to LB2’s IP.
`ACME_EMAIL`	`ops@midsummer.cloud`	Let’s Encrypt registration email.
`ACME_STAGING`	`true`	Start with `true`! Flip to `false` after E2E verification.
`ACME_WEBROOT`	`/var/www/letsencrypt`	Must match `nginx.conf` ACME location.
`CUSTOM_DOMAIN_QUEUE`	`certmgr`	Celery queue name for cert tasks. Create a worker for it (see Step 7).
`RELOAD_ALL_PODS`	`true`	Set `false` to only reload local nginx (no kubectl needed).
`DEPLOYMENT_NAME`	`midsummer`	K8s Deployment name, used by kubectl exec to find sibling pods.
`POD_NAMESPACE`	(from Downward API)	K8s namespace the pods run in.
`POD_NAME`	(from Downward API)	Current pod name (set via Downward API).

Other existing vars (MIDSUMMER_PROD, MIDSUMMER_DB_URL, CELERY_BROKER_URL, etc.) remain unchanged.

Step 6 — Build & Deploy the Updated Image¶

The Dockerfile already includes the certbot/cron/ssl-cert installs and EXPOSEs port 8443. Build and push:

# From the project root
docker build -t your-registry/midsummer:custom-domains .
docker push your-registry/midsummer:custom-domains

Update your Deployment’s image tag to custom-domains (or whatever your tag strategy is).

Step 7 — Celery Worker for Certificate Management¶

The tenant.tasks.issue_custom_domain_task is routed to the certmgr queue. You need at least one Celery worker listening on that queue:

celery -A midsummer worker -Q certmgr --loglevel=info

In Kubernetes, add this as a sidecar container in the same pod (shares the /etc/letsencrypt volume), or as a separate Deployment with the same image + volume mount. In Rancher, you can add a sidecar via the workload UI.

Example sidecar in the Deployment:

- name: celery-certmgr
  image: your-registry/midsummer:custom-domains
  command: ["celery", "-A", "midsummer", "worker", "-Q", "certmgr", "--loglevel=info"]
  envFrom:
    - configMapRef:
        name: midsummer-env
    - secretRef:
        name: midsummer-secrets
  volumeMounts:
    - name: letsencrypt
      mountPath: /etc/letsencrypt
    - name: acme-webroot
      mountPath: /var/www/letsencrypt
    - name: nginx-confd
      mountPath: /etc/nginx/conf.d

Step 8 — Startup / Init Procedure¶

On first deployment (and after any pod restart), the generated custom-domains.conf will be empty (or lost if on an emptyDir). You need an init step that regenerates it from the database:

Add an init container to the pod:

initContainers:
  - name: regen-nginx
    image: your-registry/midsummer:custom-domains
    command: ["python", "manage.py", "customdomain_regen_nginx"]
    envFrom:
      - configMapRef:
          name: midsummer-env
      - secretRef:
          name: midsummer-secrets
    volumeMounts:
      - name: letsencrypt
        mountPath: /etc/letsencrypt
      - name: acme-webroot
        mountPath: /var/www/letsencrypt
      - name: nginx-confd
        mountPath: /etc/nginx/conf.d

This ensures /etc/nginx/conf.d/custom-domains.conf is populated from the DB before nginx starts.

Step 9 — Cron Setup¶

The Dockerfile installs cron and supervisord runs cron -f. The cron file cron/midsummer-custom-domains is copied to /etc/cron.d/ by the Dockerfile.

However, inside a Kubernetes pod, cron jobs run in the pod’s filesystem. If you’re using PVC for /etc/letsencrypt but emptyDir for /etc/nginx/conf.d, the cron jobs work because they share the same volumes.

If you prefer Kubernetes-native CronJobs instead of in-pod cron:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: custom-domain-renew
  namespace: midsummer
spec:
  schedule: "17 6 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: renew
              image: your-registry/midsummer:custom-domains
              command: ["python", "manage.py", "customdomain_renew"]
              envFrom:
                - configMapRef:
                    name: midsummer-env
                - secretRef:
                    name: midsummer-secrets
              volumeMounts:
                - name: letsencrypt
                  mountPath: /etc/letsencrypt
                - name: acme-webroot
                  mountPath: /var/www/letsencrypt
                - name: nginx-confd
                  mountPath: /etc/nginx/conf.d
          restartPolicy: OnFailure
          volumes:
            - name: letsencrypt
              persistentVolumeClaim:
                claimName: midsummer-letsencrypt
            - name: acme-webroot
              emptyDir: {}
            - name: nginx-confd
              emptyDir: {}   # Won't work across pods — see note below

Important: If you use K8s CronJobs instead of in-pod cron, the emptyDir for nginx-confd won’t be shared with the app pod. You need either: (a) a shared PVC for nginx-confd, (b) the CronJob calls customdomain_regen_nginx on the app pod via kubectl exec, or (c) stick with in-pod cron (simplest for single-replica). For initial deployment, in-pod cron is recommended.

Step 10 — End-to-End Verification Checklist¶

Perform these steps in order to verify the full flow. Keep ACME_STAGING=true until all steps pass, then flip to false.

10.1 — Infrastructure¶

LB2 is provisioned with TLS passthrough on 443→8443 and TCP 80→8080
custom.midsummer.cloud A record resolves to LB2’s public IP: dig custom.midsummer.cloud
Pod starts successfully; nginx -t passes (check logs: KUBECTL logs <pod> -c proxy)
Health check at http://<pod-ip>:8080/ping/ returns pong
Port 8443 is reachable from LB2: curl -k https://custom.midsummer.cloud:443/ returns 421 (snakeoil default)

10.2 — Add a Custom Domain (Staging)¶

In the tenantui, go to Domains → Add Custom Domain
Enter a test subdomain you control (e.g., test.yourdomain.com)
Select mode “Event” and pick an event
Click Add Domain — confirm the DNS instructions dialog appears
In your DNS provider, create a CNAME: test.yourdomain.com → custom.midsummer.cloud
Wait 1-5 minutes for DNS propagation
Click Verify DNS — status should change to DNS Verified

10.3 — Issue SSL Certificate (Staging)¶

Click Issue SSL — this enqueues the Celery task
Watch the celery-certmgr logs: kubectl logs <pod> -c celery-certmgr — look for Certificate issued for test.yourdomain.com
Or run manually: kubectl exec <pod> -c api -- python manage.py customdomain_issue --domain test.yourdomain.com
Verify in tenantui: status changes to SSL Issued, cert expiry date populated
Check the cert: kubectl exec <pod> -- certbot certificates --staging
Check the nginx config was regenerated: kubectl exec <pod> -- cat /etc/nginx/conf.d/custom-domains.conf
Verify HTTPS works: curl -vI https://test.yourdomain.com/ — should show the Let’s Encrypt staging cert and a 301→HTTPS or 200 response

10.4 — Switch to Production Let’s Encrypt¶

Update the deployment environment: ACME_STAGING=false
Redeploy / restart the pod
Force-reissue the test domain: kubectl exec <pod> -c api -- python manage.py customdomain_issue --domain test.yourdomain.com
- This will issue a production LE cert (the staging cert will be replaced)
Verify: curl -vI https://test.yourdomain.com/ — should show a valid (non-staging) Let’s Encrypt certificate issued by R3
Verify the tenant’s site loads correctly on the custom domain

10.5 — Renewal¶

Test renewal: kubectl exec <pod> -c api -- python manage.py customdomain_renew
- For staging: add --staging flag (or ensure ACME_STAGING=true)
Verify the cron logs: kubectl logs <pod> -c custom-domain-cron — should show the daily renewal entry
Check that the customdomain_renew management command runs without error

10.6 — Cleanup¶

Remove the test domain from tenantui: Delete → confirm the Domain routing row is also removed
Verify the domain returns 421 on port 8443 (or no longer resolves)

Troubleshooting¶

certbot fails with “Failed authorization procedure”¶

Confirm DNS CNAME is in place: dig test.yourdomain.com CNAME
Confirm port 80 is reachable from the internet to the pod: curl http://test.yourdomain.com/.well-known/acme-challenge/test
If using ACME_STAGING=true, the cert will be a fake “Fake LE Intermediate X1” — this is expected
Check certbot logs: kubectl exec <pod> -- cat /var/log/letsencrypt/letsencrypt.log

nginx -t fails after customdomain_regen_nginx¶

Check the generated config: kubectl exec <pod> -- cat /etc/nginx/conf.d/custom-domains.conf
Check for missing cert files: ls -la /etc/letsencrypt/live/<domain>/ inside the pod
If cert dirs are missing, the Domain was marked ssl_issued but the cert wasn’t actually issued — reset its status: python manage.py shell → CustomDomain.objects.filter(domain='<domain>').update(status='failed') and re-issue

Pod restart loses nginx config¶

The init container (regen-nginx) should regenerate from DB on every pod start
If you skipped Step 8 (init container), run manually: kubectl exec <pod> -c api -- python manage.py customdomain_regen_nginx

Custom domain shows 404 or “No Event matched”¶

Verify the Domain row exists: Domain.objects.filter(domain='<domain>')
Verify the CustomDomain row has status='ssl_issued'
Verify EventSetupMiddleware finds the event: check that CustomDomain.event is set (not null for mode='event')
Check request.current_custom_domain in a Django shell to confirm the middleware is routing correctly

Apex domain (e.g., `furrycon.org`) doesn’t work with CNAME¶

Some DNS providers don’t support CNAME at the apex. Use an ALIAS/ANAME record instead (Cloudflare, DNSimple, Route53 support this)
Alternative: advise tenants to use a subdomain (e.g., events.furrycon.org) which always supports CNAME
The UI already shows an “apex note” in the DNS instructions dialog

Rancher-Specific Notes¶

Workload → Deployments → midsummer → Environment Variables — add the custom domain vars here
Workload → Deployments → midsummer → Add Sidecar — add the celery-certmgr container with the same image + celery -A midsummer worker -Q certmgr command
Storage → PersistentVolumeClaims — create the midsummer-letsencrypt PVC (1Gi, do-block-storage)
Service Discovery → Services — ensure the midsummer service exposes port 8443
Load Balancing → Load Balancers — create LB2 with the forwarding rules from Step 1

The Rancher UI makes it straightforward to add environment variables, volumes, and sidecars to an existing Deployment without hand-editing YAML.

Step 11 — Multi-Pod Deployment (5+ Replicas)¶

With multiple pods, you have two problems:

Certificates and nginx config must be shared across all pods — otherwise only one pod has the certs
nginx reload must reach all pods — otherwise only one pod serves the right config

Shared Storage (Required)¶

Use a ReadWriteMany (RWX) volume for /etc/letsencrypt and /etc/nginx/conf.d. DigitalOcean offers Filespaces (NFS) or you can use a NAS:

volumes:
  - name: letsencrypt
    persistentVolumeClaim:
      claimName: midsummer-letsencrypt   # Must be ReadWriteMany
  - name: nginx-confd
    persistentVolumeClaim:
      claimName: midsummer-nginx-confd    # Must be ReadWriteMany
  - name: acme-webroot
    emptyDir: {}

Note: DO Block Storage (do-block-storage) only supports ReadWriteOnce. For RWX, use a DO Filespace (NFS) or a Longhorn distributed volume.

nginx Reload Across All Pods¶

After cert issuance, regenerate_nginx_config() calls reload_nginx_all_pods(), which:

Reloads nginx on the local pod
Uses kubectl exec to reload nginx on all sibling pods matching the deployment label

For this to work, set these environment variables:

Variable	Value	Notes
`DEPLOYMENT_NAME`	`midsummer`	K8s Deployment name, used to find sibling pods
`POD_NAMESPACE`	`default`	Namespace the pods run in
`POD_NAME`	(from Downward API)	The current pod’s name, used to skip self
`RELOAD_ALL_PODS`	`true`	Set to `false` to only reload local nginx

Rancher adds CATTLE_K8S_NAMESPACE automatically. For POD_NAME, use the Downward API:

spec:
  containers:
    - name: midsummer
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: DEPLOYMENT_NAME
          value: "midsummer"

RBAC for kubectl exec (if using pod-wide reload)¶

If you use the kubectl exec approach, the pod’s service account needs RBAC permission to exec into pods:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-exec-role
  namespace: default
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list"]
  - apiGroups: [""]
    resources: ["pods/exec"]
    verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-exec-binding
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: pod-exec-role
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default

Simpler Alternative: Rolling Restart¶

If you don’t want to deal with RBAC and kubectl exec, set RELOAD_ALL_PODS=false and instead trigger a rolling restart after cert changes:

kubectl rollout restart deployment/midsummer -n default

Each pod starts fresh, the init container runs customdomain_regen_nginx, and nginx loads the latest config. This is the simplest approach and works with ReadWriteOnce PVCs (the cert volume is mounted on a single pod which is the cert-manager).

The tradeoff: ~30 seconds of downtime per pod during restart, but pods restart one at a time in a rolling update so there’s no full outage.

Custom domains¶

Architecture Recap¶

Step 1 — Provision Load Balancer 2 (LB2)¶

Step 2 — DNS Records¶

Step 3 — Persistent Volume for Let’s Encrypt Certificates¶

kustomize/resources/custom-domains-pvc.yaml (or equivalent Helm values)¶

Update the Deployment to mount it¶

Step 4 — Update the Kubernetes Service¶

Step 5 — Environment Variables¶

Step 6 — Build & Deploy the Updated Image¶

Step 7 — Celery Worker for Certificate Management¶

Step 8 — Startup / Init Procedure¶

Step 9 — Cron Setup¶

Step 10 — End-to-End Verification Checklist¶

10.1 — Infrastructure¶

10.2 — Add a Custom Domain (Staging)¶

10.3 — Issue SSL Certificate (Staging)¶

10.4 — Switch to Production Let’s Encrypt¶

10.5 — Renewal¶

10.6 — Cleanup¶

Troubleshooting¶

certbot fails with “Failed authorization procedure”¶

nginx -t fails after customdomain_regen_nginx¶

Pod restart loses nginx config¶

Custom domain shows 404 or “No Event matched”¶

Apex domain (e.g., furrycon.org) doesn’t work with CNAME¶

Rancher-Specific Notes¶

Step 11 — Multi-Pod Deployment (5+ Replicas)¶

Shared Storage (Required)¶

nginx Reload Across All Pods¶

RBAC for kubectl exec (if using pod-wide reload)¶

Simpler Alternative: Rolling Restart¶

Related¶

`kustomize/resources/custom-domains-pvc.yaml` (or equivalent Helm values)¶

Apex domain (e.g., `furrycon.org`) doesn’t work with CNAME¶