Custom domains

Note

Canonical source: guides/custom-domains-setup.md. Transcluded here for the docs site. Edit the source file if anything is wrong.

(DigitalOcean Kubernetes / Rancher)

This guide walks through deploying the custom domain feature on a DO Kubernetes cluster managed via Rancher. It covers infrastructure provisioning, Kubernetes manifests, environment configuration, and the end-to-end verification checklist.

Read the design doc first: .plans/custom-domain-support.md


Architecture Recap

                    ┌──────────────────────────────────────────────┐
 *.midsummer.cloud │ LB1 (EXISTING)  TLS-term wildcard  HTTP:8080  │──▶ nginx :8080 ──▶ Hypercorn :8082
                    └──────────────────────────────────────────────┘

                    ┌──────────────────────────────────────────────┐
 custom domains    │ LB2 (NEW)  TLS passthrough  TCP 80+443        │──▶ nginx :8080 (ACME + 301→https)
                    └──────────────────────────────────────────────┘──▶ nginx :8443 (SSL, SNI per-domain cert) ──▶ Hypercorn :8082
  • LB1 (existing): Wildcard *.midsummer.cloud cert, terminates TLS, forwards HTTP to pod :8080. Untouched.

  • LB2 (new): TLS passthrough — no cert on the LB. Traffic reaches nginx :8443 which picks the per-domain LE cert via SNI. Port 80 forwarded to pod :8080 for ACME HTTP-01 challenges and 301→https redirects.


Step 1 — Provision Load Balancer 2 (LB2)

In the DigitalOcean console (or via doctl):

doctl compute load-balancer create \
  --name midsummer-custom-lb \
  --region <YOUR_REGION> \
  --forwarding-rules entry-protocol:tcp,entry-port:443,target-protocol:tcp,target-port:8443 \
  --forwarding-rules entry-protocol:tcp,entry-port:80,target-protocol:tcp,target-port:8080 \
  --health-check protocol:http,port:8080,path:/ping/,check-interval-seconds:10,response-timeout-seconds:5,healthy-threshold:2,unhealthy-threshold:3 \
  --tag k8s:<YOUR_CLUSTER_ID>

Or via the DO cloud console:

  1. Networking → Load Balancers → Create

  2. Name: midsummer-custom-lb

  3. Region: same as your cluster

  4. Forwarding Rules:

    • Rule 1: TCP:443TCP:8443 (TLS passthrough for custom domains)

    • Rule 2: TCP:80TCP:8080 (ACME HTTP-01 + redirect)

  5. Health Check: HTTP:8080/ping/ every 10s, threshold 2/3

  6. Sticky Sessions: Disabled

  7. TLS: None — this LB must NOT terminate TLS (passthrough only)

Important: Do NOT add a certificate to this LB. TLS termination happens inside nginx on port 8443 using per-domain Let’s Encrypt certs.

Note the LB’s public IP (e.g. 203.0.113.50). You’ll need it for DNS.


Step 2 — DNS Records

In your DNS provider (managing midsummer.cloud):

Record

Type

Name

Value

A

A

custom.midsummer.cloud

<LB2_PUBLIC_IP>

This custom.midsummer.cloud hostname is what tenants will CNAME their custom domains to. It’s configured via CUSTOM_DOMAIN_CNAME_TARGET in the app’s environment.

Tenants will create their own DNS records:

CNAME  register.furrycon.org  →  custom.midsummer.cloud

Step 3 — Persistent Volume for Let’s Encrypt Certificates

Certs in /etc/letsencrypt/ must survive pod restarts. Create a PVC and mount it.

kustomize/resources/custom-domains-pvc.yaml (or equivalent Helm values)

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: midsummer-letsencrypt
  namespace: midsummer
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: do-block-storage

Update the Deployment to mount it

Add to your Deployment/Pod spec (whether you use a Helm chart, Kustomize, or Rancher UI):

volumeMounts:
  - name: letsencrypt
    mountPath: /etc/letsencrypt
  - name: acme-webroot
    mountPath: /var/www/letsencrypt
  - name: nginx-confd
    mountPath: /etc/nginx/conf.d

volumes:
  - name: letsencrypt
    persistentVolumeClaim:
      claimName: midsummer-letsencrypt
  - name: acme-webroot
    persistentVolumeClaim:
      claimName: midsummer-acme-webroot
  - name: nginx-confd
    emptyDir: {}           # generated at runtime by the management command

Note on nginx-confd: The custom-domains.conf file is generated at runtime by manage.py customdomain_regen_nginx and written to /etc/nginx/conf.d/. If you mount an emptyDir volume there, the file will be lost on pod restart. You must run manage.py customdomain_regen_nginx as part of the pod startup (see Step 5), or use an init container that calls it.

Note on multi-pod replicas: If you run >1 pod replica, the PVC ReadWriteOnce access mode means only one pod can mount it. For multi-replica setups, use ReadWriteMany with a DO Filespace (NFS) or switch to a cert-manager CRD approach. For initial deployment, single-replica is recommended.


Step 4 — Update the Kubernetes Service

Your existing Service likely only exposes 8080 and 8082. Add port 8443:

apiVersion: v1
kind: Service
metadata:
  name: midsummer
  namespace: midsummer
spec:
  type: LoadBalancer   # or ClusterIP + Ingress if you're using an Ingress controller
  ports:
    - name: http
      port: 80
      targetPort: 8080
    - name: https-custom
      port: 443
      targetPort: 8443
    - name: app
      port: 8082
      targetPort: 8082
  selector:
    app: midsummer

If you use an Ingress controller (nginx-ingress, Traefik, etc.) instead of a raw LB, the TLS passthrough setup differs — you’d use TCP passthrough via an Ingress Service of type LoadBalancer for port 443 → 8443. The DO LB approach above is simpler.


Step 5 — Environment Variables

Add these to your Deployment’s environment (Rancher UI → Workloads → your deployment → Environment Variables, or in your Helm values / Kustomize overlay):

Variable

Value

Notes

CUSTOM_DOMAIN_CNAME_TARGET

custom.midsummer.cloud

The hostname tenants CNAME to. Must resolve to LB2’s IP.

ACME_EMAIL

ops@midsummer.cloud

Let’s Encrypt registration email.

ACME_STAGING

true

Start with true! Flip to false after E2E verification.

ACME_WEBROOT

/var/www/letsencrypt

Must match nginx.conf ACME location.

CUSTOM_DOMAIN_QUEUE

certmgr

Celery queue name for cert tasks. Create a worker for it (see Step 7).

RELOAD_ALL_PODS

true

Set false to only reload local nginx (no kubectl needed).

DEPLOYMENT_NAME

midsummer

K8s Deployment name, used by kubectl exec to find sibling pods.

POD_NAMESPACE

(from Downward API)

K8s namespace the pods run in.

POD_NAME

(from Downward API)

Current pod name (set via Downward API).

Other existing vars (MIDSUMMER_PROD, MIDSUMMER_DB_URL, CELERY_BROKER_URL, etc.) remain unchanged.


Step 6 — Build & Deploy the Updated Image

The Dockerfile already includes the certbot/cron/ssl-cert installs and EXPOSEs port 8443. Build and push:

# From the project root
docker build -t your-registry/midsummer:custom-domains .
docker push your-registry/midsummer:custom-domains

Update your Deployment’s image tag to custom-domains (or whatever your tag strategy is).


Step 7 — Celery Worker for Certificate Management

The tenant.tasks.issue_custom_domain_task is routed to the certmgr queue. You need at least one Celery worker listening on that queue:

celery -A midsummer worker -Q certmgr --loglevel=info

In Kubernetes, add this as a sidecar container in the same pod (shares the /etc/letsencrypt volume), or as a separate Deployment with the same image + volume mount. In Rancher, you can add a sidecar via the workload UI.

Example sidecar in the Deployment:

- name: celery-certmgr
  image: your-registry/midsummer:custom-domains
  command: ["celery", "-A", "midsummer", "worker", "-Q", "certmgr", "--loglevel=info"]
  envFrom:
    - configMapRef:
        name: midsummer-env
    - secretRef:
        name: midsummer-secrets
  volumeMounts:
    - name: letsencrypt
      mountPath: /etc/letsencrypt
    - name: acme-webroot
      mountPath: /var/www/letsencrypt
    - name: nginx-confd
      mountPath: /etc/nginx/conf.d

Step 8 — Startup / Init Procedure

On first deployment (and after any pod restart), the generated custom-domains.conf will be empty (or lost if on an emptyDir). You need an init step that regenerates it from the database:

Add an init container to the pod:

initContainers:
  - name: regen-nginx
    image: your-registry/midsummer:custom-domains
    command: ["python", "manage.py", "customdomain_regen_nginx"]
    envFrom:
      - configMapRef:
          name: midsummer-env
      - secretRef:
          name: midsummer-secrets
    volumeMounts:
      - name: letsencrypt
        mountPath: /etc/letsencrypt
      - name: acme-webroot
        mountPath: /var/www/letsencrypt
      - name: nginx-confd
        mountPath: /etc/nginx/conf.d

This ensures /etc/nginx/conf.d/custom-domains.conf is populated from the DB before nginx starts.


Step 9 — Cron Setup

The Dockerfile installs cron and supervisord runs cron -f. The cron file cron/midsummer-custom-domains is copied to /etc/cron.d/ by the Dockerfile.

However, inside a Kubernetes pod, cron jobs run in the pod’s filesystem. If you’re using PVC for /etc/letsencrypt but emptyDir for /etc/nginx/conf.d, the cron jobs work because they share the same volumes.

If you prefer Kubernetes-native CronJobs instead of in-pod cron:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: custom-domain-renew
  namespace: midsummer
spec:
  schedule: "17 6 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: renew
              image: your-registry/midsummer:custom-domains
              command: ["python", "manage.py", "customdomain_renew"]
              envFrom:
                - configMapRef:
                    name: midsummer-env
                - secretRef:
                    name: midsummer-secrets
              volumeMounts:
                - name: letsencrypt
                  mountPath: /etc/letsencrypt
                - name: acme-webroot
                  mountPath: /var/www/letsencrypt
                - name: nginx-confd
                  mountPath: /etc/nginx/conf.d
          restartPolicy: OnFailure
          volumes:
            - name: letsencrypt
              persistentVolumeClaim:
                claimName: midsummer-letsencrypt
            - name: acme-webroot
              emptyDir: {}
            - name: nginx-confd
              emptyDir: {}   # Won't work across pods — see note below

Important: If you use K8s CronJobs instead of in-pod cron, the emptyDir for nginx-confd won’t be shared with the app pod. You need either: (a) a shared PVC for nginx-confd, (b) the CronJob calls customdomain_regen_nginx on the app pod via kubectl exec, or (c) stick with in-pod cron (simplest for single-replica). For initial deployment, in-pod cron is recommended.


Step 10 — End-to-End Verification Checklist

Perform these steps in order to verify the full flow. Keep ACME_STAGING=true until all steps pass, then flip to false.

10.1 — Infrastructure

  • LB2 is provisioned with TLS passthrough on 443→8443 and TCP 80→8080

  • custom.midsummer.cloud A record resolves to LB2’s public IP: dig custom.midsummer.cloud

  • Pod starts successfully; nginx -t passes (check logs: KUBECTL logs <pod> -c proxy)

  • Health check at http://<pod-ip>:8080/ping/ returns pong

  • Port 8443 is reachable from LB2: curl -k https://custom.midsummer.cloud:443/ returns 421 (snakeoil default)

10.2 — Add a Custom Domain (Staging)

  1. In the tenantui, go to DomainsAdd Custom Domain

  2. Enter a test subdomain you control (e.g., test.yourdomain.com)

  3. Select mode “Event” and pick an event

  4. Click Add Domain — confirm the DNS instructions dialog appears

  5. In your DNS provider, create a CNAME: test.yourdomain.comcustom.midsummer.cloud

  6. Wait 1-5 minutes for DNS propagation

  7. Click Verify DNS — status should change to DNS Verified

10.3 — Issue SSL Certificate (Staging)

  1. Click Issue SSL — this enqueues the Celery task

  2. Watch the celery-certmgr logs: kubectl logs <pod> -c celery-certmgr — look for Certificate issued for test.yourdomain.com

  3. Or run manually: kubectl exec <pod> -c api -- python manage.py customdomain_issue --domain test.yourdomain.com

  4. Verify in tenantui: status changes to SSL Issued, cert expiry date populated

  5. Check the cert: kubectl exec <pod> -- certbot certificates --staging

  6. Check the nginx config was regenerated: kubectl exec <pod> -- cat /etc/nginx/conf.d/custom-domains.conf

  7. Verify HTTPS works: curl -vI https://test.yourdomain.com/ — should show the Let’s Encrypt staging cert and a 301→HTTPS or 200 response

10.4 — Switch to Production Let’s Encrypt

  1. Update the deployment environment: ACME_STAGING=false

  2. Redeploy / restart the pod

  3. Force-reissue the test domain: kubectl exec <pod> -c api -- python manage.py customdomain_issue --domain test.yourdomain.com

    • This will issue a production LE cert (the staging cert will be replaced)

  4. Verify: curl -vI https://test.yourdomain.com/ — should show a valid (non-staging) Let’s Encrypt certificate issued by R3

  5. Verify the tenant’s site loads correctly on the custom domain

10.5 — Renewal

  1. Test renewal: kubectl exec <pod> -c api -- python manage.py customdomain_renew

    • For staging: add --staging flag (or ensure ACME_STAGING=true)

  2. Verify the cron logs: kubectl logs <pod> -c custom-domain-cron — should show the daily renewal entry

  3. Check that the customdomain_renew management command runs without error

10.6 — Cleanup

  • Remove the test domain from tenantui: Delete → confirm the Domain routing row is also removed

  • Verify the domain returns 421 on port 8443 (or no longer resolves)


Troubleshooting

certbot fails with “Failed authorization procedure”

  • Confirm DNS CNAME is in place: dig test.yourdomain.com CNAME

  • Confirm port 80 is reachable from the internet to the pod: curl http://test.yourdomain.com/.well-known/acme-challenge/test

  • If using ACME_STAGING=true, the cert will be a fake “Fake LE Intermediate X1” — this is expected

  • Check certbot logs: kubectl exec <pod> -- cat /var/log/letsencrypt/letsencrypt.log

nginx -t fails after customdomain_regen_nginx

  • Check the generated config: kubectl exec <pod> -- cat /etc/nginx/conf.d/custom-domains.conf

  • Check for missing cert files: ls -la /etc/letsencrypt/live/<domain>/ inside the pod

  • If cert dirs are missing, the Domain was marked ssl_issued but the cert wasn’t actually issued — reset its status: python manage.py shellCustomDomain.objects.filter(domain='<domain>').update(status='failed') and re-issue

Pod restart loses nginx config

  • The init container (regen-nginx) should regenerate from DB on every pod start

  • If you skipped Step 8 (init container), run manually: kubectl exec <pod> -c api -- python manage.py customdomain_regen_nginx

Custom domain shows 404 or “No Event matched”

  • Verify the Domain row exists: Domain.objects.filter(domain='<domain>')

  • Verify the CustomDomain row has status='ssl_issued'

  • Verify EventSetupMiddleware finds the event: check that CustomDomain.event is set (not null for mode='event')

  • Check request.current_custom_domain in a Django shell to confirm the middleware is routing correctly

Apex domain (e.g., furrycon.org) doesn’t work with CNAME

  • Some DNS providers don’t support CNAME at the apex. Use an ALIAS/ANAME record instead (Cloudflare, DNSimple, Route53 support this)

  • Alternative: advise tenants to use a subdomain (e.g., events.furrycon.org) which always supports CNAME

  • The UI already shows an “apex note” in the DNS instructions dialog


Rancher-Specific Notes

  • Workload → Deployments → midsummer → Environment Variables — add the custom domain vars here

  • Workload → Deployments → midsummer → Add Sidecar — add the celery-certmgr container with the same image + celery -A midsummer worker -Q certmgr command

  • Storage → PersistentVolumeClaims — create the midsummer-letsencrypt PVC (1Gi, do-block-storage)

  • Service Discovery → Services — ensure the midsummer service exposes port 8443

  • Load Balancing → Load Balancers — create LB2 with the forwarding rules from Step 1

The Rancher UI makes it straightforward to add environment variables, volumes, and sidecars to an existing Deployment without hand-editing YAML.


Step 11 — Multi-Pod Deployment (5+ Replicas)

With multiple pods, you have two problems:

  1. Certificates and nginx config must be shared across all pods — otherwise only one pod has the certs

  2. nginx reload must reach all pods — otherwise only one pod serves the right config

Shared Storage (Required)

Use a ReadWriteMany (RWX) volume for /etc/letsencrypt and /etc/nginx/conf.d. DigitalOcean offers Filespaces (NFS) or you can use a NAS:

volumes:
  - name: letsencrypt
    persistentVolumeClaim:
      claimName: midsummer-letsencrypt   # Must be ReadWriteMany
  - name: nginx-confd
    persistentVolumeClaim:
      claimName: midsummer-nginx-confd    # Must be ReadWriteMany
  - name: acme-webroot
    emptyDir: {}

Note: DO Block Storage (do-block-storage) only supports ReadWriteOnce. For RWX, use a DO Filespace (NFS) or a Longhorn distributed volume.

nginx Reload Across All Pods

After cert issuance, regenerate_nginx_config() calls reload_nginx_all_pods(), which:

  1. Reloads nginx on the local pod

  2. Uses kubectl exec to reload nginx on all sibling pods matching the deployment label

For this to work, set these environment variables:

Variable

Value

Notes

DEPLOYMENT_NAME

midsummer

K8s Deployment name, used to find sibling pods

POD_NAMESPACE

default

Namespace the pods run in

POD_NAME

(from Downward API)

The current pod’s name, used to skip self

RELOAD_ALL_PODS

true

Set to false to only reload local nginx

Rancher adds CATTLE_K8S_NAMESPACE automatically. For POD_NAME, use the Downward API:

spec:
  containers:
    - name: midsummer
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: DEPLOYMENT_NAME
          value: "midsummer"

RBAC for kubectl exec (if using pod-wide reload)

If you use the kubectl exec approach, the pod’s service account needs RBAC permission to exec into pods:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-exec-role
  namespace: default
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list"]
  - apiGroups: [""]
    resources: ["pods/exec"]
    verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pod-exec-binding
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: pod-exec-role
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default

Simpler Alternative: Rolling Restart

If you don’t want to deal with RBAC and kubectl exec, set RELOAD_ALL_PODS=false and instead trigger a rolling restart after cert changes:

kubectl rollout restart deployment/midsummer -n default

Each pod starts fresh, the init container runs customdomain_regen_nginx, and nginx loads the latest config. This is the simplest approach and works with ReadWriteOnce PVCs (the cert volume is mounted on a single pod which is the cert-manager).

The tradeoff: ~30 seconds of downtime per pod during restart, but pods restart one at a time in a rolling update so there’s no full outage.