Skip to main content

Setup Docker Swarm

  • This is a guide on setting up the NG Network

Prequisites

  • Working VPN (Tailscale/Headscale) allowing ESP
    • In order to use an encrypted overlay network Headscale must be configured to allow ESP (Protocol 50) through its tunnel. Sample ACL:
      "acls": [
        {
            "Action": "accept",
            "src": [
                "*",
            ],
            "proto": "tcp",
            "dst": [
                "*:*",
            ],
        },
        {
            "Action": "accept",
            "src": [
                "*",
            ],
            "proto": "udp",
            "dst": [
                "*:53",
            ],
        },
        {
            "Action": "accept",
            "src": [
                "*",
            ],
            "proto": "icmp",
            "dst": [
                "*:*",
            ],
        },
        {
            "Action": "accept",
            "src": [
                "*",
            ],
            "proto": "esp",
            "dst": [
                "*:*",
            ],
        },
    ],
}
  • Make sure A records exist for $HOSTNAME (Public IP) and $HOSTNAME.v (VPN IP)

Setup Hosts (All)

  • Run arch-install.sh on easy mode and get a basic "Server" setup in place

    • Create an sdadmin account when prompted to login locally on first boot
  • Install and enable tailscale from package manager

    pacman -Syyu tailscale --noconfirm
    systemctl enable tailscale --now
    
  • Setup IPTables

    • Add the following to the install scripts /etc/iptables/iptables.rules file
    -A FILTERS -m state --state NEW -m tcp -p tcp -s $(tailscale ip --4) --dport 2377 -m comment --comment "Docker Swarm - Manager" -j ACCEPT
    -A FILTERS -m state --state NEW -m udp -p udp -s $(tailscale ip --4) --dport 4789 -m comment --comment "Docker Swarm - Ingress Traffic UDP" -j ACCEPT
    -A FILTERS -m state --state NEW -m tcp -p tcp -s $(tailscale ip --4) --dport 4789 -m comment --comment "Docker Swarm - Ingress Traffic TCP" -j ACCEPT
    -A FILTERS -m state --state NEW -m tcp -p tcp -s $(tailscale ip --4) --dport 7946 -m comment --comment "Docker Swarm - Container Network Discovery" -j ACCEPT
    
  • Delete all old docker networks from install script

     docker network prune -f
    
  • Join the Host to the tailnet and follow the instructions visiting the link provided

    tailscale up --login-server https://v.example.com
    
  • Authorize the connection on Headscale Control Server

    headscale nodes register --user selfdesign --key nodekey:<whatever the nodekey is>
    

Setup Docker Swarm

Manager

Initialize the swarm First Manager Only
docker swarm init --listen $(tailscale ip --4) --advertise-addr $(tailscale ip --4)
  • Delete unencrypted / MTU 1500 ingress network
docker network rm ingress -f
  • Create new docker overlay networks
# Recreate with encryption and set MTU to 1280 to support Tailscale VPN
docker network create --driver overlay --ingress --opt encrypted --opt com.docker.network.driver.mtu=1280 --subnet 10.10.0.0/16 --gateway 10.10.0.1 ingress

# host network for outside of docker
docker network create --subnet 10.11.0.0/16 --driver overlay --scope swarm --opt encrypted --attachable edge

# network hosting the socket proxy
docker network create --subnet 10.12.0.0/16 --driver overlay --scope swarm --opt encrypted --attachable socket-proxy

# network hosting the services that are routed by traefik
docker network create --subnet 10.13.0.0/16 --driver overlay --scope swarm --opt encrypted --attachable proxy-public

# network hosting the services that are routed by traefik
docker network create --subnet 10.14.0.0/16 --driver overlay --scope swarm --opt encrypted --attachable proxy-internal
  • Add more managers docker swarm join-token manager

Worker

  • On a manager node, get a join token

    $ docker swarm join-token worker
    
  • Run the command given on a worker node eg

    docker swarm join --token abcdef12345667 101.102.103.104:2377
    

Create Network Ingress

Socket Proxy

Create the following compose entry /var/local/data/_system/socket-proxy/compose.yml

services:
  socket-proxy:
    image: docker.io/tiredofit/socket-proxy:latest
    deploy:
      placement:
        constraints:
          - node.labels.socket-proxy.proxy-public == true
          - node.labels.socket-proxy.proxy-private == true
          - node.role == manager
      resources:
        limits:
          memory: 128m
      restart_policy:
        condition: on-failure
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./logs/socket-proxy:/logs
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_NAME=socket-proxy
      - CONTAINER_ENABLE_MONITORING=FALSE

      - ALLOWED_IPS=127.0.0.1,10.12.0.0/16
      - ENABLE_READONLY=TRUE
      - MODE=containers,events,networks,ping,services,tasks,version
    networks:
      socket-proxy:
        aliases:
          - socket-proxy

networks:
  socket-proxy:
    external: true
  • Start the stack
  docker stack deploy -c /var/local/data/_system/socket-proxy/compose.yml socket-proxy

Public

This will create a Traefik Instance that is accessible to the public IPv4 internet.

On a manager node set the following tag so that the Socket Proxy is always deployed to the same node as our Traefik services and uses the same volume.

docker node update --label-add socket-proxy.proxy-public=true $(docker info -f '{{.Swarm.NodeID}}')

On a manager node set the following tag so that Traefik is always deployed to the same node and uses the same volume

docker node update --label-add proxy.proxy-public=true $(docker info -f '{{.Swarm.NodeID}}')

Create the following compose entry. note the differences in the deploy key and DOCKER_CONSTRAINTS, and networking (ports+overlay) compared to the public deployment

/var/local/data/_system/proxy-public/compose.yml

services:
  traefik:
    image: docker.io/tiredofit/traefik:2.10
    deploy: 
      labels:
        - traefik.constraint=proxy-public
      placement:
        constraints:
          - node.labels.proxy.traefik-public-certificates == true
          - node.role == manager
      resources:
        limits:
          memory: 512m
      restart_policy:
        condition: on-failure
    ports:
      - 80:80
      - 443:443
    volumes:
      - proxy-public-traefik-certificates:/data/certs
      - proxy-public-traefik-logs:/data/logs
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_NAME=traefik
      - CONTAINER_ENABLE_MONITORING=FALSE

      - ENABLE_DASHBOARD=FALSE
      #- DASHBOARD_HOSTNAME=proxy-public.example.com

      - DOCKER_ENDPOINT=http://socket-proxy:2375
      - ENABLE_DOCKER_SWARM_MODE=TRUE
      - DOCKER_CONTEXT=Label(`traefik.constraint`, `traefik-public`)
      - DOCKER_DEFAULT_NETWORK=proxy-public

      - ACCESS_LOG_TYPE=FILE
      - LOG_TYPE=FILE

      - TRAEFIK_USER=traefik

      - LETSENCRYPT_EMAIL=zonemaster@example.com
      - LETSENCRYPT_CHALLENGE=DNS
      - LETSENCRYPT_DNS_PROVIDER=cloudflare

      - CF_API_EMAIL=zonemaster@example.com
      - CF_API_KEY=token
    networks:
      edge:
        aliases:
          - traefik-public-edge
      proxy-public:
        aliases:
          - traefik-public
      socket-proxy:
        aliases:
          - traefik-public-socket-proxy

  cloudflare-companion:
    image: docker.io/tiredofit/traefik-cloudflare-companion:latest
    deploy: 
      placement:
        constraints:
          - node.labels.proxy.traefik-public-certificates == true
          - node.role == manager
      resources:
        limits:
          memory: 128m
      restart_policy:
        condition: on-failure
    volumes: 
      - proxy-public-tcc-logs:/logs
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_NAME=cloudflare-companion
      - CONTAINER_ENABLE_LOGSHIPPING=FALSE
      - CONTAINER_ENABLE_MONITORING=FALSE

      - TCC_USER=tcc
      
      - DOCKER_HOST=http://socket-proxy:2375

      - TRAEFIK_VERSION=2
      - CF_EMAIL=zonemaster@example.com
      - CF_TOKEN={{CLOUDFLARE_TOKEN}}
      - TARGET_DOMAIN={{HOSTNAME}}.example.com

      - TRAEFIK_FILTER=proxy-public
      
      - REFRESH_ENTRIES=TRUE

      - DOMAIN1=example.com
      - DOMAIN1_ZONE_ID=b44d3be5b2a3e526b2c57842d26d926e
    networks:
      socket-proxy:
        aliases:
          - cloudflare-companion-public

volumes:
  proxy-public-traefik-certs:
  proxy-public-traefik-logs:
  proxy-public-tcc-logs:

networks:
  edge:
    external: true
  proxy-public:
    external: true
  socket-proxy:
    external: true
  • Start the stack
  docker stack deploy -c /var/local/data/_system/proxy-public/compose.yml proxy-public

Private

This will create a private traefik instance that is only accessible via the VPN.

On a manager node set the following tag so that the Socket Proxy is always deployed to the same node as our Traefik services and uses the same volume.

    docker node update --label-add socket-proxy.proxy-private=true $(docker info -f '{{.Swarm.NodeID}}')

On a manager node set the following tag so that Traefik is always deployed to the same node and uses the same volume

    docker node update --label-add proxy.proxy-private=true $(docker info -f '{{.Swarm.NodeID}}')

Create the following compose entry note the differences in the deploy key and DOCKER_CONSTRAINTS, and networking compared to the public one

/var/local/data/_system/proxy-private/compose.yml

services:
  traefik:
    image: docker.io/tiredofit/traefik:2.9
    deploy: 
      placement:
        constraints:
          - node.labels.proxy.traefik-private-certificates == true
          - node.role == manager
      resources:
        limits:
          memory: 512m
      restart_policy:
        condition: on-failure
    ports:
      - 80:80
      - 443:443
    volumes:
      - proxy-private-traefik-certificates:/data/certs
      - proxy-private-traefik-logs:/data/logs
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_NAME=traefik
      - CONTAINER_ENABLE_MONITORING=FALSE

      - ENABLE_DASHBOARD=FALSE
      #- DASHBOARD_HOSTNAME=proxy-private.example.com

      - DOCKER_ENDPOINT=http://socket-proxy:2375
      - ENABLE_DOCKER_SWARM_MODE=TRUE
      - DOCKER_CONTEXT=Label(`traefik.constraint`, `proxy-private`)
      - DOCKER_DEFAULT_NETWORK=proxy-private

      - ACCESS_LOG_TYPE=FILE
      - LOG_TYPE=FILE

      - TRAEFIK_USER=traefik

      - LETSENCRYPT_EMAIL=zonemaster@example.com
      - LETSENCRYPT_CHALLENGE=DNS
      - LETSENCRYPT_DNS_PROVIDER=cloudflare

      - CF_API_EMAIL=zonemaster@example.com
      - CF_API_KEY=token
    networks:
      edge:
        aliases:
          - traefik-private-edge
      proxy-private:
        aliases:
          - traefik-private
      socket-proxy:
        aliases:
          - traefik-private-socket-proxy

  cloudflare-companion:
    image: docker.io/tiredofit/traefik-cloudflare-companion:latest
    deploy: 
      placement:
        constraints:
          - node.labels.proxy.traefik-private-certificates == true
          - node.role == manager
      resources:
        limits:
          memory: 128m
      restart_policy:
        condition: on-failure
    volumes: 
      - proxy-private-tcc-logs:/logs
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_NAME=cloudflare-companion
      - CONTAINER_ENABLE_MONITORING=FALSE
      
      - TCC_USER=tcc
      
      - DOCKER_HOST=http://socket-proxy:2375

      - TRAEFIK_VERSION=2
      - CF_EMAIL=zonemaster@example.com
      - CF_TOKEN={{CLOUDFLARE_TOKEN}}
      - TARGET_DOMAIN={{HOSTNAME}}.v.example.com

      - TRAEFIK_FILTER=proxy-public
      - REFRESH_ENTRIES=TRUE

      - DOMAIN1=example.com
      - DOMAIN1_ZONE_ID=b44d3be5b2a3e526b2c57842d26d926e
    networks:
      socket-proxy:
        aliases:
          - cloudflare-companion-private

volumes:
  proxy-private-traefik-certs:
  proxy-private-traefik-logs:
  proxy-private-tcc-logs:

networks:
  edge:
    external: true
  proxy-private:
    external: true
  socket-proxy:
    external: true
  • Start the stack
  docker stack deploy -c /var/local/data/_system/proxy-private/compose.yml proxy-private

Create services

Network Scope

Public (Nginx Webserver)

This will create a simple service that will be deployed on a worker node. Note the label traefik.constraint which limits it to be only used on the proxy-public proxy instance.

On a manager create the following file:

/var/local/data/nginx.example.com/compose.yml

services:
  nginx:
    image: docker.io/tiredofit/nginx:latest
    deploy:
      labels:
        - traefik.enable=true
        - traefik.http.routers.nginx-example-com.rule=Host(`nginx.example.com`)
        - traefik.http.services.nginx-example-com.loadbalancer.server.port=80
        - traefik.constraint=proxy-public
      replicas: 1
      resources:
        limits:
          memory: 128m
      restart_policy:
        condition: on-failure
    volumes:
      - nginx-example-com-data:/www/html
      - nginx-example-com-logs:/www/logs/nginx/
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_ENABLE_MONITORING=FALSE
      - CONTAINER_NAME=nginx-example-com-nginx
    networks:
      - proxy-public

volumes:
  nginx-example-com-data:
  nginx-example-com-logs:
  
networks:
  proxy-public:
    external: true
Private (Nginx Webserver)

This will create a simple service that will be deployed on a worker node. Note the label traefik.constraint which limits it to be only used on the proxy-private proxy instance.

On a manager create the following file:

/var/local/data/nginx.v.example.com/compose.yml

services:
  nginx:
    image: docker.io/tiredofit/nginx:latest
    deploy:
      labels:
        - traefik.enable=true
        - traefik.http.routers.nginx-v-example-com.rule=Host(`nginx.v.example.com`)
        - traefik.http.services.nginx-v-example-com.loadbalancer.server.port=80
        - traefik.constraint=proxy-private
      replicas: 1
      resources:
        limits:
          memory: 128m
      restart_policy:
        condition: on-failure
    volumes:
      - nginx-v-example-com-data:/www/html
      - nginx-v-example-com-logs:/www/logs/nginx/
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_ENABLE_MONITORING=FALSE
      - CONTAINER_NAME=nginx-example-com-nginx
    networks:
      - proxy-private

volumes:
  nginx-v-example-com-data:
  nginx-v-example-com-logs:
  
networks:
  proxy-private:
    external: true

Full stack Application

Wordpress Basic

This will create a basic wordpress instance service that will be deployed on a worker node. On a manager create /var/local/data/wordpress.example.com/compose.yml

services:
  wordpress:
    image: docker.io/tiredofit/wordpress:latest
    deploy:
      labels:
        - traefik.enable=true
        - traefik.http.routers.wordpress-example-com.rule=Host(`wordpress.example.com`)
        - traefik.http.services.wordpress-example-com.loadbalancer.server.port=80
        - traefik.constraint=proxy-public
      replicas: 1
      resources:
        limits:
          memory: 512m
      restart_policy:
        condition: on-failure
    volumes:
      - wordpress-example-com-data:/www/wordpress
      - wordpress-example-com-logs:/www/logs
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_ENABLE_MONITORING=FALSE
      - CONTAINER_NAME=wordpress

      - DB_HOST=wordpress-db
      - DB_NAME=wordpress
      - DB_USER=wordpress
      - DB_PASS=userpassword

      - ENABLE_HTTPS_REVERSE_PROXY=FALSE

      - ADMIN_EMAIL=email@example.com 
      - ADMIN_USER=admin 

      - SITE_URL=https://wordpress.example.com
      - SITE_TITLE=Docker Wordpress
    networks:
      - proxy-public

  wordpress-db:
    image: docker.io/tiredofit/mariadb
    deploy:
      replicas: 1
      resources:
        limits:
          memory: 512m
      restart_policy:
        condition: on-failure
    volumes:
      - wordpress-example-com-db:/var/lib/mysql
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_NAME=wordpress-db

      - ROOT_PASS=password

      - DB_NAME=wordpress
      - DB_USER=wordpress
      - DB_PASS=userpassword
    networks:
      - proxy-public

volumes:
  wordpress-example-com-data:
  wordpress-example-com-logs:
  
networks:
  proxy-public:
    external: true
Wordpress Advanced

This advanced setup splits frontend and backend duties into their own containers for scalability purposes.

On a manager create /var/local/data/advanced.wordpress.example.com/compose.yml

services:
  frontend:
    image: docker.io/tiredofit/wordpress:latest
    deploy:
      labels:
        - traefik.enable=true
        - traefik.http.routers.advanced-wordpress-example-com.rule=Host(`advanced-wordpress.example.com`)
        - traefik.http.services.advanced-wordpress-example-com.loadbalancer.server.port=80
        - traefik.constraint=proxy-public
      replicas: 1
      resources:
        limits:
          memory: 512m
      restart_policy:
        condition: on-failure
    volumes:
      - advanced-wordpress-example-com-data:/www/wordpress
      - advanced-wordpress-example-com-logs:/www/logs
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_ENABLE_MONITORING=FALSE
      - CONTAINER_NAME=wordpress

      - PHP_FPM_CONTAINER_MODE=nginx
      - PHP_FPM_CONTAINER_MODE=advanced-wordpress-selfdedsign-org-backend

      - DB_HOST=advanced-wordpress-db
      - DB_NAME=wordpress
      - DB_USER=wordpress
      - DB_PASS=userpassword

      - ADMIN_EMAIL=email@example.com 
      - ADMIN_USER=admin 

      - SITE_URL=https://advanced.wordpress.example.com
      - SITE_TITLE=Docker Wordpress Advanced
    networks:
      - proxy-public
      
  backend:
    image: docker.io/tiredofit/wordpress:latest
    deploy:
      replicas: 1
      resources:
        limits:
          memory: 512m
      restart_policy:
        condition: on-failure
    volumes:
      - advanced-wordpress-example-com-data:/www/wordpress
      - advanced-wordpress-example-com-logs:/www/logs
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_ENABLE_MONITORING=FALSE
      - CONTAINER_NAME=backend

      - PHP_FPM_CONTAINER_MODE=php-fpm
    networks:
      proxy-public:
        aliases:
          - advanced-wordpress-example-com-backend

  db:
    image: docker.io/tiredofit/mariadb:10.11
    deploy:
      replicas: 1
      resources:
        limits:
          memory: 512m
      restart_policy:
        condition: on-failure
    volumes:
      - advanced-wordpress-example-com-db:/var/lib/mysql
    environment:
      - TIMEZONE=America/Vancouver
      - CONTAINER_NAME=advanced-wordpress-db

      - ROOT_PASS=password

      - DB_NAME=wordpress
      - DB_USER=wordpress
      - DB_PASS=userpassword
    networks:
      - proxy-public

volumes:
  advanced-wordpress-example-com-data:
  advanced-wordpress-example-com-logs:
  advanced-wordpress-example-com-db:

networks:
  proxy-public:
    external: true

TODO

  • Switch the example to named volumes
  • Drop Logs and switch to fluent-bit logshipping
  • Build healthchecks example
  • Create Ingress constraints example for private and public services
  • Create broken out NPFPM example
  • Create Gallera Example
  • Create Redis session cache
  • Ceph / Distributed Volumes