Raspberry Pi cluster with k3s & Salt (Part 1)

07 Sep 2020

I have been running some workloads on Raspberry Pi’s / Leap for some time. I manage them using salt-ssh along with a Pine64 running OpenBSD. You can read more about using Salt this way in my Using Salt like Ansible post.

The workloads ran on containers, which were managed with systemd and podman. Salt managed the systemd service files on /etc/systemd, which start, monitor and stops the containers. For example, the homeassistant.sls state, managed the service file for mosquitto:

homeassistant.mosquito.deploy:
  file.managed:
    - name: /root/config/eclipse-mosquitto/mosquitto.conf
    - source: salt://homeassistant/files/mosquitto/mosquitto.conf

homeassistant.eclipse-mosquitto.container.service:
  file.managed:
    - name: /etc/systemd/system/eclipse-mosquitto.service
    - contents: |
	[Unit]
	Description=%N Podman Container
	After=network.target

	[Service]
	Type=simple
	TimeoutStartSec=5m
	ExecStartPre=-/usr/bin/podman rm -f "%N"
	ExecStart=/usr/bin/podman run -ti --rm --name="%N" -p 1883:1883 -p 9001:9001 -v /root/config/eclipse-mosquitto:/mosquitto/config -v /etc/localtime:/etc/localtime:ro --net=host docker.io/library/eclipse-mosquitto
	ExecReload=-/usr/bin/podman stop "%N"
	ExecReload=-/usr/bin/podman rm "%N"
	ExecStop=-/usr/bin/podman stop "%N"
	Restart=on-failure
	RestartSec=30

	[Install]
	WantedBy=multi-user.target
  service.running:
    - name: eclipse-mosquitto
    - enable: True
    - require:
      - pkg: homeassistant.podman.pkgs
      - file: /etc/systemd/system/eclipse-mosquitto.service
      - file: /root/config/eclipse-mosquitto/mosquitto.conf
    - watch:
      - file: /root/config/eclipse-mosquitto/mosquitto.conf

The Salt state also made sure the right packages and other details where ready before the service was started.

This was very simple and worked well so far. One disadvantage is that the workloads are tied to a particular Pi. I was not going to make the setup more complex by building my own orchestrator.

Another disadvantage is that I was pulling the containers into the SD card. I was not hoping for a long life of these. After it died, I took it as a good opportunity to re-do this setup.

My long term goal would be to netboot the Pi’s, and have the storage mounted. I am not very familiar with all the procedure, so I will go step by step.

I decided for the the initial iteration:

k3s (Lightweight Kubernetes) on the Pi’s
The k3s server to use a USB disks/SSDs with btrfs as storage
The worker nodes to /var/lib/rancher/k3s from USB storage
Applying the states over almost stock Leap 15.2 images should result in a working cluster

All the above managed with salt-ssh tree on a git repository just like I was used to

k3s installation

We start by creating k3s/init.sls. For the k3s state I defined a minimal pillar defining the server and the shared token:

k3s:
  token: xxxxxxxxx
  server: rpi03

The first part of the k3s state ensures cgroups are configured correctly and disables swap:

k3s.boot.cmdline:
  file.managed:
    - name: /boot/cmdline.txt
    - contents: |
	cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

k3s.disable.swap:
  cmd.run:
    - name: swapoff -a
    - onlyif:  swapon --noheadings --show=name,type | grep .

As the goal was to avoid using the SD card, the next state makes sure /var/lib/rancher/k3s is a mount. I have to admit I wasted quite some time getting right the state for the storage mount. Using mount.mounted did not work because it is buggy and took different btrfs subvolume mounts from the same device as the same mount.

k3s.volume.mount:
  mount.mounted:
    - name: /var/lib/rancher/k3s
    - device: /dev/sda1
    - mkmnt: True
    - fstype: btrfs
    - persist: False
    - opts: "subvol=/@k3s"

I resorted then to write my own state. I discovered the awesome findmnt command, and my workaround looked like:

k3s.volume.mount:
  cmd.run:
    - name: mount -t btrfs -o subvol=/@{{ grains['id'] }}-data /dev/sda1 /data
    - unless: findmnt --mountpoint /data --noheadings | grep '/dev/sda1[/@k3s]'
    - require:
	- file: k3s.volume.mntpoint

This turned later to be a pain, as the k3s installer started k3s without caring much if this volume was mounted or not. Then I remembered: systemd does exactly that. It manages mount and dependencies. This simplified the mount state to:

k3s.volume.mount:
  file.managed:
    - name: /etc/systemd/system/var-lib-rancher-k3s.mount
    - contents : |
	[Unit]

	[Install]
	RequiredBy=k3s
	RequiredBy=k3s-agent

	[Mount]
	What=/dev/sda1
	Where=/var/lib/rancher/k3s
	Options=subvol=/@k3s
	Type=btrfs
  cmd.run:
    - name: systemctl daemon-reload
    - onchanges:
	- file: k3s.volume.mount
  service.running:
    - name: var-lib-rancher-k3s.mount

The k3s state works as follows: it runs the installation script in server or agent mode depending if the pillar k3s:server entry matches with the node where the state is applied.

{%- set k3s_server = salt['pillar.get']('k3s:server') -%}
{%- if grains['id'] == k3s_server %}
{%- set k3s_role = 'server' -%}
{%- set k3s_suffix = "" -%}
{%- else %}
{%- set k3s_role = 'agent' -%}
{%- set k3s_suffix = '-agent' -%}
{%- endif %}

k3s.{{ k3s_role }}.install:
  cmd.run:
    - name: curl -sfL https://get.k3s.io | sh -s -
    - env:
	- INSTALL_K3S_TYPE: {{ k3s_role }}
{%- if k3s_role == 'agent' %}
	- K3S_URL: "https://{{ k3s_server }}:6443"
{%- endif %}
	- INSTALL_K3S_SKIP_ENABLE: "true"
	- INSTALL_K3S_SKIP_START: "true"
	- K3S_TOKEN: {{ salt['pillar.get']('k3s:token', {}) }}
    - unless:
	# Run install on these failed conditions
	# No binary
	- ls /usr/local/bin/k3s
	# Token changed/missing
	- grep '{{ salt['pillar.get']('k3s:token', {}) }}' /etc/systemd/system/k3s{{ k3s_suffix }}.service.env
	# Changed/missing server
{%- if k3s_role == 'agent' %}
	- grep 'K3S_URL=https://{{ k3s_server }}:6443' /etc/systemd/system/k3s{{ k3s_suffix }}.service.env
{%- endif %}
    - require:
	- service: k3s.volume.mount
	- service: k3s.kubelet.volume.mount

k3s.{{ k3s_role }}.running:
  service.running:
    - name: k3s{{ k3s_suffix }}
    - enable: True
    - require:
      - cmd: k3s.{{ k3s_role }}.install

Workloads

The next step is to move workloads like homeassistant into this setup.

k3s allows to automatically deploy manifests located in /var/lib/rancher/server/manifests. We can deploy eg. mosquitto like the following:

homeassistant.mosquitto:
  file.managed:
    - name: /var/lib/rancher/k3s/server/manifests/mosquitto.yml
    - source: salt://homeassistant/files/mosquitto.yml
    - require:
      - k3s.volume.mount

With mosquito.yml being:

---
apiVersion: v1
kind: Namespace
metadata:
  name: homeassistant
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mosquitto
  namespace: homeassistant
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mosquitto
  template:
    metadata:
      labels:
	app: mosquitto
    spec:
      containers:
	- name: mosquitto
	  image: docker.io/library/eclipse-mosquitto
	  resources:
	    requests:
	      memory: "64Mi"
	      cpu: "100m"
	    limits:
	      memory: "128Mi"
	      cpu: "500m"
	  ports:
	  - containerPort: 1883
	  imagePullPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  name: mosquitto
  namespace: homeassistant
spec:
  ports:
  - name: mqtt
    port: 1883
    targetPort: 1883
    protocol: TCP
  selector:
    app: mosquitto

Homeassistant is no different, except that we use a ConfigMap resource to store the configuration and define an Ingress resource to access it from the LAN:

---
apiVersion: v1
kind: Namespace
metadata:
  name: homeassistant
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: homeassistant-config
  namespace: homeassistant
data:
  configuration.yaml: |
    homeassistant:
      auth_providers:
	- type: homeassistant
	- type: trusted_networks
	  trusted_networks:
	    - 192.168.178.0/24
	    - 10.0.0.0/8
	    - fd00::/8
	  allow_bypass_login: true
      name: Home
      latitude: xx.xxxx
      longitude: xx.xxxx
      elevation: xxx
      unit_system: metric
      time_zone: Europe/Berlin
    frontend:
    config:
    http:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: homeassistant
  namespace: homeassistant
spec:
  replicas: 1
  selector:
    matchLabels:
      app: homeassistant
  template:
    metadata:
      labels:
	app: homeassistant
    spec:
      containers:
	- name: homeassistant
	  image: homeassistant/raspberrypi3-64-homeassistant:stable
	  volumeMounts:
	    - name: config-volume-configuration
	      mountPath: /config/configuration.yaml
	      subPath: configuration.yaml
	  livenessProbe:
	    httpGet:
	      scheme: HTTP
	      path: /
	      port: 8123
	    initialDelaySeconds: 30
	    timeoutSeconds: 30
	  resources:
	    requests:
	      memory: "512Mi"
	      cpu: "100m"
	    limits:
	      memory: "1024Mi"
	      cpu: "500m"
	  ports:
	    - containerPort: 8123
	      protocol: TCP
	  imagePullPolicy: Always
      volumes:
	- name: config-volume-configuration
	  configMap:
	    name: homeassistant-config
	    items:
	    - key: configuration.yaml
	      path: configuration.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: homeassistant
  namespace: homeassistant
spec:
  selector:
    app: homeassistant
  ports:
    - port: 8123
      targetPort: 8123
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: homeassistant
  namespace: homeassistant
  annotations:
    kubernetes.io/ingress.class: traefik
    traefik.frontend.rule.type: PathPrefixStrip
spec:
  rules:
  - host: homeassistant.int.mydomain.com
    http:
      paths:
      - path: /
	backend:
	  serviceName: homeassistant
	  servicePort: 8123

Setting up Ingress was the most time consuming part. It took me a while to figure out how it was supposed to work, and customizing the Treafik Helm chart is not intuitive to me. While homeassistant was more straightforward as it is a simple HTTP behind SSL proxy service, the Kubernetes dashboard is already deployed with SSL inside the cluster. I am still figuring out how ingress.kubernetes.io/protocol: https, traefik.ingress.kubernetes.io/pass-tls-cert: "true" (oh, don’t forget the quotes!) or insecureSkipVerify work toghether and what is the best way to expose it to the LAN.

In a future post, I will describe the dashboards setup, and other improvements.

The postings on this site are my own and don't necessarily represent my employer’s positions, strategies or opinions.