Member post originally published on Linbit’s blog by Matt Kereczman

Edge computing is a distributed computing paradigm that brings data processing and computation closer to the data source or “edge” of the network. This reduces latency and removes Internet connectivity as a point of failure for users of edge services.

Since more hardware is involved in an edge computing environment than there is in a traditional central data center topology, there is a need to keep that hardware relatively inexpensive and replaceable. Generally, that will mean the hardware running at the edge will have less system resources than what you might find in a central data center.

LINBIT SDS, which consists of LINSTOR® and DRBD® from LINBIT®, has a very small footprint on system resources. This leaves more resources available for other edge services and applications and makes LINBIT SDS an ideal candidate for solving persistent storage needs at the edge.

The core function of LINBIT SDS is to provide resilient and feature-rich block storage to the many platforms it integrates with. The resilience comes from DRBD, the block storage replication driver managed by LINSTOR in LINBIT SDS, allows services to tolerate host-level failures. This is an important feature at the edge, since host-level failures may be more frequent when using less expensive hardware that might not be as fault tolerant as hardware that you would run in a proper data center.

To prove and highlight some of the claims I’ve made about LINBIT SDS above, I used my trusty Libre Computer AML-S905X-CC (Le Potato) ARM-based single board computer (SBC) cluster to run LINBIT SDS and K3s. If you’re not familiar with “Le Potato” SBCs, they are simply 2GB Raspberry Pi 4 model B clones. I would characterize my “Potato cluster” as severely underpowered compared to the enterprise grade hardware used by some of LINBIT’s users, and would even go as far as saying this is the floor in terms of hardware capability that I would try something like this on. To read about LINBIT SDS on a much more capable ARM-based system read my blog, Benchmarking on Ampere Altra Max Platform with LINBIT SDS. That said, if LINBIT SDS can run on my budget Raspberry Pi clone cluster, it can run anywhere.

Here is a real photograph* of my Le Potato cluster and cooling system in my home lab:

Not a real photograph
📝 NOTE: This is not a real photograph.

Small Footprint on System Resources

The cluster I’m using does not have a ton of resources available. Using the kubectl top node command we can see what each of these nodes has available with LINBIT SDS already deployed.

root@potato-0:~# kubectl top node
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
potato-0   1111m        27%    1495Mi          77%
potato-1   1277m        31%    1666Mi          86%
potato-2   1096m        27%    1634Mi          84%

A single CPU core in these quad-core Libre Computer SBCs is equal to 1000m, or 1000 millicpus. This output shows us that with LINBIT SDS and Kubernetes running on them, that they still have 69-73% of their CPU resources available. Memory pressure on these 2Gi SBCs is extremely limiting, but we still have a little room to play around.

A typical LINBIT SDS deployment in Kubernetes consists of the following containers:

📝 NOTE: You can verify which containers the latest LINBIT SDS deployment in Kubernetes uses by viewing the image lists that LINBIT maintains at charts.linstor.io.

Using kubectl top pods -n linbit-sds --sum we can see how much memory and CPU the LINBIT SDS containers are using.

root@potato-0:~# kubectl top -n linbit-sds pods --sum
NAME                                                   CPU(cores)   MEMORY(bytes)
ha-controller-bk2rh                                    4m           20Mi
ha-controller-bxhs7                                    5m           19Mi
ha-controller-knvg8                                    3m           21Mi
linstor-controller-5b84bfc497-wrbdn                    25m          168Mi
linstor-csi-controller-8c9fdd6c-q7rj5                  45m          124Mi
linstor-csi-node-c46bb                                 3m           31Mi
linstor-csi-node-knmv2                                 4m           29Mi
linstor-csi-node-x9bhr                                 3m           33Mi
linstor-operator-controller-manager-6dd5bfbfc8-cfp7t   10m          57Mi
potato-0                                               9m           87Mi
potato-1                                               10m          68Mi
potato-2                                               10m          57Mi
                                                       ________     ________
                                                       127m         719Mi

That’s less than a quarter of a single CPU core and under 1Gi of the 6Gi available in my tiny cluster.

If I create a LINBIT SDS provisioned persistent volume claim (PVC) replicated twice for a demo MinIO pod, we can check the utilization again while we’re actually running services. Using the following PVC and pod manifest LINBIT SDS will provision a LINSTOR volume, replicate it between two nodes (as defined in my storageClass) using DRBD, and schedule MinIO pod with data persisted on the LINBIT SDS managed storage.

root@potato-0:~# kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  name: minio
  labels:
    name: minio
EOF
namespace/minio created

root@potato-0:~# kubectl apply -f - <<EOF
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: demo-pvc-0
  namespace: minio
spec:
  storageClassName: linstor-csi-lvm-thin-r2
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4G
EOF
persistentvolumeclaim/demo-pvc-0 created

root@potato-0:~# kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: minio
  name: minio
  namespace: minio
spec:
  containers:
  - name: minio
    image: quay.io/minio/minio:latest
    command:
    - /bin/bash
    - -c
    args:
    - minio server /data --console-address :9090
    volumeMounts:
    - mountPath: /data
      name: demo-pvc-0
  volumes:
  - name: demo-pvc-0
    persistentVolumeClaim:
      claimName: demo-pvc-0
EOF
pod/minio created

Using the kubectl port-forward pod/minio 9000 9090 -n minio --address 0.0.0.0 command, I forwarded traffic on port 9000 from my potato-0 node to the MinIO pod. I then started an upload of a Debian image (583Mi) to a new MinIO bucket (bucket-0) using the MinIO console accessible at https://potato-0:9000. During the upload I captured the output of kubectl top again to compare against my previous results.

root@potato-2:~# kubectl top pods -n linbit-sds --sum
NAME                                                   CPU(cores)   MEMORY(bytes)
ha-controller-bk2rh                                    3m           22Mi
ha-controller-bxhs7                                    5m           26Mi
ha-controller-knvg8                                    6m           16Mi
linstor-controller-5b84bfc497-wrbdn                    72m          174Mi
linstor-csi-controller-8c9fdd6c-q7rj5                  30m          127Mi
linstor-csi-node-c46bb                                 8m           33Mi
linstor-csi-node-knmv2                                 3m           25Mi
linstor-csi-node-x9bhr                                 2m           31Mi
linstor-operator-controller-manager-6dd5bfbfc8-cfp7t   174m         58Mi
potato-0                                               4m           118Mi
potato-1                                               8m           131Mi
potato-2                                               7m           81Mi
                                                       ________     ________
                                                       317m         846Mi
root@potato-2:~# kubectl top node
NAME       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
potato-0   2581m        64%    1623Mi          84%
potato-1   1556m        38%    1713Mi          88%
potato-2   1147m        28%    1636Mi          84%

That’s a pretty minimal difference. The LINSTOR satellite pods (potato-0, potato-1, and potato-2) are using a little more memory than in the first sample. The extra memory is likely to hold DRBD’s bitmap because there are physical replicas on potato-0 and potato-1, and a diskless “tiebreaker” assignment on potato-2, which does not store a bitmap.

root@potato-2:~# kubectl exec -n linbit-sds deployments/linstor-controller -- linstor resource list
+----------------------------------------------------------------------------------------------------------------+
| ResourceName                             | Node     | Port | Usage  | Conns |      State | CreatedOn           |
|================================================================================================================|
| pvc-e5c40aa3-e9b2-40dc-b096-e96a61a27d47 | potato-0 | 7000 | InUse  | Ok    |   UpToDate | 2023-09-08 17:37:32 |
| pvc-e5c40aa3-e9b2-40dc-b096-e96a61a27d47 | potato-1 | 7000 | Unused | Ok    |   UpToDate | 2023-09-08 17:37:18 |
| pvc-e5c40aa3-e9b2-40dc-b096-e96a61a27d47 | potato-2 | 7000 | Unused | Ok    | TieBreaker | 2023-09-08 17:37:40 |
+----------------------------------------------------------------------------------------------------------------+

Resilience During Host Failures

Now that the storage has data, I can simulate a failure in the cluster and see whether the data persists. I can tell that the MinIO pod is running on potato-0 from the linstor resource list command which shows the PVC as InUse on potato-0. To do this, I used the command echo c > /proc/sysrq-trigger on potato-0. This immediately crashes the kernel, and unless you’ve configured your system otherwise, it will not reboot on its own.

While waiting for Kubernetes to catch and react to the failure, I checked DRBD’s state on the remaining nodes and could see that potato-1, the remaining “diskful” peer, reported UpToDate data, so it would be able to take over services:

root@potato-2:~# kubectl exec -it -n linbit-sds potato-1 -- drbdadm status
pvc-e5c40aa3-e9b2-40dc-b096-e96a61a27d47 role:Secondary
  disk:UpToDate
  potato-0 connection:Connecting
  potato-2 role:Secondary
    peer-disk:Diskless

root@potato-2:~# kubectl exec -it -n linbit-sds potato-2 -- drbdadm status
pvc-e5c40aa3-e9b2-40dc-b096-e96a61a27d47 role:Secondary
  disk:Diskless
  potato-0 connection:Connecting
  potato-1 role:Secondary
    peer-disk:UpToDate

After roughly five minutes, Kubernetes picked up on the failure and began terminating potato-0’s pods. I didn’t use a deployment, or any other workload resources for managing this pod, so it will not be rescheduled on its own. To delete a pod from a dead node I needed to use the force, that is: kubectl delete pod -n minio minio --force. With the pod deleted, I could recreate it by using the same command used earlier:

root@potato-2:~# kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: minio
  name: minio
  namespace: minio
spec:
  containers:
  - name: minio
    image: quay.io/minio/minio:latest
    command:
    - /bin/bash
    - -c
    args:
    - minio server /data --console-address :9090
    volumeMounts:
    - mountPath: /data
      name: demo-pvc-0
  volumes:
  - name: demo-pvc-0
    persistentVolumeClaim:
      claimName: demo-pvc-0
EOF
pod/minio created

After the pod was rescheduled on potato-1, and the port-forward to the MinIO pod restarted from potato-1, I could once again access the console and see the contents of my bucket were intact. This is because the DRBD resource LINBIT SDS created for the MinIO pod’s persistent storage replicates writes synchronously between the cluster peers. This means that by using DRBD, you have a block-for-block copy of your block devices on more than one node in the cluster at all times.

In this scenario, K3s happened to reschedule the MinIO pod on another node with a physical replica of the DRBD device, but this isn’t necessarily always the case. If K3s would have rescheduled the MinIO pod on a node without a physical replica of the DRBD device, the LINSTOR CSI driver would have created what we call a “diskless” resources on that node. A “diskless” resource uses DRBD’s replication network to attach the “diskless” peer to a node in the cluster that does contain a physical replica of the volume, allowing reads and writes to occur over the network. You can think of this like NVMe-oF or iSCSI targets and initiators, except that it uses DRBD’s internal protocols. Since this may be undesirable for workloads that are sensitive to latency, such as databases, you can configure LINBIT SDS to enforce volume locality in Kubernetes.

Total Cost of Ownership

LINBIT SDS is open source, with LINBIT offering support and services on a subscription basis. This means that the total cost of ownership (TCO) in terms of acquisition can be as low as the price of your hardware. My Potato cluster can be had for less than $100 USD from Amazon at the time I’m writing this blog. Realistically, you’re not going to run anything meaningful on a couple of Raspberry Pi clones, but I think I’ve made my point that you don’t need to spend tens of thousands of dollars for hardware to run a LINBIT SDS cluster.

The other side of TCO is the operating cost. This is where TCO involving open source software gets a bit more abstract. The price of hiring a Linux system administrator familiar with distributed storage can vary widely depending on the region you operate in, and you’ll want to have enough other work to keep an admin busy to make their salary a good investment. If that makes open source sound expensive, you’re not wrong, but LINBIT stands by its software and its users offering subscriptions at a fraction of the cost of hiring your own distributed storage expert.

Ultimately the actual TCO will come down to the expertise your organization has on staff and how many spare cycles they can put towards maintaining an open source solution like LINBIT SDS. I feel like this is where I can insert one of my favorite quotes regarding open source software, “think free as in free speech, not free beer.”

Concluding Thoughts

I’ve proven, at least to myself but hopefully to you the reader, that you could run LINBIT SDS and Kubernetes on a cluster that fits in a shoe box, with a price tag that’s probably lower than the shoes that came in said shoe box. The efficiency of LINSTOR when coupled with the resilient block storage from DRBD makes running edge services possible using replaceable hardware. The self healing nature of Kubernetes and LINBIT SDS makes replacing a node as easy as running a single command to add it to the Kubernetes cluster, making the combination an excellent platform for running persistent containers at the edge.

After using this “Potato cluster” for a few days to write this blog, I am happy with it, but I’m also eager to tinker with other ARM-based systems that are a bit more powerful. In the past I’ve used DRBD and Pacemaker for HA clustering on small form factor Micro ATX boards with Intel processors to great success, but the low power and size requirements of newer ARM-based systems is attractive for edge environments. If you have experience with a specific hardware platform that could fit this bill, consider joining the LINBIT Slack community and dropping me a message.