Raspberry Pi K3s Cluster Part 2 Networking
In the previous part of this series we bootstrapped the cluster without any CNI because we were going to use Cilium instead of Flannel. Cilium offers a lot of features such as advanced security, policies, and application-aware networking opposed to Flannel, which only provides basic networking. I decided to go with Cilium for two major reasons. The first is that it’s a graduated CNCF project. The second is that it has built in load balancing, IPAM, ingress, and egress so I don’t need additional workloads such as MetalLB or ingress-nginx.
Prerequisits
- You have a barebone cluster with no CNI or you have followed Part 1..
- A domain
- API access to edit DNS records (i.e., AzureDNS, Cloudflare, DigitalOcean, Google CloudDNS, Route53)
- Gateway, such as a Unifi Cloud Gateway, Netgate appliance, or some other device that allows you to create:
- DNS reservations
- DNS records
Software
- CNI: Cilium – Container Network Interface with builtin load balancing.
- Cilium CLI – Command line tool for configuring Cilium CNI.
- Network Observability: Hubble
- Hubble CLI – Command line tool for Hubble
- Certificate Management: cert-manager – Used to automatically request certificates when new workloads are deployed.
Setup
There are two routes to installing the CNI into the cluster, through the CLI or using Helm. Under the hood the CLI installs Cilium via a helm chart making the process of installing the CNI almost identical. In addition to installing the CNI, Cilium’s CLI offers some other features, such as easily enabling Hubble, managing multiple clusters (clustermesh), and troubleshooting. With that said, we will use the CLI to install Cilium.
As mentioned prior, Cilium will replace kube-proxy. We must do this in order to use L2 Announcements since BGP is not typically found in consumer grade networking hardware. Also included is an LB IPAM feature that allows Cilium to assign IP addresses to Kubernetes Services of type LoadBalancer. Once an IP address is assigned, Cilium can advertise those assigned IPs through BGP or L2 Announcements, reducing the need for tools such as MetalLB.
Note: All work will be done on the main control plane.
Cilium CLI
Installing the CLI is straight forward:
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
Cilium CNI
Since Cilium is installed via a helm chart we’re going to ceate a new file called cilium-values.yaml. Before we do that we must go over some under of the hood details.
There are some prerequisites in order to use L2Announcments:
- Kube Proxy replacement mode must be enabled, which is set by
kubeProxyReplacement: truein the following values file. - All network devices on which L2 Aware LB will be announced should be enabled by setting
devices. This is not set and the default action taken when it’s not set is to autodetect the devices. - If using external IPs is desired then
externalIPs.enabledmust be set to true. This is enabled by default.
The limitations of the L2 Announcements are less important in a test/home lab environment opposed to an enterprise environment.
- No IPv6/NDP support
- One node receives all ARP requests for a specific IP. This means that no load balancing can happen before the traffic hits the cluster.
- There is no traffic balancing mechanism. Nodes within the same policy might be asymmetrically loaded.
- Cannot use
externalTrafficPolicy: localon services. It may cause service IPs to be announced on nodes without pods, causing traffic to be dropped.
ARP and NDP will only store a single MAC address per IP, with the last reply they see. In a cluster, only one node in a cluster is allowed to reply to requests for a given IP.
Cilium implements this behavior where each Cilium agent resolves which services are selected for its node and will start participating in leader election for each service. This is done using the lease mechanism. For each service, it is translated into a lease, and the lease holder will start replying to requests on the selected interfaces. This is a first come first serve basis and may result in some asymmetric traffic distribution. To get the leases:
kubectl -n kube-system get lease
With that out of the way we can create the values file.
# Cilium operator config
operator:
# replicas: 1 # Uncomment this if you only have one node
# Roll out cilium-operator pods automatically when configmap is updated.
rollOutPods: true
# Install operator on master node
nodeSelector:
node-role.kubernetes.io/master: "true"
# Roll out cilium agent pods automatically when ConfigMap is updated.
rollOutCiliumPods: true
# K8s API service
k8sServiceHost: auto
k8sServicePort: 6443
# Replace Kube-proxy
kubeProxyReplacement: true
kubeProxyReplacementHealthzBindAddr: "0.0.0.0:10250"
# Configure L2 announcements (LB-IPAM configuration)
l2announcements:
enabled: true
leaseDuration: 15s
leaseRenewDeadline: 5s
leaseRetryPeriod: 2s
# Enables external IP support. Allows you to use external IP addresses for services.
externalIPs:
enabled: true
# Increase the k8s api client rate limit to avoid being limited due to increased API usage
k8sClientRateLimit:
qps: 50
burst: 200
# Enable the ingress controller and make it so
ingressController:
enabled: true
loadbalancerMode: dedicated
default: true
gatewayAPI:
enabled: true
# Traffic may be filtered if you have VLANs setup on a network switch. Allow no more than 5 here or use 0 to allow all vlan traffic
bpf:
vlanBypass:
- 0
# Specify which network interfaces can run the eBPF datapath. This means
# that a packet sent from a pod to a destination outside the cluster will be
# masqueraded (to an output device IPv4 address), if the output device runs the
# program. When not specified, probing will automatically detect devices that have
# a non-local route. This should be used only when autodetection is not suitable.
devices:
- eth0
Note: Some of the values such as
kubeProxyReplacementandexternalIPs.enabledhave the same value as the default. They are here to illustrate their importance.
Not setting k8sServiceHost and K8sServicePort will result in the CNI to crash. It takes a bit of time for the CNI to start. Now we can install Cilium CLI and then install Cilium to the cluster.
cilium install --values cilium-values.yaml
You can watch the status by running the following command:
cilium status --wait
Hubble
The first thing to do is to enable Hubble, which is very easy.
cilium hubble enable
/¯¯\
/¯¯\__/¯¯\ Cilium: OK
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Envoy DaemonSet: OK
\__/¯¯\__/ Hubble Relay: OK
\__/ ClusterMesh: disabled
DaemonSet cilium Desired: 4, Ready: 4/4, Available: 4/4
DaemonSet cilium-envoy Desired: 4, Ready: 4/4, Available: 4/4
Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1
Deployment hubble-relay Desired: 1, Ready: 1/1, Available: 1/1
Containers: cilium Running: 4
cilium-envoy Running: 4
cilium-operator Running: 1
clustermesh-apiserver
hubble-relay Running: 1
Cluster Pods: 3/3 managed by Cilium
Helm chart version: 1.18.0
Image versions cilium quay.io/cilium/cilium:v1.18.0@sha256:dfea023972d06ec183cfa3c9e7809716f85daaff042e573ef366e9ec6a0c0ab2: 4
cilium-envoy quay.io/cilium/cilium-envoy:v1.34.4-1753677767-266d5a01d1d55bd1d60148f991b98dac0390d363@sha256:231b5bd9682dfc648ae97f33dcdc5225c5a526194dda08124f5eded833bf02bf: 4
cilium-operator quay.io/cilium/operator-generic:v1.18.0@sha256:398378b4507b6e9db22be2f4455d8f8e509b189470061b0f813f0fabaf944f51: 1
hubble-relay quay.io/cilium/hubble-relay:v1.18.0@sha256:c13679f22ed250457b7f3581189d97f035608fe13c87b51f57f8a755918e793a: 1
Install Hubble client
Next, we need to install the client, which makes it easy to observe the cluster’s traffic.
HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
HUBBLE_ARCH=arm64
if [ "$(uname -m)" = "aarch64" ]; then HUBBLE_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}
sha256sum --check hubble-linux-${HUBBLE_ARCH}.tar.gz.sha256sum
sudo tar xzvfC hubble-linux-${HUBBLE_ARCH}.tar.gz /usr/local/bin
rm hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}
Once Hubble has been enabled and the client installed we can view the status of cluster:
sudo hubble status -P
Healthcheck (via 127.0.0.1:4245): Ok
Current/Max Flows: 16,380/16,380 (100.00%)
Flows/s: 23.02
Connected Nodes: 4/4
hubble observe -P
You can run the cilium hubble port-forward & to create a port-forward, otherwise you will need to add a -P to any Hubble command.
cert-manager
For now, we’re going to install the cert-manager and configured later.
helm install cert-manager oci://quay.io/jetstack/charts/cert-manager --version v1.19.0 --namespace cert-manager --create-namespace --set crds.enabled=true
Configuration
IP Pools and L2Announcement
The first thing we must do is create an IP pool so that ingresses can be assigned external IP addresses, which in turn allows access to the workload from outside the cluster.
apiVersion: cilium.io/v2
kind: CiliumLoadBalancerIPPool
metadata:
name: pool1
spec:
blocks:
- start: 192.168.1.65
stop: 192.168.1.127
The next thing that we need to do is create an L2 Announcement Policy. Without one we won’t be able to access the service from outside the cluster.
apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
name: default-l2-announcement-policy
spec:
nodeSelector:
matchExpressions:
- key: "node-role.kubernetes.io/control-plane"
operator: DoesNotExist
interfaces:
- eth0
loadBalancerIPs: true
externalIPs: true
Cert-Manager
Unless this cluster has access to the internet where the HTTP01 solver can be used we’re going to rely on the DNS01 solver. We need to create at least two resources.
- A secret containing the API credentials.
- Either a ClusterIssuer or Issuer. The only difference between them is one is namespaced while the other is not.
Important: Just because the ClusterIssuer is not namedspaced doesn’t mean the secret is.
apiVersion: v1
kind: Secret
metadata:
name: digitalocean-dns
namespace: default
data:
access-token: "base64 encoded access-token here"
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt-staging
namespace: default
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
# If the ACME server supports profiles, you can specify the profile name here.
# See #acme-certificate-profiles below.
profile: tlsserver
# You must replace this email address with your own.
# Let's Encrypt will use this to contact you about expiring
# certificates, and issues related to your account.
email: user@example.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-staging
# ACME DNS-01 provider configurations
solvers:
# An empty 'selector' means that this solver matches all domains
- selector: {}
dns01:
digitalocean:
tokenSecretRef:
name: digitalocean-dns
key: access-token