Kubernetes: Difference between revisions

From CSCWiki
Jump to navigation Jump to search
(Created page with "We are running a [https://kubernetes.io/ Kubernetes] cluster on top of CloudStack. User documentation is here: https://docs.cloud.csclub.uwaterloo.ca/kubernetes/ == Clou...")
 
 
(9 intermediate revisions by the same user not shown)
Line 51: Line 51:
We are using a [https://github.com/apalia/cloudstack-csi-driver CSI driver] for PersistentVolume storage.
We are using a [https://github.com/apalia/cloudstack-csi-driver CSI driver] for PersistentVolume storage.


Installation:
Installation: <br>
UPDATE: don't apply the manifest directly; you'll need to download and edit it first. It seems like the labels on the control plane node changed starting from v1.24. <br>
After downloading the manifest, open it in an editor and change <code>node-role.kubernetes.io/master: ""</code> to <code>node-role.kubernetes.io/control-plane: ""</code>. <br>
If you already applied the manifest and need to edit it, just run <code>kubectl -n kube-system edit deployment cloudstack-csi-controller</code>.
<pre>
<pre>
https://github.com/apalia/cloudstack-csi-driver/releases/latest/download/manifest.yaml
wget https://github.com/apalia/cloudstack-csi-driver/releases/latest/download/manifest.yaml
# Make necessary edits
vim manifest.yaml
kubectl apply -f manifest.yaml
</pre>
</pre>


Line 71: Line 77:
csi.cloudstack.apache.org/disk-offering-id: 0da1f706-fd2e-4203-8bae-1b740aef9886
csi.cloudstack.apache.org/disk-offering-id: 0da1f706-fd2e-4203-8bae-1b740aef9886
</pre>
</pre>
Change the disk-offering-id to the ID of the 'custom' disk size offering in CloudStack.
Change the disk-offering-id to the ID of the 'custom' disk size offering in CloudStack. Apply the YAML file once you are done editing it.


<b>Note</b>: only a single writer is allowed, so do NOT use ReadWriteMany on any PersistentVolumeClaims.
<b>Note</b>: only a single writer is allowed, so do NOT use ReadWriteMany on any PersistentVolumeClaims.
Line 95: Line 101:
</pre>
</pre>
(Do this from biloba or chamomile.)
(Do this from biloba or chamomile.)

== Docker shim ==
The original CloudStack Kubernetes ISO which we used (v1.22) used Docker as the container engine, which is no longer supported; after upgrading to v1.24, all hell broke loose because kubelet tried to use containerd instead. As a workaround, we are using [https://github.com/Mirantis/cri-dockerd cri-dockerd] on the control plane and the worker nodes. Each VM should have this in /var/lib/kubelet/kubeadm-flags.env:
<pre>
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///run/cri-dockerd.sock --pod-infra-container-image=k8s.gcr.io/pause:3.7"
</pre>
The solution was found [https://discuss.kubernetes.io/t/unable-to-determine-runtime-api-version-rpc-error/20736 here].


== Enabling a feature gate ==
== Enabling a feature gate ==
UPDATE: starting from v1.23, the PodSecurity feature gate is [https://v1-23.docs.kubernetes.io/docs/concepts/security/pod-security-admission/#enabling-the-podsecurity-admission-plugin enabled by default], so there is no need to manually enable it. The rest of this section was kept for historical purposes only.

In v1.22, the PodSecurity feature gate is an Alpha feature and must be enabled in kube-apiserver (https://kubernetes.io/docs/concepts/security/pod-security-admission/).
In v1.22, the PodSecurity feature gate is an Alpha feature and must be enabled in kube-apiserver (https://kubernetes.io/docs/concepts/security/pod-security-admission/).


Line 106: Line 121:


This will <b>automatically</b> restart the kube-apiserver; wait a minute and run <code>kubectl -n kube-system get pods</code> to check.
This will <b>automatically</b> restart the kube-apiserver; wait a minute and run <code>kubectl -n kube-system get pods</code> to check.

== Certificates ==
=== Admin kubeconfig ===
The kubeconfig in the CloudStack UI will only last one year (as of this writing, it is expired, so don't use it). If it expires again, here's how you can renew it:
<ol>
<li>SSH into the control plane VM (see instructions above)</li>
<li>
Create a file called e.g. kubeadm-config.yaml with this content:
<pre>
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
clusterName: "kubernetes"
controlPlaneEndpoint: "172.19.134.149:6443"
certificatesDir: "/etc/kubernetes/pki"
</pre>
</li>
<li>
Generate a new admin kubeconfig which will last ten years:
<pre>
kubeadm kubeconfig user --config=kubeadm-config.yaml --client-name=kubernetes-admin --org=system:masters --validity-period=87600h0m0s
</pre>
Reference: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#kubeconfig-additional-users
</li>
<li>
Copy the output into <code>/root/.kube/config</code> on biloba and chamomile.
</li>
</ol>

=== Kubelet certificate rotation ===
See https://kubernetes.io/docs/tasks/tls/certificate-rotation/.

SSH into the control plane and make sure that <code>rotateCertificates: true</code> is set in /var/lib/kubelet/config.yaml.

=== What to do if the certificates expire ===
SSH into the control plane VM. To check the expired certs, run <code>kubeadm certs check-expiration</code>. From biloba or chamomile, you can also run <code>kubectl -n kube-system get cm kubeadm-config -o yaml</code> if the admin kubeconfig has not expired.

To renew the expired certificates, run
<pre>
kubeadm certs renew all
</pre>
You now need to restart kube-apiserver, kube-controller-manager, kube-scheduler and etcd. Unfortunately I was never able to figure out how to do this. Deleting the pods doesn't seem to work. We might need to restart all of the Docker containers running on the control plane. You can check the logs of the kube-apiserver pod to see if it's still having certificate expiry issues.

Anywho, the safe but slow option is to just restart all of the Kubernetes VMs from the CloudStack web UI.

To be sure that everything is working again, make sure that you can create a temporary pod successfully.


== Members ==
== Members ==
Line 111: Line 171:


We are also using [https://open-policy-agent.github.io/gatekeeper/website/docs/ OPA Gatekeeper] to restrict the Ingresses which members can create. See [https://git.csclub.uwaterloo.ca/cloud/manifests/src/branch/master/cscingressconstraint-template.yaml here] and [https://git.csclub.uwaterloo.ca/cloud/manifests/src/branch/master/cscingressconstraint-constraint.yaml here] for details.
We are also using [https://open-policy-agent.github.io/gatekeeper/website/docs/ OPA Gatekeeper] to restrict the Ingresses which members can create. See [https://git.csclub.uwaterloo.ca/cloud/manifests/src/branch/master/cscingressconstraint-template.yaml here] and [https://git.csclub.uwaterloo.ca/cloud/manifests/src/branch/master/cscingressconstraint-constraint.yaml here] for details.

=== Certificate Signing Requests ===
We're going to set the max. CSR signing duration to 10 years so that members don't have to worry about their kubeconfig cert expiring (at least, not for a long time).

SSH into the control node and edit /etc/kubernetes/manifests/kube-controller-manager.yaml so that it has the following CLI flag:
<pre>
--cluster-signing-duration=87600h
</pre>
The controller will automatically restart after you save and close the file.

Latest revision as of 02:49, 7 January 2023

We are running a Kubernetes cluster on top of CloudStack.

User documentation is here: https://docs.cloud.csclub.uwaterloo.ca/kubernetes/

CloudStack setup

Enable the Kubernetes plugin from the CloudStack UI. This will require a restart of the management servers.

We currently have one control node and 3 worker nodes. Each node is using the same Compute offering (8 CPUs, 16GB of RAM). Autoscaling is enabled, so CloudStack will automatically create more worker nodes if necessary.

The admin kubeconfig has been installed on biloba and chamomile.

Note that we cannot use LoadBalancers because we are basically running our own load balancer (NGINX) outside of Kubernetes which accepts external traffic. To expose services, use Ingresses or NodePorts instead.

NGINX Ingress

Read this first: https://kubernetes.github.io/ingress-nginx/deploy/#bare-metal-clusters

Then run:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.1.0/deploy/static/provider/baremetal/deploy.yaml

Get the NodePort:

kubectl -n ingress-nginx get svc

Create an upstream in /etc/nginx/nginx.conf which points to the IPs of one or more Kubernetes VMs, with the HTTP port from the NodePort. Then, reload NGINX on biloba and chamomile.

Mark the NGINX IngressClass as the default IngressClass:

kubectl edit ingressclass nginx

This will open up Vim; add the annotation ingressclass.kubernetes.io/is-default-class: "true" to the annotations section.

Edit the global ConfigMap:

kubectl -n ingress-nginx edit configmap ingress-nginx-controller

Add the following to the 'data' section:

  allow-backend-server-header: "true"
  use-forwarded-headers: "true"
  proxy-buffer-size: 128k
  server-snippet: |
    proxy_http_version 1.1;
    proxy_pass_header Connection;
    proxy_pass_header Upgrade;

CSI Driver

We are using a CSI driver for PersistentVolume storage.

Installation:
UPDATE: don't apply the manifest directly; you'll need to download and edit it first. It seems like the labels on the control plane node changed starting from v1.24.
After downloading the manifest, open it in an editor and change node-role.kubernetes.io/master: "" to node-role.kubernetes.io/control-plane: "".
If you already applied the manifest and need to edit it, just run kubectl -n kube-system edit deployment cloudstack-csi-controller.

wget https://github.com/apalia/cloudstack-csi-driver/releases/latest/download/manifest.yaml
# Make necessary edits
vim manifest.yaml
kubectl apply -f manifest.yaml

To make this the default StorageClass, clone the repo, and edit examples/k8s/0-storageclass.yml so that it looks like this:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cloudstack-storage
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.cloudstack.apache.org
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: false
parameters:
  csi.cloudstack.apache.org/disk-offering-id: 0da1f706-fd2e-4203-8bae-1b740aef9886

Change the disk-offering-id to the ID of the 'custom' disk size offering in CloudStack. Apply the YAML file once you are done editing it.

Note: only a single writer is allowed, so do NOT use ReadWriteMany on any PersistentVolumeClaims.

Testing

Create a PersistentVolumeClaim and bind it to a Pod just to make sure that everything's working:

kubectl apply -f ./examples/k8s/pvc.yaml
kubectl apply -f ./examples/k8s/pod.yaml

Run kubectl get pv to make sure that a PersistentVolume was dynamically provisioned.

Once you're done testing, delete the resources:

kubectl delete -f ./examples/k8s/pvc.yaml
kubectl delete -f ./examples/k8s/pod.yaml

SSH'ing into a node

If you need to SSH into one of the Kubernetes nodes, get the IP from the CloudStack UI, and run e.g.

ssh -i /var/lib/cloudstack/management/.ssh/id_rsa core@172.19.134.149

(Do this from biloba or chamomile.)

Docker shim

The original CloudStack Kubernetes ISO which we used (v1.22) used Docker as the container engine, which is no longer supported; after upgrading to v1.24, all hell broke loose because kubelet tried to use containerd instead. As a workaround, we are using cri-dockerd on the control plane and the worker nodes. Each VM should have this in /var/lib/kubelet/kubeadm-flags.env:

KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///run/cri-dockerd.sock --pod-infra-container-image=k8s.gcr.io/pause:3.7"

The solution was found here.

Enabling a feature gate

UPDATE: starting from v1.23, the PodSecurity feature gate is enabled by default, so there is no need to manually enable it. The rest of this section was kept for historical purposes only.

In v1.22, the PodSecurity feature gate is an Alpha feature and must be enabled in kube-apiserver (https://kubernetes.io/docs/concepts/security/pod-security-admission/).

SSH into the control node, and edit /etc/kubernetes/manifests/kube-apiserver.yaml so that the 'command' list has the following flag:

--feature-gates=PodSecurity=true

(If the flag is already present, add the gate after a comma, e.g. --feature-gates=Feature1=true,PodSecurity=true.)

This will automatically restart the kube-apiserver; wait a minute and run kubectl -n kube-system get pods to check.

Certificates

Admin kubeconfig

The kubeconfig in the CloudStack UI will only last one year (as of this writing, it is expired, so don't use it). If it expires again, here's how you can renew it:

  1. SSH into the control plane VM (see instructions above)
  2. Create a file called e.g. kubeadm-config.yaml with this content:
    apiVersion: kubeadm.k8s.io/v1beta3
    kind: ClusterConfiguration
    clusterName: "kubernetes"
    controlPlaneEndpoint: "172.19.134.149:6443"
    certificatesDir: "/etc/kubernetes/pki"
    
  3. Generate a new admin kubeconfig which will last ten years:
    kubeadm kubeconfig user --config=kubeadm-config.yaml --client-name=kubernetes-admin --org=system:masters --validity-period=87600h0m0s
    

    Reference: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#kubeconfig-additional-users

  4. Copy the output into /root/.kube/config on biloba and chamomile.

Kubelet certificate rotation

See https://kubernetes.io/docs/tasks/tls/certificate-rotation/.

SSH into the control plane and make sure that rotateCertificates: true is set in /var/lib/kubelet/config.yaml.

What to do if the certificates expire

SSH into the control plane VM. To check the expired certs, run kubeadm certs check-expiration. From biloba or chamomile, you can also run kubectl -n kube-system get cm kubeadm-config -o yaml if the admin kubeconfig has not expired.

To renew the expired certificates, run

kubeadm certs renew all

You now need to restart kube-apiserver, kube-controller-manager, kube-scheduler and etcd. Unfortunately I was never able to figure out how to do this. Deleting the pods doesn't seem to work. We might need to restart all of the Docker containers running on the control plane. You can check the logs of the kube-apiserver pod to see if it's still having certificate expiry issues.

Anywho, the safe but slow option is to just restart all of the Kubernetes VMs from the CloudStack web UI.

To be sure that everything is working again, make sure that you can create a temporary pod successfully.

Members

ceo manages the creation of new Kubernetes namespaces for members. See here to see how this works.

We are also using OPA Gatekeeper to restrict the Ingresses which members can create. See here and here for details.

Certificate Signing Requests

We're going to set the max. CSR signing duration to 10 years so that members don't have to worry about their kubeconfig cert expiring (at least, not for a long time).

SSH into the control node and edit /etc/kubernetes/manifests/kube-controller-manager.yaml so that it has the following CLI flag:

--cluster-signing-duration=87600h

The controller will automatically restart after you save and close the file.