Setting up a new cluster¶
These are notes from winter 2021 about setting up a new cluster from scratch.
Resources¶
setup pages for this project
Assumptions¶
have a gcloud project
ea-jupyter
and the gcloud command line tools set uphave kubectl and helm 3 installed
Long version¶
What follows are “this is what I did, including mistakes that needed to be fixed” rather than “follow these steps exactly as a tutorial”.
Creating a new kubernetes cluster¶
Create a new cluster called jhub2
in the ea-jupyter project:
gcloud container clusters create jhub2 \
--num-nodes=1 --machine-type=n1-standard-2 \
--zone=us-central1-b --image-type=cos_containerd \
--enable-autoscaling --max-nodes=3 --min-nodes=1
Reference https://cloud.google.com/sdk/gcloud/reference/container/clusters/create
Changes from cluster creation in existing docs are:
use the default version of kubernetes rather than specifying a version. You can see the default version of kubernetes using
gcloud container get-server-config
. At this moment, the defaultVersion on the RELEASE channel is 1.17.13-gke.2600.use the containerd image type rather than the default cos type because the latter is being deprecated, see https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd
Output from cluster creation includes the following warnings:
WARNING: Starting in January 2021, clusters will use the Regular release channel by default when `--cluster-version`, `--release-channel`, `--no-enable-autoupgrade`, and `--no-enable-autorepair` flags are not specified.
WARNING: Currently VPC-native is not the default mode during cluster creation. In the future, this will become the default mode and can be disabled using `--no-enable-ip-alias` flag. Use `--[no-]enable-ip-alias` flag to suppress this warning.
WARNING: Starting with version 1.18, clusters will have shielded GKE nodes by default.
WARNING: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s).
and this status:
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
jhub2 us-central1-b 1.16.15-gke.4901 35.184.210.231 n1-standard-2 1.16.15-gke.4901 1 RUNNING
The create a static ip:
gcloud compute addresses create jhub2-ip --region us-central1
List the ip addresses for the cluster(s):
$ gcloud compute addresses list
NAME ADDRESS/RANGE TYPE PURPOSE NETWORK REGION SUBNET STATUS
jhub-ip 35.226.96.84 EXTERNAL us-central1 IN_USE
jhub2-ip 35.225.148.166 EXTERNAL us-central1 RESERVED
This is a single node cluster with the n1-standard-2 node type. In the current jhub cluster, the core-pool is a custom node type and there is also a node pool. There does not seem to be documentation about setting this up in our notes, but the z2jh has notes on setting up a node-pool on GKE (see step 7).
Create the user user pool (using n1-standard-8 nodes with autoscaling maximum at 5 nodes):
gcloud beta container node-pools create user-pool \
--machine-type n1-standard-8 \
--num-nodes 0 \
--enable-autoscaling \
--min-nodes 0 \
--max-nodes 5 \
--node-labels hub.jupyter.org/node-purpose=user \
--node-taints hub.jupyter.org_dedicated=user:NoSchedule \
--zone us-central1-b \
--cluster jhub2
Forgot to set the image type to cos_containerd
to match the core-pool, so I changed that using the web console UI.
Note that creating a cluster changes the default context for kubectl.
When you create a cluster using gcloud container clusters create, an entry is automatically added to the kubeconfig in your environment, and the current context changes to that cluster (from https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl).
Other users will need to change their context using gcloud container clusters get-credentials cluster-name
.
Install JupyterHub¶
Following https://zero-to-jupyterhub.readthedocs.io/en/latest/jupyterhub/installation.html:
$ helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
$ helm repo update
Add the secret token from secrets/staginghub.yaml
to config.yaml as specified in the z2jh docs (will set up the decryption of secrets later and before committing this file!) and then install the helm chart:
helm upgrade staginghub jupyterhub/jupyterhub --install \
--cleanup-on-fail --namespace staginghub --create-namespace \
--version 0.10.6 --values config.yaml
Set up docker image and gitpuller¶
See https://zero-to-jupyterhub.readthedocs.io/en/latest/jupyterhub/customizing/user-environment.html, added the following to config.yaml:
singleuser:
image:
name: earthlabhubops/ea-k8s-user-staginghub
tag: 9d034c2
lifecycleHooks:
postStart:
exec:
command: ["gitpuller", "https://github.com/earthlab-education/ea-bootcamp-fall-2020", "master", "ea-bootcamp-shared"]
Remove the token from config.yaml and provide it on the command line when we upgrade (also add a timeout to allow for downloading the image):
helm upgrade --cleanup-on-fail staginghub jupyterhub/jupyterhub --namespace staginghub --version 0.10.6 --timeout 600s --debug -f config.yaml -f ../../secrets/staginghub.yaml
Ingress and https¶
Ingress¶
In order to have multiple hubs at the same URL (e.g. hub.earthdatascience.org/hub1, hub.earthdatascience.org/hub2, etc) we need to set up an ingress controller. As recommended by the z2jh team, we use kubernetes/ingress-nginx. Following the ingress-nginx Helm installation instructions:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
kubectl create namespace ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx -n ingress-nginx
The output includes the following info:
An example Ingress that makes use of the controller:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: example
namespace: foo
spec:
rules:
- host: www.example.com
http:
paths:
- backend:
serviceName: exampleService
servicePort: 80
path: /
# This section is only required if TLS is to be enabled for the Ingress
tls:
- hosts:
- www.example.com
secretName: example-tls
If TLS is enabled for the Ingress, a Secret containing the certificate and key must also be provided:
apiVersion: v1
kind: Secret
metadata:
name: example-tls
namespace: foo
data:
tls.crt: <base64 encoded cert>
tls.key: <base64 encoded key>
type: kubernetes.io/tls
Cert-manager¶
Now we need a TLS certificate manager for https. Here, we deviate from the z2jh documentation and use cert-manager rather than the (deprecated) kube-lego. Following the cert-manager installation guide, specifically the parts about installing with heml:
kubectl create namespace cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
Then install the custom resource definitions (CRDs):
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.crds.yaml
And install the helm chart:
helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v1.1.0
Check the installation:
kubectl get pods --namespace cert-manager
Now you need to install a clusterIssuer resource (this is very poorly documented in the cert-manager docs, presumably because they assume their users know more about k8s than I do).
Create a cluster-issuer.yaml
file based on the ACME template, using:
name=letsencrypt-prod
email=Leah.Wasser@colorado.edu
url=https://acme-v02.api.letsencrypt.org/directory
And create (and check) the clusterissuer:
kubectl create -f cluster-issuer.yaml
kubectl describe clusterissuer letsencrypt-prod
Updating values.yaml¶
Add the following setup to you values.yaml file:
proxy:
service:
type: ClusterIP
hub:
baseUrl: /staginghub/
ingress:
enabled: true
hosts:
- hub.earthdatascience.org
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt-prod"
tls:
- secretName: cert-manager-tls
hosts:
- hub.earthdatascience.org
Then upgrade helm:
helm upgrade --cleanup-on-fail staginghub jupyterhub/jupyterhub --namespace staginghub --version 0.10.6 --timeout 600s --debug -f config.yaml -f ../../secrets/staginghub.yaml
I had to delete the proxy-public service that got created before switching over to manual ingress setup:
kubectl delete service proxy-public -n staginghub
and upgrade helm.
GKE version updating¶
In the GCloud console UI, find the jhub2 GKE cluster, and the release channel option. Change the setting from Static version
to Release channel
and choose the Stable channel. This ensures that the kubernetes version will be automatically updated. Note that this will not be true for the core-pool, since there is only one node.
Deployment¶
We are using GitHub Actions for continuous integration - building and pushing docker images, and deploying updates to the cluster.
Create a gcloud service account and assign the Kubernetes Engine Admin role (roles/container.clusterAdmin). See gcloud iam docs for details about permisssions, or run gcloud iam roles describe roles/container.clusterAdmin
.
The Kubernetes Developer role is not sufficient because the jupyter helm chart requires the ability to add and delete rbac roles as part of installing the pre-upgrade hooks.