Monitoring the Hubs¶
To get an overview of the health of the hubs and infrastructure visit the Grafana page.
The Hub monitor
page lets you see how many pods are running, launch success
rate, which users are using a lot of CPU, etc. For each hub separately.
The Node monitor
page contains information about each of the compute nodes
that are part of the cluster.
The best way to use the monitoring is to watch it for a while when not a lot is happening to get a feeling for what “baseline” looks like. Then login to a hub and observe what it looks like on the monitoring. Next time a workshop is happening keep an eye on the monitoring to see what happens when lots of people login at the same time.
For more about the our Grafana setup, see Grafana and Prometheus.
Operations¶
This section contains commands and snippets that let you inspect the state of the cluster and perform tasks that are useful when things are broken.
To perform these commands you need to have kubectl
installed and setup
on your local laptop (see Google Cloud & Kubernetes Tools for details).
Inspecting what is going on in the cluster¶
To see what is running on the cluster (and get a feeling for what is the normal
state of affairs) run kubectl get pods --all-namespaces
. This will list all
pods that are running, including service pods that we don’t ever interact with.
You should see at least two pods in each namespace that is associated to a hub.
The namespace and hubname are the same, so an eahub
hub lives in the
eahub
namespace.
Pods in the kube-system
, monitoring
and router
namespaces are best
left alone.
Each hub specific namespace should contain at least two pods: hub-77fbd96bb-dh2b5
and proxy-6549f4fbc8-8zn67
. Everything after hub- and proxy- will change
when you restart the hub or make configuration changes. The status of these
two pods should be Running
.
To see what a pod is printing to its terminal run kubectl logs <podname> --namespace <hubname>
.
This will let you see errors or exceptions that happened.
You can find out more about a pod by running kubectl describe pod <podname> --namespace <hubname>
.
This will give you information on why a pod is not running or what it is up to
while you are waiting for it to start running.
When someone logs in to the hub and starts their server a new pod will appear in
the namespace of the hub. The pod will be named jupyter-<username>
. You can
inspect it with the usual kubectl
commands.
Known problems and solutions¶
There is a known problem in one of JupyterHub’s components that means sometimes
the hub misses that a user’s pod has started and will keep that user waiting
after they logged in. The symptoms of this are that there is a running pod for
a user in the right namespace but the login process does not complete. In this
case restart the offending hub by running kubectl delete pod hub-... --namespace <hubname>
,
replacing the … in the hub name with the proper name of the hub pod. This should
not interrupt currently active users and fixes a lot of things that can go wrong.
Inspecting virtual machines¶
To tell how many virtual machines (or nodes) are part of the cluster run
kubectl get nodes
. There should be at least one node with core-pool
in
its name running at all times. Once users login and start their servers new
nodes will appear that have user-pool
in their name. These nodes are
automatically created and destroyed based on demand.
You can learn about a node by running kubectl describe node <nodename>
.