Customize Your Hub¶

This is a collection of snippets and pointers to further customize your hub. Common customization include the image used to run the hub. A summary of the different ways to customize images:

hub-chart images: in the helm chart, you can specify a pre-built docker image for the JupyterHub environment; set in :code`hub-charts/<hubname>/values.yaml`
user image: when none of the pre-built images are right, you can create a Dockerfile in this repo that gets built as part of deployment; lives in user-images/<hubname>/Dockerfile
hub images: these alter the environment in which JupyterHub runs (as opposed to the packages installed in JupyterHub); we don’t use this currently, but added it when testing hash authentication (see Hub Authentication).

Custom user image¶

Each hub can have a different environment, set of libraries and tools that is provided to students. Hub’s have a default user image, but it does not contain many tools useful for doing science. However you can use it to test the rest of your hub’s setup.

The Jupyter docker stacks provide a good collection of user images to start from. They are maintained by the Jupyter team and updated reasonably often. They already work with JupyterHub so you can quickly get going. Take a look at the relationship between images to get an idea of how the images relate to each other and what is installed in each.

To configure your hub to use the datascience-notebook image edit your hub-charts/<hubname>/values.yaml and add the following snippet:

jupyterhub:
  singleuser:
    image:
      name: jupyter/datascience-notebook
      tag: 135a595d2a93
    startTimeout: 600

You have to specify both the name and an explicit tag. You can not use latest as tag.

Pulling images from docker hub can be a bit slow at times. This means it is a good idea to increase the startTimeout to 600 seconds as shown above.

Self-made user image¶

Sometimes none of the images available as part of the docker-stacks is enough and you want to build a custom image. A good way to get started with this is to base your work on a docker-stacks image that does most of what you need and then customise it further. If you need the libraries from the earth-analytics-python-env you can start from that Docker image as well.

To create a self-made user image create a new directory in the user-images/ directory that has the same name as your hub.

In this directory place a Dockerfile. This will be automatically built by travis. To allow travis to find images and determine which need rebuilding when you need to follow this naming convention.

Below an example of a minimally modified earth-analytics-python-env docker image. It picks a specific tag of the earth-analytics-python-env and then installs JupyterHub version 0.9.2. It also installs the nbzip notebook extension that lets students download the contents of their JupyterHub home directories as a ZIP file to their local machine. The three commands that install and enable the extension are typical for notebook extensions.

FROM earthlab/earth-analytics-python-env:41ae80f

RUN pip install --no-cache --upgrade --upgrade-strategy only-if-needed \
  jupyterhub==0.9.2 nbzip==0.1.0

RUN jupyter serverextension enable --py nbzip --sys-prefix
RUN jupyter nbextension install --py nbzip --sys-prefix
RUN jupyter nbextension enable --py nbzip --sys-prefix

This image will be automatically built by travis. You will need to adjust your hub’s values.yaml to use this image:

jupyterhub:
  singleuser:
    image:
      # tag will be set by travis on deployment
      name: earthlabhubops/ea-k8s-user-<hubname>
      tag: set-on-deployment
    startTimeout: 600

By following the convention that the custom user image for your hub is placed in user-images/<hubname> your docker image will be called earthlabhubops/ea-k8s-user-<hubname>. You do not have to set the tag by hand, travis will take care of that for you.

Pulling images from docker hub can be a bit slow at times. This means it is a good idea to increase the startTimeout to 600 seconds as shown above.

Prefetching data¶

It can be worth prefetching data for your students and including it directly in the docker image. This means they will not have to wait when the course starts. The downside is that your docker image gets bigger. Unfortunately we can not directly add data to student’s home directories. We can only bake this data into the docker image used for each user. In this example we also setup the necessary steps for the data to be copied over to each student’s home directory when they log into the hub.

To include data in your docker image create a custom user image for your hub by following Self-made user image.

An example of using earthpy to download the spatial-vector-lidar dataset is given below:

# Have to explicitly change the matplotlib backend in order to use
# earthpy on the command line.
RUN python -c "import matplotlib; matplotlib.use('Agg'); import earthpy; data = earthpy.io.EarthlabData('/data'); data.get_data('spatial-vector-lidar')"

The general idea is to execute a Python command to trigger the download and store the results in /data. You could use any kind of command to do this. For example you could use wget to fetch a dataset from FigShare or any other website. Try out your command locally to make sure it does exactly what you think it should do.

You can place the data in almost any location inside the container. By convention we use /data though.

If all you need is that the data is available in the container then you are done now. If you’d like to also copy the data over to the students home directory read the below snippet:

jupyterhub:
  singleuser:
    lifecycleHooks:
      postStart:
        exec:
          command:
            - "sh"
            - "-c"
            - >
              mkdir -p /home/jovyan/earth-analytics/data;
              rsync --ignore-existing -razv --progress /data/ /home/jovyan/earth-analytics/data;

The lifecycleHooks entry in the values.yaml of your hub give you the option to run commands when a user’s pod starts. You can place any command here. Keep in mind that the user can start interacting with their pod already before these commands complete. This means you want commands in this section to run reasonably quickly. Otherwise users might be confused or interfere with the commands here.

The above snippet does two things: it makes sure that the earth-analytics/data directory exists in the users home directory. After that it uses rsync to copy the data from /data to this directory. The way rsync is configured means that it will not overwrite files that already exist in the user’s home directory. The assumption is that a user might have edited these files and does not want them to be overwritten. If users want to refresh their datasets because they broke something they can delete that file or dataset, stop their server, and then restart it. They should now have the latest version of the data again. Or they can run the above rsync command manually.

Self-made hub image¶

You can customise the image and environment in which the JupyterHub itself runs. This is useful when you want to use custom authenticators. To create a custom hub image create a directory called hub-images/<hubname>.

An example of installing the Hash authenticator is given here:

# the tag given here has to be compatible with the version of the
# helm chart you are using for this hub.
FROM jupyterhub/k8s-hub:f8dec3f

USER root
RUN pip3 install --no-cache-dir \
         jupyterhub-hashauthenticator==0.4.0

USER ${NB_USER}

This image will be automatically built by travis. You will need to adjust your hub’s values.yaml to use this image:

jupyterhub:
  hub:
    image:
      # tag will be set by travis on deployment
      name: earthlabhubops/ea-k8s-hub-<hubname>
      tag: set-on-deployment

By following the convention that the custom hub image for your hub is placed in hub-images/<hubname> your hub’s docker image will be called earthlabhubops/ea-k8s-hub-<hubname>. You do not have to set the tag by hand, travis will take care of that for you.

Custom authentication¶

To configure the authentication mechanism read Hub Authentication.

Setting up a new cluster