Today, we’re diving into setting up Dask and Jupyter on a Kubernetes cluster hosted on AWS. If you haven’t already got a Kubernetes cluster up and running, you might want to check out my previous guide on how to set it up.
Before we start, here’s a handy YouTube tutorial demonstrating the process of adding Dask and Jupyter to an existing Kubernetes cluster, following the steps below:
Step 1: Install Helm
Helm is like the magic wand for managing Kubernetes packages. We’ll kick off by installing Helm. On Mac OS X, it’s as easy as using brew:
bash
brew update && brew install kubernetes-helm
helm init
Once Helm is initialized, you’ll get a confirmation message stating that Tiller (the server-side component of Helm) has been successfully installed into your Kubernetes Cluster.
Step 2: Install Dask
Now, let’s install Dask using Helm charts. Helm charts are curated application definitions specifically tailored for Helm. First, we need to update the known charts channels and then install the stable version of Dask:
bash
helm repo update
helm install stable/dask
Oops! Looks like we’ve hit a snag. Despite having Dask in the stable Charts channels, the installation failed. The error message hints that we need to grant the serviceaccount API permissions. This involves some Kubernetes RBAC (Role-based access control) configurations.
Thankfully, a StackOverflow post provides us with the solution:
bash
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
helm init --service-account tiller --upgrade
Let’s give installing Dask another shot:
bash
helm install stable/dask
Voila! Dask is now successfully installed on our Kubernetes cluster. Helm has assigned the deployment the name “running-newt”. You’ll notice various resources such as pods and services prefixed with “running-newt”. The deployment includes a dask-scheduler, a dask-jupyter, and three dask-worker processes by default.
Also, take note of the default Jupyter password: “dask”. We’ll need it to log in to our Jupyter server later.
Step 3: Obtain AWS DNS Entry
Before we can access our deployed Jupyter server, we need to determine the URL. Let’s list all services in the namespace:
bash
kubectl get services
The EXTERNAL-IP column displays hexadecimal values, representing AWS ELB (Elastic Load Balancer) entries. Match the EXTERNAL-IP to the appropriate load balancer in your AWS console (EC2 -> Load Balancers) to obtain the exposed DNS entry.
Step 4: Access Jupyter Server
Now, fire up your browser and head over to the Jupyter server using the obtained DNS entry. You’ll be prompted to enter the Jupyter password, which, as we remember, is “dask”. And there you have it – you’re all set to explore Dask and Jupyter on your Kubernetes cluster!