Running Spark + Scala + Jupyter on Dataproc - scala

I haven't yet managed to get Spark, Scala, and Jupyter to co-operate. Does anyone have a simple recipe? Which version of each component did you use?

Apache Toree is compatible with DataProc's 1.0 image, which currently includes Spark 1.6.1. I had unsuccessfully tried to use it with the preview image, which includes Spark 2.0 preview. To install Toree on the DataProc master you can run
sudo apt install python3-pip
pip3 install --user jupyter
export SPARK_HOME=/usr/lib/spark
pip3 install --pre --user toree
export PATH=$HOME/.local/bin:$PATH
jupyter toree install --user --spark_home=$SPARK_HOME

Spark is included standard on Dataproc clusters.
Here is a gcloud command you can use to create a Dataproc cluster (named "dplab") that includes Jupyter listening on port 8124:
$ gcloud dataproc clusters create dplab \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--metadata "JUPYTER_PORT=8124" \
--zone=us-central1-c
Then run this command to port-forward from your host to the cluster master:
$ gcloud compute ssh dplab-m \
--ssh-flag="-Llocalhost:8124:localhost:8124" --zone=us-central1-c
Open localhost:8124 in your browser and you should see the Jupyter page.

Related

Pyspark kernel error while running jupyter notebook on Amazon EMR cluster

I'm new in AWS EMR. I would like to run jupyter notebook on my cluster from command line with pyspark kernel.
To create cluster I ran command as follows:
aws emr create-cluster --release-label emr-5.32.0 --name 'spark_jupyter_2'
--applications Name=Hadoop Name=Spark Name=Livy Name=JupyterEnterpriseGateway Name=Hive
--ec2-attributes KeyName=…………..,InstanceProfile=EMR_EC2_DefaultRole --service-role EMR_DefaultRole
--instance-groups InstanceGroupType=MASTER,
InstanceCount=1,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m5.xlarge
--region eu-central-1 --log-uri s3://………….../logs/ --no-termination-protected
Then I installed jupyter with sudo pip3 install jupyter
Then I signed in again with ssh -i "…………...pem" -L 8888:localhost:8888 hadoop#ec2-…………..compute.amazonaws.com
,ran command jupyter notebook and went to webpage from the link on the screen.
Until this step everything went smoothly but when I tried to run notebook with pyspark kernel I got ``kernel error```
I don't completely understand why is that. When I run pyspark from command line there's no error.
What do I have to do to run jupyter with pyspark kernel without any errors?
1.

Kubeflow kale failed to connect Rok module

I am trying to integrate kubeflow kale in jupyterlab. For that, I have installed the recommeded package using the below command
RUN pip3 --no-cache-dir install \
--upgrade pip \
urllib3==1.24.3 \
jupyter-client==6.1.5 \
nbformat==5.0.2 \
six==1.15 \
numpy==1.17.3 \
jupyter-console==6.0.0 \
jupyterlab==1.1.1 \
jupyterthemes \
xgboost \
kubeflow-fairing==1.0.0 \
[![enter image description here][1]][1]kubeflow-kale
# Kale installation
RUN jupyter labextension install kubeflow-kale-launcher
The docker image was build successfully. When I run this jupyterlab in the cluster I am getting the below error
Details: Rok Gateway Client module not found
Do I need to install any other plugins?
Please help anyone to fix this problem. Thanks in advance
You should to install ROK on Your Kubernetes cluster.
Without ROK you can't use it

How to use Confluent CLI on docker

I have started Confluent Platform on my windows 10 using docker with the help of https://docs.confluent.io/current/quickstart/ce-docker-quickstart.html. Now I want to try using Confluent CLI. But I don't see any documentation on how to use confluent cli on docker. Can you please suggest me how can I do this !
Confluent does not provide a docker image for the CLIs at this time (that I'm aware of). Until that time, you could build a simple image locally to package up the CLI for experimenting w/ the command.
Create Dockerfile:
FROM ubuntu:latest
RUN apt update && apt upgrade
RUN apt install -y curl
RUN curl -L --http1.1 https://cnfl.io/cli | sh -s -- -b /usr/local/bin
Then build with:
docker build -t confluent-cli:latest .
Then run on the cp-all-in-one network with:
$ docker run -it --rm --network="cp-all-in-one_default" confluent-cli:latest bash
Then from the containers shell, experiement w/ the command:
root#421e53d4a04a:/# confluent
Manage your Confluent Platform.
Usage:
confluent [command]
Available Commands:
cluster Retrieve metadata about Confluent clusters.
completion Print shell completion code.
help Help about any command
iam Manage RBAC, ACL and IAM permissions.
local Manage a local Confluent Platform development environment.
login Log in to Confluent Platform (required for RBAC).
logout Logout of Confluent Platform.
secret Manage secrets for Confluent Platform.
update Update the confluent CLI.
version Print the confluent CLI version.
Flags:
-h, --help help for confluent
-v, --verbose count Increase verbosity (-v for warn, -vv for info, -vvv for debug, -vvvv for trace).
--version version for confluent
Use "confluent [command] --help" for more information about a command.
Here is the image:
https://hub.docker.com/r/confluentinc/confluent-cli
Basically run the following commands:
devbox1#devbox1:~/onibex/wa$ docker pull confluentinc/confluent-cli
devbox1#devbox1:~/onibex/wa$ docker run confluentinc/confluent-cli
To check if the image was added:
devbox1#devbox1:~/onibex/wa$ docker ps -a | grep confluent-cli
a5ecf9223d35 confluentinc/confluent-cli
Add "sudo" if it is needed.

Kubernetes fission create env error

I've installed the fission CLI with
curl -Lo fission https://github.com/fission/fission/releases/download/v0.2.1/fission-cli-osx && chmod +x fission && sudo mv fission /usr/local/bin/
Now I wan to create nodejs environment
fission env create --name nodejs --image fission/node-env:v0.2.1
but i returns an error
/usr/local/bin/fission: cannot execute binary file: Exec format error
what should I do?
There are two versions of fission available one for Linux and one for Mac OS X, see here: http://fission.io/docs/0.3.0/install/
It looks like you have downloaded fission CLI for OSX and using it on Linux. See here: http://fission.io/docs/0.3.0/install/#linux

Local Kubernetes on CentOS

I am trying to install Kubernetes locally on my CentOS. I am following this blog http://containertutorials.com/get_started_kubernetes/index.html, with appropriate changes to match CentOS and latest Kubernetes version.
./kube-up.sh script runs and exists with no errors and I don't see the server started on port 8080. Is there a way to know what was the error and if there is any other procedure to follow on CentOS 6.3
The easiest way to install the kubernetes cluster is using kubeadm. The initial post which details the steps of setup is here. And the detailed documentation for the kubeadm can be found here. With this you will get the latest released kubernetes.
If you really want to use the script to bring up the cluster, I did following:
Install the required packages
yum install -y git docker etcd
Start docker process
systemctl enable --now docker
Install golang
Latest go version because default centos golang is old and for kubernetes to compile we need at least go1.7
curl -O https://storage.googleapis.com/golang/go1.8.1.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.8.1.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
Setup GOPATH
export GOPATH=~/go
export GOBIN=$GOPATH/bin
export PATH=$PATH:$GOBIN
Download k8s source and other golang dependencies
Note: this might take sometime depending on your internet speed
go get -d k8s.io/kubernetes
go get -u github.com/cloudflare/cfssl/cmd/...
Start cluster
cd $GOPATH/src/k8s.io/kubernetes
./hack/local-up-cluster.sh
In new terminal
alias kubectl=$GOPATH/src/k8s.io/kubernetes/cluster/kubectl.sh
kubectl get nodes