Pyspark kernel doesnt start on Jupyterhub - pyspark

The Jupyterhub Pyspark kernel used to work very well but now will not start or will have Kernel connected status (doesnt go to idle) but will not run any code in cells. It is using localprocess spawner with PAM auth on Centos.In the jupyterhub logs we see these messages:
[I 2020-03-10 15:13:08.644 SingleUserNotebookApp restarter:110] KernelRestarter: restarting kernel (4/5), new random ports
Assertion failed: rc == 0 (src/socket_poller.cpp:41)
[W 2020-03-10 15:13:11.666 SingleUserNotebookApp restarter:100] KernelRestarter: restart failed
[W 2020-03-10 15:13:11.666 SingleUserNotebookApp kernelmanager:127] Kernel 0148c77d-143a-4721-90e7-0d9a41a878c4 died, removing from map.
Any thoughts?

This was resolved by reinstalling sparkmagic in the base conda environment:
(base) [root#]# pip install sparkmagic

Related

Use notebook with slurm

I want to be able to connect to our institution's cluster using VS Code Remote SSH without the server running on a compute node instead of the login node. The preferred workflow is to SSH into the login node and then use a command to allocate a job and spin up an interactive shell on a compute node, and then run the jupyter notebook kernel on those node.
I launch jupyter-lab through this bash script
#!/bin/bash
#SBATCH -J jupyterTest
#SBATCH -N 1
#SBATCH --mem=16GB
#SBATCH --time=5:00:00
#SBATCH --output=/home/adufour/work/jupyter.log
#Load necessary modules
module purge
source /home/adufour/.bashrc
conda activate singlecell
#Go to the folder you wanna run jupyter in
cd ~/work
#Start the notebook
jupyter lab --ip=0.0.0.0 --port=8888
Which give me the following log :
[I 2021-02-16 16:14:29.363 ServerApp] jupyterlab | extension was successfully linked.
[I 2021-02-16 16:14:31.551 ServerApp] nbclassic | extension was successfully linked.
[I 2021-02-16 16:14:31.690 LabApp] JupyterLab extension loaded from /work/adufour/anaconda3/envs/singlecell/lib/python3.9/site-packages/jupyterlab
[I 2021-02-16 16:14:31.691 LabApp] JupyterLab application directory is /work/adufour/anaconda3/envs/singlecell/share/jupyter/lab
[I 2021-02-16 16:14:31.698 ServerApp] jupyterlab | extension was successfully loaded.
[I 2021-02-16 16:14:31.714 ServerApp] nbclassic | extension was successfully loaded.
[I 2021-02-16 16:14:31.719 ServerApp] Serving notebooks from local directory: /work/adufour
[I 2021-02-16 16:14:31.719 ServerApp] Jupyter Server 1.3.0 is running at:
[I 2021-02-16 16:14:31.719 ServerApp] http://node126:8888/lab?token=6a1178ff82901e42559423100b13ba12105935432fb0efdd
[I 2021-02-16 16:14:31.719 ServerApp] or http://127.0.0.1:8888/lab?token=6a1178ff82901e42559423100b13ba12105935432fb0efdd
[I 2021-02-16 16:14:31.719 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2021-02-16 16:14:31.744 ServerApp]
To access the server, open this file in a browser:
file:///home/adufour/.local/share/jupyter/runtime/jpserver-108627-open.html
Or copy and paste one of these URLs:
http://node126:8888/lab?token=6a1178ff82901e42559423100b13ba12105935432fb0efdd
or http://127.0.0.1:8888/lab?token=6a1178ff82901e42559423100b13ba12105935432fb0efdd
Unable to connect to VS Code server.
Error in request

Opening Jupyterhub url on kubernetes is so slow(gets js, icon)

I deployed jupyterhub on kubernetes using helm.
and I can login with ID 'admin'
but when I first login, the url doesn't respond or respond after 30~50 seconds later, it seems it fails to get the javascript file or icon.
When I refresh it, it works then.
Is there any problem with the network in my kubernetes cluster?
I'm using GlusterFS Storage Class for Dynamic Provisioning.
This is my config file when install jupyterhub using helm.
proxy:
secretToken: "34999170ac41826f956ee1a757b53ff91ce6efabc3dfe24fcee863955efcc6b9"
The pod's log is like this(with user qqqqq)
[I 2020-12-23 05:22:21.664 SingleUserNotebookApp extension:158] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab
[I 2020-12-23 05:22:21.665 SingleUserNotebookApp extension:159] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 2020-12-23 05:22:22.015 SingleUserNotebookApp singleuser:561] Starting jupyterhub-singleuser server version 1.1.0
[I 2020-12-23 05:22:22.022 SingleUserNotebookApp notebookapp:1924] Serving notebooks from local directory: /home/jovyan
[I 2020-12-23 05:22:22.022 SingleUserNotebookApp notebookapp:1924] The Jupyter Notebook is running at:
[I 2020-12-23 05:22:22.022 SingleUserNotebookApp notebookapp:1924] http://jupyter-qqqqq:8888/user/qqqqq/
[I 2020-12-23 05:22:22.022 SingleUserNotebookApp notebookapp:1925] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2020-12-23 05:22:22.038 SingleUserNotebookApp singleuser:542] Updating Hub with activity every 300 seconds
[I 2020-12-23 05:22:25.096 SingleUserNotebookApp log:174] 302 GET /user/qqqqq/ -> /user/qqqqq/tree? (#10.233.79.154) 0.93ms
[I 2020-12-23 05:22:25.165 SingleUserNotebookApp log:174] 302 GET /user/qqqqq/ -> /user/qqqqq/tree? (#10.233.93.0) 0.76ms
[I 2020-12-23 05:22:25.185 SingleUserNotebookApp log:174] 302 GET /user/qqqqq/tree? -> /hub/api/oauth2/authorize?client_id=jupyterhub-user-qqqqq&redirect_uri=%2Fuser%2Fqqqqq%2Foauth_callback&response_type=code&state=[secret] (#10.233.93.0) 2.31ms
[I 2020-12-23 05:22:25.561 SingleUserNotebookApp auth:981] Logged-in user {'kind': 'user', 'name': 'qqqqq', 'admin': False, 'groups': [], 'server': '/user/qqqqq/', 'pending': None, 'created': '2020-12-23T05:22:16.257525Z', 'last_activity': '2020-12-23T05:22:25.524384Z', 'servers': None}
[I 2020-12-23 05:22:25.562 SingleUserNotebookApp log:174] 302 GET /user/qqqqq/oauth_callback?code=[secret]&state=[secret] -> /user/qqqqq/tree? (#10.233.93.0) 250.52ms
[I 2020-12-23 05:22:25.654 SingleUserNotebookApp log:174] 200 GET /user/qqqqq/tree? (qqqqq#10.233.93.0) 71.92ms
GET //usr/qqqqq/tree? I'm getting stuck in here.
Thanks for any advice!
try looking for events of hub pod and also for user pods, it must be either taking time in allocating pvc, pulling image (for first time only if not present in machine) or setting up route in jhub. Also check whether you've added any postHook in user images config.
I don't know why but, after I changed kubernetes version from 1.16 to 1.17, it works fine.

Minikube not starting on Ubuntu, throwing errors

I'm running Ubuntu 17.04 (zesty) on a Dell XPS 13 (3854 MB of RAM and Intel Core i5-5200U CPU # 2.20GHz) and trying to start up Minikube, but I'm getting a couple errors when I try to start it up.
➜ minikube version
minikube version: v0.22.3
➜ kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.5", GitCommit:"17d7182a7ccbb167074be7a87f0a68bd00d58d97", GitTreeState:"clean", BuildDate:"2017-08-31T09:14:02Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?
I have VM VirtualBox Version 5.2.0 r118431 (Qt5.7.1). I've checked the BIOS settings and have virtualization enabled.
➜ minikube start
Starting local Kubernetes v1.7.5 cluster...
Starting VM...
E1025 09:49:40.206594 22972 start.go:146] Error starting host: Error starting stopped host: Unable to start the VM: /usr/bin/VBoxManage startvm minikube --type headless failed:
VBoxManage: error: The virtual machine 'minikube' has terminated unexpectedly during startup with exit code 1 (0x1)
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component MachineWrap, interface IMachine
.
Retrying.
E1025 09:49:40.207051 22972 start.go:152] Error starting host: Error starting stopped host: Unable to start the VM: /usr/bin/VBoxManage startvm minikube --type headless failed:
VBoxManage: error: The virtual machine 'minikube' has terminated unexpectedly during startup with exit code 1 (0x1)
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component MachineWrap, interface IMachine
I've tried some suggests that I've found online, like running ~/rm -rf .minikube/ and trying to start up minikube again. I've tried running minikube stop followed by a minikube delete and then trying to start minikube again. I've tried specifying the virtualbox driver when starting as well minikube start --vm-driver=virtualbox. These aren't working, I still get the same error.
This looks like an issue with your Virtualbox installation, have you tried reinstalling it?
sudo apt-get purge virtualbox virtualbox-dkms
sudo apt-get install virtualbox-5.1
Try to enable virtual box in BIOS system, in my case it resovled problem

Failed to start puppetserver Service

While trying to run a puppet update form a node:
sudo /opt/puppetlabs/bin/puppet agent -t
I get an error:
Error: Could not retrieve catalog; skipping run
Error: Could not send report: Connection refused - connect(2) for "puppet" port 8140`
Elsewhere indicates this is likely a problem with the puppetserver service, and suggests to reboot the server. Restarting didn't help, and when I try to restart the service I get failure:
~$ sudo service puppetserver restart
Job for puppetserver.service failed because the control process exited with error code. See "systemctl status puppetserver.service" and "journalctl -xe" for details.
I've looked at these logs, and as a puppet/linux noob, I'm not sure what to do next.
systemctl status puppetserver.service
● puppetserver.service - puppetserver Service
Loaded: loaded (/lib/systemd/system/puppetserver.service; enabled; vendor preset: enabled)
Active: activating (start-post) since Fri 2016-09-02 15:54:26 PDT; 2s ago
Process: 22301 ExecStartPre=/usr/bin/install --directory --owner=puppet --group=puppet --mode=775 /var/run/puppetlabs/puppetserver (code=exited
Main PID: 22306 (java); : 22307 (bash)
Tasks: 17
Memory: 335.7M
CPU: 5.535s
CGroup: /system.slice/puppetserver.service
├─22306 /usr/bin/java -Xms6g -Xmx6g -XX:MaxPermSize=256m -XX:OnOutOfMemoryError=kill -9 %p -Djava.security.egd=/dev/urandom -cp /opt/p
└─control
├─22307 /bin/bash /opt/puppetlabs/server/apps/puppetserver/ezbake-functions.sh wait_for_app
└─22331 sleep 1
Sep 02 15:54:26 puppet systemd[1]: Starting puppetserver Service...
Sep 02 15:54:26 puppet java[22306]: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
puppet version 4.6.1
The puppet master communicates with the other node using port number 8140.
I don't think a restart will help, since this looks like a connection issue between the server and the node.
please try the following -
first make sure that the puppet master is actually listening on port 8140. run the following command on the puppetmaster -
netstat -ntlp | grep 8140
this command should return something like this -
tcp 0 0 0.0.0.0:8140 0.0.0.0:* LISTEN 1783/puppetmaster
If you don't get the same output, your puppetmaster is not listening, and therefore can not compile catalogs for the node.
Try checking the puppet master log at /var/log/puppetmaster.log
check that the node can communicate with the puppetmaster on the relevant port. you can check this quickly with the telnet command. run this on your node -
telnet < puppetmaster ip address \ dns name> 8140
you should get something like -
Connected to <puppet-master-IP/DNS-name>
Escape character is '^]'.
if you don't get this output, this means that something is blocking you from accessing the puppetmaster. try opening the port in your firewall to access the puppetmaster.
if you're still stuck try using the --debug flag for verbose output and edit your question.
Could be 2 things: (1) in puppet.conf you have configured more memory than you have on your machine. Or (2) You installed both apt-get install puppetserver and apt-get install puppet.
If you get failed to start puppet.service: unit not found. error on slave machine while connecting to puppet.
Close the putty and then again open and connect it.The issue wont come while starting putty on slave.
The error occurs because there is not enough RAM and to fix the error, open the Puppet server configuration file:
sudo nano /etc/sysconfig/puppetserver
And reduce the amount of allocated RAM for the Puppet server (for example, I specified 512m instead of 2g):
JAVA_ARGS="-Xms512m -Xmx512m"
Now let’s start the Puppet server:
sudo systemctl start puppetserver

Vagrant startup times out on setup

I am doing this as part of the Ambari setup. followed the steps for quick start with Ambari and Vagrant.
I am using this CentOS 6.4 image:
https://github.com/u39kun/ambari-vagrant/blob/master/centos6.4/Vagrantfile
I did this on Google Cloud from RHEL 7.2 host and with VirtualBox 5, but went to install, as suggested, CentOS 6.4 guests.
I successfully installed and configured the pre-requisities (with tweaking required to make vbox 5 work on RHEL 7.2).
When I try to bring up 6 hosts, I see the timeouts where machines are not coming up.
Host machine I am running on is fast - 32 cores, 64 GB RAM, 500 GB SSD ...
Does anyone know what might be the issue?
Is there some firewall I need to turn off, etc.?
[<myuser>#ambari-host-rhel7 centos6.4]$ ./up.sh 6
Bringing machine 'c6401' up with 'virtualbox' provider...
==> c6401: Box 'centos6.4' could not be found. Attempting to find and install... c6401: Box Provider: virtualbox c6401: Box Version: >= 0
==> c6401: Box file was not detected as metadata. Adding it directly...
==> c6401: Adding box 'centos6.4' (v0) for provider: virtualbox c6401: Downloading: http://developer.nrel.gov/downloads/vagrant-boxes/CentOS-6.4-x86_64-v20130427.box
==> c6401: Box download is resuming from prior download progress
==> c6401: Successfully added box 'centos6.4' (v0) for 'virtualbox'!
==> c6401: Importing base box 'centos6.4'...
==> c6401: Matching MAC address for NAT networking...
==> c6401: Setting the name of the VM: centos64_c6401_1456171923223_2329
==> c6401: Clearing any previously set network interfaces...
==> c6401: Preparing network interfaces based on configuration... c6401: Adapter 1: nat c6401: Adapter 2: hostonly
==> c6401: Forwarding ports... c6401: 22 (guest) => 2222 (host) (adapter 1)
==> c6401: Running 'pre-boot' VM customizations...
==> c6401: Booting VM...
==> c6401: Waiting for machine to boot. This may take a few minutes... c6401: SSH address: 127.0.0.1:2222 c6401: SSH username: vagrant c6401: SSH auth method: private key
Timed out while waiting for the machine to boot. This means thatVagrant was unable to communicate with the guest machine withinthe configured ("config.vm.boot_timeout" value) time period.If you look above, you should be able to see the error(s) thatVagrant had when attempting to connect to the machine. These errorsare usually good hints as to what may be wrong.If you're using a custom box, make sure that networking is properlyworking and you're able to connect to the machine. It is a commonproblem that networking isn't setup properly in these boxes.Verify that authentication configurations are also setup properly,as well.If the box appears to be booting properly, you may want to increasethe timeout ("config.vm.boot_timeout") value.
As a final step I get this summary error:
There was an error while executing `VBoxManage`, a CLI used by Vagrantfor controlling VirtualBox.
The command and stderr is shown below.
Command: ["import", "/home/<me>/.vagrant.d/boxes/centos6.4/0/virtualbox/box.ovf", "--vsys", "0", "--vmname", "CentOS-6.4-x86_64_1456173504674_45962", "--vsys", "0", "--unit", "9", "--disk", "/home/<me>/VirtualBox VMs/CentOS-6.4-x86_64_1456173504674_45962/box-disk1.vmdk"]
Stderr: 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Interpreting /home/<me>/.vagrant.d/boxes/centos6.4/0/virtualbox/box.ovf...OK.0%...
Progress state: VBOX_E_FILE_ERRORVBoxManage: error: Appliance import failedVBoxManage: error: Could not create the imported medium '/home/<me>/VirtualBox VMs/CentOS-6.4-x86_64_1456173504674_45962/box-disk1.vmdk'.
VBoxManage: error: VMDK: cannot write allocated data block in '/home/<me>/VirtualBox VMs/CentOS-6.4-x86_64_1456173504674_45962/box-disk1.vmdk' (VERR_DISK_FULL)
VBoxManage: error: Details: code VBOX_E_FILE_ERROR (0x80bb0004), component ApplianceWrap, interface IAppliance
VBoxManage: error: Context: "RTEXITCODE handleImportAppliance(HandlerArg*)" at line 877 of file VBoxManageAppliance.cpp
Any ideas what might be going on?
Do you still have free space on your drive ?
Generally VERR_DISK_FULL indicates that the hard drive is full, it cannot provision enough space for the vdi files.