I have deployed jupyterhub to GKE using the Zero to Jupyterhub Helm chart.
I've set up my python environment and notebooks in my spawned singleuser instance, and now I would like to remotely (e.g., via API on another server) run a notebook in my singleuser environment and then download the outputs.
If it helps, I've parameterized the notebook to run it with papermill because I'd like to write a script to run the notebook over series of datasets.
Have a look here. This allows you to deploy your Jupyter notebook as serverless function. You can then invoke that serverless function with different parameters (datasets) on demand.
Related
I normally connect my Databricks notebooks to a cluster.
However we are starting to use pools instead now.
There are temp "job" clusters which get attached to our pool, such as these:
How can I connect a notebook to a pool? (for example, if I am running a notebook interactively via the browser)
I did not find any obvious answer reviewing the documentation such as
https://learn.microsoft.com/en-us/azure/databricks/clusters/instance-pools/
I'm trying to connect a PySpark session running locally to a DataProc cluster. I want to be able to work with files on gcs without downloading them. My goal is to perform ad-hoc analyses using local Spark, then switch to a larger cluster when I'm ready to scale. I realize that DataProc runs Spark on Yarn, and I've copied over the yarn-site.xml locally. I've also opened up an ssh tunnel from my local machine to the DataProc master node and set up port forwarding for the ports identified in the yarn xml. It doesn't seem to be working though, when I try to create a session in a Jupyter notebook it hangs indefinitely. Nothing in stdout or DataProc logs that I can see. Has anyone had success with this?
For anyone interested, I eventually abandoned this approach. I'm instead running Jupyter Enterprise Gateway on the master node, setting up port forwarding, and then launching my notebooks locally to connect to kernel(s) running on the server. It works very nicely so far.
How does one instruct Pulumi to execute one or more commands on a remote host?
The equivalent Terraform command is remote-exec.
Pulumi currently doesn't support remote-exec-like provisioners but they are on the roadmap (see https://github.com/pulumi/pulumi/issues/1691).
For now, I'd recommend using the cloud-init userdata functionality of the various providers as in this AWS EC2 example.
Pulumi supports this with the Command package as of 2021-12-31: https://github.com/pulumi/pulumi/issues/99#issuecomment-1003445058
I have managed to use this command on my hdinsight when I connect via ssh using azure cli, however I want to create an azure powershell scrip that will run the following command but I can't figure out how. I have tried searching for it online but can't find anything.
sudo -HE /usr/bin/anaconda/bin/conda install pandas
In this documentation see a section titled "Apply a script action to a running cluster from Azure PowerShell". You will need to take your script and put it in blob storage and then have the cluster execute that script ok each node using an HDInsight script action. The nice thing about script actions is that when they do cluster maintenance patching the underlying servers and need to take down a node and bring up a new node (or if you scale the cluster) then it will run the script action on any new nodes.
Is it possible to run notebook server with a kernel scheduled as processes on remote cluster (ssh or pbs), (with common directory on NFS)?
For example I have three servers with GPU and would like to run a notebook on one of them, but I do not like to start more than one notebook server. It would be ideal to have notebook server on 4th machine which would in some way scheduele kernels automatically or manually.
I did some trials with making cluster with one engine. Using %%px in each cell is almost a solution, but one cannot use introspection and the notebook code in fact is dependent on the cluster configuration which is not very good.
This is not possible with the notebook at this time. The notebook cannot use a kernel that it did not start.
You could possibly write a new KernelManager that starts kernels remotely distributed across your machines and plug that into the notebook server, but you cannot attach an Engine or other existing kernel to the notebook server.