Pointing to private github repository or AWS S3 as notebook directory for Jupyterhub notebook servers - github

Is it possible to point to private github repository or AWS S3 as notebook directory for Jupyterhub notebook servers?
In Jupyterhub config file, I can set C.Spawner.notebook_dir to point to local directories but how can I point to a fileshare protected by password or to a private github repository or AWS S3?
There is some information here - https://github.com/jupyterhub/jupyterhub/issues/314 on customizing the directory location for each user. Is there a way to extend the custom spawner class to have the ability to point to private github or S3?

The simplest way, if you can satisfy the requirements, would be to use the S3 FUSE filesystem, to mount an S3 bucket at a path in your local directory tree.
You could also further extend the custom spawner in that issue to re-clone/update a github repo every time you spawn a notebook (and then pass the path into the notebook), but that would be pretty slow. Also in that case the user account running the spawner needs to be able to read the credentials for the github account. The S3 solution allows you to do this outside of the Jupyter workflow, allowing you to preserve credentials with a different permissions scheme.

Related

Right way of using Google Storage on a GCE VM

I want to know the right/best way of having one machine copying data to Google Storage.
I need one machine to be able to write to a bucket, but not be able to create or delete other buckets.
While researching, I found out that you should create a account service so this account can log in to GC and then use the storage.
But the problem is, when the machine is from GCE, there are scopes. When setting up the scope "Default" it can Read from Google Storage, but can not write to it. Even after authenticated with a service account.
When the scope is Devstorage.read_write now the machine can create and remove buckets from that storage without login. I find that to risk.
Does anyone have any recommendations?
Thanks
The core problem here is that the "write" scope covers both write and delete, and that the GCE service account is likely a member of project-editors, which can create and delete buckets. It sounds like what you want to do is restrict a service account to only being able to affect a single bucket. You should be able to do this with these steps:
Create a service account in your project (and save the private key file).
In the permissions page for the project, make sure that service account is not a project editor for your project.
Using an account that does have full permissions to your project, create the bucket, then grant the service account write access to the bucket. Example gsutil commands to do this:
gsutil mb gs://yourbucket
gsutil acl ch -u your-service-account-name#gserviceaccount.com:W gs://yourbucket
Create a VM that does not have a GCE service account enabled.
Push the service account's private key file to that VM.
On the VM, gcloud auth activate-service-account --key-file=your-key-file.json
Now gsutil commands run on the VM should be able to write to (and delete) objects in that bucket, but not any other buckets in your project.

How to gain SSH access from an AWS instance to another without private key?

I have an SSH keypair: private lives on my local Mac, public lives on several AWS cloud machines.
From my Mac, I can SSH to a cloud instance, call it "deploy server". From there, I need to deploy my application to several instances (I cannot deploy locally).
I authenticate to the other instances with my private key. I can do this by either leaving my private key on the deploy server (insecure), or SSH Agent Forwarding (probably not much better).
Moreover, the deploy takes a while, so I do it in a gnu screen or tmux session; then I just detach and end the SSH session with the deploy server meaning I cannot use SSH Agent Forwarding (as I believe it requires the SSH connection to remain open).
What other options are available to me?
You can use a deploy key. That is a server specific key that has read only access to the repository.
To use this, you need to:
Generate a private key for the server (ssh-keygen on the server)
Set it at the github repo as a deploy key (https://github.com/<user>/<repo>/settings/keys). That will grant read only permissions to the repo. You have a checkbox if you also need write access to it.
Read more on this github help guide. There you can see more methods for deploying from a server accessing a repository.

How to use Service Accounts with gsutil, for downloading from CS - DCM Google private owned bucket

A project, a Google Group have been set up for controlling data access following the DCM guide: https://support.google.com/dcm/partner/answer/3370481?hl=en-GB&ref_topic=6107456
The project does not contain the bucket I want to access(under Storage->Cloud Storage), since it's Google owned bucket, for which I only have read only access. I can see the bucket in my browser since I am allowed to with my Google account(since I am a member of the ACL).
I used the gsutil tool to configure the service account of the project that was linked with the private bucket using
gsutil config -e
but when I try to access that private bucket with
gsutil ls gs://<bucket_name>
I always get 403 errors, and I don't know why is that. Did anyone tried that before or any ideas are welcome.
Since the bucket is private and in project A, service accounts in your project (project B) will not have access. The service account for your project (project B) would need to be added to the ACL for that bucket.
Note that since you can access this bucket with read access as a user, you can run gsutil config to grant your user credentials to gsutil and use that to read the bucket.

Mounting private GitHub repository into a pod's volume

Has anyone mounted a private GitHub repository into a kubernetes pod volume?
What is the best way to achieve this?
I thought of two possible ways:
Using user / password in HTTPS repository URL
Using private SSH key on the machine
I like the second better, but I couldn't figure which user is pulling the repository to puts the appropriate SSH configuration for it.
Any thoughts?
GitHub allows cloning repositories using an OAuth token in https URLs as such:
$ git clone https://$GH_TOKEN#github.com/owner/repo.git
see
https://help.github.com/articles/creating-an-access-token-for-command-line-use/

How to automate cloning private GitHub with Chef

Everyday I must launch new EC2 instances (or any other server with public IP). I'm provisioning it with Chef, creating vhosts, uploading databases etc.
I need to clone there a couple of private repos from GitHub. What would be the best way to do this?
I could manually generate an ssh key, and add it for each GitHub repo I need, then run the script - but it's a lot of work.
I could go for git clone git://user:password#github.com/*****/*****.git, but obviously I don't want to store my password this way
What else?
Is there any way to:
store a private key (or password?) in a recipe/cookbook or
generate new key, and synchronize it via API with GitHub (but this would lead to hundreds of keys in my GitHub account)
Store your key in an S3 bucket and use IAM roles/policies to control access. Citadel makes this easy to integrate with Chef. See my post about secrets management with Chef for a summary of other options.
Shameless plug: the deploy_key cookbook.
I created this cookbook with this precise use case in mind. It manages the entire lifecycle of deploy keys in GitHub, BitBucket and GitLab. It creates a key locally (so that it never has to be sent over the network), adds it to the repo as a deploy_key (read-only, so that these keys don't ever push changes to the repo), and can be used to delete the key files and remove the keys from the repo.
All actions are idempotent so if you're afraid your repos will be flooded with too many deploy_keys you can either remove the key from the repo after use (via Chef, :remove action), or have a periodic clean up task to delete all deploy_keys. Next time Chef runs, it will notice that the key is absent and re-add it.
The only secret you need to protect are the credentials to the repo, which can be protected in the same way you do other secrets.