How to share a Jupyter installation for team notebook development - jupyter

I'm developing a Jupyter Notebook for my team to use to catalogue and analyse some proprietary data. I'm ready to share it with the team for on-going execution and development. The team generally have Windows 10 workstations and are skilled engineers, though not data scientists. No one currently uses Jupyter.
I now realise I might have thoroughly misjudged Jupyter's ability to support this sort of working environment.
Option 1: Individual installations
This is the worst case scenario. Anyone that wants to run or modify the notebook needs to install Jupyter. Anaconda is probably the best way to go, but its a big, ugly, scary install. Worse, every user will have to install and manage additional libraries. Any notebook change that requires a kernel change will have to be manually applied to each installation.
Surely, being client-server, this is not the intention of Jupyter.
Option 2: One server, many clients
The obvious alternative is to host the Jupyter server on a network accessible computer and have all users connect to it with a browser. That way there's only one shared installation to manage and each user just needs a URL to access it.
But there's a gotcha - the server expects the notebook to be on its own file system! So every user will access the same notebook file. This makes version control very problematic - no one can check out their own copy of the notebook for independent edit and commit sessions. Instead, changes will overwrite the only copy, and commits/reverts/diffs will have to be done on the server (or by mounting the server's file system).
Option 3: Server in Docker image, each user runs a container
Docker to the rescue? That way we can build/maintain one server image (and even version control it) and each user only needs to have a Docker engine installed to instantiate the image (which is a friendly 8GB download!!). They connect to their own container which, with a bit of scripting trickery, will be pointing at their own copy of the notebook.
This option only took 20 hours to investigate before discovering that it fundamentally sucks. Working with the kernel is tricky with lots of new skills necessary. But more showstopping - nothing that shows a Qt window will work. The qtconsole we can do without, but part of our notebook shows a File Open dialog and the best way to do that is with a Qt Widget. With the server in a Docker Container expecting an X Windows environment, and the client in a Windows browser, the Widget cannot be shown.
The Qt issue was the last of many, many issues trying to get the Docker option running. Everything from matplotlib to path mapping, from os library calls to ipywidgets needed to be investigated, tweaked, Googled, chopped and changed to work. I'm fairly convinced that these dramas would be on-going.
Conclusion
There are lots of discussions around Jupyter version control. There's lots of options for read-only sharing. And there's even a project for runtime-building a Docker container to provide executable access to a notebook. But there is scant advice on using Jupyter in a team environment.
Given the endless complications when the server is not natively running on the same machine as the client, I'm starting to believe Option 1 is the only sane way to go. Before I go to my colleagues with the crappy news, are there any other suggestions?

Ended up having a fruitful discussion on the Jupyter Google Group and have confirmed that out-of-the-box, Jupyter does not support this sort of working environment. Indeed, crucially, Jupyter expects the server to have a single user.
The most promising DIY solution was firstly to deploy JupyterHub, for two reasons:
It launches a new server instance for each user, preventing any multi-user per server issues.
It prompts for users to identify themselves, allowing different actions to be taken depending on the user.
And secondly, have the server mount each user's file system (or equivalent network architecture), so it can point the user back to their own local files.
I have not implemented this strategy (making do with Option 1 for now!) but it certainly makes sense.

Related

Run Eclipse EPIC Perl Plugin on Remote Project/Files

I recently started a new job, where all development is done on a remote dev server. I really like Eclipse as a centralized development environment for all the different stuff I'm working on, and am not a particularly big fan of emacs or vi. I'll use emacs if I have to change something quickly, but after really trying to like it for normal development, I'm really starting to miss Eclipse.
That said, is there any way I can use Eclipse with EPIC for Perl development on a remote server? I can live without debugging functionality, but proper syntax highlighting, and the ability to create projects would be really, really nice. So far, I've tried using a remote browser plugin for Eclipse to peruse the remote dev server and open stuff into Eclipse that way, but it is far from ideal. Anyone have any better ideas?
Answering my own question (which no one seems to have looked at or care about, but what the hell-- maybe someone will have the same issue):
Grab Remote Systems Explorer from here.
Setup RSE to ssh into your remote server.
Create a new empty EPIC project (or using whatever plugin/ language you want).
Right click the project, select "New Folder," then
Advanced >> Link to alternate location (Linked Folder)
Switch file system to RSE, then just browse to some folder on your remote system you'd like to become a project, and add it.
That's it, you're done. Now when you open your project in Eclipse, you'll see that folder with all the code you wanted, and you can use it just like you would locally.
The main problem I'm seeing with this right now is that currently I can't get it to do any error checking, which is too bad. I'll work on finding a work around for that and update here if I do.
If you're on linux, you can also mount the remote drive/folder with sshfs and use the same "linked folder technique". I do this all the time for Java EE development. sshfs is also very reliable, unlike Windows network shares mounted on linux with Samba-Client. (Sometimes the Windows sharing service gets confused. And needs to be restarted on the remote server. I use a powershell one liner for this "restart-service -name 'sharing service' " or something likeĀ“that.)

Netbeans: Remote project w/source files over SSH?

Is it possible to set up a remote NetBeans C++ project where the source files are only accessible via SSH?
My project needs to build on a Linux box, but I'd like to develop it on a Windows machine.
Checking out the code via SVN to my Windows machine is not an option since there are a few files that differ only by case, and NTFS is not case sensitive (unfortunately, I can not change them).
I'm well aware that Windows can be kind-of forced be case-aware and the ideal solution is to just re-name those file to something sane.
However, I'm just trying to solve this using NetBeans. Since it's a remote project anyway, why bother to keep any files locally.
Thanks
Currently, no. In general programming files with different cases of the same name is a bad practice.
You can enable case sensitivity in Windows - you may need to have a Professional version or better.
For Windows XP: http://support.microsoft.com/kb/817921
For Windows 7: http://technet.microsoft.com/en-us/library/cc732389.aspx
See also: Windows Services for Unix
Another solution would be to setup VNC/RDP on the remote Unix system. The overall solution should be to conform to a better file naming convention:
Programmer 1: "Hey man, take a look at noCamelCase.cs - I just rewrote it."
Programmer 2: "Um, nocamelcase.cs is blank."
There are two ways of doing remote builds with Netbeans. The first, the project is stored locally. You just create a regular project and on the 2nd page of the wizard you specify the network directory with the source and the remote build host. I've used this for Solaris client to Linux server, but not from Windows as we don't have the mounts exported by SMB. This uses ssh and some shared lib interposers to get the build info.
The second way is to create a remote project. In this case the project is created on the remote host and date is copied on demand to the client. I've only doe a few tests with this as I preferred the first method as it had much better latency.
Lastly, you could either use vnc or install X on your windows machine and do everything on the Linux machine.

Emacs cloud storage options?

What options are there for saving and retrieving documents to and from the cloud, from within Emacs?
I use Emacs at work, on a Windows machine, and at home, on a Linux box, so ideally I would want a solution that works more or less out of the box for both operating systems.
I touched on g-client, but could not quite get it to work. Obviously, if there are no other, simpler options, I'm just going to have to spend a couple of more hours on it.
Many thanks,
Andreas
Dropbox is pretty universal. I store even my Emacs config files there. Works on Windows, Linux, OS X, and iPhone. Syncs automatically. Stores history. Is free. What else do you want?:-)
Two options that I can think of:
If you have access to a server somewhere that runs ssh, then use ssh with tramp. You can also run a ssh server at your home linux box and access your home files through from work. Tramp works perfectly fine on Windows with ssh from cygwin. It will automatically grab a file (provided that you give emacs something like /ssh:yourusername#yourserverhost:~/yourfile), put it to a temporary file at your computer, then copy it back to the host when you save it.
Use a source control system like SVN or Git. Again you can host the server at your home or you can find online hosts (most are for open source and are thus public, but some are free and private; I use unfuddle.com). You would have to regularly commit/update, but you can easily automate that if you want, and the source control system gives you a nice history of your files and a safety net in case you did something very wrong.
Emacs has excellent integration with source control system. If you find the build-in one not sufficient (it is quite generic and thus does not offer interface to some specific features of a particular source control system), there are plenty of good alternative (psvn for SVN, and magit for Git, for example).
sshfs, if you have good connection speed.
Otherwise there's always tramp-mode for Emacs.
Edit: Just saw you are using Windows.
It's been some years since I used Windows as my desktop, but I used WebDrive back then. It sort-of works, although it always was a bit unstable.
Emacs has great support for remote file systems via Tramp. So the real question is what should you use as a remote FS. There are a bunch of them and as long as they have a way of mounting them or logging in via ssh (for Tramp) you should be ok.
I use JungleDisk - works great for Windows, Linux and Mac. Starts around $2 per month and there's a cap of around $90 per year. You can back up to S3 or to Rackspace.
It integrates at the file system level so you can either read/write directly to it or create links from it to your local file system. I use that to share my .emacs, .bash etc between multiple machines.
Chris

Remote Administration of Windows XP through the Command Line

Does anyone know of a good way to do remote administration of a Windows XP machine using just the command line?
At the moment the only things it needs to do is to be able to install applications/patches, and transfer files to and from the machine, and installing registry patches would be nice as well.
Currently we use a horrible hacked together solution that uses NetMeeting, in the past I've thrown together a proof of concept using SSH for windows (at the time windows 2000) but it didn't work to my satisfaction and was pretty buggy. Which was probably the result of the SSH Daemon I was running more then anything.
I'm pretty much open to anything, however a solution using SSH would be ideal since it's already approved for installation in my organization, and it's free. I work in the Canadian Government so anything free is best, and anything that we've already got approved for installation is even better.
psexec will allow you to run commands remotely. Some of the other PsTools can help you kill applications, get a list of processes, etc.
Why must it be
remote administration of a Windows XP machine using just the command line?
I think your very limiting yourself to what is possible by sticking to the command line. In windows environments you can easily use Group Policy to distribute most software and/or patches, and for the ones that you can't you can usually script these changes through any of the popular scripting languages such as JScript, VBScript, Kixtart, AutoIt, Powershell, etc. With these scripting languages you can easily leverage WMI to exceute and mointor processes on remotes systems, copy files, updates registry...basically everything that you're trying to accomplish....and it won't cost you anything but the cost of learning these technologies, and there many online resources and which document how to do them. Here is a link to the Microsoft Script Center, its a great start: http://www.microsoft.com/technet/scriptcenter/default.mspx
I wrote this a while back, and used it to maintain my home windows XP desktop for a while:
ssh and telnet on windows
I used the SSH option (not telnet). It worked for my purposes (killing remote tasks, copying files etc.) It uses Cygwin, but you're able to run regular windows commands as well as the bash commands that come with cygwin.
The Software Testing Automation Framework (STAF) is designed for remote access, installing software, transferring files. etc. It's open source and you can write your own service if there isn't one that does what you need. It also has a GUI component for writing, scheduling, queueing and monitoring jobs across a pool of machines.
At the moment the only things it needs to do is to be able to install
applications/patches, and transfer files to and from the machine, and
installing registry patches would be nice as well.
try to download and install eurysco to use in order:
transfer file, applications and patches with multiple-upload feature from eurysco file browser
install applications and patches in silent mode from eurysco command line feature
edit registry from eurysco system-registry feature
http://www.eurysco.com/features

Best practices for deploying tools & scripts to production?

I've got a number of batch processes that run behind the scenes for a Linux/PHP website. They are starting to grow in number and complexity, so I want to bring a small amount of process to bear on them.
My source tree has a bunch of cpp files and scripts, organized with development but not deployment in mind. After compiling all the executables, I need to put various scripts and binaries on a cluster of machines. Different machines need different executables, scripts, and config files for their batch processes. I also have a few of tools that I've written that belong on every machine. At the moment, this deployment process is manual and error prone.
I'm guessing I'm just going to end up with a script that runs at the root of the source tree and builds a smaller tree of everything necessary for any of the machines. Then, I'll just rsync that to the appropriate machines. But I'm curious how other people are managing this type of problem. Any ideas?
There are a several categories of tool here. Some people use a combination of tools from these categories. I sometimes use, for example, both Puppet and Capistrano. See Puppet or Capistrano - Use the Right Tool for the Job for a discussion.
Scripting Tools aimed at Deploying an Application:
The general pattern with tools in this category is that you create a script and/or config file, often with sets of commands similar to a Makefile, and the tool will ssh over to your production box, do a checkout of your source, and run whatever other steps are necessary.
Tools in this area usually have facilities for rollback to a previous version. So they'll check out your source to releases/ directory, and create a symbolic link from "current" to "releases/" if all goes well. If there's a problem, you can revert to the previous version by running a command that will remove "current" and link it to the previous releases/ directory.
Capistrano comes from the Rails community but is general-purpose. Users of Capistrano may be interested in deprec, a set of deployment recipes for Capistrano.
Vlad the Deployer is an alternative to Capistrano, again from the Rails community.
Write your own shell script or Makefile.
Options for getting the files to the production box:
Direct checkout from source. Not always possible if your production boxes lack development tools, specifically source code management tools.
Checkout source locally, then tar/zip it up. Use scp or rsync to copy the tarball over. This is sometimes preferred for something like an Amazon EC2 deployment, where a compressed tarball can save time/bandwidth.
Checkout source locally, then rsync it over to the production box.
Packaging Tools
Use your OS's packaging system to generate packages containing the files for your app. Create a master package that has as dependencies the other packages you need. The RubyWorks system is an example of this, used to deploy a Rails stack and sample application. Then it's a matter of using apt, yum/rpm, Windows msi, or whatever to deploy a given version. Rollback involves uninstalling and reinstalling an old version.
General Tools Aimed at Installing Apps/Configs and Maintaining a Set of Systems
These tools do not specifically target the problem of deploying a web app, but rather the more general problem of deploying/maintaining Apps/Configs for a set of servers, or an entire company's workstations. They are aimed more at the system administrator than the web developer, though either can find them useful.
Cfengine is a tool in this category.
Puppet aims to improve on Cfengine. It's got a learning curve but many find it worth the time to figure out how to do the configs. Once you've got it going, each box checks the central server periodically and makes sure everything is up to date. If someone edits a file or changes a permission, this is detected and corrected. So, unlike the deployment tools above, Puppet not only puts files in the right place for you, it ensures they stay that way.
Chef is a little younger than Puppet with a similar approach.
Smartfrog is another tool in this category.
Ansible works with plain YAML files and does not require agents running on the servers it manages
For a comparison of these and many more tools in this category, see the Wikipedia article, Comparison of open source configuration management software.
Take a look at the cfengine tutorial to see if cfengine looks like the right tool for your situation. It may be a little too complicated for a small website, but if it is going to involve more computers and more configuration in the future, at some point you will end up using cfengine or something like that.
Create your own packages in the format your distribution uses, e.g. Debian packages (.deb). These can either be copied to each machine and installed manually, or you can set up your own repository, and add it to your list of sources.
Your packages should be set up so that the scripts they contain consult a configuration file, which is different on each host, depending on what scripts need to be run on each.
To tie it all together, you can create a meta package that just depends on each of the other packages you create. That way, when you set up a new server, you install that one meta package, and the other packages are brought in as dependencies.
Although this process sounds a bit complicated, if you have many scripts and many hosts to deploy them to, it can really pay off in the long run.
I have to roll out PHP scripts and Apache configurations to several customers on a frequent basis. Since they all run Debian Linux, I've set up a Debian package repository on my server and the all the customer has to do is type apt-get upgrade and they get the latest version.
The first thing to do is get all these scripts into a source control repository (svn or git are good) so that you can track changes to these scripts over time.
If you are interested in ruby, check out Capistrano, it is well suited deploying things to multiple machines in a cluster, and is fairly easy to set up. It can read files directly from your version control system.
Puppet is another tool that can be used in this situation. It is similar to cfengine - you create a model of the desired deployment and Puppet figures how to get the environment to this state.