Running Beats with docker or without? - elastic-stack

As Beats a Data shippers, and they suppose to ship files, metrics, and so on, from a machine, is it a good approach to run beats with docker, as in this way beats is containerized?
I have currently the issue that I want to ship log files from an application, and if I install filebeats with docker, I have to provide somehow the log to the container. Is it a good approach to do something like this with docker, or should I normally install Filebeats, configure and run it on the machine without a container?

I don't think there is a strict rule, but I would use the same approach for your application as for the Beats — either containerize both or neither. That will also help you keep the expectations and setups aligned: logging to a file vs stdout and how to collect that.

Related

How to share a Jupyter installation for team notebook development

I'm developing a Jupyter Notebook for my team to use to catalogue and analyse some proprietary data. I'm ready to share it with the team for on-going execution and development. The team generally have Windows 10 workstations and are skilled engineers, though not data scientists. No one currently uses Jupyter.
I now realise I might have thoroughly misjudged Jupyter's ability to support this sort of working environment.
Option 1: Individual installations
This is the worst case scenario. Anyone that wants to run or modify the notebook needs to install Jupyter. Anaconda is probably the best way to go, but its a big, ugly, scary install. Worse, every user will have to install and manage additional libraries. Any notebook change that requires a kernel change will have to be manually applied to each installation.
Surely, being client-server, this is not the intention of Jupyter.
Option 2: One server, many clients
The obvious alternative is to host the Jupyter server on a network accessible computer and have all users connect to it with a browser. That way there's only one shared installation to manage and each user just needs a URL to access it.
But there's a gotcha - the server expects the notebook to be on its own file system! So every user will access the same notebook file. This makes version control very problematic - no one can check out their own copy of the notebook for independent edit and commit sessions. Instead, changes will overwrite the only copy, and commits/reverts/diffs will have to be done on the server (or by mounting the server's file system).
Option 3: Server in Docker image, each user runs a container
Docker to the rescue? That way we can build/maintain one server image (and even version control it) and each user only needs to have a Docker engine installed to instantiate the image (which is a friendly 8GB download!!). They connect to their own container which, with a bit of scripting trickery, will be pointing at their own copy of the notebook.
This option only took 20 hours to investigate before discovering that it fundamentally sucks. Working with the kernel is tricky with lots of new skills necessary. But more showstopping - nothing that shows a Qt window will work. The qtconsole we can do without, but part of our notebook shows a File Open dialog and the best way to do that is with a Qt Widget. With the server in a Docker Container expecting an X Windows environment, and the client in a Windows browser, the Widget cannot be shown.
The Qt issue was the last of many, many issues trying to get the Docker option running. Everything from matplotlib to path mapping, from os library calls to ipywidgets needed to be investigated, tweaked, Googled, chopped and changed to work. I'm fairly convinced that these dramas would be on-going.
Conclusion
There are lots of discussions around Jupyter version control. There's lots of options for read-only sharing. And there's even a project for runtime-building a Docker container to provide executable access to a notebook. But there is scant advice on using Jupyter in a team environment.
Given the endless complications when the server is not natively running on the same machine as the client, I'm starting to believe Option 1 is the only sane way to go. Before I go to my colleagues with the crappy news, are there any other suggestions?
Ended up having a fruitful discussion on the Jupyter Google Group and have confirmed that out-of-the-box, Jupyter does not support this sort of working environment. Indeed, crucially, Jupyter expects the server to have a single user.
The most promising DIY solution was firstly to deploy JupyterHub, for two reasons:
It launches a new server instance for each user, preventing any multi-user per server issues.
It prompts for users to identify themselves, allowing different actions to be taken depending on the user.
And secondly, have the server mount each user's file system (or equivalent network architecture), so it can point the user back to their own local files.
I have not implemented this strategy (making do with Option 1 for now!) but it certainly makes sense.

Running SBT (Scala) on several (cluster) machines at the same time

So I've been playing with Akka Actors for a while now, and have written some code that can distribute computation across several machines in a cluster. Before I run the "main" code, I need to have an ActorSystem waiting on each machine I will be deploying over, and I usually do this via a Python script that SSH's into all the machines and starts the process by doing something like cd /into/the/proper/folder/ and then sbt 'run-main ActorSystemCode'.
I run this Python script on one of the machines (call it "Machine X"), so I will see the output of SSH'ing into all the other machines in my Machine X SSH session. Whenever I do run the script, it seems all the machines are re-compiling the entire code before actually running it, making me sit there for a few minutes before anything useful is done.
My question is this:
Why do they need to re-compile at all? The same JVM is available on all machines, so shouldn't it just run immediately?
How do I get around this problem of making each machine compile "it's own copy"?
sbt is a build tool and not an application runner. Use sbt-assembly to build an all in one jar and put the jar on each machine and run it with scala or java command.
It's usual for cluster to have a single partition mounted on every node (via NFS or samba). You just need to copy the artifact on that partition and they will be directly accessible in each node. If it's not the case, you should ask your sysadmin to install it.
Then you will need to launch the application. Again, most clusters come
with MPI. The tools mpirun (or mpiexec) are not restricted to real MPI applications and will launch any script you want on several nodes.

How do I setup an init.d rc script for a Daemon-kit project?

I am using the Ruby Daemon-kit to setup a services that does various background operations for my Rails application.
It works fine when I call in on the commandline:
./bin/bgservice
How do I go about creating a daemon initd starter script for it, so that it will autostart on reboot?
There's a few approaches:
You could write /etc/init.d/ scripts that could be placed into the /etc/rc?.d/ directories (or wherever they live on your target distributions). Some details on this mechanism can be found in the Debian policy guidelines and openSUSE initscript tutorial. There's an annoying number of distribution-specific idiosyncrasies in initscripts, so don't feel about writing a simple one and asking distributions to contribute 'better' ones tailored for their environment. (For example, any Debian-derived distribution will provide the immensely useful start-stop-daemon(8) helper, but it is sorely missing from other distributions.)
You could write upstart job specifications for the distributions that support upstart (which I think is Ubuntu, Google ChromeOS, Fedora, .. more?). upstart documentation is still pretty weak, but there are some details and plenty of examples in /etc/init/ on Ubuntu, probably the same location in other distributions that use upstart. Getting the dependencies correct may be some work across all distributions, but upstart job specifications look far simpler to write and maintain than initscripts.
You could add lines to /etc/inittab on distributions that still support the standard SysV-init inittab(5) file. This would only be useful if your program doesn't do the usual daemon fork(2)/setsid(2)/fork(2) incantation, as init uses the pid it gets from fork(2) to determine if your program needs to be restarted.
Modern Vixie cron(8) supports a #reboot specifier in crontab(5) files. This can be used both by the system crontab as well as user crontabs, which might be nice if you just want to run the program as your usual login account.
As the author of daemon-kit I've avoided making any init-style scripts due to coping with the various distributions and they're migrations from old init-V style to newer upstart/insserv, saving myself a nightmare.
How I recommend to do this is to use the god config-generator, and ensure god is started on boot (by runit or some other means), and god starts the daemon up initially and keeps it running.
At best I'll expand daemon-kit to be able to generate runit scripts for boot...
HTH.

Best practices for deploying tools & scripts to production?

I've got a number of batch processes that run behind the scenes for a Linux/PHP website. They are starting to grow in number and complexity, so I want to bring a small amount of process to bear on them.
My source tree has a bunch of cpp files and scripts, organized with development but not deployment in mind. After compiling all the executables, I need to put various scripts and binaries on a cluster of machines. Different machines need different executables, scripts, and config files for their batch processes. I also have a few of tools that I've written that belong on every machine. At the moment, this deployment process is manual and error prone.
I'm guessing I'm just going to end up with a script that runs at the root of the source tree and builds a smaller tree of everything necessary for any of the machines. Then, I'll just rsync that to the appropriate machines. But I'm curious how other people are managing this type of problem. Any ideas?
There are a several categories of tool here. Some people use a combination of tools from these categories. I sometimes use, for example, both Puppet and Capistrano. See Puppet or Capistrano - Use the Right Tool for the Job for a discussion.
Scripting Tools aimed at Deploying an Application:
The general pattern with tools in this category is that you create a script and/or config file, often with sets of commands similar to a Makefile, and the tool will ssh over to your production box, do a checkout of your source, and run whatever other steps are necessary.
Tools in this area usually have facilities for rollback to a previous version. So they'll check out your source to releases/ directory, and create a symbolic link from "current" to "releases/" if all goes well. If there's a problem, you can revert to the previous version by running a command that will remove "current" and link it to the previous releases/ directory.
Capistrano comes from the Rails community but is general-purpose. Users of Capistrano may be interested in deprec, a set of deployment recipes for Capistrano.
Vlad the Deployer is an alternative to Capistrano, again from the Rails community.
Write your own shell script or Makefile.
Options for getting the files to the production box:
Direct checkout from source. Not always possible if your production boxes lack development tools, specifically source code management tools.
Checkout source locally, then tar/zip it up. Use scp or rsync to copy the tarball over. This is sometimes preferred for something like an Amazon EC2 deployment, where a compressed tarball can save time/bandwidth.
Checkout source locally, then rsync it over to the production box.
Packaging Tools
Use your OS's packaging system to generate packages containing the files for your app. Create a master package that has as dependencies the other packages you need. The RubyWorks system is an example of this, used to deploy a Rails stack and sample application. Then it's a matter of using apt, yum/rpm, Windows msi, or whatever to deploy a given version. Rollback involves uninstalling and reinstalling an old version.
General Tools Aimed at Installing Apps/Configs and Maintaining a Set of Systems
These tools do not specifically target the problem of deploying a web app, but rather the more general problem of deploying/maintaining Apps/Configs for a set of servers, or an entire company's workstations. They are aimed more at the system administrator than the web developer, though either can find them useful.
Cfengine is a tool in this category.
Puppet aims to improve on Cfengine. It's got a learning curve but many find it worth the time to figure out how to do the configs. Once you've got it going, each box checks the central server periodically and makes sure everything is up to date. If someone edits a file or changes a permission, this is detected and corrected. So, unlike the deployment tools above, Puppet not only puts files in the right place for you, it ensures they stay that way.
Chef is a little younger than Puppet with a similar approach.
Smartfrog is another tool in this category.
Ansible works with plain YAML files and does not require agents running on the servers it manages
For a comparison of these and many more tools in this category, see the Wikipedia article, Comparison of open source configuration management software.
Take a look at the cfengine tutorial to see if cfengine looks like the right tool for your situation. It may be a little too complicated for a small website, but if it is going to involve more computers and more configuration in the future, at some point you will end up using cfengine or something like that.
Create your own packages in the format your distribution uses, e.g. Debian packages (.deb). These can either be copied to each machine and installed manually, or you can set up your own repository, and add it to your list of sources.
Your packages should be set up so that the scripts they contain consult a configuration file, which is different on each host, depending on what scripts need to be run on each.
To tie it all together, you can create a meta package that just depends on each of the other packages you create. That way, when you set up a new server, you install that one meta package, and the other packages are brought in as dependencies.
Although this process sounds a bit complicated, if you have many scripts and many hosts to deploy them to, it can really pay off in the long run.
I have to roll out PHP scripts and Apache configurations to several customers on a frequent basis. Since they all run Debian Linux, I've set up a Debian package repository on my server and the all the customer has to do is type apt-get upgrade and they get the latest version.
The first thing to do is get all these scripts into a source control repository (svn or git are good) so that you can track changes to these scripts over time.
If you are interested in ruby, check out Capistrano, it is well suited deploying things to multiple machines in a cluster, and is fairly easy to set up. It can read files directly from your version control system.
Puppet is another tool that can be used in this situation. It is similar to cfengine - you create a model of the desired deployment and Puppet figures how to get the environment to this state.

Deploying Perl to a share nothing cluster

Anyone have suggestions for deployment methods for Perl modules to a share nothing cluster?
Our current method is very manual.
Take down half the cluster
Copy Perl modules ( CPAN style modules ) to downed cluster members
ssh to each member and run perl Makefile.pl; make ; make install on each module to be installed
Confirm deployment
In service the newly deployed cluster members, out of service the old cluster members and repeat steps 2 -> 4
This is obviously far from optimal, anyone have or know of good tool chains for deploying Perl modules to a shared nothing cluster?
Take one node offline, install Perl, and then use it to reimage the other nodes.
At least, that's how I imagine you'd want to install software in a shared-nothing cluster. Perl is just the application you happen to be installing.
Assuming all the machines are identical, you should be able to keep one canonical installation, and use rsync or something to keep the others in updated.
I have, in the past, developed a Perl program which used the Expect module (from CPAN) to automate basically the process you described, automatically sshing to each host, copying any necessary files, and performing the installations. Unfortunately, this was developed on-site for a client, so I do not have access to the code to share. If you're familiar with Expect, it shouldn't be too difficult to set up, though.
We currently have a clustered Perl application that does data processing. We also have numerous CPAN modules and modules that we've developed that the software depends on. When you say 'shared nothing', I'm assuming you're referring to things like NFS mounts.
If the machines have identical configurations, then you may be able to build your entire application into a single directory structure (eg: /opt/my-app), tar it up and that could be come the only thing you need to push to the boxes.
As far as deploying it to the boxes, you might be able to use Capistrano. We developed a couple of our own cluster utilities that piggybacked off of ssh - I've released one form of that utility: parallel-jobs. Its README shows an example of executing multiple parallel ssh commands. It's a small step to extend that program to be able to know about your cluster and then be able to execute the same command across the cluster (as opposed to a series of different commands).
If you are using Debian or Ubunto OS you could package your Perl modules - I have open sourced some code to help with this: Perl module builder it's still very rough but does work and can be made to work on your own code as well as CPAN modules, this then makes deployment much easier.
There is also project to get RedHat rpms for all of CPAN, Dave Cross gave a talk Perl in RPM-Land which may be of use.
If you are on some other system which doesn't have packaging then the rsync option (install on one machine and then rsync to the others) should work as well, note you can mount a windows share and rsync to it across unix if needed.
Using a central manager like Puppet makes creating and maintaining machines in a cluster a lot easier to manage, from installing code to managing users and email configuration. There is also a Perl project in the pipeline to do something similar but this has not been made public yet.
Capistrano is a tool that allows you to run commands on a group of servers; it is perfectly suited to making your task considerably easier.
Further down the line of automation, but also complexity, is Puppet that allows you do define a group of servers, give them roles and then push out sets of code to every machine subscribing to a certain role.
I am not sure exactly what a share nothing cluster is, but if it uses some base *nix system like Fedora, Mandriva, or Ubuntu. Many of the perl modules are precompiled for specific architectures. You can easily run these.
If these systems are of the same arch you can do as someone else said and just copy the compiled modules from system to system, just make sure you have all of the dependancies as well on the recipient system.