Use Apache Cascading in windows - eclipse

I am starting to use Cascading library, but I search information and all is about cascading on linux... I have executed fine the Impatient examples in a ubuntu server.
But I want to develop and test my application using eclipse in windows...
Is that posssible?? How I can do it?
Thanks

Glad to hear the "Impatient" examples helped out -
There are two concerns: (1) Windows and (2) Eclipse.
Hadoop runs in Java, and is primarily meant for running apps on clusters. You must be careful on Windows, because the Java support is problematic. I've seen many students attempt to use Cygwin, thinking that would provide a Java layer -- it does not. Running Hadoop atop Cygwin typically is more trouble than it's worth. Obviously the HDInsight work by Microsoft is a great way to run Hadoop on Windows, on Azure. To run Hadoop on your desktop Windows, it's best to use a virtual machine. Then be certain to run in "Standalone Mode", instead of pseudo-distributed mode or attempting to create a cluster on your desktop. Otherwise, it'd be better to run Cascading apps in HDInsight for Hadoop on Azure.
Eclipse is a much simpler answer. Gradle build scripts in for the "Impatient" series show how to use "gradle eclipse" to generate a project to import into your IDE. Even so, you may have to clean up some paths -- Eclipse doesn't handle Gradle imports as cleanly as it should, from what I've seen.
Hope that helps -

To develop and test your Cascading application using eclipse in windows, you need to apply a patch (https://github.com/congainc/patch-hadoop_7682-1.0.x-win). Download the patch jar, then add to your application's CLASSPATH. In your code, you need to set the properties "fs.file.impl"
Properties properties = new Properties();
AppProps.setApplicationJarClass(properties, Main.class);
if (System.getProperty("os.name").toLowerCase().indexOf("win") >= 0) {
properties.put("fs.file.impl",
"com.conga.services.hadoop.patch.HADOOP_7682.WinLocalFileSystem");
}
HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);

Related

Deployment strategies for Go services?

I'm writing some new web services in Go.
What are some deployment strategies I can use, regardless of the target platform? For example, I'm developing on a Mac, but the staging/production servers will be running Linux.
Are there some existing deployment tools I can use that support Go? If not, what are some things I can do to streamline the process?
I use LiteIDE for development. Is there any way to hook LiteIDE into the deployment process?
Unfortunately since Go is such a young language not much exists yet, or at least they've been hard to find. I would also be interested in the development of such tools for Go.
What I have found is that some people have been doing it themselves, or they've adapted other tools, such as Capistrano, to do it for them.
Most likely it's something you'll have to do yourself. And you don't have to limit yourself to shell scripts - do it in Go! In fact many of the Go tools are written in Go. You should avoid compiling on the target system as it's usually a bad practice to have build tools on your production system. Go makes it really easy to cross compile binaries. For example, this is how you compile for ARM & Linux:
GOARCH=arm GOOS=linux go build myapp
One thing you could do is hop on the #go-nuts freenode IRC channel or join the Go mailing list and ask other Gophers what they're doing.
Capistrano sounds like a good idea for deployment alone. You can also do cross-compilation as Luke suggested. Both will work just fine.
More generally though... I'm also kind of torn between OS X (development) and Linux (deployment) and in fact I ended just developing in a virtual machine via VirtualBox and Vagrant. I'm using TextMate 2 for text editing but installing many of development tools on a Mac is just a major PITA and I'm just more comfortable with having Debian or the like running somewhere in the background. The bonus is - this virtual environment can mirror deployment environment so I can avoid surprises when I deploy my code, whatever the language.
I haven't tried it myself, but it appears you can cross compile golang (either with goxc or Dave Cheney's golang-crosscompile), albeit with some caveats.
But if you need to match the environment with production, which probably you should most of the time, it's safest to go as Marcin suggested.
You can find some prebuilt VirtualBox images on http://virtualboxes.org/images/ although creating one yourself is pretty easy.
what are some things I can do to streamline the process?
The cross-compilation idea should be even more appealing with Go 1.5 (Q3 2015), as Dave Cheney details in "Cross compilation just got a whole lot better in Go 1.5":
Before:
For successful cross compilation you would need
compilers for the target platform, if they differed from your host platform, ie you’re on darwin/amd64 (6g) and you want to compile for linux/arm (5g).
a standard library for the target platform, which included some files generated at the point your Go distribution was built.
After (Go 1.5+):
With the plan to translate the Go compiler into Go coming to fruition in the 1.5 release the first issue is now resolved.
package main
import "fmt"
import "runtime"
func main() {
fmt.Printf("Hello %s/%s\n", runtime.GOOS, runtime.GOARCH)
}
build for darwin/386
% env GOOS=darwin GOARCH=386 go build hello.go
# scp to darwin host
$ ./hello
Hello darwin/386
Or build for linux/arm
% env GOOS=linux GOARCH=arm GOARM=7 go build hello.go
# scp to linux host
$ ./hello
Hello linux/arm
I'm developing on a Mac, but the staging/production servers will be running Linux.
Considering the compiler for Go is in Go, the process to produce a Linux executable from your Mac should become straightforward.

Portable scripting language for deployment?

We have set of unix shell(ksh) scripts used for deployment of our product.
Actually there is a mixture of ksh+sed+awk+ant code.
Our product works on AIX only so we did not try to do our scripts portable.
But now we need to run part of our scripts not only on AIX but on Windows also.
We need to have ability to create/update DB both from AIX and from Windows.
Now for this part of functionality we use ksh+ant.
We have a requirement to install as few as possible tools on this Windows box.
In the best case it should be JRE+our products only.
What do you propose to use instead of ksh?
As I know we can put Groovy jar into our project and write this part of functionality on Groovy.
In this case this code will be portable.
May be there are better solutions than Groovy?
Any JVM language such as Jython or Scala should work as well as Groovy so it’s really a choice of what the developers are comfortable with. I’ve had good success with Groovy and have been able to bundle Groovy as a jar file and execute any script I wanted in the following way
Java -jar groovy.jar myscript.groovy
I’ve been able to do this on z/OS, Windows, and Linux.

Is there a way to view commands run in the background in Eclipse?

I use Eclipse Helios for developing J2ME applications. I wanted to know if there was a way I could see the background/text-line commands that are executed for each click/action on the GUI (eg. Compile, Run, Creating a J2ME package). I am interested in it so I can run through the process using a script.
There is nothing specific that you can do to see all commands run in Eclipse. First, there is no one to one mapping between code being executed in Eclipse and java commands that you might run from the command line. So, even though you might be able to view individual commands in vim or Emacs, you won't be able to do this Eclipse.
However, there are some things that may help. There is a platform tracing facility that will print some trace messages to stdout. However, this is not widely used outside of core Eclipse projects so you won't be able to get full tracing on all commands. Also, this facility is not meant to be a general tracing facility, but really only for a few plugins that you are interested in. You can find more information about it here:
http://wiki.eclipse.org/FAQ_How_do_I_use_the_platform_debug_tracing_facility%3F
So, you will need to create a .options file in your Eclipse install directory with something like the following contents:
org.eclipse.platform/debug=true
org.eclipse.ui/debug=true
org.eclipse.core.runtime/debug=true
org.eclipse.core.resources/debug=true
org.eclipse.core.commands/debug=true
org.eclipse.core.filesystem/debug=true
org.eclipse.core.jobs/debug=true
These are a few of the low-level Eclipse plugins that will very likely contain some tracing information. However, without knowing more about what you are trying to do, it is hard to recommend specific plugins to trace.

Running java without installing jre?

As asked and answered here, python has a useful way of deployment without installers. Can Java do the same thing?
Is there any way to run Java's jar file without installing jre?
Is there a tool something like java2exe (win32), java2bin (linux) or java2app (mac)?
You can use Launch4j for this. Well documented and easy to use. While the resulting program still needs a JRE to run, you don't have to install the JRE on the target system. You can just copy it with your application and tell Launch4j were to find it or just wrap it up with everything else.
For creating native executables, you can use Excelsion Jet, which compiles Java to native code. We used it for a project at work, and we had to perform zero modification to the original source code (which targetted Sun's JDK).
you can embbed the JRE inside your application and create a setup or installation for your application.
You can have a look at
http://www.bearcave.com/software/java/comp_java.html
You might get it what you want.
You might want to check out how Eclipse does it - it has a native .exe that can use a local (to the installation) JRE.
You might be able to get some luck with GCJ - haven't tried it myself.
You can do it with NetBeans and a couple of tools. The result is a standalone installer that packages everything you need, so your software can run without installing JRE. It is also completely portable, because it install your software on AppData, that is, it does not need privileges to be installed. Maybe you can even configure the installation path, or you can install it on your own PC, locate the folder and copy it to distribute your software in that way.
Check the Answer I made on different post
You can use jlink to create your own customized jre which would contain only those dependencies which are needed for execution. This deployment method is really efficient. please follow **this**link for one such example.

Best practices for deploying tools & scripts to production?

I've got a number of batch processes that run behind the scenes for a Linux/PHP website. They are starting to grow in number and complexity, so I want to bring a small amount of process to bear on them.
My source tree has a bunch of cpp files and scripts, organized with development but not deployment in mind. After compiling all the executables, I need to put various scripts and binaries on a cluster of machines. Different machines need different executables, scripts, and config files for their batch processes. I also have a few of tools that I've written that belong on every machine. At the moment, this deployment process is manual and error prone.
I'm guessing I'm just going to end up with a script that runs at the root of the source tree and builds a smaller tree of everything necessary for any of the machines. Then, I'll just rsync that to the appropriate machines. But I'm curious how other people are managing this type of problem. Any ideas?
There are a several categories of tool here. Some people use a combination of tools from these categories. I sometimes use, for example, both Puppet and Capistrano. See Puppet or Capistrano - Use the Right Tool for the Job for a discussion.
Scripting Tools aimed at Deploying an Application:
The general pattern with tools in this category is that you create a script and/or config file, often with sets of commands similar to a Makefile, and the tool will ssh over to your production box, do a checkout of your source, and run whatever other steps are necessary.
Tools in this area usually have facilities for rollback to a previous version. So they'll check out your source to releases/ directory, and create a symbolic link from "current" to "releases/" if all goes well. If there's a problem, you can revert to the previous version by running a command that will remove "current" and link it to the previous releases/ directory.
Capistrano comes from the Rails community but is general-purpose. Users of Capistrano may be interested in deprec, a set of deployment recipes for Capistrano.
Vlad the Deployer is an alternative to Capistrano, again from the Rails community.
Write your own shell script or Makefile.
Options for getting the files to the production box:
Direct checkout from source. Not always possible if your production boxes lack development tools, specifically source code management tools.
Checkout source locally, then tar/zip it up. Use scp or rsync to copy the tarball over. This is sometimes preferred for something like an Amazon EC2 deployment, where a compressed tarball can save time/bandwidth.
Checkout source locally, then rsync it over to the production box.
Packaging Tools
Use your OS's packaging system to generate packages containing the files for your app. Create a master package that has as dependencies the other packages you need. The RubyWorks system is an example of this, used to deploy a Rails stack and sample application. Then it's a matter of using apt, yum/rpm, Windows msi, or whatever to deploy a given version. Rollback involves uninstalling and reinstalling an old version.
General Tools Aimed at Installing Apps/Configs and Maintaining a Set of Systems
These tools do not specifically target the problem of deploying a web app, but rather the more general problem of deploying/maintaining Apps/Configs for a set of servers, or an entire company's workstations. They are aimed more at the system administrator than the web developer, though either can find them useful.
Cfengine is a tool in this category.
Puppet aims to improve on Cfengine. It's got a learning curve but many find it worth the time to figure out how to do the configs. Once you've got it going, each box checks the central server periodically and makes sure everything is up to date. If someone edits a file or changes a permission, this is detected and corrected. So, unlike the deployment tools above, Puppet not only puts files in the right place for you, it ensures they stay that way.
Chef is a little younger than Puppet with a similar approach.
Smartfrog is another tool in this category.
Ansible works with plain YAML files and does not require agents running on the servers it manages
For a comparison of these and many more tools in this category, see the Wikipedia article, Comparison of open source configuration management software.
Take a look at the cfengine tutorial to see if cfengine looks like the right tool for your situation. It may be a little too complicated for a small website, but if it is going to involve more computers and more configuration in the future, at some point you will end up using cfengine or something like that.
Create your own packages in the format your distribution uses, e.g. Debian packages (.deb). These can either be copied to each machine and installed manually, or you can set up your own repository, and add it to your list of sources.
Your packages should be set up so that the scripts they contain consult a configuration file, which is different on each host, depending on what scripts need to be run on each.
To tie it all together, you can create a meta package that just depends on each of the other packages you create. That way, when you set up a new server, you install that one meta package, and the other packages are brought in as dependencies.
Although this process sounds a bit complicated, if you have many scripts and many hosts to deploy them to, it can really pay off in the long run.
I have to roll out PHP scripts and Apache configurations to several customers on a frequent basis. Since they all run Debian Linux, I've set up a Debian package repository on my server and the all the customer has to do is type apt-get upgrade and they get the latest version.
The first thing to do is get all these scripts into a source control repository (svn or git are good) so that you can track changes to these scripts over time.
If you are interested in ruby, check out Capistrano, it is well suited deploying things to multiple machines in a cluster, and is fairly easy to set up. It can read files directly from your version control system.
Puppet is another tool that can be used in this situation. It is similar to cfengine - you create a model of the desired deployment and Puppet figures how to get the environment to this state.