mapreduce programs using eclipse in CDH4 - eclipse

I am very new to Java, eclipse and Hadoop things, so pardon my mistake if it my question seems too silly.
The question is:
I have 3 node CDH4 cluster of RHEL5 on cloud platform. CDH4 setup has been completed and now I want to write some sample mapreduce programs to learn about it.
Here is my understanding to how to do it:
To write Java mapreduce programs I will have to install Eclipse in my main server, right? Which version of eclipse should i go for.
And just installing eclipse will not be enough, I will have to do some setting changes so that it can use my CDH cluster, what are the things needed to do this?
and last but not least, could you guys please suggest some sites where i can get more info regarding same, remember i am just beginner in all these..:)
Thanks in advance...

Pankaj, you can always visit the official page. Apart from this you might find these links helpful :
It is not mandatory to have Eclipse on the main server(main server=master machine???). Any of the last 3 versions of eclipse work perfectly fine. Don't know about earlier versions. You can either run your job through Eclipse directly or you can write your job in Eclipse and export it as a jar. You can then copy this jar to your JT machine and execute it there through the shell using hadoop/jar command. If you are running your job directly through the eclipse you need to tell it the location of your NameNode and JobTracker machines though these properties :
Configuration conf = new Configuration();
conf.set("", "hdfs://NN_HOST:9000");
conf.set("mapred.job.tracker", "JT_HOST:9001");
(Change the hostnames and ports as per your configuration).
One quick suggestion though. You can always search for these kind of things before posting the question. A lot of info is available over the net and it is very easily accessible.


How do I get a server up and running in WebStorm?

Because of cross domain issues I need to run my code (which is HTML & JS) via a server in WebStorm. None of the instructions I can find are simple and straight forward. Can someone give me an idiots' guide to doing this?
Alternatively, I have got Tomcat up and running in Eclipse, but I can't figure out how to import a non-java project into it. Again, instructions that a bear of little brain can follow would be appreciated.
WebStorm comes with a built-in static web server, listening on localhost:63342. All you need to do to run your code on it is right-clicking your .html file and choosing Run.
see also, Debugging an application running on the built-in server

How to run multiple JBoss EAP 6.3 instances as Windows services

We are migrating our JBoss EAP 4.3 infrastructure to EAP 6.3 (standalone).
We currently run several instances on each machine by having different server folders:
and a different set of startup scripts for each instance:
JBOSS_HOME\bin\run_instance_1.bat ; JBOSS_HOME\bin\service_instance_1.bat
JBOSS_HOME\bin\run_instance_n.bat ; JBOSS_HOME\bin\service_instance_n.bat
This way you can define SERVERNAME and SERVERIP for each instance from service_instance_X.bat.
The problem I'm facing is that I cannot seem to find a similar mechanism on EAP6.3. The closest I got was this command:
JBOSS_HOME\bin\standalone.bat -Djboss.bind.address=%SERVERIP% -Djboss.server.base.dir
which does the job, but that does not help when running it as a service.
There is one promising option to prunsrv which is ++JvmOptions, where you can pass -D and -X options to JVM at service install time, but even when the install command runs successfully with the added options, it keeps starting up using JBOSS_HOME\standalone as jboss.server.base.dir.
Should I rather have custom service.bat, standalone.bat and standalone.conf.bat scripts? Looks like the best approach, but migrating or patching might become troublesome.
Any ideas would be welcome.
I had it working with the last option I mentioned: custom service_instancename.bat, standalone_instancename.bat and standalone_instancename.conf.bat.
But I had to edit several parts of the scripts. Definitely not ideal, but I see no other choice. If anyone came up with a better idea, please share.

How use eclipse debug hadoop wordcount?

I want to use eclipse debug the wordcount, because I want to see the job how to run in the JobTracker. But hadoop use Proxy, I don't know the concrete process that job how to run in the JobTracker. How should I debug?
You are better off debugging "locally" against a single-node cluster (e.g. one of the sandboxes supplied by Cloudera or Hortonworks): that way you can truly step through the code as there is only one mapper/reducer in play. That's been my approach at least: usually the problems I had to debug were to do with the contents of specific files; I just copied over the relevant file to my test system and debugged there.

DFS locations in Eclipse Europa after accessing Hadoop in a VM

I am very new to hadoop. I need to install it and play around with samples.
SO i referred this tutorial . I have installed Sandbox given in that tutorial. I need to configure ECLIPSE in windows mentioning VM location as specified in the image below, which is given in the tutorial.
I have installed eclipse europa and hadoop plugin.
Then in Map/Reduce Locations i gave VM Ip for host name, Linux user name in UserName and 9001 in Map/Reduce port and 9000 in DFS port.
In Advanced Tab I have set value to the mapred.system.dir as /hadoop/mapred/system
and there is no hadoop.job.ugi to give username.
After i click ok, I couldn't get HDFS file system under my DFS locations in ECLIPSE.
Please help me on this
I also got the same problem. The problem here is not related to hadoop configuration but eclipse. To fix this, go to "\workspace\.metadata\.plugins\org.apache.hadoop.eclipse\locations". Here open the XML file and just add the property "hadoop.job.ugi" with value "hadoop-user,ABC" and then restart your eclipse. It worked for me.
I tried by giving just one value i.e. without ABC but it dint work and I dont know the significance of this comma separated value but since I have just started the tutorial I hope to get this answer soon :)
I too ran into the same issue. I installed RedHad cgywin (openssh and openssl packages) and updated the "Path" environment variable with a path to cgywin/bin (c:\rhcygwin\bin). Then my Eclipse DFS location was able to connect to Hadoop on the Virtual Machine. Once that was successful I saw the option "hadoop.job.ugi". describes installing cgywin.
Note: I am running the Hadoop VM on Windows Vista.
I spawned Eclipse from within Cygwin and it worked fine for me (i.e. I could see the "hadoop.job.ugi" parameter). Also, I didn't make any changes to my PATH environment variable.

Running a mapreduce jar on Hadoop cluster

I'm trying to run the map reduce implementation of quadratic sieve algorithm on Hadoop. For this purpose I'm using karmasphere Hadoop community plugin with Netbeans. The program works fine using the plugin. But I'm unable to run it on actual cluster.
I'm running this command
bin/hadoop jar MRIF.jar 689
Where MRIF.jar is the jar file made by building the netbeans project and 689 is number to be factored. The input and output directories are hard coded in program itself. When running on actual cluster, it appears that the inside java classes are not being processed as reduce completes to 100% before map being at 0% itself. And input and output files are created with no content.
But this works fine when running using Karmasphere plugin.
Try running it as bin/hadoop -jar MRIF.jar 689. The -jar forces it to run local and displays information to the console as well as logs to that machine. You can also check the Hadoop logs to see if they have any indicators of why it's not happening correctly.
When using -jar you can use System.out.println(...); to display information on the console, further helping to debug.
You can also use Hadoop Counters (link is random blog post I found) to assist in troubleshooting when running (psuedo-)distributed.
I admit this post isn't a 'solution' to the problem; Without more/further information about what is happening and where, there is a wide range of things that could be going on. If it is, as you mention, not processing the 'inside java classes' then it would likely be your implementation, of which we can't see to make suggestions, ect.
More data about the issue, such as logs, errors or output will likely assist in getting more solution-y responses instead of debugging tips. :)
EDIT: Thanks for the link to the files. I think your call is missing a component.
I looked in the and think this might get it to work for you:
bin/hadoop jar mrif.jar com.javiertordable.mrif.MapReduceQuadraticSieve 689