Debug MapReduce (of Hadoop 2.2 or higher) in Eclipse - eclipse

I am able to debug MapReduce (of Hadoop 1.2.1) in Eclipse by following the steps in http://www.thecloudavenue.com/2012/10/debugging-hadoop-mapreduce-program-in.html. But how do I debug MapReduce (of Hadoop 2.2 or higher) in Eclipse?

You can debug in same way.
You just run you MapReduce code in standalone mode and use eclipse to debug MR code like any Java code.

Here are the steps I setup in Eclipse. Environment: Ubuntu 16.04.2, Eclipse Neon.3 Release (4.6.3RC2), jdk1.8.0_121. I did a fresh hadoop-2.7.3 installation under /j01/srv/hadoop, which is my $HADOOP_HOME. Replace $HADOOP_HOME value with your actual path wherever referenced below. For hadoop running from Eclipse, you do not need to do any hadoop configurations, what really needed is to pull the right set of hadoop jars into Eclipse.
Step 1 Create new Java Project
File > New > Project...
Select Java Project, Next
Enter Project name: hadoopmr
Click Configure default...
Source folder name: src/main/java
Output folder name: target/classes
Click Apply, OK, then Next
Click tab Libraries
Click Add External JARs...
Browse to hadoop installation folder, and add the following jars, when done click Finish
$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.3.jar
$HADOOP_HOME/share/hadoop/common/hadoop-nfs-2.7.3.jar
$HADOOP_HOME/share/hadoop/common/lib/avro-1.7.4.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-collections-3.2.2.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-configuration-1.6.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-io-2.4.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-lang-2.6.jar
$HADOOP_HOME/share/hadoop/common/lib/commons-logging-1.1.3.jar
$HADOOP_HOME/share/hadoop/common/lib/hadoop-auth-2.7.3.jar
$HADOOP_HOME/share/hadoop/common/lib/httpclient-4.2.5.jar
$HADOOP_HOME/share/hadoop/common/lib/httpcore-4.2.5.jar
$HADOOP_HOME/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar
$HADOOP_HOME/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar
$HADOOP_HOME/share/hadoop/common/lib/log4j-1.2.17.jar
$HADOOP_HOME/share/hadoop/common/lib/slf4j-api-1.7.10.jar
$HADOOP_HOME/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.3.jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.3.jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.3.jar
$HADOOP_HOME/share/hadoop/mapreduce/lib-examples/hsqldb-2.0.0.jar
$HADOOP_HOME/share/hadoop/tools/lib/guava-11.0.2.jar
$HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-api-2.7.3.jar
$HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-common-2.7.3.jar
Step 2 Create a MapReduce example
Create a new package: org.apache.hadoop.examples
Create WordCount.java under package org.apache.hadoop.examples with the following contents:
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Create input.txt under /home/hadoop/input/ (or your path) with the following contents:
What do you mean by Object
What is Java Virtual Machine
How to create Java Object
How Java enabled High Performance
Step 3 Setup Debug Configuration
In Eclipse, open WordCount.java, set breakpoints in places you like.
Right click on WordCount.java, Debug As > Debug Configurations...
Select Java Application, click New launch configuration on top-left icon
Enter org.apache.hadoop.examples.WordCount in Main class box
Click Arguments tab
enter
/home/hadoop/input/input.txt /home/hadoop/output
into Program arguments
Click Apply, then Debug
Program starts along with hadoop, it should hit the breakpoints you set.
Check results at
ls -l /home/hadoop/output
-rw-r--r-- 1 hadoop hadoop 131 Apr 5 22:59 part-r-00000
-rw-r--r-- 1 hadoop hadoop 0 Apr 5 22:59 _SUCCESS
Notes:
1) If program does not run, make sure Project > Build Automatically is checked. Project > Clean… to force a build
2) You can get more examples from
jar xvf $HADOOP_HOME/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.3-sources.jar
Copy them into this project to continue explore
3) You can download this eclipse project from
git clone https://github.com/drachenrio/hadoopmr
In Eclipse, File > Import... > Existing Projects into Workspace > Next
Browse to cloned project and import it
Open .classpath, replace /j01/srv/hadoop-2.7.3 with your hadoop installation home

Related

Loading JDBC connector in Eclipse Jython plugin

I'm working with an Eclipse-based tool which provides an interactive Jython shell for scripting and data analysis against an internal data model.
I'm trying to write a script which exports results to some form of database, so I'm trying to use the built-in com.ziclix.python.sql package in Jython to provide the interface and the Xerial JDBC connector for SQLite (https://github.com/xerial/sqlite-jdbc) to provide the backend.
The script below works perfectly when run outside of the third-party tool using a standard command line Jython interpreter, including relying on the importJar() hack which is commonly used to work around Jython not always using the user CLASSPATH when run using java -jar <blah>:
from com.ziclix.python.sql import zxJDBC
from java.net import URL, URLClassLoader
from java.lang import ClassLoader
from java.io import File
JDBC_URL = "jdbc:sqlite:test.db"
JDBC_DRIVER = "org.sqlite.JDBC"
JDBC_JAR = "E:/sqlite-jdbc-3.21.0.jar"
# Import Jar file into local class path
def importJar(jarFile):
m = URLClassLoader.getDeclaredMethod("addURL", [URL])
m.accessible = 1
m.invoke(ClassLoader.getSystemClassLoader(), [File(jarFile).toURL()])
def main():
try:
importJar(JDBC_JAR)
dbConn = zxJDBC.connect(JDBC_URL, None, None, JDBC_DRIVER)
cursor = dbConn.cursor()
# Do something useful
cursor.close()
dbConn.close()
except zxJDBC.DatabaseError, msg:
print msg
if __name__ == '__main__':
main()
... but fails when run from the plugin inside Eclipse, where the zxJDBC.connect() call errors with:
driver [org.sqlite.JDBC] not found
If I add the Jar file to the JYTHONPATH environment I can do import org.sqlite.JDBC successfully in the Python script, but the connect call still fails in the Java-side of the JDBC driver manager.
For sake of completeness the full path to the Jar file is on the CLASSPATH, PYTHONPATH, and JYTHONPATH environment variables ...
Any ideas?

NoClassDefFoundError when running HelloWorld.class

Im getting this error when I try to run HelloWorld.class
From this it looks like it's trying to run HelloWorld/class. The program should simply print out HelloWorld!.
package threads;
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}
Any ideas?
Check your classpath: Select Start > Control Panel > System > Advanced > Environment Variables > System Variables > CLASSPATH.
You can make a new variable there OR in the command prompt type: SET CLASSPATH=.;C:\Program Files\Java\jdk-10.0.2(or whatever version you are using)\bin.
Type: cd C:\Users\David\Desktop\eclipse\JNP\bin\threads
this is your DIRECTORY NOT your CLASSPATH
Type: javac HelloWorld.java
a class file named HelloWorld.CLASS should appear in the threads folder.
Type: java HelloWorld
Also make sure you have named the file HelloWorld.java
I hope this helped!

Postgres PL/JAVA: java.lang.ClassNotFoundException error after loading JAR file in database

I am getting the java.lang.ClassNotFoundException: error inside Postgres when running a function that calls a JAR file I have loaded. I have installed and configured PL/JAVA (including the delivered examples) in my database and can run the examples to success. I am not attempting to load/install my first JAR, but I am doing something wrong.
My host controls the OS version: CentOS 6.8. Postgres is version 8.4.
I am attempting to install my own very simple java class, which is a derivative of the delivered example Parameters.addOne class. All my code is in /tmp. Here are the steps I've followed:
Doug.java:
package com.msmetric;
import java.math.BigDecimal;
import java.sql.Date;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Time;
import java.sql.Timestamp;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.TimeZone;
import java.util.logging.Logger;
public class Doug {
public static int addOne(int value) {
return value + 1;
}
}
Compile Doug.java using 'javac Doug.java' succeeds.
Create JAR file with Doug.class file in it using 'jar -cvf Doug.jar Doug.class. This works fine.
Now I load the JAR file into Postgres (public schema), change the classpath, create the function that calls the JAR, then attempt to run at psql prompt.
Run sqlj.install_jar from psql:
select sqlj.install_jar('file:/tmp/Doug.jar','Doug',false);
Set the classpath inside Postgres (from psql prompt postgres=#):
select sqlj.set_classpath('public','Doug');
Create the function that calls the JAR. This create function code is taken directly from the examples.ddr file that came with PL/JAVA. I simply changed org.postgres to com.msmetric.
create or replace function addone(int) returns int as 'com.msmetric.Doug.addOne(java.lang.Integer)' language java;
Now with the JAR loaded and function created, I attempt to run it. This function should simply add 1 to the number provided.
select addone(3);
Results:
ERROR: java.lang.ClassNotFoundException: com.msmetric.Doug
Thoughts?
I'm very sorry I didn't see your question sooner. Underneath all the exotic details (PostgreSQL, PL/Java, schemas, classpaths...), there's just a bit of basic Java going on here: if a jar file contains a class Doug.class in package com.msmetric, its path within the jar has to reflect that: it has to be com/msmetric/Doug.class. Otherwise, it won't be found.
You can set up that whole structure step by step:
javac Doug.java
mkdir com
mkdir com/msmetric
mv Doug.class com/msmetric/
jar -cvf Doug.jar com/msmetric/Doug.class
Or, you can let javac do more of the work for you:
mkdir classes
javac -d classes Doug.java
jar -cvf Doug.jar -C classes .
When you give javac a -ddirectory option, instead of just writing class files next to their .java sources, it will put them all in their proper places under the directory you named, and then you can just tell jar to change into that directory and slurp them all up (don't overlook the . at the end of that jar command).
Once you fix that, if you retry your original steps, you'll see that you now get a different error:
ERROR: Unable to find static method com.msmetric.Doug.addOne with signature (Ljava/lang/Integer;)I
That happens because you declared the function in Doug.java with int addOne(int value) (that is, taking a primitive int argument), but you declared it in SQL with returns int as 'com.msmetric.Doug.addOne(java.lang.Integer)' taking an Integer object.
Once you correct that:
create or replace function addone(int) returns int as 'com.msmetric.Doug.addOne(int)' language java;
you'll be able to see:
# select addone(3);
addone
--------
4
(1 row)
If you happen to see this belated answer, may I ask what version of PL/Java you are using? That's one detail you didn't mention. If it is older than 1.5.0, there are newer features that can help you out. For one, you can just annotate that function:
#Function
public static int addOne(int value) {
return value + 1;
}
and have javac spit out not only the Doug.class file but also a pljava.ddr file with your SQL function declaration already written correctly (no mixing up argument types!). There is a way to include that .ddr file into the jar you create so that you can just call sqlj.install_jar with the last parameter true so it runs the commands in the .ddr and your functions are ready to use. There's a Hello, world example in the docs that shows more of how it's done.
Cheers,
-Chap

How to execute JMeter test case from Java code

How do I run a JMeter test case from Java code?
I have followed the example Here from Blazemeter.com
My code is as follows:
public class BasicSampler {
public static void main(String[] argv) throws Exception {
// JMeter Engine
StandardJMeterEngine jmeter = new StandardJMeterEngine();
// Initialize Properties, logging, locale, etc.
JMeterUtils.loadJMeterProperties("/home/stone/Workbench/automated-testing/apache-jmeter-2.11/bin/jmeter.properties");
JMeterUtils.setJMeterHome("/home/stone/Workbench/automated-testing/apache-jmeter-2.11");
JMeterUtils.initLogging();// you can comment this line out to see extra log messages of i.e. DEBUG level
JMeterUtils.initLocale();
// Initialize JMeter SaveService
SaveService.loadProperties();
// Load existing .jmx Test Plan
FileInputStream in = new FileInputStream("/home/stone/Workbench/automated-testing/apache-jmeter-2.11/bin/examples/CSVSample.jmx");
HashTree testPlanTree = SaveService.loadTree(in);
in.close();
// Run JMeter Test
jmeter.configure(testPlanTree);
jmeter.run();
}
}
but I keep getting the following messages in the console and my test never executes.
INFO 2014-09-23 12:04:40.492 [jmeter.e] (): Listeners will be started after enabling running version
INFO 2014-09-23 12:04:40.511 [jmeter.e] (): To revert to the earlier behaviour, define jmeterengine.startlistenerslater=false
I have also tried uncommented jmeterengine.startlistenerslater=false from jmeter.properties file
How do you know that your "test never executes"?
What is in jmeter.log file (it should be in the root of your project). Or alternatively comment JMeterUtils.initLogging() line to see the full output in STDOUT
Have you changed relative path CSVSample_user.csv in "Get user details" CSV Data Set Config as it may resolve into a different location as it recommended in Using CSV DATA SET CONFIG
Is CSVSample.jtl file generated anywhere (again it should be in the root of your project by default)? What is in it?
The code looks good and I'm pretty sure that the problem is with the path to CSVSample_user.csv file and you have something like java.io.FileNotFoundException in your log. Please double check that CSVSample.jmx file contains valid full path to CSVSample_user.csv.
UPDATE TO ANSWER QUESTIONS IN COMMENTS
jmeter.log file should be under your Eclipse workspace folder by default
Looking into CSVSample.jmx there is a View Resulst in Table listener which is configured to store results under ~/CSVSample.jtl
If you want to see summarizer messages and "classic" .jtl reporting add next few lines before jmeter.configure(testPlanTree); stanza
Summariser summer = null;
String summariserName = JMeterUtils.getPropDefault("summariser.name", "summary");
if (summariserName.length() > 0) {
summer = new Summariser(summariserName);
}
String logFile = "/path/to/jtl/results/file.jtl";
ResultCollector logger = new ResultCollector(summer);
logger.setFilename(logFile);
testPlanTree.add(testPlanTree.getArray()[0], logger);
Try using library - https://github.com/abstracta/jmeter-java-dsl.
It supports implementing JMeter test as java code.
Below example shows how to implement and execute test for REST API. Same approach could be applied to other type of tests as well.
#Test
public void testPerformance() throws IOException {
TestPlanStats stats = testPlan(
threadGroup(2, 10,
httpSampler("http://my.service")
.post("{\"name\": \"test\"}", Type.APPLICATION_JSON)
),
//this is just to log details of each request stats
jtlWriter("test" + Instant.now().toString().replace(":", "-") + ".jtl")
).run();
assertThat(stats.overall().elapsedTimePercentile99()).isLessThan(Duration.ofSeconds(5));
}

How do I manage an unmanaged Eclipse formatting profile?

Our project has an "Unmanaged profile" and save-time autoformatting. I'd like to be able to modify the settings for this unmanaged profile and be able to check them back in to version control.
Eclipse's help documents are quite unhelpful ("You are not allowed to change such a profile, only the creator (manager) of the profile can change it.").
I'm not sure if this will allow you to check your profile back in to version control, but the following process will allow you to edit your profile on any computer that has the source checked out.
To edit your profile, you have to recreate the profile in Eclipse, which you can easily do as follows:
Create a new profile by clicking "New..."
Give the new profile the same name as your existing unmanaged profile.
Before clicking OK, make sure you selected your unmanaged profile in "Initialize settings with the following profile" drop down list.
This will let you recreate the profile, and allow you to modify it in Eclipse, as normal.
Note: This process works with Eclipse Indigo
The problem is that a managed profile is actually stored in your workspace not your project. Settings are pushed into your project when you make project specific changes such as selecting a different profile. But the settings in the project are a different format to those in in the profile (in the workspace).
At present eclipse does not have the ability to perform this in reverse. That is it can't take settings from your project folder to create a profile in the workspace. Effectively an "Unmanaged Profile" is a profile to which you have lost the source code.
The simplest way I've found to reverse-engineer the profile is to generate an XML profile file that can be imported (under the formatter settings --> import button).
To reverse-engineer the the settings from a project; I wrote the following program. It reads the settings from a project folder and writes them out as an XML file:
import java.io.*;
public class ExtractFormatter {
public static void main( String args[] ) throws IOException {
if (args.length < 2)
throw new RuntimeException("No arguements specified; expected <project folder> <output file>");
File inFile = new File(args[0]);
File outFile = new File(args[1]);
BufferedReader reader = new BufferedReader(new FileReader(new File(inFile,
".settings/org.eclipse.jdt.core.prefs")));
PrintWriter writer = new PrintWriter(outFile);
writer.println("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>");
// Retain the date from the file as a comment
String line = reader.readLine();
writer.println("<!-- Exported from " + inFile + " -->");
writer.println("<!-- " + line + " -->");
writer.println("<profiles version=\"12\">");
writer.println("<profile kind=\"CodeFormatterProfile\" name=\"" + inFile.getName()
+ "\" version=\"12\">\")");
// Now read every setting for the formatter and write it out as an XML tag.
for (line = reader.readLine(); line != null; line = reader.readLine()) {
if (line.startsWith("org.eclipse.jdt.core.formatter.")) {
String[] parts = line.split("=", 2);
writer.println("<setting id=\"" + parts[0] + "\" value=\"" + parts[1] + "\" />");
}
}
writer.println("</profile>");
writer.println("</profiles>");
reader.close();
writer.close();
}
}
Eclipse should put a .settings folder in your project dir when you have an unmanaged profile. The only way I've found to change the settings so far is to go into the .settings folder within the project and hand edit the .prefs files in there.