Hadoop V2.7 and Eclipse - eclipse

I have set up Hadoop v2.7 in my mac and i am able to start the Hadoop daemons.
I would like to write the MR program using eclipse, i need some help to get the hadoop on my eclipse, i would like to know the jar files to be added and basic set up guide
The following is my Driver class code and i couldn't execute it
public class MyJobDriver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
JobConf job = new JobConf(conf, MyJobDriver.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("Patent");
job.setMapperClass(InverseMapper.class);
//Input Split consist two values separated by ","
//K1 and V1 type is Text
job.setInputFormat(KeyValueTextInputFormat.class);
job.set("key.value.separator.in.input.line",",");//Everything before the separator is the key and after is the value
job.setOutputFormat(TextOutputFormat.class);//Key and value written as string and separated by tab(default)
//when k1 and k2 are od same type and V1 and V2 are of same type
//we can skip job.setMapOutputKeyClass() and job.setMapOutputValueClass()
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
//jobClient communicates with the JobTrackers to start job across clusters
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
MyJobDriver driver = new MyJobDriver();
System.out.println("Calling the run method");
int exitCode = ToolRunner.run(driver, args);
System.exit(exitCode);
}

It is too much trouble track and retrieve the necessary jar file (there are many). Instead create a maven project in eclipse and add necessary dependencies as mentioned here https://hadoopi.wordpress.com/2013/05/25/setup-maven-project-for-hadoop-in-5mn/

Related

Dataflow uploading file encoding error

My development environment uses Eclipse OXYGEN, Google Cloud Tools for Eclipse 1.7.0 installed.
I create Google cloud Dataflow Java Project.
There was a problem testing wordcount example.
When reading a file in the bucket, it will be output normally from the log as follows.
The problem occurs when you process data for WordCount and store the data in the bucket.
If you check the saved file, you can see the above picture.
Does dataflow not support Korean language?
here is my TextIO.write Codes
static class WriteData extends PTransform<PCollection<KV<URI, String>>, PDone>
{
private String output;
public WriteData(String output)
{
this.output = output;
}
#Override
public Coder<?> getDefaultOutputCoder()
{
return KvCoder.of(StringDelegateCoder.of(URI.class), StringUtf8Coder.of());
}
#Override
public PDone expand(PCollection<KV<URI, String>> outputfile) {
// TODO Auto-generated method stub
return outputfile
.apply(ParDo.of(new DoFn<KV<URI, String>, String>(){
#ProcessElement
public void processElement(ProcessContext c)
{
output = c.element().getKey().toString();
LOG.info("WRITE DATA : " + c.element().getValue());
c.output(c.element().getValue());
}
}))
.apply(TextIO.write().to(output).withSuffix(".txt"));
}
}
Most of the time, the correct coder can be automatically inferred, but if it doesn't, then make sure you're specifying a coder when reading data.
When you need to specify the coder, you typically need to do it when reading data into your pipeline from an external source (or creating pipeline data from local data), and also when you output pipeline data to an external sink.
For example, you can decode the data to read:
StringUtf8Coder.of().decode(inStream)

where to put images uploaded to be viewed in browser [duplicate]

I read here that one should not save the file in the server anyway as it is not portable, transactional and requires external parameters. However, given that I need a tmp solution for tomcat (7) and that I have (relative) control over the server machine I want to know :
What is the best place to save the file ? Should I save it in /WEB-INF/uploads (advised against here) or someplace under $CATALINA_BASE (see here) or ... ? The JavaEE 6 tutorial gets the path from the user (:wtf:). NB : The file should not be downloadable by any means.
Should I set up a config parameter as detailed here ? I'd appreciate some code (I'd rather give it a relative path - so it is at least Tomcat portable) - Part.write() looks promising - but apparently needs a absolute path
I'd be interested in an exposition of the disadvantages of this approach vs a database/JCR repository one
Unfortunately the FileServlet by #BalusC concentrates on downloading files, while his answer on uploading files skips the part on where to save the file.
A solution easily convertible to use a DB or a JCR implementation (like jackrabbit) would be preferable.
Store it anywhere in an accessible location except of the IDE's project folder aka the server's deploy folder, for reasons mentioned in the answer to Uploaded image only available after refreshing the page:
Changes in the IDE's project folder does not immediately get reflected in the server's work folder. There's kind of a background job in the IDE which takes care that the server's work folder get synced with last updates (this is in IDE terms called "publishing"). This is the main cause of the problem you're seeing.
In real world code there are circumstances where storing uploaded files in the webapp's deploy folder will not work at all. Some servers do (either by default or by configuration) not expand the deployed WAR file into the local disk file system, but instead fully in the memory. You can't create new files in the memory without basically editing the deployed WAR file and redeploying it.
Even when the server expands the deployed WAR file into the local disk file system, all newly created files will get lost on a redeploy or even a simple restart, simply because those new files are not part of the original WAR file.
It really doesn't matter to me or anyone else where exactly on the local disk file system it will be saved, as long as you do not ever use getRealPath() method. Using that method is in any case alarming.
The path to the storage location can in turn be definied in many ways. You have to do it all by yourself. Perhaps this is where your confusion is caused because you somehow expected that the server does that all automagically. Please note that #MultipartConfig(location) does not specify the final upload destination, but the temporary storage location for the case file size exceeds memory storage threshold.
So, the path to the final storage location can be definied in either of the following ways:
Hardcoded:
File uploads = new File("/path/to/uploads");
Environment variable via SET UPLOAD_LOCATION=/path/to/uploads:
File uploads = new File(System.getenv("UPLOAD_LOCATION"));
VM argument during server startup via -Dupload.location="/path/to/uploads":
File uploads = new File(System.getProperty("upload.location"));
*.properties file entry as upload.location=/path/to/uploads:
File uploads = new File(properties.getProperty("upload.location"));
web.xml <context-param> with name upload.location and value /path/to/uploads:
File uploads = new File(getServletContext().getInitParameter("upload.location"));
If any, use the server-provided location, e.g. in JBoss AS/WildFly:
File uploads = new File(System.getProperty("jboss.server.data.dir"), "uploads");
Either way, you can easily reference and save the file as follows:
File file = new File(uploads, "somefilename.ext");
try (InputStream input = part.getInputStream()) {
Files.copy(input, file.toPath());
}
Or, when you want to autogenerate an unique file name to prevent users from overwriting existing files with coincidentally the same name:
File file = File.createTempFile("somefilename-", ".ext", uploads);
try (InputStream input = part.getInputStream()) {
Files.copy(input, file.toPath(), StandardCopyOption.REPLACE_EXISTING);
}
How to obtain part in JSP/Servlet is answered in How to upload files to server using JSP/Servlet? and how to obtain part in JSF is answered in How to upload file using JSF 2.2 <h:inputFile>? Where is the saved File?
Note: do not use Part#write() as it interprets the path relative to the temporary storage location defined in #MultipartConfig(location). Also make absolutely sure that you aren't corrupting binary files such as PDF files or image files by converting bytes to characters during reading/writing by incorrectly using a Reader/Writer instead of InputStream/OutputStream.
See also:
How to save uploaded file in JSF (JSF-targeted, but the principle is pretty much the same)
Simplest way to serve static data from outside the application server in a Java web application (in case you want to serve it back)
How to save generated file temporarily in servlet based web application
I post my final way of doing it based on the accepted answer:
#SuppressWarnings("serial")
#WebServlet("/")
#MultipartConfig
public final class DataCollectionServlet extends Controller {
private static final String UPLOAD_LOCATION_PROPERTY_KEY="upload.location";
private String uploadsDirName;
#Override
public void init() throws ServletException {
super.init();
uploadsDirName = property(UPLOAD_LOCATION_PROPERTY_KEY);
}
#Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
// ...
}
#Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
Collection<Part> parts = req.getParts();
for (Part part : parts) {
File save = new File(uploadsDirName, getFilename(part) + "_"
+ System.currentTimeMillis());
final String absolutePath = save.getAbsolutePath();
log.debug(absolutePath);
part.write(absolutePath);
sc.getRequestDispatcher(DATA_COLLECTION_JSP).forward(req, resp);
}
}
// helpers
private static String getFilename(Part part) {
// courtesy of BalusC : http://stackoverflow.com/a/2424824/281545
for (String cd : part.getHeader("content-disposition").split(";")) {
if (cd.trim().startsWith("filename")) {
String filename = cd.substring(cd.indexOf('=') + 1).trim()
.replace("\"", "");
return filename.substring(filename.lastIndexOf('/') + 1)
.substring(filename.lastIndexOf('\\') + 1); // MSIE fix.
}
}
return null;
}
}
where :
#SuppressWarnings("serial")
class Controller extends HttpServlet {
static final String DATA_COLLECTION_JSP="/WEB-INF/jsp/data_collection.jsp";
static ServletContext sc;
Logger log;
// private
// "/WEB-INF/app.properties" also works...
private static final String PROPERTIES_PATH = "WEB-INF/app.properties";
private Properties properties;
#Override
public void init() throws ServletException {
super.init();
// synchronize !
if (sc == null) sc = getServletContext();
log = LoggerFactory.getLogger(this.getClass());
try {
loadProperties();
} catch (IOException e) {
throw new RuntimeException("Can't load properties file", e);
}
}
private void loadProperties() throws IOException {
try(InputStream is= sc.getResourceAsStream(PROPERTIES_PATH)) {
if (is == null)
throw new RuntimeException("Can't locate properties file");
properties = new Properties();
properties.load(is);
}
}
String property(final String key) {
return properties.getProperty(key);
}
}
and the /WEB-INF/app.properties :
upload.location=C:/_/
HTH and if you find a bug let me know

Ubuntu 12.04 - Eclispe 3.8- hadoop-1.2.1- Input path does not exist

I did set up the hadoop Ubuntu OS, followed all the necessary steps, 1.created the hdfs file system 2.Moved the text files to input directory 3.having privilege to access all the directories. but when run the simple word count example, i got:
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class wordcount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.addResource(new Path("/HADOOP_HOME/conf/core-site.xml"));
conf.addResource(new Path("/HADOOP_HOME/conf/hdfs-site.xml"));
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setJarByClass(wordcount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
// FileInputFormat.addInputPath(job, new Path(args[0]));
// FileOutputFormat.setOutputPath(job, new Path(args[1]));
FileInputFormat.setInputPaths(job, new Path("/user/gabriele/input"));
FileOutputFormat.setOutputPath(job, new Path("/user/gabriele/output"));
job.waitForCompletion(true);
}
}
but, input path is valid (checked also from command line) and even can able view the files in that path from eclipse itself, so plz assist were i am wrong.
There was a solution that say to add the following 2 lines:
config.addResource(new Path("/HADOOP_HOME/conf/core-site.xml"));
config.addResource(new Path("/HADOOP_HOME/conf/hdfs-site.xml"));
But still does not work.
Here the errors: Run as -> run on hadoop
13/11/08 08:39:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/11/08 08:39:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/11/08 08:39:12 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/08 08:39:12 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-gabriele/mapred/staging/gabriele481581440/.staging/job_local481581440_0001
13/11/08 08:39:12 ERROR security.UserGroupInformation: PriviledgedActionException as:gabriele cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/gabriele/input
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/gabriele/input
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at wordcount.main(wordcount.java:74)
THanks
Unless your Hadoop installation really is rooted at /HADOOP_HOME, i suggest you change the following lines such that HADOOP_HOME is replaced to where your Hadoop is actually installed (/usr/lib/hadoop, /opt/hadoop or wherever you installed it):
conf.addResource(new Path("/usr/lib/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/usr/lib/hadoop/conf/hdfs-site.xml"));
Or in Eclipse, add the /usr/lib/hadoop/conf folder (or wherever you have installed hadoop) to the Build classpath).

How to programmatically new a java class which implements sepecified interface in eclipse plugin development

Friends,
Now we are developing a eclipse plugin, it contains a action to generated a service interface and it's impl stub.
Now the interface is generated, I want to use eclipse JDT to create a java class which implements sepecified interface, but don't know how.
The info we have:
the interface name, the impl class name, the packagename, the java project contains them.
Thanks in advance for your kindly help.
A quick scan of how the new class wizard does it, it seems that there is no public easy to use API for this. You can have a look at org.eclipse.jdt.ui.wizards.NewTypeWizardPage.createType(IProgressMonitor) method to see how JDT itself creates new classes.
It should be possible to extend the org.eclipse.jdt.ui.wizards.NewTypeWizardPage, so you can leverage the createType() method.
Probably the minimal steps you would have to do is simply generate source content into the correctly placed IFile. ex:
public Object execute(ExecutionEvent event) throws ExecutionException {
final String PACKAGE_PATH = "z.ex/src/z/ex/go";
final String CONTENT = "package z.ex.go;\n"
+ "public class RunAway {\npublic static void main(String[] args) {\n"
+ "System.out.println(\"Run Away\");\n}\n}\n";
final IWorkspaceRoot root = ResourcesPlugin.getWorkspace().getRoot();
final IResource packageResource = root.findMember(PACKAGE_PATH);
if (packageResource instanceof IFolder) {
IFolder packageFolder = (IFolder) packageResource;
final IFile file = packageFolder.getFile("RunAway.java");
try {
if (!file.exists()) {
file.create(new ByteArrayInputStream(CONTENT.getBytes()),
true, new NullProgressMonitor());
} else {
file.setContents(
new ByteArrayInputStream(CONTENT.getBytes()),
IFile.FORCE | IFile.KEEP_HISTORY,
new NullProgressMonitor());
}
} catch (CoreException e) {
e.printStackTrace();
}
}
return null;
}
See AbstractNewClassWizard for a smaller example that is similar to NewTypeWizardPage and uses some of the JDT APIs.
You can use the new class wizard to create classes.
This will prompt the user for the class name, et cetera. You can initialize the values of the wizard page. Below I am setting the source folder only (and tell the wizard that it cannot be changed, thus the second false parameter). You might want to set the interface and possible the package as well.
OpenNewClassWizardAction wizard = new OpenNewClassWizardAction();
wizard.setOpenEditorOnFinish(false);
NewClassWizardPage page = new NewClassWizardPage();
page.setPackageFragmentRoot(sourceFolder, false);
wizard.setConfiguredWizardPage(page);
wizard.run();
return (IType) wizard.getCreatedElement();
Hope that helps!
Create the entire java file using the JDT - AST. First create the AST and then write it to a java file. It might look as hefty work, but its the best one. You will have complete control.

Why am I having trouble accessing a .properties file in a standalone instance of tomcat but not in an eclipse-embedded instance?

I wrote a simple Hello World Servlet in Eclipse containing the following in the doGet method of my HelloWorldServlet.java
PrintWriter writer = response.getWriter();
String hello = PropertyLoader.bundle.getProperty("hello");
writer.append(hello);
writer.flush();
PropertyLoader is a simple class in the same package as the Servlet that does the following:
public class PropertyLoader {
public static final Properties bundle = new Properties();
static {
InputStream stream = null;
URL url = PropertyLoader.class.getResource("/helloSettings.properties");
stream = new FileInputStream(url.getFile());
bundle.load(stream);
}
}//End of class
I placed a file called helloSettings.properties in /WebContent/WEB-IND/classes that contains the following single line of content:
hello=Hello Settings World
When I add Tomcat 6.0 to my project and run it in eclipse it successfully prints
"Hello Settings World" to the web browser.
However when I export the project as a war file and manually place it in
.../Tomcat 6.0/webapps I then get "null" as my result.
Is it a problem with the classpath/classloader configuration? permissions? any of the other configuration files? I know for a fact that the helloSettings.properties file is in the WEB-INF/classes folder.
Any help?
Well, after much browsing I found what seems a "normal" why to do what I'm trying to do:
Instead of...(how I was doing it)
public class PropertyLoader {
public static final Properties bundle = new Properties();
static {
InputStream stream = null;
URL url = PropertyLoader.class.getResource("/helloSettings.properties");
stream = new FileInputStream(url.getFile());
bundle.load(stream);
}
}//End of class
THE FIX
public class PropertyLoader {
public static final Properties bundle = new Properties();
static {
InputStream stream = null;
stream = SBOConstants.class.getResourceAsStream("/sbonline.properties");
bundle.load(stream);
}
}//End of class
I'm modifiying someone else's code so I'm not sure why they did it the other way in the first place... but I guess url.getFile() was my problem and I don't know why.