Not running Beam job written on Java on Portable Flink runner - apache-beam

When I try to run the simplest PortableRunner pipeline on Java, I get the error:
Exception in thread "main" java.lang.RuntimeException: The Runner experienced the following error during execution:
java.io.IOException: error=2, No such file or directory
at org.apache.beam.runners.portability.JobServicePipelineResult.propagateErrors(JobServicePipelineResult.java:176)
at org.apache.beam.runners.portability.JobServicePipelineResult.waitUntilFinish(JobServicePipelineResult.java:117)
at org.apache.beam.examples.PythonExternal.runWordCount(PythonExternal.java:74)
at org.apache.beam.examples.PythonExternal.main(PythonExternal.java:81)
However, I do not read any files anywhere in the pipeline or I do not sink to anywhere. My arguments are
--runner=PortableRunner --jobEndpoint=localhost:8099
and code
package org.apache.beam.examples;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.options.*;
import org.apache.beam.sdk.schemas.Schema;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.values.Row;
import java.util.Arrays;
import java.util.List;
public class Simplest {
public static final Schema SCHEMA =
Schema.of(
Schema.Field.of("sentence", Schema.FieldType.STRING),
Schema.Field.of("count", Schema.FieldType.INT32));
public interface WordCountOptions extends PipelineOptions {
/** Set this option to specify Python expansion service URL. */
#Description("URL of Python expansion service")
String getExpansionService();
void setExpansionService(String value);
}
static void runWordCount(WordCountOptions options){
final List<String> LINES = Arrays.asList(
"To be, or not to be: that is the question: ",
"Whether 'tis nobler in the mind to suffer ",
"The slings and arrows of outrageous fortune, ",
"Or to take arms against a sea of troubles, ");
Pipeline p = Pipeline.create(options);
p.apply("ReadLines", Create.of(LINES)).setCoder(StringUtf8Coder.of())
.apply(ParDo
.of(new DoFn<String, Row>() {
#ProcessElement
public void processElement(#Element String element, OutputReceiver<Row> out, ProcessContext c) {
// In our DoFn, access the side input.
out.output(Row.withSchema(SCHEMA)
.withFieldValue("sentence", element)
.withFieldValue("count", 1)
.build());
}
}))
.setRowSchema(SCHEMA);
p.run().waitUntilFinish();
}
public static void main(String[] args) {
WordCountOptions options =
PipelineOptionsFactory.fromArgs(args).withValidation().as(WordCountOptions.class);
runWordCount(options);
}
}
Although the manual suggests adding --environment_type=LOOPBACK, I also get error as
Class interface org.apache.beam.examples.PythonExternal$WordCountOptions missing a property named 'environment_type'.
Beam Version: 2.40.0
I started the JobService endpoint docker run --net=host apache/beam_flink1.14_job_server:latest as suggested.
Any suggestions?

Related

hadoop mapreduce wordcount program

I tried to run a word count program in eclipse. in the browser the output directories are being created but i am not getting any out put. I am getting the following error. plz help me out. please. I am struct at the last point.
I gave the following command to execute the program.
hduser#niki-B85M-D3H:/home/niki/workspace$ hadoop jar wordcount.jar WordCount /user/hadoop/dir1/file.txt wordcountoutput
The output file named wordcountoutput is created but the error is displayed as follows.
I tried to run a word count program in eclipse. in the browser the output directories are being created but i am not getting any out put. I am getting the following error. plz help me out. please. I am struck at the last point.
I gave the following command to execute the program.
hduser#niki-B85M-D3H:/home/niki/workspace$ hadoop jar wrd.jar WordCount /user/hduser/outputwc /user/hduser/outputwc/
The output file named wordcountoutput is created but the error is displayed as follows.
5/12/24 15:15:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at WordCount.run(WordCount.java:29)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at WordCount.main(WordCount.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
this is the exception i am getting now at the final step
WordCount.java
import java.io.IOException;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;
public class WordCount extends Configured implements Tool{
#Override
public int run(String[] args) throws Exception {
if(args.length<2){
System.out.println("plz give input and output directories");
return -1;
}
JobConf conf = new JobConf();
//conf.setJarByClass(WordCount.class);
conf.setJar("wrd.jar");
FileInputFormat.setInputPaths(conf,new Path(args[1]));
FileOutputFormat.setOutputPath(conf,new Path(args[2]));
conf.setMapperClass(WordMapper.class);
conf.setReducerClass(WordReducer.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
public static void main(String args[]) throws Exception{
int exitCode =ToolRunner.run(new WordCount(), args);
System.exit(exitCode);
}
}
Word Reducer.java
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WordReducer extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable>{
#Override
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter r)
throws IOException {
int count=0;
while(values.hasNext()){
IntWritable i= values.next();
count+=i.get();
}
output.collect(key,new IntWritable(count));
}
}
WordMapper.java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WordMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
#Override
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter r)
throws IOException {
String s = value.toString();
for(String word:s.split(" ")){
if(word.length()>0){
output.collect(new Text(word), new IntWritable(1));
}
}
}
}
Try this out:
public static void main(String args[]) throws Exception{
int exitCode =ToolRunner.run(new Configuration(), new WordCount(), args);
System.exit(exitCode);}
according to this

Create Scalding Source like TextLine that combines multiple files into single mappers

We have many small files that need combining. In Scalding you can use TextLine to read files as text lines. The problem is we get 1 mapper per file, but we want to combine multiple files so that they are processed by 1 mapper.
I understand we need to change the input format to an implementation of CombineFileInputFormat, and this may involve using cascadings CombinedHfs. We cannot work out how to do this, but it should be just a handful of lines of code to define our own Scalding source called, say, CombineTextLine.
Many thanks to anyone who can provide the code to do this.
As a side question, we have some data that is in s3, it would be great if the solution given works for s3 files - I guess it depends on whether CombineFileInputFormat or CombinedHfs works for s3.
You get the idea in your question, so here is what possibly is a solution for you.
Create your own input format that extends the CombineFileInputFormat and uses your own custom RecordReader. I am showing you Java code, but you could easily convert it to scala if you want.
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.LineRecordReader;
import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.lib.CombineFileInputFormat;
import org.apache.hadoop.mapred.lib.CombineFileRecordReader;
import org.apache.hadoop.mapred.lib.CombineFileSplit;
public class CombinedInputFormat<K, V> extends CombineFileInputFormat<K, V> {
public static class MyKeyValueLineRecordReader implements RecordReader<LongWritable,Text> {
private final RecordReader<LongWritable,Text> delegate;
public MyKeyValueLineRecordReader(CombineFileSplit split, Configuration conf, Reporter reporter, Integer idx) throws IOException {
FileSplit fileSplit = new FileSplit(split.getPath(idx), split.getOffset(idx), split.getLength(idx), split.getLocations());
delegate = new LineRecordReader(conf, fileSplit);
}
#Override
public boolean next(LongWritable key, Text value) throws IOException {
return delegate.next(key, value);
}
#Override
public LongWritable createKey() {
return delegate.createKey();
}
#Override
public Text createValue() {
return delegate.createValue();
}
#Override
public long getPos() throws IOException {
return delegate.getPos();
}
#Override
public void close() throws IOException {
delegate.close();
}
#Override
public float getProgress() throws IOException {
return delegate.getProgress();
}
}
#Override
public RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException {
return new CombineFileRecordReader(job, (CombineFileSplit) split, reporter, (Class) MyKeyValueLineRecordReader.class);
}
}
Then you need to extend the TextLine class and make it use your own input format you just defined (Scala code from now on).
import cascading.scheme.hadoop.TextLine
import cascading.flow.FlowProcess
import org.apache.hadoop.mapred.{OutputCollector, RecordReader, JobConf}
import cascading.tap.Tap
import com.twitter.scalding.{FixedPathSource, TextLineScheme}
import cascading.scheme.Scheme
class CombineFileTextLine extends TextLine{
override def sourceConfInit(flowProcess: FlowProcess[JobConf], tap: Tap[JobConf, RecordReader[_, _], OutputCollector[_, _]], conf: JobConf) {
super.sourceConfInit(flowProcess, tap, conf)
conf.setInputFormat(classOf[CombinedInputFormat[String, String]])
}
}
Create a scheme for the for your combined input.
trait CombineFileTextLineScheme extends TextLineScheme{
override def hdfsScheme = new CombineFileTextLine().asInstanceOf[Scheme[JobConf,RecordReader[_,_],OutputCollector[_,_],_,_]]
}
Finally, create your source class:
case class CombineFileMultipleTextLine(p : String*) extends FixedPathSource(p :_*) with CombineFileTextLineScheme
If you want to use a single path instead of multiple ones, the change to your source class is trivial.
I hope that helps.
this should do the trick, ya man? - https://wiki.apache.org/hadoop/HowManyMapsAndReduces

Error in Eclipse when compiling code but not when is validated

I have Eclipse Java EE IDE Helios Service Release 2 Build id: 20110218-0911; I am trying use them with SeleniumRC and when I compile the code appears the next error message:
Type mismatch: cannot convert from DataProviderSites to annotation
The attribute DataProviderSites is undefined for the annotation type Test
However, when I validate the code, is visualized "the validation completed with no errors or warnings"
The Code in DataProviderSites.java is:
package script;
import org.junit.Test;
import junit.framework.TestCase;
import com.thoughtworks.selenium.SeleneseTestBase;
import org.junit.AfterClass;
import org.openqa.selenium.server.SeleniumServer;
import org.testng.annotations.*;
import java.io.File;
import jxl.*;
public class DataProviderSites extends SeleneseTestBase {
#BeforeClass
public void setUp() throws Exception {
SeleniumServer seleniumserver=new SeleniumServer();
seleniumserver.boot();
seleniumserver.start();
setUp("http://www.examinator.ws/", "*firefox");
selenium.open("/");
selenium.windowMaximize();
selenium.windowFocus();
}
#DataProviderSites
(name = "DPS1")
public Object[][] createData1() throws Exception{
Object[][] retObjArr=getTableArray("test\\Resources\\Data\\sitios.xls",
"DataPool", "TestData");
return(retObjArr);
}
#Test(DataProviderSites = "DPS1")
public void testDataProviderSites(String nombre) throws Exception {
selenium.type("sitio", nombre);
if (selenium.isTextPresent("examinator"))
selenium.click("xpath=/descendant::button[#type='submit']");
else
selenium.waitForPageToLoad("30000");
selenium.click("xpath=/descendant::a[text()='"+nombre+"']");
}
#AfterClass
public void tearDown(){
selenium.close();
selenium.stop();
}
public String[][] getTableArray(String xlFilePath, String sheetName, String tableName) throws Exception{
String[][] tabArray=null;
Workbook workbook = Workbook.getWorkbook(new File(xlFilePath));
Sheet sheet = workbook.getSheet(sheetName);
int startRow,startCol, endRow, endCol,ci,cj;
Cell tableStart=sheet.findCell(tableName);
startRow=tableStart.getRow();
startCol=tableStart.getColumn();
Cell tableEnd= sheet.findCell(tableName, startCol+1,startRow+1, 100, 64000, false);
endRow=tableEnd.getRow();
endCol=tableEnd.getColumn();
System.out.println("startRow="+startRow+", endRow="+endRow+", " +
"startCol="+startCol+", endCol="+endCol);
tabArray=new String[endRow-startRow-1][endCol-startCol-1];
ci=0;
for (int i=startRow+1;i<endRow;i++,ci++){
cj=0;
for (int j=startCol+1;j<endCol;j++,cj++){
tabArray[ci][cj]=sheet.getCell(j,i).getContents();
}
}
return(tabArray);
}
}
Anybody have a idea to resolve this?
Just click the error and click "Convert to TestNG(Annotations)".
Validation does not check the logic like it does not check for syntax of code.
But when we compile the program it will check for syntax so you got type cast exception.
Validation success does not mean program is correct.
I think annotations(#Test(DataProviderSites = "DPS1"), #DataProviderSites
(name = "DPS1")) causes the exception.

Reading xls file in gwt

I am looking to read xls file using the gwt RPC and when I am using the code which excecuted fine in normal file it is unable to load the file and giving me null pointer exception.
Following is the code
{
{
import com.arosys.readExcel.ReadXLSX;
import com.google.gwt.user.server.rpc.RemoteServiceServlet;
import org.Preview.client.GWTReadXL;
import java.io.InputStream;
import com.arosys.customexception.FileNotFoundException;
import com.arosys.logger.LoggerFactory;
import java.util.Iterator;
import org.apache.log4j.Logger;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
/**
*
* #author Amandeep
*/
public class GWTReadXLImpl extends RemoteServiceServlet implements GWTReadXL
{
private String fileName;
private String[] Header=null;
private String[] RowData=null;
private int sheetindex;
private String sheetname;
private XSSFWorkbook workbook;
private XSSFSheet sheet;
private static Logger logger=null;
public void loadXlsxFile() throws Exception
{
logger.info("inside loadxlsxfile:::"+fileName);
InputStream resourceAsStream =ClassLoader.getSystemClassLoader().getSystemResourceAsStream("c:\\test2.xlsx");
logger.info("resourceAsStream-"+resourceAsStream);
if(resourceAsStream==null)
throw new FileNotFoundException("unable to locate give file");
else
{
try
{
workbook = new XSSFWorkbook(resourceAsStream);
sheet = workbook.getSheetAt(sheetindex);
}
catch (Exception ex)
{
logger.error(ex.getMessage());
}
}
}// end loadxlsxFile
public String getNumberOfColumns() throws Exception
{
int NO_OF_Column=0; XSSFCell cell = null;
loadXlsxFile();
Iterator rowIter = sheet.rowIterator();
XSSFRow firstRow = (XSSFRow) rowIter.next();
Iterator cellIter = firstRow.cellIterator();
while(cellIter.hasNext())
{
cell = (XSSFCell) cellIter.next();
NO_OF_Column++;
}
return NO_OF_Column+"";
}
}
}
I am calling it in client program by this code:
final AsyncCallback<String> callback1 = new AsyncCallback<String>() {
public void onSuccess(String result) {
RootPanel.get().add(new Label("In success"));
if(result==null)
{
RootPanel.get().add(new Label("result is null"));
}
RootPanel.get().add(new Label("result is"+result));
}
public void onFailure(Throwable caught) {
RootPanel.get().add(new Label("In Failure"+caught));
}
};
try{
getService().getNumberOfColumns(callback1);
}catch(Exception e){}
}
Pls tell me how can I resolve this issue as the code runs fine when run through the normal java file.
Why are using using the system classloader, rather than the normal one?
But, If you still want to use then look at this..
As you are using like a web application. In that case, you need to use the ClassLoader which is obtained as follows:
ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
This one has access to the all classpath paths tied to the webapplication in question and you're not anymore dependent on which parent classloader (a webapp has more than one!) has loaded your class.
Then, on this classloader, you need to just call getResourceAsStream() to get a classpath resource as stream, not the getSystemResourceAsStream() which is dependent on how the webapplication is started. You don't want to be dependent on that as well since you have no control over it at external hosting:
InputStream input = classLoader.getResourceAsStream("filename.extension");
The location of file should in your CLASSPATH.

Unable to find the org.drools.builder.KnowledgeType drrols class

While am trying to execute the Helloword process example from the section 2.3 in
https://hudson.jboss.org/hudson/job/drools/lastSuccessfulBuild/artifact/trunk/target/docs/drools-flow/html_single/index.html#d4e24 site am unable to find the below mentioned class.
org.drools.builder.KnowledgeType
Could anyone please tell from which package can i get this class?
Thanks!
That part of the documentation seems a little outdated. You should use ResourceType. I've updated the docs with the following code fragment instead (should also appear on the link you're using once the build succeeds):
package com.sample;
import org.drools.KnowledgeBase;
import org.drools.builder.KnowledgeBuilder;
import org.drools.builder.KnowledgeBuilderFactory;
import org.drools.builder.ResourceType;
import org.drools.io.ResourceFactory;
import org.drools.logger.KnowledgeRuntimeLogger;
import org.drools.logger.KnowledgeRuntimeLoggerFactory;
import org.drools.runtime.StatefulKnowledgeSession;
/**
* This is a sample file to launch a process.
*/
public class ProcessTest {
public static final void main(String[] args) {
try {
// load up the knowledge base
KnowledgeBase kbase = readKnowledgeBase();
StatefulKnowledgeSession ksession = kbase.newStatefulKnowledgeSession();
KnowledgeRuntimeLogger logger = KnowledgeRuntimeLoggerFactory.newFileLogger(ksession, "test");
// start a new process instance
ksession.startProcess("com.sample.ruleflow");
logger.close();
} catch (Throwable t) {
t.printStackTrace();
}
}
private static KnowledgeBase readKnowledgeBase() throws Exception {
KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();
kbuilder.add(ResourceFactory.newClassPathResource("ruleflow.rf"), ResourceType.DRF);
return kbuilder.newKnowledgeBase();
}
}