Hadoop TotalOrderPartitioner - hadoop-partitioning

I am trying to use total order partioner in hadoop with following code:
job.setNumReduceTasks(4);
Path partitionFile = new Path(args[1]);
InputSampler.Sampler sampler = new InputSampler.RandomSampler(0.1,3,1)
TotalOrderPartitioner.setPartitionFile(job.getConfiguration(),partitionFile);
InputSampler.writePartitionFile(job, sampler);
job.setPartitionerClass(TotalOrderPartitioner.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
This code while runing throwing exception as follows:
Exception running child : java.lang.IllegalArgumentException: Can't read partitions file
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.io.FileNotFoundException: File file:/tmp/hadoop-sarang/nm-local-dir/usercache/sarang/appcache/application_1417956066584_0001/container_1417956066584_0001_01_000005/_partition.lst does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1749)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:301)
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88)
... 10 more

I also found this error when I was using hadoop mapreduce and the mapreduce service had not been installed and started. After installing mapreduce and starting it, the exception disappeared.

Related

Spark fails to read from Elasticsearch/Opensearch. Invalid map received dynamic_date_formats

Hi I'm trying using scala 2.11.12, spark 2.3.0 and elasticsearch-spark-20 7.7.0 to read from an OpenSearch 1.3.4 Index with the following code:
spark.read.format("org.elasticsearch.spark.sql")
.load("myIndex")
.filter('Timestamp === lit(dateToRead))
But I get this error
22/08/17 15:30:42 ERROR EventManager$: Unexpected error retrieving offsets. Bailing out...
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: invalid map received dynamic_date_formats=[yyyy-MM-dd HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS||yyyy-MM-dd||yyyy-MM-dd'T'HH||yyyy-MM-dd'T'HH:mm]
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseField(FieldParser.java:146)
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseMapping(FieldParser.java:88)
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseIndexMappings(FieldParser.java:69)
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseMappings(FieldParser.java:40)
at org.elasticsearch.hadoop.rest.RestClient.getMappings(RestClient.java:321)
at org.elasticsearch.hadoop.rest.RestClient.getMappings(RestClient.java:307)
at org.elasticsearch.hadoop.rest.RestRepository.getMappings(RestRepository.java:293)
at org.elasticsearch.spark.sql.SchemaUtils$.discoverMappingAndGeoFields(SchemaUtils.scala:103)
at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:91)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema$lzycompute(DefaultSource.scala:229)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema(DefaultSource.scala:229)
at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:233)
at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:233)
at scala.Option.getOrElse(Option.scala:121)
at org.elasticsearch.spark.sql.ElasticsearchRelation.schema(DefaultSource.scala:233)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:431)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at com.MyCalass$$anonfun$myMethod2$1.apply(MyCalass.scala:130)
at com.MyCalass$$anonfun$myMethod2$1.apply(MyCalass.scala:126)
at scala.util.Try$.apply(Try.scala:192)
at com.MyCalass$.myMethod2(MyCalass.scala:126)
at com.MyCalass$.myMethod(MyCalass.scala:55)
at com.MyApp$.MyApp$$myMethod(MyApp.scala:107)
at com.MyApp$$anonfun$main$2.apply(MyApp.scala:86)
at com.MyApp$$anonfun$main$2.apply(MyApp.scala:76)
at scala.Option.fold(Option.scala:158)
at com.MyApp$.main(MyApp.scala:76)
at com.MyApp.main(MyApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Command exiting with ret '1'
I've set the dynamic date mapping in opensearch. And also I am able to write to the index with the correct mapping, but when I try to read, it fails.
I found out the problem, basically The elasticsearch connector is not working properly and it tries to use ES 1.3.4 instead of Opensearch 1.3.4, to solve this problem add compatibility.override_main_response_version : true to your opensearch.yml file.

AWS GLUE: Cassandra connection using SSL is not working

I wanted to connect to Cassandra using Spark, when trying to connect Cassandra using the default port it is working, but when I try accessing it via SSL the job fails, below is the code:
val spark: SparkSession = SparkSession.builder()
.config("spark.cassandra.connection.host","server.abc")
.config("spark.cassandra.connection.port","9142")
.config("spark.cassandra.connection.ssl.enabled",true)
.config("spark.cassandra.connection.ssl.trustStore.path","s3:/dev-code/certs/trust.jks")
.config("spark.cassandra.connection.ssl.trustStore.password","mypass")
.config("spark.cassandra.auth.username","myuser")
.config("spark.cassandra.auth.password","userpass")
.appName("CassandraIntegration").getOrCreate()
FYI: it has access to the S3 bucket, I am able to read the CSV file from the same location. Also, both the ports are enabled 9042 and 9142. Closed 9042 and kept only 9142 port still the error persists.
Below is the error:
ERROR [main] glue.ProcessLauncher (Logging.scala:logError(94)): Exception in User Class
java.io.IOException: Failed to open native connection to Cassandra at {server.abc:9142} :: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
at com.datastax.spark.connector.cql.CassandraConnector$.createSession(CassandraConnector.scala:173)
at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$sessionCache$1(CassandraConnector.scala:161)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
at com.datastax.spark.connector.datasource.CassandraCatalog$.com$datastax$spark$connector$datasource$CassandraCatalog$$getMetadata(CassandraCatalog.scala:455)
at com.datastax.spark.connector.datasource.CassandraCatalog$.getTableMetaData(CassandraCatalog.scala:421)
at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:68)
at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:296)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:266)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
at MyCsvToCassandrsJob$.main(csv-to-cassanra-job:63)
at MyCsvToCassandrsJob.main(csv-to-cassanra-job-job)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke(ProcessLauncher.scala:47)
at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke$(ProcessLauncher.scala:47)
at com.amazonaws.services.glue.ProcessLauncher$$anon$1.invoke(ProcessLauncher.scala:75)
at com.amazonaws.services.glue.ProcessLauncher.launch(ProcessLauncher.scala:123)
at com.amazonaws.services.glue.ProcessLauncher$.main(ProcessLauncher.scala:29)
at com.amazonaws.services.glue.ProcessLauncher.main(ProcessLauncher.scala)
Caused by: java.lang.IllegalArgumentException: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:253)
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:108)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslEngineFactory(DefaultDriverContext.java:414)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.lambda$new$4(DefaultDriverContext.java:279)
at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslEngineFactory(DefaultDriverContext.java:733)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslHandlerFactory(DefaultDriverContext.java:470)
at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslHandlerFactory(DefaultDriverContext.java:799)
at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.init(DefaultSession.java:348)
at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.access$1100(DefaultSession.java:300)
at com.datastax.oss.driver.internal.core.session.DefaultSession.lambda$init$0(DefaultSession.java:146)
at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
at com.datastax.oss.driver.shaded.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
at com.datastax.oss.driver.shaded.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at com.datastax.oss.driver.shaded.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Cannot initialize SSL Context
at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:74)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:246)
... 18 more
Caused by: java.nio.file.NoSuchFileException: s3:/dev-code/certs/trust.jks
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.buildContext(DefaultSslEngineFactory.java:119)
at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:72)
... 23 more
Big help if there is any workaround for this problem.
At the bottom of your error message, I see this:
NoSuchFileException: s3:/dev-code/certs/trust.jks
Alex is right, in that you need to provide a path to that file that the Spark connector can actually get to. From the looks of it, S3 won't work here.
Added the .jks s3 file into "Referenced files path" of Glue Job and then just try to access just provide the file name. As the file will be automatically be placed under /tmp folder. But it will still not solve the issue.
From the this website, I understood that we need to provide all the default values as well:
Below is my final code:
val spark: SparkSession = SparkSession.builder()
.config("spark.cassandra.connection.host","server.abc")
.config("spark.cassandra.connection.port","9142")
.config("spark.cassandra.connection.ssl.enabled",true)
.config("spark.cassandra.connection.ssl.enabledAlgorithms", "TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA")
.config("spark.cassandra.connection.ssl.trustStore.path","trust.jks")
.config("spark.cassandra.connection.ssl.trustStore.password","mypass")
.config("spark.cassandra.connection.ssl.trustStore.type","JKS")
.config("spark.cassandra.connection.ssl.protocol","TLS")
.config("spark.cassandra.auth.username","myuser")
.config("spark.cassandra.auth.password","userpass")
.appName("CassandraIntegration").getOrCreate()

Problems with Kafka Source initialization in Siddhi

Can't create stream from Kafka topic using Siddhi. Even if I create string with Design View.
I copied all required jars to lib and bundle folders. Even started Kafka with Zookeeper locally (dunno why I need it locally but nwm).
On tooling.sh start I have following error:
[2020-02-26 22:15:43,041] WARNING {org.wso2.carbon.launcher.extensions.OSGiLibBundleDeployerUtils lambda$getBundlesInfo$1} - Error when loading the OSGi bundle information from /home/Hed/StreamProcessor/siddhi-tooling-5.1.2/lib/kafka-clients-2.3.0.jar
java.io.IOException: Required bundle manifest headers do not exist
at org.wso2.carbon.launcher.extensions.OSGiLibBundleDeployerUtils.getBundleInfo(OSGiLibBundleDeployerUtils.java:183)
at org.wso2.carbon.launcher.extensions.OSGiLibBundleDeployerUtils.lambda$getBundlesInfo$1(OSGiLibBundleDeployerUtils.java:135)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
at java.util.stream.StreamSpliterators$DistinctSpliterator.forEachRemaining(StreamSpliterators.java:1291)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721)
at java.util.stream.AbstractTask.compute(AbstractTask.java:327)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
For this script:
#App:name("HelloKafka")
#App:description('Consume events from a Kafka Topic and publish to a different Kafka Topic')
#source(type='kafka',
topic.list='kafka_topic',
partition.no.list='0',
threading.option='single.thread',
group.id="group",
bootstrap.servers='localhost:9092',
#map(type='json'))
define stream SweetProductionStream (name string, amount double);
I have see error on Run command:
io.siddhi.core.exception.SiddhiAppCreationException: Error on 'HelloKafka' # Line: 10. Position: 26, near '#source(type='kafka',
topic.list='kafka_topic',
partition.no.list='0',
threading.option='single.thread',
group.id="group",
bootstrap.servers='localhost:9092',
#map(type='json'))'. org/apache/kafka/clients/producer/Producer
at io.siddhi.core.util.ExceptionUtil.populateQueryContext(ExceptionUtil.java:43)
at io.siddhi.core.util.parser.helper.DefinitionParserHelper.addEventSource(DefinitionParserHelper.java:388)
at io.siddhi.core.util.SiddhiAppRuntimeBuilder.defineStream(SiddhiAppRuntimeBuilder.java:117)
at io.siddhi.core.util.parser.SiddhiAppParser.defineStreamDefinitions(SiddhiAppParser.java:374)
at io.siddhi.core.util.parser.SiddhiAppParser.parse(SiddhiAppParser.java:230)
at io.siddhi.core.SiddhiManager.createSiddhiAppRuntime(SiddhiManager.java:85)
at io.siddhi.core.SiddhiManager.createSiddhiAppRuntime(SiddhiManager.java:95)
at io.siddhi.distribution.editor.core.internal.DebugRuntime.createRuntime(DebugRuntime.java:201)
at io.siddhi.distribution.editor.core.internal.DebugRuntime.(DebugRuntime.java:56)
at io.siddhi.distribution.editor.core.internal.DebugProcessorService.start(DebugProcessorService.java:38)
at io.siddhi.distribution.editor.core.internal.EditorMicroservice.start(EditorMicroservice.java:761)
at io.siddhi.distribution.editor.core.internal.EditorMicroservice.startWithVariables(EditorMicroservice.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.wso2.msf4j.internal.router.HttpMethodInfo.invokeResource(HttpMethodInfo.java:187)
at org.wso2.msf4j.internal.router.HttpMethodInfo.invoke(HttpMethodInfo.java:143)
at org.wso2.msf4j.internal.MSF4JHttpConnectorListener.dispatchMethod(MSF4JHttpConnectorListener.java:218)
at org.wso2.msf4j.internal.MSF4JHttpConnectorListener.lambda$onMessage$58(MSF4JHttpConnectorListener.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/kafka/clients/producer/Producer
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at io.siddhi.core.util.SiddhiClassLoader.loadClass(SiddhiClassLoader.java:32)
at io.siddhi.core.util.SiddhiClassLoader.loadExtensionImplementation(SiddhiClassLoader.java:48)
at io.siddhi.core.util.parser.helper.DefinitionParserHelper.addEventSource(DefinitionParserHelper.java:346)
... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.clients.producer.Producer cannot be found by siddhi-io-kafka_5.0.7
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:448)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:361)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:353)
at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:161)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
... 28 more
Can somebody tell me what am I doing wrong? :(
Please make sure you had the OSGi-converted jars to the "C:\Program Files\WSO2\Enterprise Integrator\7.0.2\streaming-integrator\lib".
The OSGi-converted jar list:
kafka_2.12_2.3.0_1.0.0
kafka_clients_2.3.0_1.0.0
metrics_core_2.2.0_1.0.0
scala_library_2.12.8_1.0.0
zkclient_0.11_1.0.0
zookeeper_3.4.14_1.0.0
The, copy the original jars to to the "C:\Program Files\WSO2\Enterprise Integrator\7.0.2\streaming-integrator\samples\sample-clients\lib"
The list of original jars:
kafka_2.12-2.3.0
kafka-clients-2.3.0
metrics-core-2.2.0
scala-library-2.12.8
zkclient-0.11
zookeeper-3.4.14
In order to generate the OSGi-converted jars, copy all original jars to a folder called "source" and create an empty folder called "destination". Then run the following command in the terminal:
MINGW32 /c/Program Files/WSO2/Enterprise Integrator/7.0.2/streaming-integrator/bin
$ ./jartobundle.sh C:/DevTools/source C:/DevTools/destination
Finally, distribute the OSGis and original in accordance with the directories above.
PS1: in my case i am using kafka_2.12-2.4.1, but the basename of the jars does not change.
PS2: adapt the directories to your installation path
For more details check WSO2 documentation: Kafka transport

MongoDB hadoop connector fails to query on mongo hive table

I am using MongoDB hadoop connector to query mongoDB using hive table in hadoop.
I am able to execute
select * from mongoDBTestHiveTable;
But when I try to execute following query
select id from mongoDBTestHiveTable;
it throws following exception.
Following class exist in hive lib folder.
Exception stacktrace:
Diagnostic Messages for this Task:
Error: java.io.IOException: Cannot create an instance of InputSplit class = com.mongodb.hadoop.hive.input.HiveMongoInputFormat$MongoHiveInputSplit:Class com.mongodb.hadoop.hive.input.HiveMongoInputFormat$MongoHiveInputSplit not found
at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:147)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:370)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.hive.input.HiveMongoInputFormat$MongoHiveInputSplit not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:144)
... 10 more
Container killed by the ApplicationMaster.
Please advice.
You also need to add mongo-hadoop-* and well as mongo driver jars to MR1/MR2 classpath on all workers

JBoss- JMS - Failed to download and/or install client side AOP stack

When tried to build a JMS Client to get a connection to messaging on JBoss5(default configuration), I encounter this error(at line: Connection conn = qcf.createQueueConnection();). This is maven project with following libraries in classpath.
jboss:jnp-client:jar:4.0.2:compile
jboss:jboss-aop:jar:JBOSSAS-5.1:compile
jboss:jboss-messaging-client:jar:1.4.7.GA:compile
jboss:jbossall-client:jar:JBOSSAS-5.1:compile
jboss:jboss-common-core:jar:JBOSSAS-5.1:compile
jboss:jboss-mdr:jar:JBOSSAS-5.1:compile
jboss:jboss-logging-spi:jar:JBOSSAS-5.1:compile
org.jboss.remoting:jboss-remoting:jar:2.5.3.SP1:compile
For a very simple code this did not make sense. Any help is appreciated.
My code is as following:
Hashtable env = new Hashtable();
env.put(Context.INITIAL_CONTEXT_FACTORY, "org.jnp.interfaces.NamingContextFactory");
env.put(Context.PROVIDER_URL, "jnp://localhost:1099");
env.put(Context.OBJECT_FACTORIES, "ConnectionFactory");
env.put(Context.URL_PKG_PREFIXES, "org.jboss.naming:org.jnp.interfaces");
InitialContext iniCtx = new InitialContext(env);
Object tmp = iniCtx.lookup("java:/XAConnectionFactory");
QueueConnectionFactory qcf = (QueueConnectionFactory) tmp;
Connection conn = qcf.createQueueConnection();
The error I was getting is
Exception in thread "main" java.lang.RuntimeException: Failed to download and/or install client side AOP stack
at org.jboss.jms.client.JBossConnectionFactory.createConnectionInternal(JBossConnectionFactory.java:199)
at org.jboss.jms.client.JBossConnectionFactory.createQueueConnection(JBossConnectionFactory.java:101)
at org.jboss.jms.client.JBossConnectionFactory.createQueueConnection(JBossConnectionFactory.java:95)
at com.test.JMSExample.main(JMSExample.java:120)
Caused by: org.jboss.jms.exception.MessagingNetworkFailureException: Failed to connect client
at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate.createClient(ClientConnectionFactoryDelegate.java:347)
at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate.org$jboss$jms$client$delegate$ClientConnectionFactoryDelegate$getClientAOPStack$aop(ClientConnectionFactoryDelegate.java:246)
at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate.getClientAOPStack(ClientConnectionFactoryDelegate.java)
at org.jboss.jms.client.ClientAOPStackLoader.load(ClientAOPStackLoader.java:75)
at org.jboss.jms.client.JBossConnectionFactory.createConnectionInternal(JBossConnectionFactory.java:192)
... 3 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.jboss.remoting.InvokerRegistry.loadClientInvoker(InvokerRegistry.java:460)
at org.jboss.remoting.InvokerRegistry.createClientInvoker(InvokerRegistry.java:359)
at org.jboss.remoting.Client$6.run(Client.java:724)
at java.security.AccessController.doPrivileged(Native Method)
at org.jboss.remoting.Client.connect(Client.java:720)
at org.jboss.remoting.Client.connect(Client.java:668)
at org.jboss.jms.client.delegate.ClientConnectionFactoryDelegate.createClient(ClientConnectionFactoryDelegate.java:343)
... 7 more
Caused by: java.lang.NoSuchMethodError: org.jboss.util.propertyeditor.PropertyEditors.mapJavaBeanProperties(Ljava/lang/Object;Ljava/util/Properties;Z)V
at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.mapJavaBeanProperties(MicroSocketClientInvoker.java:1359)
at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.setup(MicroSocketClientInvoker.java:533)
at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.<init>(MicroSocketClientInvoker.java:292)
at org.jboss.remoting.transport.socket.SocketClientInvoker.<init>(SocketClientInvoker.java:78)
at org.jboss.remoting.transport.bisocket.BisocketClientInvoker.<init>(BisocketClientInvoker.java:166)
at org.jboss.remoting.transport.bisocket.TransportClientFactory.createClientInvoker(TransportClientFactory.java:44)
... 18 more
It looks like there is a mismatch between JAR files or connectivity problem. Try to do following steps:
1) set -verbose:class option of JVM running JBoss AS and examine the output to find where MicroSocketClientInvoker.class comes from, it looks like JBoss couldn't find a method from it.
2) check if port 4457 is opened, because JBoss messaging connector uses default serverBindPort on 4457.
Hope it helps.
I had the same issue. The issue can be caused by several underlying root causes. In order to evaluate your specific root cause you need to look at the exception stack trace all the way down the "Caused by" exception the to root exception.
In my case the problem was caused by a truststore certificate file that had been corrupted due to incorrectly filtering it during processResources taks of my gradle project. Binary files get corrupted when they are filtered during processResources. For me the fix was to exclude the my certificate.truststore file from resource filtering.