Compilation Error while writing Hopping Window in Kafka Streams(Confluent 4.0.0) - apache-kafka

I am trying to write aggregation operation on time-windows in Confluent open source 4.0.0 version as below.
KTable<Windowed<String>, aggrTest> testWinAlerts =
testRecords.groupByKey()
.windowedBy(TimeWindows.of(TimeUnit.SECONDS.toMillis(120))
.advanceBy(TimeUnit.SECONDS.toMillis(1)))
.aggregate(
new aggrTestInitilizer(),
new minMaxCalculator(),
Materialized.<String, aggrTest, WindowStore<Bytes, byte[]>>
as("queryable-store-name")
.withValueSerde(aggrMessageSerde)
.withKeySerde(Serdes.String()));
But above code gives error while compilation as below
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
The method aggregate(Initializer<VR>, Aggregator<? super String,? super TestFields,VR>, Materialized<String,VR,WindowStore<Bytes,byte[]>>) in the type TimeWindowedKStream<String,TestFields> is not applicable for the arguments (aggrTestInitilizer, minMaxCalculator, Materialized<String,aggrTest,WindowStore<Bytes,byte[]>>)
Same code when I write in 3.3.1 version as below, it does not give any error
KTable<Windowed<String>, aggrTest> testWinAlerts =
testRecords.groupByKey()
.aggregate(
new aggrTestInitilizer(),
new minMaxCalculator(),
TimeWindows.of(TimeUnit.SECONDS.toMillis(120))
.advanceBy(TimeUnit.SECONDS.toMillis(1)),
aggrMessageSerde,
"aggr-test");
What might be issue? Also aggrTestInitilizer, minMaxCalculator, aggrMessageSerde used in all cases are same.

Related

createOrReplaceTempView is not a member of org.apache.spark.rdd.RDD

I am using hadoop 2.7.2 , hbase 1.4.9, spark 2.2.0, scala 2.11.8 and java 1.8 .
I run this command whithout having any error:
val Patterns_fromHbase = mimic_PatternsFromHbase.mapPartitions(f=> f.map(row1 => (Bytes.toString(row1._2.getRow), Bytes.toString(row1._2.getValue(Bytes.toBytes("sepsiscategories"),Bytes.toBytes("subject_id")))))).toDF("id","subject_id")
Then I run this command :
mimic_PatternsFromHbase.createOrReplaceTempView("subject_id_table")
and I just have this error:
:57: error: value createOrReplaceTempView is not a member of
org.apache.spark.rdd.RDD[(org.apache.hadoop.hbase.io.ImmutableBytesWritable,
org.apache.hadoop.hbase.client.Result)]
mimic_PatternsFromHbase.createOrReplaceTempView("subject_id_table")
what is the cause of this error and how to fix it please
I found my fault, it is a fault of bad attention.
in the place to call createOrReplaceView method by calling them by Patterns_fromHbase I called her by mimic_PatternsFromHbase

Assertion Error while using Scala future with Spark

I have been using spark code inside scala future.
I am facing string error like Assertion Failed <None>.
This comes encapsulated on Boxed error.
Each time stack trace starts with sqlContext.udf.register line.
When I put synchonized block around udf register statement, Error goes away.
Scala version - 2.10.8
Spark - 1.6.0

The initialization of the DataSource's outputs caused an error: The UDF class is not a proper subclass

I have this issue
The initialization of the DataSource's outputs caused an error: The UDF class is not a proper subclass of org.apache.flink.api.common.functions.MapFunction
generated by this code:
val probes: DataSet[Probe] = env.createInput[InputProbe](new ProbesInputFormat).map { i =>
new Probe(
i.rssi,
0,
i.macHash,
i.deviceId,
0,
i.timeStamp)
}
I'm using scala 2.11 on flink 1.4.0 with IDEA.
On Dev machine i have no issue and the job runs properly, while on a Flink Standalone Cluster of 3 nodes i encountered the above error.
Can you help me please ;(
UPDATE:
I resolved implementing a class that extends from RichMapFunction, i don't know why but seems that lambda function => are not supported properly.
Now i have a new issue:
java.lang.ClassCastException: hk.huko.aps2.entities.Registry cannot be cast to scala.Product
Should i open a new POST?
I resolved the issue. It happened because flink load my job JAR many times (classloader) and in somehow it produced that error.
The solution is to not creare a JAR including all external JARs dependencies, but to copy into flink/lib folder those libraries plus your job JAR.

How to catch an exception that occurred on a spark worker?

val HTF = new HashingTF(50000)
val Tf = Case.map(row=>
HTF.transform(row)
).cache()
val Idf = new IDF().fit(Tf)
try
{
Idf.transform(Tf).map(x=>LabeledPoint(1,x))
}
catch {
case ex:Throwable=>
println(ex.getMessage)
}
Code like this isn't working.
HashingTF/Idf belongs to org.spark.mllib.feature.
I'm still getting an exception that says
org.apache.spark.SparkException: Failed to get broadcast_5_piece0 of broadcast_5
I cannot see any of my files in the error log, how do I debug this?
It seems that the worker ran out of memory.
Instant temporary Fix:
Run the application without caching.
Just remove .cache()
How to Debug:
Probably Spark UI might have the complete exception details.
check Stage details
check the logs and thread dump in Executor tab
If you find multiple exceptions or errors try to resolve it in sequence.
Most of the times resolving 1st error will resolve subsequent errors.

Spark with MongoDB error

I'm learning to use Spark with MongoDB, but I've encountered a problem that I think is related to the way I use Spark., because it doesn't make any sense to me.
My concept test is that I want to filter a collection containing about 800K documents by a certain field.
My code is very simple. Connect to my MongoDB, apply a filter transformation and then count the elements:
JavaSparkContext sc = new JavaSparkContext("local[2]", "Spark Test");
Configuration config = new Configuration();
config.set("mongo.input.uri", "mongodb://127.0.0.1:27017/myDB.myCollection");
JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config, com.mongodb.hadoop.MongoInputFormat.class, Object.class, BSONObject.class);
long numberOfFilteredElements = mongoRDD.filter(myCollectionDocument -> myCollectionDocument._2().get("site").equals("marfeel.com")).count();
System.out.format("Filtered collection size: %d%n", numberOfFilteredElements);
When I execute this code, the Mongo driver splits my collection into 2810 partitions, so equal number of tasks start to process.
About the task number 1000, I get the following error message:
ERROR Executor: Exception in task 990.0 in stage 0.0 (TID 990) java.lang.OutOfMemoryError: unable to create new native thread
I've searched a lot about this error, but it doesn't make any sense to me. I came up the conclusion that I have a problem with my code, that I have some library versions incompatibilities or that my real problem is that I'm getting the whole Spark concept wrong, and that the code above doesn't make any sense at all.
I'm using the following library versions:
org.apache.spark.spark-core_2.11 -> 1.2.0
org.apache.hadoop.hadoop-client -> 2.4.1
org.mongodb.mongo-hadoop.mongo-hadoop-core -> 1.3.1
org.mongodb.mongo-java-driver -> 2.13.0-rc1