The initialization of the DataSource's outputs caused an error: The UDF class is not a proper subclass - scala

I have this issue
The initialization of the DataSource's outputs caused an error: The UDF class is not a proper subclass of org.apache.flink.api.common.functions.MapFunction
generated by this code:
val probes: DataSet[Probe] = env.createInput[InputProbe](new ProbesInputFormat).map { i =>
new Probe(
i.rssi,
0,
i.macHash,
i.deviceId,
0,
i.timeStamp)
}
I'm using scala 2.11 on flink 1.4.0 with IDEA.
On Dev machine i have no issue and the job runs properly, while on a Flink Standalone Cluster of 3 nodes i encountered the above error.
Can you help me please ;(
UPDATE:
I resolved implementing a class that extends from RichMapFunction, i don't know why but seems that lambda function => are not supported properly.
Now i have a new issue:
java.lang.ClassCastException: hk.huko.aps2.entities.Registry cannot be cast to scala.Product
Should i open a new POST?

I resolved the issue. It happened because flink load my job JAR many times (classloader) and in somehow it produced that error.
The solution is to not creare a JAR including all external JARs dependencies, but to copy into flink/lib folder those libraries plus your job JAR.

Related

I get 'Task not serializable' when I try to run the John Snow spark-nlp example in Scala

I have been trying to run the John Snow Spark-NLP example from this repository:
https://github.com/JohnSnowLabs/spark-nlp/blob/master/example/src/TrainViveknSentiment.scala
on my local machine. But it throws the org.apache.spark.SparkException: Task not serializable error when it arrives on val sparkPipeline = pipeline.fit(training) in the stack it also says Caused by: java.io.NotSerializableException: com.johnsnowlabs.nlp.annotators.param.AnnotatorParam$SerializableFormat$
I might be wrong, but as far as I could research, Seq is probably a non-serializable trait. So using an Array or List instead to build the immutable variable training should solve the issue.

Scala Spark : (org.apache.spark.repl.ExecutorClassLoader) Failed to check existence of class org on REPL class server at path

Running basic df.show() post spark notebook installation
I am getting the following error when running scala - spark code on spark-notebook. Any idea when this occurs and how to avoid?
[org.apache.spark.repl.ExecutorClassLoader] Failed to check existence of class org.apache.spark.sql.catalyst.expressions.Object on REPL class server at spark://192.168.10.194:50935/classes
[org.apache.spark.util.Utils] Aborting task
[org.apache.spark.repl.ExecutorClassLoader] Failed to check existence of class org on REPL class server at spark://192.168.10.194:50935/classes
[org.apache.spark.util.Utils] Aborting task
[org.apache.spark.repl.ExecutorClassLoader] Failed to check existence of class
I installed the spark on local, and when I was using following code it was giving me the same error.
spark.read.format("json").load("Downloads/test.json")
I think the issue was, it was trying to find some master node and taking some random or default IP. I specified the mode and then provided the IP as 127.0.0.1 and it resolved my issue.
Solution
Run the spark using local master
usr/local/bin/spark-shell --master "local[4]" --conf spark.driver.host=127.0.0.1'

Starting KsqlRestApplication form scala and getting NoSuchMethodError org.apache.kafka.streams.StreamsConfig.getConsumerConfigs

I am trying to write a program that enables me to run predefined KSQL operations on Kafka topics in Scala, but I don't want to open the KSQL Cli everytime. Therefore I want to start the KSQL "Server" from within my Scala program. If I understand the KSQL source code right, I have to build and start a KsqlRestApplication:
def restServer = KsqlRestApplication.buildApplication(new
KsqlRestConfig(defaultServerProperties), true, new VersionCheckerAgent
{override def start(ksqlModuleType: KsqlModuleType, properties:
Properties): Unit = ???})
But when I try doing that, I get the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.kafka.streams.StreamsConfig.getConsumerConfigs(Ljava/lang/String;Ljava/lang/String;)Ljava/util/Map;
at io.confluent.ksql.rest.server.BrokerCompatibilityCheck.create(BrokerCompatibilityCheck.java:62)
at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:241)
I looked into the function call in BrokerCompatibilityCheck and in the create function it calls the StreamsConfig.getConsumerConfigs() with 2 Strings as parameters instead of the parameters defined in
https://kafka.apache.org/0102/javadoc/org/apache/kafka/streams/StreamsConfig.html#getConsumerConfigs(StreamThread,%20java.lang.String,%20java.lang.String).
Are my KSQL and Kafka version simply not compatible or am I doing something wrong?
I am using KSQL version 4.1.0-SNAPSHOT and Kafka version 1.0.0.
Yes, NoSuchMethodError typically indicates a version incompatibility between libraries.
The link you posted is to javadoc for kafka 0.10.2. The method hasn't changed in 1.0 but indeed in the upcoming 1.1 it only takes 2 Strings:
https://kafka.apache.org/11/javadoc/org/apache/kafka/streams/StreamsConfig.html#getConsumerConfigs(java.lang.String,%20java.lang.String)
. That suggests the version of KSQL you're using (4.1.0-SNAPSHOT) depends on version 1.1 of kafka streams, which is currently in the release candidate phase and I believe and should be out soon:
https://lists.apache.org/thread.html/780c4458b16590e99261b69d7b41b9ec374a3226d72c8d38885a008a#%3Cusers.kafka.apache.org%3E
As per that email you can find the latest (1.1.0-rc2) artifacts in the apache staging repo:
https://repository.apache.org/content/groups/staging/

GCS Connector Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found

We are trying to run Hive queries on HDP 2.1 using GCS Connector, it was working fine until yesterday but since today morning our jobs are randomly started failing. When we restart them manually they just work fine. I suspect it's something to do with number of parallel Hive jobs running at a given point of time.
Below is the error message:
vertexId=vertex_1407434664593_37527_2_00, diagnostics=[Vertex Input: audience_history initializer failed., java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found]
DAG failed due to vertex failure. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
Any help will be highly appreciated.
Thanks!

NoClassDefFoundError in openid4java

I have downloaded openid4java-0.9.6.662 and implemented a class using it. When I execute:
List discoveries = manager.discover("https://www.google.com/accounts/o8/id");
I get a
java.lang.NoClassDefFoundError: org/apache/http/protocol/ImmutableHttpProcessor
at org.apache.http.impl.client.AbstractHttpClient.getProtocolProcessor(AbstractHttpClient.java:656)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:804)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at org.openid4java.util.HttpCache.head(HttpCache.java:335)
at org.openid4java.discovery.yadis.YadisResolver.retrieveXrdsLocation(YadisResolver.java:400)
at org.openid4java.discovery.yadis.YadisResolver.discover(YadisResolver.java:248)
at org.openid4java.discovery.yadis.YadisResolver.discover(YadisResolver.java:232)
at org.openid4java.discovery.yadis.YadisResolver.discover(YadisResolver.java:166)
at org.openid4java.discovery.Discovery.discover(Discovery.java:147)
at org.openid4java.discovery.Discovery.discover(Discovery.java:129)
at org.openid4java.consumer.ConsumerManager.discover(ConsumerManager.java:542)
at com.sugra.openid.helper.OpenIDConsumer.authRequest(OpenIDConsumer.java:90)
the funny thing is this class cannot be found in any of the jars, thought it is supposed to be found in httpcore-4.0.1.jar, as it contains classes of the same package. This class is available in httpcore-4.2.1.jar. But I've tried it and got
org.openid4java.discovery.yadis.YadisException: 0x704: I/O transport error: hostname in certificate didn't match: <www.google.com/173.194.35.144> != <www.google.com>
that is reported to be an error of portability and a previous version should be used
What is supposed to be the right approach to use this method?
I found it. There was a conflict with another jar (httpclient.jar) I had in my app. I just had to upgrade it