I am writing an Apache Flink streaming application that deserializes data (Avro format) read off a Kafka bus (more details on here). The data is being deserialized into a Scala case class. I am getting an exception when i run the program and it received the first message from Kafka
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.myorg.quickstart.DeviceData.<init>()
at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:625)
at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:121)
at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
at org.myorg.quickstart.StreamingKafkaClient$.main(StreamingKafkaClient.scala:26)
at org.myorg.quickstart.StreamingKafkaClient.main(StreamingKafkaClient.scala)
Caused by: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.myorg.quickstart.DeviceData.<init>()
at org.apache.avro.specific.SpecificData.newInstance(SpecificData.java:353)
at org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:369)
at org.apache.avro.reflect.ReflectData.newRecord(ReflectData.java:901)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:212)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
at org.myorg.quickstart.AvroDeserializationSchema.deserialize(AvroDeserializationSchema.scala:20)
at org.apache.flink.streaming.util.serialization.KeyedDeserializationSchemaWrapper.deserialize(KeyedDeserializationSchemaWrapper.java:44)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:142)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:738)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodException: org.myorg.quickstart.DeviceData.<init>()
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at org.apache.avro.specific.SpecificData.newInstance(SpecificData.java:347)
... 16 more
Process finished with exit code 1
The Scala case class is very simple:
package org.myorg.quickstart
/** Case class to hold the Device data. */
case class DeviceData(deviceId: String,
sw_version: String,
timestamp: String,
reading: Double
)
Not sure why an "init" method is needed for the case class. An examples of how to do this? Should i be using a different data structure other than the case class?
The Avro serializer or more specifically the SpecificData requires the target type to have a default constructor (constructor with no arguments). Otherwise Avro cannot instantiate an object of the target type.
Try to add a default constructor via
case class DeviceData(
deviceId: String,
sw_version: String,
timestamp: String,
reading: Double) {
def this() = this("default", "default", "default", 0)
}
Related
I have a simple service as an example:
object SimpleService {
def findById(id: String, col: MongoCollections): Future[Option[Simple]] =
collection(col).flatMap(c => c.find(selectorId(id)).one[Simple])
}
where the Simple:
case class Simple(#Key("_id")id: String, name: String,)
object Simple{
implicit val eventHandler: BSONDocumentHandler[Simple] =
Macros.using[MacroOptions.ReadDefaultValues].handler[Simple]
}
Then I have written some integration tests and everything works fine when I run the tests. I can do all CRUD operations in the tests.
But I added a server with some simple API and when I use a method findById from SimpleService
I have an error in runtime only:
Uncaught error from thread [SimpleServer-akka.actor.default-dispatcher-5]: reactivemongo/api/bson/SafeBSONWriter, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[SomeServer]
java.lang.NoClassDefFoundError: reactivemongo/api/bson/SafeBSONWriter
FYI: SafeBSONWriter is a private trait and object in the library.
Could you suggest where to look?
Add the "reactivemongo-bson-api" dependency, this will solve the issue
I am trying to read avro file with case class: UserItemIds that includes case class type: User , sbt and scala 2.11
case class User(id: Long, description: String)
case class UserItemIds(user: User, itemIds: List[Long])
val UserItemIdsInputStream = env.createInput(new AvroInputFormat[UserItemIds](user_item_ids_In, classOf[UserItemIds]))
UserItemIdsInputStream.print()
but receive: Error:
Caused by: java.lang.NoSuchMethodException: schema.User.<init>()
Can anyone guide me how to work with these types please? This example is with avro files, but this could be parquet or any custom DB input.
Do I need to use TypeInformation ? ex:, if yes how to do so?
val tupleInfo: TypeInformation[(User, List[Long])] = createTypeInformation[(User, List[Long])]
I also saw env.registerType() , does it relate to the issue at all? Any help is greatly appreciated.
I found the solution to this java error as Adding a default constructor in this case I added factory method to scala case class by adding it to the companion object
object UserItemIds{
case class UserItemIds(
user: User,
itemIds: List[Long])
def apply(user:User,itemIds:List[Long]) = new
UserItemIds(user,itemIds)}
but this has not resolved the issue
You have to add a default constructor for the User and UserItemIds type. This could look the following way:
case class User(id: Long, description: String) {
def this() = this(0L, "")
}
case class UserItemIds(user: User, itemIds: List[Long]) {
def this() = this(new User(), List())
}
I am using IntelliJ Community Edition with Scala Plugin and spark libraries. I am still learning Spark and am using Scala Worksheet.
I have written the below code which removes punctuation marks in a String:
def removePunctuation(text: String): String = {
val punctPattern = "[^a-zA-Z0-9\\s]".r
punctPattern.replaceAllIn(text, "").toLowerCase
}
Then I read a text file and try to remove punctuation:
val myfile = sc.textFile("/home/ubuntu/data.txt",4).map(removePunctuation)
This gives error as below, any help would be appreciated:
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(/home/ubuntu/src/main/scala/Test.sc:294)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(/home/ubuntu/src/main/scala/Test.sc:284)
at org.apache.spark.util.ClosureCleaner$.clean(/home/ubuntu/src/main/scala/Test.sc:104)
at org.apache.spark.SparkContext.clean(/home/ubuntu/src/main/scala/Test.sc:2090)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(/home/ubuntu/src/main/scala/Test.sc:366)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(/home/ubuntu/src/main/scala/Test.sc:365)
at org.apache.spark.rdd.RDDOperationScope$.withScope(/home/ubuntu/src/main/scala/Test.sc:147)
at #worksheet#.#worksheet#(/home/ubuntu/src/main/scala/Test.sc:108)
Caused by: java.io.NotSerializableException: A$A21$A$A21
Serialization stack:
- object not serializable (class: A$A21$A$A21, value: A$A21$A$A21#62db3891)
- field (class: A$A21$A$A21$$anonfun$words$1, name: $outer, type: class A$A21$A$A21)
- object (class A$A21$A$A21$$anonfun$words$1, )
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:369)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.map(RDD.scala:369)
at A$A21$A$A21.words$lzycompute(Test.sc:27)
at A$A21$A$A21.words(Test.sc:27)
at A$A21$A$A21.get$$instance$$words(Test.sc:27)
at A$A21$.main(Test.sc:73)
at A$A21.main(Test.sc)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jetbrains.plugins.scala.worksheet.MyWorksheetRunner.main(MyWorksheetRunner.java:22)
As T. Gaweda already pointed out, you're most likely defining your function in a class that's not serializable. Because it is a pure function, i.e. it doesn't depend on any context of the enclosing class, I suggest you put it into a companion object which should extend Serializable. This would be Scala's equivalent of a Java static method:
object Helper extends Serializable {
def removePunctuation(text: String): String = {
val punctPattern = "[^a-zA-Z0-9\\s]".r
punctPattern.replaceAllIn(text, "").toLowerCase
}
}
As #TGaweda suggests, Spark's SerializationDebugger is very helpful for identifying "the serialization path leading from the given object to the problematic object." All the dollar signs before the "Serialization stack" in the stack trace indicate that the container object for your method is the problem.
While it is easiest to just slap Serializable on your container class, I prefer to take advantage of the fact Scala is a functional language and use your function as a first class citizen:
sc.textFile("/home/ubuntu/data.txt",4).map { text =>
val punctPattern = "[^a-zA-Z0-9\\s]".r
punctPattern.replaceAllIn(text, "").toLowerCase
}
Or if you really want to keep things separate:
val removePunctuation: String => String = (text: String) => {
val punctPattern = "[^a-zA-Z0-9\\s]".r
punctPattern.replaceAllIn(text, "").toLowerCase
}
sc.textFile("/home/ubuntu/data.txt",4).map(removePunctuation)
These options work of course since Regex is serializable as you should confirm.
On a secondary but very important note, constructing a Regex is expensive, so factor it out of your transformations for the sake of performance--possibly with a broadcast.
Read the stacktrace, there is:
$outer, type: class A$A21$A$A21
It is a very good hint. Your lambda is serializable, but your class is not serializable.
When you make lambda expression, then this expression has reference to outer class. Outer class in your case is not serializable, i.e. is not implementing Serializable or one of fields is not an instance of Serializable
I have the following Scala class hierarchy:
abstract class BaseModule(val appConf : AppConfig) {
// ...
}
class SimpleModule(appConf : AppConfig) extends BaseModule(appConf) {
// ...
}
class FairlyComplexModule(appConf : AppConfig) extends BaseModule(appConf) {
// ...
}
// dozens of other BaseModule subclasses...
At runtime, my app will accept a String input argument for the fully-qualified class name of a BaseModule subclass to instantiate, but the code won't know which concrete subclass it will be. So I have:
val moduleFQCN = loadFromInputArgs() // ex: "com.example.myapp.SimpleModule"
val moduleClass = Class.forName(moduleFQCN)
println(s"Found ${moduleFQCN} on the runtime classpath.")
val module = Class.forName(moduleFQCN).getConstructor(classOf[AppConfig]).newInstance(appConf).asInstanceOf[BaseModule]
So this way, the input specifies which BaseModule subclass to look for on the classpath, and then subsequently, to instantiate. The first three lines above execute just fine, and I see the println fire. However the last line above throws an exception:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
<rest of stacktrace omitted for brevity>
So clearly I'm doing something wrong when trying to create an instance of the SimpleModule subclass, just can't figure out what it is. Any ideas?
You're probably failing because you call newInstance() without any arguments, but no default constructor is found therefore the instantiation fails.
try this:
Class.forName(moduleFQCN).getConstructor(classOf[AppConfig])
.newInstance(appConf).asInstanceOf[BaseModule]
Where appConf is an instance of AppConfig and is the parameter to instantiate BaseModule with.
Say I have a case class that maps to table with two Timestamp fields in my database, one of which can be null. I want to define my own custom DateTime class to map to the timestamp fields:
case class DateTime(val time: Long) extends TimestampField(new Timestamp(time)) {
def this() = this(System.currentTimeMillis)
def this(ts: Timestamp) = this(ts.getTime)
}
And I define my entity class thus:
case class Period(id: Int, name: String, begin: DateTime, end: Option[DateTime])
But as soon as I run my program I get the exception below. Can someone tell me what I am missing?
...
Caused by: java.lang.RuntimeException: error while reflecting on metadata for (Some(private final scala.Option data.Period.end),Some(public scala.Option data.Period.end()),None,Set()) of class data.Period
at org.squeryl.internals.PosoMetaData$$anonfun$6.apply(PosoMetaData.scala:126)
at org.squeryl.internals.PosoMetaData$$anonfun$6.apply(PosoMetaData.scala:83)
at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
at org.squeryl.internals.PosoMetaData.<init>(PosoMetaData.scala:83)
at org.squeryl.View.<init>(View.scala:64)
at org.squeryl.Table.<init>(Table.scala:29)
at org.squeryl.Schema.table(Schema.scala:345)
at org.squeryl.Schema.table(Schema.scala:341)
at data.Library$.<init>(Library.scala:211)
at data.Library$.<clinit>(Library.scala)
... 17 more
Caused by: java.lang.RuntimeException: class data.Period used in table Period, needs a zero arg constructor with sample values for Option[] field end
at org.squeryl.internals.Utils$.throwError(Utils.scala:95)
at org.squeryl.internals.FieldMetaData$$anon$1.build(FieldMetaData.scala:490)
at org.squeryl.internals.PosoMetaData$$anonfun$6.apply(PosoMetaData.scala:118)
... 28 more