no valid constructor on spark - scala

This is my code:
class FNNode(val name: String)
case class Ingredient(override val name: String, category: String) extends FNNode(name)
val ingredients: RDD[(VertexId, FNNode)] =
filter(! _.startsWith("#")).
map(line => line.split('\t')).
map(x => (x(0).toInt ,Ingredient(x(1), x(2))))
and there are no errors when I define these variables. However, when trying to execute it:
I get
org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: $iwC$$iwC$Ingredient; no valid constructor
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
It seems this could be related to Serialization issues as per the answer here . However, I have no idea of how to solve this if it is indeed a Serialization issue.
I'm following along the code in this book by they way, so I would assume this should have at least worked at some point?

This fixed your issue for me:
class FNNode(val name: String) extends Serializable


Scala/Spark serializable error - join don't work

I am trying to use join method between 2 RDD and save it to cassandra but my code don't work . at the begining , i get a huge Main method and everything working well , but when i use function and class this don't work . i am new to scala and spark
code is :
class Migration extends Serializable {
case class userId(offerFamily: String, bp: String, pdl: String) extends Serializable
case class siteExternalId(site_external_id: Option[String]) extends Serializable
case class profileData(begin_ts: Option[Long], Source: Option[String]) extends Serializable
def SparkMigrationProfile(sc: SparkContext) = {
val test = sc.cassandraTable[siteExternalId](KEYSPACE,TABLE)
.filter(x => x._2.site_external_id != None)
val profileRDD = sc.cassandraTable[profileData](KEYSPACE,TABLE)
//dont work
// don't work
.saveToCassandra(keyspace, table)
At the beginig i get the famous : Exception in thread "main" org.apache.spark.SparkException: Task not serializable at . . .
so i extends my main class and also the case class but stil don't work .
I think you should move your case classes from Migration class to dedicated file and/or object. This should solve your problem. Additionally, Scala case classes are serializable by default.

Flink read custom types - implicit value error: `Caused by: java.lang.NoSuchMethodException: <init>()`

I am trying to read avro file with case class: UserItemIds that includes case class type: User , sbt and scala 2.11
case class User(id: Long, description: String)
case class UserItemIds(user: User, itemIds: List[Long])
val UserItemIdsInputStream = env.createInput(new AvroInputFormat[UserItemIds](user_item_ids_In, classOf[UserItemIds]))
but receive: Error:
Caused by: java.lang.NoSuchMethodException: schema.User.<init>()
Can anyone guide me how to work with these types please? This example is with avro files, but this could be parquet or any custom DB input.
Do I need to use TypeInformation ? ex:, if yes how to do so?
val tupleInfo: TypeInformation[(User, List[Long])] = createTypeInformation[(User, List[Long])]
I also saw env.registerType() , does it relate to the issue at all? Any help is greatly appreciated.
I found the solution to this java error as Adding a default constructor in this case I added factory method to scala case class by adding it to the companion object
object UserItemIds{
case class UserItemIds(
user: User,
itemIds: List[Long])
def apply(user:User,itemIds:List[Long]) = new
but this has not resolved the issue
You have to add a default constructor for the User and UserItemIds type. This could look the following way:
case class User(id: Long, description: String) {
def this() = this(0L, "")
case class UserItemIds(user: User, itemIds: List[Long]) {
def this() = this(new User(), List())

Apache Spark Task not Serializable when Class exends Serializable

I am consistently having errors regarding Task not Serializable.
I have made a small Class and it extends Serializable - which is what I believe is meant to be the case when you need values in it to be serialised.
class SGD(filePath : String) extends Serializable {
val rdd = sc.textFile(filePath)
val mappedRDD = => x.split(" ")
.map(y => Rating(y(0).toInt, y(1).toInt, y(2).toDouble))
val RNG = new Random(1)
val factorsRDD = mappedRDD(x => (x.user, (x.product, x.rating)))
.mapValues(listOfItemsAndRatings =>
The final line always results in a Task not Serializable error. What I do not understand is: the Class is Serializable; and, the Class Random is also Serializable according to the API. So, what am I doing wrong? I consistently can't get stuff like this to work; therefore, I imagine my understanding is wrong. I keep being told the Class must be Serializable... well it is and it still doesn't work!?
scala.util.Random was not Serializable until 2.11.0-M2.
Most likely you are using an earlier version of Scala.
A class doesn't become Serializable until all its members are Serializable as well (or some other mechanism is provided to serialize them, e.g. transient or readObject/writeObject.)
I get the following stacktrace when running given example in spark-1.3:
Caused by: scala.util.Random
Serialization stack:
- object not serializable (class: scala.util.Random, value: scala.util.Random#52bbf03d)
- field (class: $iwC$$iwC$SGD, name: RNG, type: class scala.util.Random)
One way to fix it is to take instatiation of random variable within mapValues:
mapValues(listOfItemsAndRatings => { val RNG = new Random(1)
Vector(Array.fill(2)(RNG.nextDouble)) })

Task not serializable: only due to a fixed String?

I have the following RDD:
myRDD:org.apache.spark.rdd.RDD[(String, org.apache.spark.mllib.linalg.Vector)]
Then I want to add a fixed key:"myFixedKey",_)):org.apache.spark.rdd.RDD[(String, (String, org.apache.spark.mllib.linalg.Vector))]
But if I use a constant String val instead of a hardcoded/explicit string:
val myFixedKeyVal:String = "myFixedKey",_))
This previous line code gives the following exception:
org.apache.spark.SparkException: Task not serializable
Am I missing something?
Ok I found the problem, myRdd is an object that extends a Serializable class, but after process this RDD by another class, e.g. Process:
class Process(someRdd:MyRddClass) extends Serializable{
def preprocess =
val someprocess = Process(myRdd)
val newRdd = someprocess.preprocess>("newkey",x)
This class Process must extend Serializable too in order to work. I thought that the newRdd was extending the root class MyRddClass...
The string constant is not the problem. Turn on serialization debugging with to figure out the real cause.

Kryo class error in Apache-Spark

I have some Spark code and I use Kryo serialization. When no server fails everything runs fine, but when a server fails, I come across some big issues as it tries to recover itself. Basically the error message says that my Article class becomes unknown to a server.
Job aborted due to stage failure: Task 29 in stage 4.0 failed 4 times, most recent failure: Lost task 29.3 in stage 4.0 (TID 316, DATANODE-3): com.esotericsoftware.kryo.KryoException: Unable to find class: $line50.$read$$iwC$$iwC$Article
I really have a lot of trouble understanding what I am doing wrong...
I declare these classes outside of my maps
case class Contrib ( contribType: Option[String], surname: Option[String], givenNames: Option[String], phone: Option[String], email: Option[String], fax: Option[String] )
// Class to hold references
case class Reference( idRef:Option[String], articleNameRef:Option[String], pmIDFrom: Option[Long], pmIDRef:Option[Long])
// Class to hold articles
case class Article(articleName:String, articleAbstract: Option[String],
pmID:Option[Long], doi:Option[String],
references: Iterator[Reference],
contribs: Iterator[Contrib],
keywords: List[String])
It seems that some executors just don't know what an Article is anymore...
How can I resolve that issue?