Persistence with NamedObjects in Spark Job Server - scala

Im using the latest SJS version (master) and the application extends SparkHiveJob. In the runJob implementation, I have the following
val eDF1 = hive.applySchema(rowRDD1, schema)
I would like to persist eDF1 and tried the following
val rdd_topersist = namedObjects.getOrElseCreate("cleanedDF1", {
NamedDataFrame(eDF1, true, StorageLevel.MEMORY_ONLY)
})
where the following compile errors occur
could not find implicit value for parameter persister: spark.jobserver.NamedObjectPersister[spark.jobserver.NamedDataFrame]
not enough arguments for method getOrElseCreate: (implicit timeout:scala.concurrent.duration.FiniteDuration, implicit persister:spark.jobserver.NamedObjectPersister[spark.jobserver.NamedDataFrame])spark.jobserver.NamedDataFrame. Unspecified value parameter persister.
Obviously this is wrong, but I can't figure what is wrong. I'm fairly new to Scala.
Can someone help me understand this syntax from NamedObjectSupport?
def getOrElseCreate[O <: NamedObject](name: String, objGen: => O)
(implicit timeout: FiniteDuration = defaultTimeout,
persister: NamedObjectPersister[O]): O

I think you should define implicit persister. Looking at the test code, I see something like this
https://github.com/spark-jobserver/spark-jobserver/blob/ea34a8f3e3c90af27aa87a165934d5eb4ea94dee/job-server-extras/test/spark.jobserver/NamedObjectsSpec.scala#L20

Related

No implicits arguments of type T, Trying to cast a dataframe to a Dataset[T]

I'm trying to build a generic function to consolidate some mongoDb collections, I'm using case classes to type the collections and on my function I receive the type as a parameter [T] like this:
def refreshCollection[T](newDS:Dataset[T],oldDS:Dataset[T]): Dataset[T]={
val filteredOldDS=oldDS.join(newDS, Seq("id"),"left_anti").as[T]
filteredOldDS.union(newDS)
}
The problem is when I try to convert the Dataframe result of the join to the original case class using .as[T] to return a Dataset[T] I've got this error even tough I've imported the sparkSession.implicits._:
no implicit arguments of type: Encoder[T]
The interesting part is, when I make the conversion with a fixed case class works fine, any advice on this?
Thanks in advance!
I believe it's because your code doesn't guarantee that there will be Encoder[T] available when necessary.
You can try to parametrise your with implicit decoder and postpone the moment when compiler will try to find the required decoder.
def refreshCollection[T](newDS:Dataset[T],oldDS:Dataset[T])(implicit enc: Encoder[T]): Dataset[T]={
val filteredOldDS=oldDS.join(newDS, Seq("id"),"left_anti").as[T]
filteredOldDS.union(newDS)
}
Of course, you will need to bring into scope somehow the Encoder[MyCaseClass] when calling refreshCollection

How to extend the transformer in Kafka Scala?

I am working on a Kafka streaming implementation of a word counter in Scala in which I extended the transformer:
class WordCounter extends Transformer[String, String, (String, Long)]
It is then called in the stream as follows:
val counter: KStream[String, Long] = filtered_record.transform(new WordCounter, "count")
However, I am getting the error below when running my program via sbt:
[error] required: org.apache.kafka.streams.kstream.TransformerSupplier[String,String,org.apache.kafka.streams.KeyValue[String,Long]]
I can't seem to figure out how to fix it, and could not find any appropriate Kafka example of a similar implementation.
Anyone got any idea of what I am doing wrong?
The signature of transform() is:
def transform[K1, V1](transformerSupplier: TransformerSupplier[K, V, KeyValue[K1, V1]],
stateStoreNames: String*): KStream[K1, V1]
Thus, transform() takes a TransformerSupplier as first argument not a Transformer.
See also the javadocs

Scala: Unspecified value parameter evidence$3

I've looked around and found several other examples of this, but I don't really understand from those answers what's actually going on.
I'd like to understand why the following code fails to compile:
val df = readFiles(sqlContext).
withColumn("timestamp", udf(UDFs.parseDate _)($"timestamp"))
Giving the error:
Error:(29, 58) not enough arguments for method udf: (implicit evidence$2: reflect.runtime.universe.TypeTag[java.sql.Date], implicit evidence$3: reflect.runtime.universe.TypeTag[String])org.apache.spark.sql.UserDefinedFunction.
Unspecified value parameter evidence$3.
withColumn("timestamp", udf(UDFs.parseDate _)($"timestamp")).
^
Whereas this code does compile:
val parseDate = udf(UDFs.parseDate _)
val df = readFiles(sqlContext).
withColumn("timestamp", parseDate($"timestamp"))
Obviously I've found a "workaround" but I'd really like to understand:
What this error really means. The info I have found on TypeTags and ClassTags has been really difficult to understand. I don't come from a Java background, which perhaps doesn't help, but I think I should be able to grasp it…
If I can achieve what I want without a separate function definition
The error message is a bit mis-leading indeed; the reason for it is that the function udf takes an implicit parameter list but you are passing an actual parameter. Since I don't know much about spark and since the udf signature is a bit convoluted I'll try to explain what is going on with a simplified example.
In practice udf is a function that given some explicit parameters and an implicit parameter list gives you another function; let's define the following function that given a pivot of type T for which we have an implicit Ordering will give as a function that allows us to split a sequence in two, one containing elements smaller than pivot and the other containing elements that are bigger:
def buildFn[T](pivot: T)(implicit ev: Ordering[T]): Seq[T] => (Seq[T], Seq[T]) = ???
Let's leave out the implementation as it's not important. Now, if I do the following:
val elements: Seq[Int] = ???
val (small, big) = buildFn(10)(elements)
I will make the same kind of mistake that you are showing in your code, i.e. the compiler will think that I am explicitly passing elements as the implicit parameter list and this won't compile. The error message of my example will be somewhat different from the one you have because in my case the number of parameters I am mistakenly passing for the implicit parameter list matches the expected one and then the error will be about types not lining up.
Instead, if I write it as:
val elements: Seq[Int] = ???
val fn = buildFn(10)
val (small, big) = fn(elements)
In this case the compiler will correctly pass the implicit parameters to the function. I don't know of any way to circumvent this problem, unless you want to pass the actual implicit parameters explicitly but I find it quite ugly and not always practical; for reference this is what I mean:
val elements: Seq[Int] = ???
val (small, big) = buildFn(10)(implicitly[Ordering[Int]])(elements)

Play! 2.4 - Implicit Reads for ScalaJson w/ Generic Types

Using Play 2.4 ScalaWS. I've defined a method that takes a type manifest T and performs a GET request to an external API. The problem is that it won't compile because there isn't an implicit Reads for parsing JSON.
Here's the code:
def myGet[T](path: String)(implicit m: Manifest[T]): Future[Either[model.MyError,T]] = {
val url = MY_HOST+"/"+path
ws
.url(url)
.withHeaders(myHeaders: _*)
.get()
.map { response =>
try {
Right(response.json.as[T])
} catch {
// check if this response was an error
Left(response.json.as[model.MyError])
}
}
}
The compilation error is specifically:
Compilation error[No Json deserializer found for type T. Try to implement an implicit Reads or Format for this type.]
I'm not sure of the simplest way to do this. Thanks for your help.
Edit
I also tried (implicit m: Manifest[T], reads: Reads[T]) with no luck.
It turns out using (implicit m: Manifest[T], readsT: Reads[T]) and having the Reads be an implicit parameter was the correct way of doing this. I had to run sbt clean since something was improperly cached in the incremental compiler.
It now works just fine.

How to make this Scala implicit conversion work?

I'm using a library that gives me the following (relations can be implicitly converted to nodes):
class Relation[A,B]
class Node[A,B](r: Relation[A,B])
implicit def relation2node[A,B](r: Relation[A,B]) = new Node(r)
I'm extending Relation for my own use:
class XRelation[A] extends Relation[A,Int]
Relations/XRelations are meant to be subclassed:
class User extends XRelation[Int]
Now, I also define my own Helper methods like GET, designed to work with any Node and anything that converts to Node:
class Helper[A,B](n: Node[A,B]) { def GET {} }
// note: this is the only way I know of to make the next example work.
implicit def xx2node2helper[A,B,C[_,_]](x: C[A,B])(implicit f: C[A,B] => Node[A,B]) = new Helper(x)
So this example works:
new Relation[Int,Int]().GET
And if I add another implicit conversion:
// don't understand why this doesn't work for the previous example
// but works for the next example
implicit def x2node2helper[A,B,C](x: C)(implicit f: C => Node[A,B]) = new Helper(x)
I can also make the following conversion work:
new XRelation[Int]().GET
But this doesn't work:
new User().GET
Sadly, that fails with:
error: No implicit view available from Sandbox3.User => Sandbox3.Node[A,B]
Can anyone make sense of all this and explain how to get the last example to work? Thanks in advance.
Update: I know you can just introduce implicit conversions from Relation, but I'm asking to (1) figure out how to do this without having to introduce implicits from every single type that could possibly implicitly convert to Node, and (2) to solidify my understanding of implicits.
implicit def nodeLike2Helper[R, C <: R](r:C)(implicit f: R => Node[_,_]) = {
new Helper(r)
}
Just as the error message indicates, User does not have implicit conversion to Node. But it's super-super-class Relation has. So you just give the right bounds to type parameters.
FYI, there is a syntax sugar <% for view bounds, so the above code can be shorter:
implicit def nodeLike2Helper[R <% Node[_,_], C <: R](r:C) = {
new Helper(r)
}
The scala resolution only goes one super class deep when checking if User matches the type pattern C[_,_]. You can fix this by doing away with the pattern in the following code.
implicit def x2node2helper[A,B](x: Relation[A,B])(implicit f: Relation[A,B] => Node[A,B]) = new Helper(x)
And if the implicit relation2node is in scope for the definition of x2node2helper then it can be written
implicit def x2node2helper[A,B](x: Relation[A,B]) = new Helper(x)