Scala wrapping method of a parametrized class (spark-cassandra-connector) - scala

I am writing a set of methods that extend Spark RDD's API.
I have to implement a general method for storing the RDDs, and for a start I tried to wrap spark-cassandra-connector's saveAsCassandraTable, without success.
Here's the "extending RDD's API" part:
object NewRDDFunctions {
implicit def addStorageFunctions[T](rdd: RDD[T]):
RDDStorageFunctions[T] = new RDDStorageFunctions(rdd)
}
class RDDStorageFunctions[T](rdd: RDD[T]) {
def saveResultsToCassandra() {
rdd.saveAsCassandraTable("ks_name", "table_name") // this line produces errors!
}
}
...and importing the object as: import ...NewRDDFunctions._.
The marked line produces following errors:
Error:(99, 29) could not find implicit value for parameter rwf: com.datastax.spark.connector.writer.RowWriterFactory[T]
rdd.saveAsCassandraTable("ks_name", "table_name")
^
Error:(99, 29) not enough arguments for method saveAsCassandraTable: (implicit connector: com.datastax.spark.connector.cql.CassandraConnector, implicit rwf: com.datastax.spark.connector.writer.RowWriterFactory[T], implicit columnMapper: com.datastax.spark.connector.mapper.ColumnMapper[T])Unit.
Unspecified value parameters rwf, columnMapper.
rdd.saveAsCassandraTable("ks_name", "table_name")
^
I don't get why this doesn't work since saveAsCassandraTable is designed to work on any RDD. Any suggestions?
I had similar problem with the example in spark-cassandra-connector docs:
case class WordCount(word: String, count: Long)
val collection = sc.parallelize(Seq(WordCount("dog", 50), WordCount("cow", 60)))
collection.saveAsCassandraTable("test", "words_new", SomeColumns("word", "count"))
...and the solution was to move case class definition out of "main" function (but I don't really know if this applies to the mentioned problem...).

saveAsCassandraTable needs 3 implicit parameters. The first one (connector) has a default value, the last two (rwf and columnMapper) are not in implicit scope in your saveResultsToCassandra method, as a consequence your method doesn't compile.
Look at this answer on another question, if you need some more information about implicits.
Turning your saveResultsToCassandra into the function below should work, if you have defined your tables (TableDef) before.
def saveResultsToCassandra()(
// implicit parameters as a separate list!
implicit rwf: RowWriterFactory[T],
columnMapper: ColumnMapper[T]
) {
rdd.saveAsCassandraTable("ks_name", "table_name")
}

Related

Scala Implicit Parameters Projection Conflict , "Ambigious Implicit Values" Error

I have been reading Bruno's TypeClasses paper and he mentioned that implicits in the argument list are projected/propagated into the implicit scope. I followed the example with this code:
package theory
import cats.implicits.toShow
import java.io.PrintStream
import java.util.Date
object Equality extends App {
import cats.Eq
// assert(123 == "123")
println(Eq.eqv(123, 123).show)
implicit val out: PrintStream = System.out
def log(m: String)(implicit o: PrintStream ): Unit =
o.println(m)
def logTime(m: String)(implicit o: PrintStream): Unit =
log(s"${new Date().getTime} : $m")
}
The kicker is that this code will not compile, with:
ambiguous implicit values:
both value out in object Equality of type java.io.PrintStream
and value o of type java.io.PrintStream
match expected type java.io.PrintStream
log(s"${new Date().getTime} : $m")
So, I assume that the compiler sees 2 instances of the same implicit and complains. I was able to silence the compiler by explicitly adding the PrintStream passed as an argument as the second argument to log:
def logTime(m: String)(implicit o: PrintStream): Unit =
log(s"${new Date().getTime} : $m")(o)
This works, but am I missing something? Why is there confusion inside the body of logTime()? I thought Bruno was implying that the implicit from the caller would be projected into the scope of the method. His example does not add the extra parameter to the log() call. Why does scalac see these as 2? I suppose I assumed that the implicit from the outer method would "hide" the val out. Not so.
If anyone can explain why I see this, it would be appreciated.
Recall that the value of an implicit parameter is determined at the call site. That's why...
Equality.log("log this")
...won't compile unless an implicit value of the appropriate type is brought into scope.
implicit val ps: PrintStream = ...
Equality.log("log this")
The logTime() definition code is a call site for the log() method and, since it is defined within the Equality object, it has the implicit val out value available to it. But it is also the recipient of the implicit o value of the same type that was passed from its call site. Thus the ambiguity. Should the compiler send the implicit out value or the implicit o value to the log() method?
Now, it might seem a bit odd that the received implicit value (from the call site) is both assigned to a local identifier, o, and inserted into the local implicit namespace as well. It turns out that Scala-3 has modified that behavior and your code compiles without error, even without the new given/using syntax. (I assume the implicit out value is passed to the log() method and not the received o value.)

Scala Implicit Conversion for companion object of extended class

I am trying to create a customRDD in Java.
RDD converts RDD[(K,V)] to PairRDDFunctions[K,V] using Scala implicit function rddToPairRDDFunctions() defined in object RDD.
I am trying to do the same with my CustomJavaRDD which extends CustomRDD which extends RDD.
Now it should call implicit function rddToCustomJavaRDDFunctions() whenever it encounters CustomJavaRDD[(K,V)], but for some reason it still goes to rddToPairRDDFunctions().
What am I doing wrong?
RDD.scala
class RDD[T]
object RDD {
implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
(implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null):
PairRDDFunctions[K, V] = {
new PairRDDFunctions(rdd)
}
}
CustomRDD.scala
abstract class CustomRDD[T] extends RDD[T]
object CustomRDD {
implicit def rddToCustomJavaRDDFunctions[K,V](rdd: CustomJavaRDD[(K,V)]):
PairCustomJavaRDDFunction[K,V] = {
new PairCustomJavaRDDFunctions[K,V](rdd)
}
}
PairCustomJavaRDDFunctions.scala
class PairCustomJavaRDDFunctions[K: ClassTag, V: ClassTag](self: CustomRDD[(K, V)])
(implicit ord: Ordering[K] = null) {
def collectAsMap() = ???
}
There is no error; the program compiles successfully,
but let's say I have data: RDD which is an instance of CustomJavaRDD.
data.collectAsMap()
At the runtime it converts data into PairRDDFunctions; i.e. it makes implicit call to rddToPairRDDFunctions defined in RDD.scala.
But it should make call to rddToCustomJavaRDDFunctions defined in CustomRDD.scala and convert it into PairCustomJavaRDDFunctions.
But it should make call to rddToCustomJavaRDDFunctions defined in CustomRDD.scala and convert it into PairCustomJavaRDDFunctions
No, Scala simply does not work this way. What you want, overriding an implicit conversion depending on the runtime type of an object, is simply not possible (without pre-existing machinery on both the library's part and yours).
Implicits are a strictly compile-time feature. When the compiler sees you using an RDD as if it were a PairRDDFunctions, it splices in a call to RDD.rddToPairRDDFunctions, as if you wrote it yourself. Then, when the code is translated to bytecode, that call has already been baked in and nothing can change it. There is no dynamic dispatch for this, it's all static. The only situation where rddToCustomJavaRDDFunctions will be called is when the static type of the expression in question is already CustomJavaRDD.
Really, this should not be necessary. Implicit conversions are really no more than glorified helper methods that save you keystrokes. (Implicit parameters, now those are interesting. ;) ) There should be no need to override them because the helper methods should already be polymorphic and work whether you have RDD, CustomRDD, or `RDD that travels through time to compute things faster`.
Of course, you can still do it, but it will only actually do anything under the above conditions, and that is probably not very likely, making the whole thing rather pointless.

Defining a Semigroup instance that depends on itself

... or mishaps of a Haskell programmer that has to code Scala, part 5.
I have the following structure in Scala:
case class ResourceTree(
resources: Map[String, ResourceTree]
)
And, using Cats, I would like to define a Semigroup instance of it.
object ResourceTreeInstances {
implicit val semigroupInstance = new Semigroup[ResourceTree] {
override def combine(x: ResourceTree, y: ResourceTree): ResourceTree = {
ResourceTree(
x.resources |+| y.resources
)
}
}
This will result in the following error:
value |+| is not a member of Map[String, ResourceTree]
[error] Note: implicit value semigroupInstance is not applicable here because it comes after the application point and it lacks an explicit result type
[error] x.resources |+| y.resource
So, my guess was that since I'm defining the instance for Semigroup the Scala compiler cannot derive an instance for Semigroup of Map[String, ResourceTree]. This seems to be confirmed, since the following instance is compiles:
implicit val semigroupInstance = new Semigroup[ResourceTree] {
override def combine(x: ResourceTree, y: ResourceTree): ResourceTree = {
dummyCombine(x, y)
}
}
// FIXME: see if there's a better way to avoid the "no instance of Semigroup" problem
def dummyCombine(x: ResourceTree, y: ResourceTree): ResourceTree = {
ResourceTree(
x.resources |+| y.resources
)
}
I'm really hoping I'm wrong because if this is the right way of defining an instance for a Semigroup in Scala I'll start considering the idea of giving up doing FP in this language.
Is there a better way?
The following should work just fine:
import cats.Semigroup
import cats.instances.map._
import cats.syntax.semigroup._
case class ResourceTree(resources: Map[String, ResourceTree])
implicit val resourceTreeSemigroup: Semigroup[ResourceTree] =
new Semigroup[ResourceTree] {
def combine(x: ResourceTree, y: ResourceTree): ResourceTree =
ResourceTree(
x.resources |+| y.resources
)
}
The key is this part of the error message: "and it lacks an explicit result type". Recursive methods in Scala must have explicit return types, and similarly type class instances that depend on themselves (either directly or indirectly through something like the Map instance and |+| syntax in this case) also need them.
In general it's a good idea to put explicit return types on all implicit definitions—not doing so can lead to unexpected behavior, some of which makes sense if you think about it and read the spec (as in this case), and some of which just seems to be bugginess in the compiler.

Why do we have to explicitly import implicit conversions having implicit parameters from companion objects? Strange.

Let's consider this code:
class A
object A{
implicit def A2Int(implicit a:A)=1
implicit def A2String(a:A)="Hello"
}
object Run extends App{
implicit val a: A =new A
import A.A2Int
// without this import this code does not compile, why ?
// why is no import needed for A2String then ?
def iWantInt(implicit i:Int)=println(i)
def iWantString(implicit s:String)=println(s)
iWantInt
iWantString(a)
}
It runs and prints:
1
Hello
Now, if we comment out the line
import A.A2Int
then we get a compilation error:
With the line commented out, why Scala cannot find A.A2String if it can find A.A2Int ?
How can this problem be fixed ?
Thanks for reading.
The difference is that when you do iWantString(a), the compiler gets some starting point to work from: you are explicitly passing a, which the compiler knows is of type A.
Given that iWantString takes a String and not an A, the compiler will search for an implicit conversion from A to String in order to insert it and make the call succeed.
Implicit lookup rules state that the compiler must look (among other places) in the companion object of class A because type A is the source type of the conversion.
This is where it finds the implicit conversion A2String.
What you must get from this is that it is only because you passed an instance of A that the compiler knew to look for implicit conversions into the companion object of A.
When you just do iWantInt, the compiler has no reason to look into A, so it won't find your method A2Int (and as no other method/value in scope provides an implicit value of type Int, compilation then fails).
For more information about implicit lookup rules, see the
See the scala specification at http://www.scala-lang.org/docu/files/ScalaReference.pdf (chapter 7.2). Here's the most relevant excerpt:
The implicit scope of a type T consists of all companion modules (§5.4) of classes
that are associated with the implicit parameter’s type.

Can structural typing work with generics?

I have an interface defined using a structural type like this:
trait Foo {
def collection: {
def apply(a: Int) : String
def values() : collection.Iterable[String]
}
}
}
I wanted to have one of the implementers of this interface do so using a standard mutable HashMap:
class Bar {
val collection: HashMap[Int, String] = HashMap[Int, String]()
}
It compiles, but at runtime I get a NoSuchMethod exception when referring a Bar instance through a Foo typed variable. Dumping out the object's methods via reflection I see that the HashMap's apply method takes an Object due to type erasure, and there's some crazily renamed generated apply method that does take an int. Is there a way to make generics work with structural types? Note in this particular case I was able to solve my problem using an actual trait instead of a structural type and that is overall much cleaner.
Short answer is that the apply method parameter is causing you grief because it requires some implicit conversions of the parameter (Int => Integer). Implicits are resolved at compile time, the NoSuchMethodException is likely a result of these missing implicits.
Attempt to use the values method and it should work since there are no implicits being used.
I've attempted to find a way to make this example work but have had no success so far.