I have the snippet below from a code where rdd is RDD[(String,Vector)], but unfortunately my Scala compiler is complaining with the error Type mismatch, expected: RDD[(String,Vector)], actual: RDD[(String,Vector)] where I call flagVectorOutlier(rdd, predictedRDD)
def someFunction() {
testData.foreachRDD( rdd => {
val vectorsRDD = rdd.map( pair => pair._2 )
val predictedRDD = model.latestModel().predict( vectorsRDD )
flagVectorOutlier( rdd, predictedRDD )
} )
ssc.start()
ssc.awaitTermination()
}
def flagVectorOutlier(testVectors: RDD[(String, Vector)], predicts: RDD[Int]): Unit = {
}
Considering the actual and expected types are the same, what is the wrong point here? How could I solve this issue?
I've got this kind of error before.
It happened when I was using a java library which would use java.util.List, while my own code was using scala.collection.immutable.List and I mixed them up.
You can have two classes with the same name coexist in code easily, but the error message will not display fully-qualified names, so I would get Type mismatch, expected: List[Integer], actual: List[Integer] which seems puzzling. The solution is just to fully qualify the types of your input parameters or use typeDefs to distinguish them.
Bonus: I've also had a similar problem with tuples. For instance, when a method can either expect tuples or direct params:
def f(param: (Integer, String)) versus def f(param1: Integer, param2: String)
When calling the method with the wrong parameters (say, 2 params instead of a tuple), on a cursory inspection the error message can seem to show two identical types. This gets worse if you have several nested tuples (thus too many parentheses but the same types).
Related
In the following piece of code, entities is a Map[String, Seq[String]] object that I receive from some other piece of code. The goal is to map the entities object into a two column Spark DataFrame; but, before I get there, I found some very unusual results.
val data: Map[String, Seq[String]] = Map("idtag" -> Seq("things", "associated", "with", "id"))
println(data)
println(data.toSeq)
data.toSeq.foreach{println}
data.toSeq.map{case(id: String, names: Seq[String]) => names}.foreach{println}
val eSeq: Seq[(String, Seq[String])] = entities.toSeq
println(eSeq.head)
println(eSeq.head.getClass)
println(eSeq.head._1.getClass)
println(eSeq.head._2.getClass)
eSeq.map{case(id: String, names: Seq[String]) => names}.foreach{println}
The output of the above on the console is:
Map(idtag -> List(things, associated, with, id))
ArrayBuffer((idtag,List(things, associated, with, id)))
(idtag,List(things, associated, with, id))
List(things, associated, with, id)
(0CY4NZ-E,["MEC", "Marriott-MEC", "Media IQ - Kimberly Clark c/o Mindshare", "Mindshare", "WPP", "WPP Plc", "Wavemaker Global", "Wavemaker Global Ltd"])
class scala.Tuple2
class java.lang.String
class java.lang.String
Exception in thread "main" java.lang.ClassCastException: java.lang.String cannot be cast to scala.collection.Seq
at package.EntityList$$anonfun$toStorage$4.apply(EntityList.scala:31)
The data object that I hardcoded acts as expected. The .toSeq function on the entities map produces a Seq (implemented as an ArrayBuffer) of tuples; and these tuples can be processed through mapping.
But using the entities object, you can see that when I take the first element using .head and it is a Tuple2[String, String]. How can that possibly happen? How does the second element of the tuple turn into a String and cause the exception?
Further confusing me, if the last line is changed to reflect the Tuple2[String, String]:
eSeq.map{case(id: String, names: String) => names}.foreach{println}
then we get a compile error:
/path/to/repo/src/main/scala/package/EntityList.scala:31: error: pattern type is incompatible with expected type;
found : String
required: Seq[String]
eSeq.map{case(id: String, names: String) => names}.foreach{println}
I can't replicate this odd behavior with a Map[String, Seq[String]] that I create myself, as you can see in this code. Can anyone explain this behavior and why it happens?
The problem appears to be that entities.toSeq is lying about the type of the data that it is returning, so I would look at "some other piece of code" and check it is doing the right thing.
Specifically, it claims to return Seq[(String, Seq[String])] and the compiler believes it. But getClass shows that the second object in the tuple is actually java.lang.String not Seq[String].
If this were correct then the match statement would be using unapply to extract the value and then getting an error when it tried to convert names to the stated type.
I note that the string appears to be a list of strings enclosed in [ ], so it seems possible that whatever is creating entities is failing to parse this into a Seq but claiming that it has succeeded.
SBT is throwing the following error:
value split is not a member of (String, String)
[error] .filter(arg => arg.split(delimiter).length >= 2)
For the following code block:
implicit def argsToMap(args: Array[String]): Map[String, String] = {
val delimiter = "="
args
.filter(arg => arg.split(delimiter).length >= 2)
.map(arg => arg.split(delimiter)(0) -> arg.split(delimiter)(1))
.toMap
}
Can anyone explain what might be going on here?
Some details:
java version "1.8.0_191"
sbt version 1.2.7
scala version 2.11.8
I've tried both on the command line and also with intellij. I've also tried Java 11 and Scala 2.11.12 to no avail.
I'm not able to replicate this on another machine (different OS, SBT, IntelliJ, etc. though) and I can also write a minimal failing case:
value split is not a member of (String, String)
[error] Array("a", "b").map(x => x.split("y"))
The issue is that the filter method is added to arrays via an implicit.
When you call args.filter(...), args is converted to ArrayOps via the Predef.refArrayOps implicit method.
You are defining a implicit conversion from Array[String] to Map[(String, String)].
This implicit has higher priority than Predef.refArrayOps and is therefore used instead.
So args is converted into a Map[(String, String)]. The filter method of that Map would expect a function of type (String, String) => Boolean as parameter.
I believe what happened is that the implicit method is getting invoked a bit too eagerly. That is, the Tuple2 that's seemingly coming out of nowhere is the result of the implicit function converting each String into a key/value pair. The implicit function was recursively calling itself. I found this out after eventually getting a stack overflow with some other code that was manipulating a collection of Strings.
I've looked around and found several other examples of this, but I don't really understand from those answers what's actually going on.
I'd like to understand why the following code fails to compile:
val df = readFiles(sqlContext).
withColumn("timestamp", udf(UDFs.parseDate _)($"timestamp"))
Giving the error:
Error:(29, 58) not enough arguments for method udf: (implicit evidence$2: reflect.runtime.universe.TypeTag[java.sql.Date], implicit evidence$3: reflect.runtime.universe.TypeTag[String])org.apache.spark.sql.UserDefinedFunction.
Unspecified value parameter evidence$3.
withColumn("timestamp", udf(UDFs.parseDate _)($"timestamp")).
^
Whereas this code does compile:
val parseDate = udf(UDFs.parseDate _)
val df = readFiles(sqlContext).
withColumn("timestamp", parseDate($"timestamp"))
Obviously I've found a "workaround" but I'd really like to understand:
What this error really means. The info I have found on TypeTags and ClassTags has been really difficult to understand. I don't come from a Java background, which perhaps doesn't help, but I think I should be able to grasp it…
If I can achieve what I want without a separate function definition
The error message is a bit mis-leading indeed; the reason for it is that the function udf takes an implicit parameter list but you are passing an actual parameter. Since I don't know much about spark and since the udf signature is a bit convoluted I'll try to explain what is going on with a simplified example.
In practice udf is a function that given some explicit parameters and an implicit parameter list gives you another function; let's define the following function that given a pivot of type T for which we have an implicit Ordering will give as a function that allows us to split a sequence in two, one containing elements smaller than pivot and the other containing elements that are bigger:
def buildFn[T](pivot: T)(implicit ev: Ordering[T]): Seq[T] => (Seq[T], Seq[T]) = ???
Let's leave out the implementation as it's not important. Now, if I do the following:
val elements: Seq[Int] = ???
val (small, big) = buildFn(10)(elements)
I will make the same kind of mistake that you are showing in your code, i.e. the compiler will think that I am explicitly passing elements as the implicit parameter list and this won't compile. The error message of my example will be somewhat different from the one you have because in my case the number of parameters I am mistakenly passing for the implicit parameter list matches the expected one and then the error will be about types not lining up.
Instead, if I write it as:
val elements: Seq[Int] = ???
val fn = buildFn(10)
val (small, big) = fn(elements)
In this case the compiler will correctly pass the implicit parameters to the function. I don't know of any way to circumvent this problem, unless you want to pass the actual implicit parameters explicitly but I find it quite ugly and not always practical; for reference this is what I mean:
val elements: Seq[Int] = ???
val (small, big) = buildFn(10)(implicitly[Ordering[Int]])(elements)
could someone bring more shed on following piece of scala code which is not fully clear to me? I have following function defined
def ids(ids: String*) = {
_builder.ids(ids: _*)
this
}
Then I am trying to pass variable argument list to this function as follows:
def searchIds(kind: KindOfThing, adIds:String*) = {
...
ids(adIds)
}
Firstly, ids(adIds) piece doesn't work which is a bit odd at first as error message says: Type mismatch, expected: String, actual: Seq[String]. This means that variable arguments lists are not typed as collections or sequences.
In order to fix this use the trick ids(adIds: _*).
I am not 100% sure how :_* works, could someone put some shed on it?
If I remember correctly : means that operation is applied to right argument instead to left, _ means "apply" to passed element, ...
I checked String and Sequence scaladoc but wasn't able to find :_* method.
Could someone explain this?
Thx
You should look at your method definitions:
def ids(ids: String*)
Here you're saying that this method takes a variable number of strings, eg:
def ids(id1: String, id2: String, id3: String, ...)
Then the second method:
def searchIds(kind: KindOfThing, adIds:String*)
This also takes a variable number of string, which are packaged into a Seq[String], so adIds is actually a Seq, but your first method ids doesn't take a Seq, it takes N strings, that's why ids(adIds: _*) works.
: _* this is called the splat operator, what that's doing is splatting the Seq into N strings.
I'm new to Scala...
Anyway, I want to do something like:
val bar = new Foo("a" -> List[Int](1), "b" -> List[String]("2"), ...)
bar("a") // gives List[Int] containing 1
bar("b") // gives List[String] containing "2"
The problem when I do:
class Foo(pairs: (String, List[_])*) {
def apply(name: String): List[_] = pairs.toMap(name)
}
pairs is gonna be Array[(String, List[Any]) (or something like that) and apply() is wrong anyway since List[_] is one type instead of "different types". Even if the varargs * returned a tuple I'm still not sure how I'd go about getting bar("a") to return a List[OriginalTypePassedIn]. So is there actually a way of doing this? Scala seems pretty flexible so it feels like there should be some advanced way of doing this.
No.
That's just the nature of static type systems: a method has a fixed return type. It cannot depend on the values of the method's parameters, because the parameters are not known at compile time. Suppose you have bar, which is an instance of Foo, and you don't know anything about how it was instantiated. You call bar("a"). You will get back an instance of the correct type, but since that type isn't determined until runtime, there's no way for a compiler to know it.
Scala does, however, give you a convenient syntax for subtyping Foo:
object bar extends Foo {
val a = List[Int](1)
val b = List[String]("2")
}
This can't be done. Consider this:
val key = readStringFromUser();
val value = bar(key);
what would be the type of value? It would depend on what the user has input. But types are static, they're determined and used at compile time.
So you'll either have to use a fixed number of arguments for which you know their types at compile time, or use a generic vararg and do type casts during runtime.