val finalRDD = joinedRDD.map(x => {
val d1 = x._2._1
val d2 = x._2._2
(x._1, d1 + d2)
})
In the above code, joinedRDD has type RDD[(Row, (Double, Double))] (according to IntelliJ) while Scala compiler says d1 & d2 are AnyVal.
For time being, I cast d1 & d2 as Double using asInstanceOf but next time it says
java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Double
Is it Scala compiler issue or IntelliJ issue which shows me wrong inferred types. Any insights?
Seems good to me :-S
Type inference is far from omniscient. Sometimes you need to specify the types explicitly. In my experience, this is especially true when the result type can be anything. Some things to try:
My preferred option since you are not touching the key: joinedRDD.mapValues(x => x._1 + x._2)
Add some type information: val d1: Double = x._2._1. With some luck, at least the compiler might be more explicit.
Define your function separately, assigning types to the parameters, and use if inside: map(myFunc)
Also, I've seen some differences between IntelliJ Scala Plugin and the actual Scala compiler. Given the errors you are getting and the fact that AnyVal is the common parent class for both Int and Double, there is a good chance you don't have doubles to begin with (and the compiler is trying to find a shared parent). Do double check that you are getting the type you mention by putting it explicitly. It is very possible that your type confusion occurs before this line.
Good luck!
Well, I tried in IntelliJ IDEA 14 and the type inference is correct, recognizing d1 and d2 as Double (this was expected). Nonetheless, I usually avoid the type-aware highlighting feature of IDEA since many times it goes crazy and reports fake results.
As a side note, since you are not changing the key of your RDD, consider using mapValues instead of map (this provides clarity, as well as performance since it would take advantage of the partitioner of the input RDD and reuse it in the output RDD).
Related
I would like to do dynamic casting for a scala variable, the casting type is stored in a different variable or in the database or provided by the user.
I am new to Scala and have mainly done coding on python. Here we are trying to take Any type input and query the type of the variable as per the type saved in DB eg.: "String/Int" and user-defined classes and cast them before any future processing.
in python:
eval("str('123')")
in scala I have tried
var a = 123.05
a.asInstanceOf['Int']
gives me error
error: identifier expected but string literal found.
And I want the code to be something as follows:
var a = 123.05
var b = "Int"
a.asInstanceOf[b]
gives me error
error: not found: type b
Ok so I have to be fairly exhaustive here because the stuff you're trying to do is tricky due to the static vs dynamic typing difference of scala and python.
First what you're doing in Python is not type casting but rather a conversion.
str(12)
in python takes the integer value 12 and converts it to a (UTF-8 ? dunno in python) string. The same thing in scala would be
12.toString
typecasting in the meantime is basically a pinky promise to the compilers typechecker that you know more than it and it should just believe you. It also should be avoided like the pest because you basically drop all the safety that static typechecking gives you. However there are certain cases where it is unavoidable
in a bit more fundamental terms. lets say you have the following scala snippet
sealed trait Foo
final case class Bar(i:Int) extends Foo
final case class Baz(s:String) extends Foo
val f:Foo = Bar(2)
//won't work because the compiler thinks f is of type Foo due to the type ascription bove
println(f.i)
//will work
println(f.asInstanceOf[Bar].i)
the Type Foo can be either Bar or Baz (because of the sealed and final, this is called an ADT) but we specifically told the compiler to treat f as a Foo which means we forgot which specific type it was. this is the case where you could use typecasting, note that we don't convert but rather tell the compiler that it is actually a Bar
Now this all happens during compile time which means you can't cast according to a runtime value like this
var a = 123.05
var b = "Int"
a.asInstanceOf[b]
Now as I understand you have some sort of stringly typed input and need to convert it according to some schema. The scalafiddle below has an example how you could do this: https://scalafiddle.io/sf/i97WZlA/0
However note that this makes use of some fairly advanced concepts in scala to ensure the types line up.
This will also work https://scalafiddle.io/sf/i97WZlA/1 but note that we lose all the type information requiring us to do a typecast if we want to do anything meaningful with out
EDIT/ADDENDUM: I thought I should also do an example on how to consume such values and make the schema dynamic.
https://scalafiddle.io/sf/i97WZlA/2
As a final note be warned that this is getting close to the limits what the compiler can do and necessitates a lot of boxing and unboxing to carry along the type information (in SchemaValue) also it's not stacksafe and has lackluster errorhandling. this solution would require some serious engineering to make viable but it should get the idea across
If you have a String value, then you can use toInt or toDouble to parse that string:
val s = "123.05"
val i = s.toInt
val d = s.toDouble
If you have an Any value then it is best to use match to convert it:
val a: Any = ...
a match {
case i: Int => // It is an Int
case d: Double => // It is a Double
case _ => // It is a different type
}
If all else fails you can explicitly pick a type, but this is not good practice:
val i = a.asInstanceOf[Int]
Any decent database framework will give you the ability to read and write values of specific types, so this should not be a problem with the right library.
I am new in Scala world and now I am reading the book called "Scala in Action" (by Nilanjan Raychaudhuri), namely the part called "Mutable object need to be invariant" on page 97 and I don't understand the following part which is taken directly from the mentioned book.
Assume ListBuffer is covariant and the following code snippet works without any compilation problem:
scala> val mxs: ListBuffer[String] = ListBuffer("pants")
mxs: scala.collection.mutable.ListBuffer[String] =
ListBuffer(pants)
scala> val everything: ListBuffer[Any] = mxs
scala> everything += 1
res4: everything.type = ListBuffer(1, pants)
Can you spot the problem? Because everything is of the type Any, you can store an
integer value into a collection of strings. This is a disaster waiting to happen. To avoid these kinds of problems, it’s always a
good idea to make mutable objects invariant.
I would have the following questions..
1) What type of everything is in reality? String or Any? The declaration is "val everything: ListBuffer[Any]" and hence I would expect Any and because everything should be type of Any then I don't see any problems to have Integer and String in one ListBuffer[Any]. How can I store integer value into collection of strings how they write??? Why disaster??? Why should I use List (which is immutable) instead of ListBuffer (which is mutable)? I see no difference. I found a lot of answers that mutably collections should have type invariant and that immutable collections should have covariant type but why?
2) What does the last part "res4: everything.type = ListBuffer(1, pants)" mean? What does "everything.type" mean? I guess that everything does not have any method/function or variable called type.. Why is there no ListBuffer[Any] or ListBuffer[String]?
Thanks a lot,
Andrew
1 This doesn't look like a single question, so I have to subdivide it further:
"In reality" everything is ListBuffer[_], with erased parameter type. Depending on the JVM, it holds either 32 or 64 bit references to some objects. The types ListBuffer[String] and ListBuffer[Any] is what the compiler knows about it at compile time. If it "knows" two contradictory things, then it's obviously very bad.
"I don't see any problems to have Integer and String in
one ListBuffer[Any]". There is no problem to have Int and String in ListBuffer[Any], because ListBuffer is invariant. However, in your hypothetical example, ListBuffer is covariant, so you are storing an Int in a ListBuffer[String]. If someone later gets an Int from a ListBuffer[String], and tries to interpret it as String, then it's obviously very bad.
"How can I store integer value into collection
of strings how they write?" Why would you want to do something that is obviously very bad, as explained above?
"Why disaster???" It wouldn't be a major disaster. Java has been living with covariant arrays forever. It's does not lead to cataclysms, it's just bad and annoying.
"Why should I use List (which is immutable) instead of ListBuffer (which is mutable)?" There is no absolute imperative that tells you to always use List and to never use ListBuffer. Use both when it is appropriate. In 99.999% of cases, List is of course more appropriate, because you use Lists to represent data way more often than you design complicated algorithms that require local mutable state of a ListBuffer.
"I found a lot of answers that mutably collections
should have type invariant and that immutable collections should
have covariant type but why?". This is wrong, you are over-simplifying. For example, intensional immutable sets should be neither covariant, nor invariant, but contravariant. You should use covariance, contravariance, and invariance when it's appropriate. This little silly illustration has proven unreasonably effective for explaining the difference, maybe you too find it useful.
2 This is a singleton type, just like in the following example:
scala> val x = "hello"
x: String = hello
scala> val y: x.type = x
y: x.type = hello
Here is a longer discussion about the motivation for this.
I agree with most of what #Andrey is saying I would just add that covariance and contravariance belong exclusively to immutable structures, the exercisce that the books proposes is just a example so people can understand but it is not possible to implement a mutable structure that is covariant, you won't be able to make it compile.
As an exercise you could try to implement a MutableList[+A], you'll find out that there is not way to do this without tricking the compiler putting asInstanceOf everywhere
I'm working on some fairly generic code (yet another CSV file reader) and I've run into some trouble with contravariance. I've stripped the code down to something that I think demonstrates the problem here (I copied this from my Scala worksheet):
def f1(x: Int)(s: String) = Try(x+s.toInt)
val f1a: Int=>String=>Try[Int] = f1 _
val r1 = f1a(3)("2")
def f2(x: String)(s: String) = Try(x.toInt+s.toInt)
val f2a: String=>String=>Try[Int] = f2 _
val r2 = f2a("3")("2")
val fs: Seq[String with Int=>String=>Try[Int]] = Seq(f1a, f2a)
val xs: Seq[Any] = Seq(3,"3")
val r3 = for {(f,x) <- fs zip xs} yield f(x)("2")
Last line edited to avoid irrelevant issues after #Lee and #badcook commented.
I've tried to simplify things for the reader by providing the types for the vals although of course that's not required by the compiler. Each of the expressions r1 and r2 evaluates to Success(5) as expected. The expression for r3 does not compile. The error is:
type mismatch; found : x.type (with underlying type Any) required: String with Int
The problem essentially lies with the type of x, the parameter of f(x) in the for-comprehension. This is a contravariant position (the argument to a function) but xs and fs are covariant (the element type of a Seq). Thus x has type Any, but f requires a type String with Int.
This is an uninhabited compound type and I have not been able to find a clean way to solve this problem. I can cast the values f1a and f2a to Any=>Int=>String but that's not the way we like to do things in Scala!
Based on what I assume you are trying to do, the Scala compiler is saving you from one mistake. Your for comprehension is a Cartesian product of your elements and your functions. This means your function expecting an Int is going to get called once with a String and your function expecting a String is going to get called once with an Int.
What you probably want is to zip your two Seqs together and then map over the resulting pairs, applying each function to its paired element. Unfortunately, as you've noted, contravariance will prevent this from typechecking. This is fundamentally because all elements of a Seq must be the same type, whereas you want to preserve the fact that your elements have different types and that those types match the different types of your functions.
In short, if you want to go down this path, you're looking for heterogeneous lists, also known as HLists, rather than ordinary Scala collections. shapeless has an implementation of this and provides map and zip methods to go along with it. There's a bit of a learning curve there (I would recommend getting familiar with higher-rank types and how shapeless implements them) and HLists are a bit unwieldy to use compared to ordinary collections, but this will solve this particular kind of problem.
I'm integrating with ZeroMQ and Akka to send case classes from different instances. Problem is that when I try to compile this code:
def persistRelay(relayEvent:String, relayData:Any) = {
val relayData = ser.serialize(relayData).fold(throw _, identity)
relayPubSocket ! ZMQMessage(Seq(Frame(relayEvent), Frame(relayData)))
}
The Scala compilers throws back recursive value relayData needs type.
The case classes going in are all different and look like Team(users:List[Long], teamId:Long) and so on.
Is there a method to allow any type in the serializer or a workaround? I'd prefer to avoid writing a serializer for every single function creating the data unless absolutely necessary.
Thanks!
This isn't really a typing issue. The problem is:
val relayData = ser.serialize(relayData).fold(throw _, identity)
You're declaring a val relayData in the same line that you're making a reference to the method parameter relayData. The Scala compiler doesn't understand that you have/want two variables with the same name, and, instead, interprets it as a recursive definition of val relayData. Changing the name of one of those variables should fix the error.
Regardless, since you didn't quite follow what the Scala compiler was asking for, I think that it would also be good to fill you in on what it is that the compiler even wanted from you (even though it's advice that, if followed, probably would have just led to you getting yet another error that wouldn't seem to make a lot of sense, given the circumstances).
It said "recursive value relayData needs type". The meaning of this is that it wanted you to simply specify the type of relayData by having
val relayData = ...
become something like
val relayData: Serializable = ...
(or, in place of Serializable, use whatever type it was that you wanted relayData to have)
It needs this information in order to create a recursive definition. For instance, take the simple case of
val x = x + 1
This code is... bizarre, to say the least, but what I'm doing is defining x in a (shallowly) recursive way. But there's a problem: how can the compiler know what type to use for the inner x? It can't really determine the type through type inference, because type inference involves leveraging the type information of other definitions, and this definition requires x's type information. Now, we might be able to infer that I'm probably talking about an Int, but, theoretically, x could be so many things! In fact, here's the ambiguity in action:
val x: Int = x + 1 // Default value for an Int is '0'
x: Int = 1
val y: String = y + 1 // Default value for a String is 'null'
y: String = null1
All that really changed was the type annotation, but the results are drastically different–and this is only a very simple case! So, yeah, to summarize all this... in most cases, when it's complaining about recursive values needing types, you should just have some empathy on the poor compiler and give it the type information that it so direly craves. It would do the same for you, DeLongey! It would do the same for you!
I'm porting some java code across and have the following
val overnightChanges: java.util.Hashtable[String, Double] = ...
When I try
if (null != overnightChanges.get(...))
I get the following warning
warning: comparing values of types Null and Double using `!=' will always yield true
Primitive and reference types are much less different in scala than they are in java, and so the convention is that name starts with an uppercase for all of them. Double is scala.Double which is the primitive java double, not the reference java.lang.Double.
When you need "a double or no value" in scala, you would use Option[Double] most of the time. Option has strong library support, and the type system will not let you ignore that there might be no value. However, when you need to interact closely with java, as in your example, your table does contain java.lang.Double and you should say it so.
val a = new java.util.HashMap[String, java.lang.Double]
If java.lang.Double starts to appear everywhere in your code, you can alias to JDouble, either by importing
import java.lang.{Double => JDouble}
or by defining
type JDouble = java.lang.Double
There are implicit conversions between scala.Double and java.lang.Double, so interaction should be reasonably smooth. However, java.lang.Double should probably be confined to the scala/java interaction layer, it would be confusing to have it go deep into scala code.
In Scala Double are primitives and thus cannot be null. That's annoying when using directly java maps, because when a key is not defined, you get the default primitive value, (here 0.0):
scala> val a = new java.util.Hashtable[String,Double]()
a: java.util.Hashtable[String,Double] = {}
scala> a.get("Foo")
res9: Double = 0.0
If the value is a object like String or List, your code should work as expected.
So, to solve the problem, you can:
Use contains in an outer if condition.
Use one of the Scala maps (many conversions are defined in scala.collection.JavaConversions)
Use Scala "options", also known as "maybe" in Haskell:
http://blog.danielwellman.com/2008/03/using-scalas-op.html