Scala convert Map$ to Map - scala

I have an exception:
java.lang.ClassCastException: scala.collection.immutable.Map$ cannot
be cast to scala.collection.immutable.Map
which i'm getting in this part of code:
val iterator = new CsvMapper()
.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
.readerFor(Map.getClass).`with`(CsvSchema.emptySchema().withHeader()).readValues(reader)
while (iterator.hasNext) {
println(iterator.next.asInstanceOf[Map[String, String]])
}
So, are there any options to avoid this issue, because this:
val iterator = new CsvMapper()
.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
.readerFor(Map[String,String].getClass).`with`(CsvSchema.emptySchema().withHeader()).readValues(reader)
doesn't help, because I get
[error] Unapplied methods are only converted to functions when a function type is expected.
[error] You can make this conversion explicit by writing `apply _` or `apply(_)` instead of `apply`.
Thanks in advance

As has been pointed out in the earlier comments, in general you need classOf[X[_,_]] rather than X.getClass or X[A, B].getClass for a class that takes two generic types. (instance.getClass retrieves the class of the associated instance; classOf[X] does the same for some type X when an instance isn't available. Since Map is an object and objects are also instances, it retrieves the class type of the object Map - the Map trait's companion.)
However, a second problem here is that scala.collection.immutable.Map is abstract (it's actually a trait), and so it cannot be instantiated as-is. (If you look at the type of Scala Map instances created via the companion's apply method, you'll see that they're actually instances of classes such as Map.EmptyMap or Map.Map1, etc.) As a consequence, that's why your modified code still produced an error.
However, the ultimate problem here is that you required - as you mentioned - a Java java.util.Map and not a Scala scala.collections.immutable.Map (which is what you'll get by default it you just type Map in a Scala program). Just one more thing to watch out for when converting Java code examples to Scala. ;-)

Related

Creating Spark Dataframes from regular classes

I have always seen that, when we are using a map function, we can create a dataframe from rdd using case class like below:-
case class filematches(
row_num:Long,
matches:Long,
non_matches:Long,
non_match_column_desc:Array[String]
)
newrdd1.map(x=> filematches(x._1,x._2,x._3,x._4)).toDF()
This works great as we all know!!
I was wondering , why we specifically need case classes here?
We should be able to achieve same effect using normal classes with parameterized constructors (as they will be vals and not private):-
class filematches1(
val row_num:Long,
val matches:Long,
val non_matches:Long,
val non_match_column_desc:Array[String]
)
newrdd1.map(x=> new filematches1(x._1,x._2,x._3,x._4)).toDF
Here , I am using new keyword to instantiate the class.
Running above has given me the error:-
error: value toDF is not a member of org.apache.spark.rdd.RDD[filematches1]
I am sure I am missing some key concept on case classes vs regular classes here but not able to find it yet.
To resolve error of
value toDF is not a member of org.apache.spark.rdd.RDD[...]
You should move your case class definition out of function where you are using it. You can refer http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-Scala-Error-value-toDF-is-not-a-member-of-org-apache/td-p/29878 for mode detail.
On your Other query - case classes are syntactic sugar and they provide following additional things
Case classes are different from general classes. They are specially used when creating immutable objects.
They have default apply function which is used as constructor to create object. (so Lesser code)
All the variables in case class are by default val type. Hence immutable. which is a good thing in spark world as all red are immutable
example for case class is
case class Book( name : string)
val book1 = Book("test")
you cannot change value of book1.name as it is immutable. and you do not need to say new Book() to create object here.
The class variables are public by default. so you don't need setter and getters.
Moreover while comparing two objects of case classes, their structure is compared instead of references.
Edit : Spark Uses Following class to Infer Schema
Code Link :
https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
If you check. in schemaFor function (Line 719 to 791). It converts Scala types to catalyst types. I this the case to handle non case classes for schema inference is not added yet. so the every time you try to use non case class with infer schema. It goes to other option and hence gives error of Schema for type $other is not supported.
Hope this helps

why redefine type variable in Predef.scala?

Why are these type aliases and vals introduced in Predef?
Things that are in Predef are automatically imported. In Scala, you can write
val mySet : Set[String] = Set( "cat", "dog", "poop" )
without having to first write
import scala.collection.immutable.Set
Note that the declaration I wrote above might have been written equivalently as
val mySet : Set[String] = Set.apply( "cat", "dog, "poop" )
On the right-hand side of the equals sign, the word Set refers to a singleton object. We can only call methods on objects (whether singletons or instances of classes). We can't call methods on types. Somehow, we must have autoimported the name of an object Set. This is what the Predef declaration
val Set = immutable.Set
does.
If you are a Java programmer, you can think of a val declaration that just points to an object that would otherwise require the use of an import or a much longer name as being the equivalent of import static.
The Set[String] after the colon and before the equals sign is a type annotation. In this context Set[String] is a type. An object is not a type. If all we had declared in Predef were the val, our declaration could not compile. We would have been saying that mySet's type is a some particular object, but a type is very different from an object, a type is a description of a category to which an object may belong.
To let Set also serve as a type, we need the type alias
type Set[A] = immutable.Set[A]
If you are a java programmer, this functions similarly to a nonstatic import of a type name. (Scala has imports as well, but they must be scoped to a specific block or file. type aliases in Predef are available to all Scala files, so it does much more than an import of the type would do.)
The package scala.collection.immutable contains both an object, declared as
object Set{ ... }
and a type, declared as
trait Set[A]{ ... }
If we want to be able to use both the object (with its useful factory methods) and the type (the compile-time description of the objects our factory creates), we need to make both of them, object and type, available to our code. The two lines (for Set) that you quote above do precisely that. (And the other two lines do precisely the same for Map.)
These types and vals just make importing unnecessary, I believe.
The vals are assigning the companion objects into the vals - again, getting rid of an import.
I suspect they are done in this odd way to avoid a circular compile-time dependency between Predef.scala and Map/Set.scala, but I'm only guessing.
The whole point of Predef is to avoid explicit qualifications and import statements. From the docs:
The Predef object provides definitions that are accessible in all
Scala compilation units without explicit qualification.
Predef provides type aliases for types which are commonly used, such as the immutable collection types scala.collection.immutable.Map,
scala.collection.immutable.Set, and the
scala.collection.immutable.List constructors
(scala.collection.immutable.:: and scala.collection.immutable.Nil).
This way, you can refer to Map without having to explicitly qualify or import scala.collection.immutable.Map.

Reference a java nested class in Spark Scala

I'm trying to read some data from hadoop into an RDD in Spark using the interactive Scala shell but I'm having trouble accessing some of the classes I need to deserialise the data.
I start by importing the necessary class
import com.example.ClassA
Which works fine. ClassA is located in a jar in the 'jars' path and has ClassB as a public static nested class
I'm then trying to use ClassB like so:
val rawData = sc.newAPIHadoopFile(dataPath, classOf[com.exmple.mapreduce.input.Format[com.example.ClassA$ClassB]], classOf[org.apache.hadoop.io.LongWritable], classOf[com.example.ClassA$ClassB])
This is slightly complicated by one of the other classes taking ClassB as a type, but I think that should be fine.
When I execute this line, I get the following error:
<console>:17: error: type ClassA$ClassB is not a member of package com.example
I have also tried using the import statement
import com.example.ClassA$ClassB
and it also seems fine with that.
Any advice as to how I could proceed to debug this would be appreciated
Thanks for reading.
update:
Changing the '$' to a '.' to reference the nested class seems to get past this problem, although I then got the following syntax error:
'<console>:17: error: inferred type arguments [org.apache.hadoop.io.LongWritable,com.example.ClassA.ClassB,com.example.mapredu‌​ce.input.Format[com.example.ClassA.ClassB]] do not conform to method newAPIHadoopFile's type parameter bounds [K,V,F <: org.apache.hadoop.mapreduce.InputFormat[K,V]]
Notice the types that the newAPIHadoopFile expects:
K,V,F <: org.apache.hadoop.mapreduce.InputFormat[K,V]
the important part here is that the generic type InputFormat expects the types K and V, i.e. the exact types of the first two parameters to the method.
In your case, the third parameter should be of type
F <: org.apache.hadoop.mapreduce.InputFormat[LongWritable, ClassA.ClassB]
does your class extend FileInputFormat<LongWritable, V>?

Why does Scala complain about illegal inheritance when there are raw types in the class hierarchy?

I'm writing a wrapper that takes a Scala ObservableBuffer and fires events compatible with the Eclipse/JFace Databinding framework.
In the Databinding framework, there is an abstract ObservableList that decorates a normal Java list. I wanted to reuse this base class, but even this simple code fails:
val list = new java.util.ArrayList[Int]
val obsList = new ObservableList(list, null) {}
with errors:
illegal inheritance; anonymous class $anon inherits different type instances of trait Collection: java.util.Collection[E] and java.util.Collection[E]
illegal inheritance; anonymous class $anon inherits different type instances of trait Iterable: java.lang.Iterable[E] and java.lang.Iterable[E]
Why? Does it have to do with raw types? ObservableList implements IObservableList, which extends the raw type java.util.List. Is this expected behavior, and how can I work around it?
Having a Java raw type in the inheritance hierarchy causes this kind of problem. One solution is to write a tiny bit of Java to fix up the raw type as in the answer for Scala class cant override compare method from Java Interface which extends java.util.comparator
For more about why raw types are problematic for scala see this bug http://lampsvn.epfl.ch/trac/scala/ticket/1737 . That bug has a workaround using existential types that probably won't work for this particular case, at least not without a lot of casting, because the java.util.List type parameter is in both co and contra variant positions.
From looking at the javadoc the argument of the constructor isn't parameterized.
I'd try this:
val list = new java.util.ArrayList[_]
val obsList = new ObservableList(list, null) {}

Compile error in scala, why: val num =123;println(num.getClass())

I'm new to scala. I tried this code:
val name = "mike"
println(name.getClass())
It's OK and printed java.lang.String
But, when I try:
val num = 123
println(num.getClass())
There is such a compiler error:
type mismatch; found : Int required: ?{val getClass: ?} Note: primitive types are not implicitly
converted to AnyRef. You can safely force boxing by casting x.asInstanceOf[AnyRef].
I remember scala said "Everything is object in scala", why can't I invoke num.getClass()? And how to fix it?
Yep, everything is an object, but not necessary an instance of a java class/something with a getClass() method :)
Java primitive values (and Unit) are AnyVals in scala (instances of so called value classes), and - whenever it's possible - they are compiled to Java primitives at the end. When it's not possible boxing is done (similar to auto boxing in Java). But - as the error reports - boxing did not happen ("implicitly") in your case. Value classes don't have a getClass() method -> compilation error.
Java classes are AnyRefs (an instance of a reference class = a class instance in Java). getClass will work fine on them: AnyRef is practically the same as java.lang.Object -> it also has a getClass() method that you can call.
As the error recommends you can force the boxing, then getClass() will work on it:
num.asInstanceOf[AnyRef].getClass
will print
class java.lang.Integer
If you want to avoid boxing (e.g. you want to differentiate between primitive and boxed values) have a look at HowTo get the class of _ :Any
The getClass method is only available for reference classes (i.e. scala.AnyRef). 123 is member of a value class (i.e. scala.Any) and thus does not have a getClass method.
See http://www.scala-lang.org/node/128 for the Scala object hierarchy. And www.scala-lang.org/docu/files/api/scala/AnyRef.html for AnyRef.
Everything is object doesn't mean every object has a method getClass.
As the compiler says, 123.asInstanceOf[AnyRef].getClass would work.