Using contains in scala - exception - scala

I am encountering this error:
java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to [Ljava.lang.Object;
whenever I try to use "contains" to find if a string is inside an array. Is there a more appropriate way of doing this? Or, am I doing something wrong? (I am fairly new to Scala)
Here is the code:
val matches = Set[JSONObject]()
val config = new SparkConf()
val sc = new SparkContext("local", "SparkExample", config)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val ebay = sqlContext.read.json("/Users/thomassquires/Downloads/products.json")
val catalogue = sqlContext.read.json("/Users/thomassquires/Documents/catalogue2.json")
val eins = ebay.map(item => (item.getAs[String]("ID"), Option(item.getAs[Set[Row]]("itemSpecifics"))))
.filter(item => item._2.isDefined)
.map(item => (item._1 , item._2.get.find(x => x.getAs[String]("k") == "EAN")))
.filter(x => x._2.isDefined)
.map(x => (x._1, x._2.get.getAs[String]("v")))
.collect()
def catEins = catalogue.map(r => (r.getAs[String]("_id"), Option(r.getAs[Array[String]]("item_model_number")))).filter(r => r._2.isDefined).map(r => (r._1, r._2.get)).collect()
def matched = for(ein <- eins) yield (ein._1, catEins.filter(z => z._2.contains(ein._2)))
The exception occurs on the last line. I have tried a few different variants.
My data structure is one List[Tuple2[String, String]] and one List[Tuple2[String, Array[String]]] . I need to find the zero or more matches from the second list that contain the string.
Thanks

Long story short (there is still part that eludes me here*) you're using wrong types. getAs is implemented as fieldIndex (String => Int) followed by get (Int => Any) followed by asInstanceOf.
Since Spark doesn't use Arrays nor Sets but WrappedArray to store array column data, calls like getAs[Array[String]] or getAs[Set[Row]] are not valid. If you want specific types you should use either getAs[Seq[T]] or getAsSeq[T] and convert your data to desired type with toSet / toArray.
* See Why wrapping a generic method call with Option defers ClassCastException?

Related

Converting Iterable[(Double, Double)] to Seq(Seq(Double))

I want to convert Pair RDD "myRDD" values from Iterable[(Double,Double)] to Seq(Seq(Double)), however I am not sure how to do it. I tried the following but it does not work.
val groupedrdd: RDD[BB,Iterable[(Double,Double)]] = RDDofPoints.groupByKey()
val RDDofSeq = groupedrdd.mapValues{case (x,y) => Seq(x,y)}
The myRDD is formed using a groupByKey operation on a RddofPoints with their respective bounding boxes as keys. The BB is a case class and it is the key for a set of points with type (Double,Double). I want the RDDofSeq to have the type RDD[BB,Seq(Seq(Double))], however after groupByKey, myRDD has the type RDD[BB,Iterable[(Double,Double)]].
Here, it gives an error as:
Error:(107, 58) constructor cannot be instantiated to expected type;
found : (T1, T2)
required: Iterable[(Double, Double)]
I am new to Scala, any help in this regard is appreciated. Thanks.
ANSWER : The following is used to accomplish the above goal:
val RDDofSeq = groupedrdd.mapValues{iterable => iterable.toSeq.map{case (x,y) => Seq(x,y)}}
I tried this on Scalafiddle
val myRDD: Iterable[(Double,Double)] = Seq((1.1, 1.2), (2.1, 2.2))
val RDDofSeq = myRDD.map{case (x,y) => Seq(x,y)}
println(RDDofSeq) // returns List(List(1.1, 1.2), List(2.1, 2.2))
The only difference is that I used myRDD.map(.. instead of myRDD.mapValues(..
Make sure that myRDD is really of the type Iterable[(Double,Double)]!
Update after comment:
If I understand you correctly you want a Seq[Double] and not a Seq[Seq[Double]]
That would be this:
val RDDofSeq = myRDD.map{case (k,v) => v} // returns List(1.2, 2.2)
Update after the Type is now clear:
The values are of type Iterable[(Double,Double)] so you cannot match on a pair.
Try this:
val RDDofSeq = groupedrdd.mapValues{iterable =>
Seq(iterable.head._1, iterable.head._2)}
You just need map, not mapValues.
val RDDofSeq = myRDD.map{case (x,y) => Seq(x,y)}

Scala - functional methods - expected NotInferedA

I have to Strings which represent two html site contents. I want to remove whitespaces and comments, compute Levenshtein distance between them and on that basis I want to decide wether they are similiar or not.
I created functions:
val removeWhiteSpacesAndHtmlComments: String => String = _.replaceAll("\\s+","\\s").replaceAll("<!--.*?-->","")
val prepareContents: (String,String) => (String,String) = (s1,s2) => (removeWhiteSpacesAndHtmlComments.apply(s1), removeWhiteSpacesAndHtmlComments(s2))
val computeLevenshteinDistance:(String,String) => Int = StringUtils.getLevenshteinDistance(_,_)
val areContentsSimilarEnough: Int => Boolean = _ <= 50
I want to combine all those functions into a flow:
val isHtmlContentChanged: (String,String) => Boolean = prepareContents.tupled andThen computeLevenshteinDistance andThen areContentsSimilarEnough
Unfortunately over the computeLevenshteinDistance part I get exception:
Type mismatch, expected: (String,String) => NotInferedA, actual: (String,String)=>Int
How to solve this ?
Add .tupled to computeLevenshteinDistance.
Try it out!

Scala Nested HashMaps, how to access Case Class value properties?

New to Scala, continue to struggle with Option related code. I have a HashMap built of Case Class instances that themselves contain hash maps with Case Class instance values. It is not clear to me how to access properties of the retrieved Class instances:
import collection.mutable.HashMap
case class InnerClass(name: String, age: Int)
case class OuterClass(name: String, nestedMap: HashMap[String, InnerClass])
// Load some data...hash maps are mutable
val innerMap = new HashMap[String, InnerClass]()
innerMap += ("aaa" -> InnerClass("xyz", 0))
val outerMap = new HashMap[String, OuterClass]()
outerMap += ("AAA" -> OuterClass("XYZ", innerMap))
// Try to retrieve data
val outerMapTest = outerMap.getOrElse("AAA", None)
val nestedMap = outerMapTest.nestedMap
This produces error: value nestedMap is not a member of Option[ScalaFiddle.OuterClass]
// Try to retrieve data a different way
val outerMapTest = outerMap.getOrElse("AAA", None)
val nestedMap = outerMapTest.nestedMap
This produces error: value nestedMap is not a member of Product with Serializable
Please advise on how I would go about getting access to outerMapTest.nestedMap. I'll eventually need to get values and properties out of the nestedMap HashMap as well.
Since you are using .getOrElse("someKey", None) which returns you a type Product (not the actual type as you expect to be OuterClass)
scala> val outerMapTest = outerMap.getOrElse("AAA", None)
outerMapTest: Product with Serializable = OuterClass(XYZ,Map(aaa -> InnerClass(xyz,0)))
so Product either needs to be pattern matched or casted to OuterClass
pattern match example
scala> outerMapTest match { case x : OuterClass => println(x.nestedMap); case _ => println("is not outerclass") }
Map(aaa -> InnerClass(xyz,0))
Casting example which is a terrible idea when outerMapTest is None, (pattern matching is favored over casting)
scala> outerMapTest.asInstanceOf[OuterClass].nestedMap
res30: scala.collection.mutable.HashMap[String,InnerClass] = Map(aaa -> InnerClass(xyz,0))
But better way of solving it would simply use .get which very smart and gives you Option[OuterClass],
scala> outerMap.get("AAA").map(outerClass => outerClass.nestedMap)
res27: Option[scala.collection.mutable.HashMap[String,InnerClass]] = Some(Map(aaa -> InnerClass(xyz,0)))
For key that does not exist, gives you None
scala> outerMap.get("I dont exist").map(outerClass => outerClass.nestedMap)
res28: Option[scala.collection.mutable.HashMap[String,InnerClass]] = None
Here are some steps you can take to get deep inside a nested structure like this.
outerMap.lift("AAA") // Option[OuterClass]
.map(_.nestedMap) // Option[HashMap[String,InnerClass]]
.flatMap(_.lift("aaa")) // Option[InnerClass]
.map(_.name) // Option[String]
.getOrElse("no name") // String
Notice that if either of the inner or outer maps doesn't have the specified key ("aaa" or "AAA" respectively) then the whole thing will safely result in the default string ("no name").
A HashMap will return None if a key is not found so it is unnecessary to do getOrElse to return None if the key is not found.
A simple solution to your problem would be to use get only as below
Change your first get as
val outerMapTest = outerMap.get("AAA").get
you can check the output as
println(outerMapTest.name)
println(outerMapTest.nestedMap)
And change the second get as
val nestedMap = outerMapTest.nestedMap.get("aaa").get
You can test the outputs as
println(nestedMap.name)
println(nestedMap.age)
Hope this is helpful
You want
val maybeInner = outerMap.get("AAA").flatMap(_.nestedMap.get("aaa"))
val maybeName = maybeInner.map(_.name)
Which if your feeling adventurous you can get with
val name: String = maybeName.get
But that will throw an error if its not there. If its a None
you can access the nestMap using below expression.
scala> outerMap.get("AAA").map(_.nestedMap).getOrElse(HashMap())
res5: scala.collection.mutable.HashMap[String,InnerClass] = Map(aaa -> InnerClass(xyz,0))
if "AAA" didnt exist in the outerMap Map object then the below expression would have returned an empty HashMap as indicated in the .getOrElse method argument (HashMap()).

Unable to use collectAsMap() in scala code

val titleMap = movies.map(line => line.split("\\|")).take(2)
//converting movie-id and movie name as map(key-pair)
val title1 = titleMap.map(array=>(array(0).toInt,array(1)))
val titles = movies.map(line => line.split("\\|").take(2)).map(array
=> (array(0).toInt,
array(1))).collectAsMap()
Whats wrong here with "title1",I am unable to apply collectAsMap function here,same thing I can apply in case of "titles"
The type of title1 is not an RDD, so it doesn't have the method collectAsMap().
The type of titles is an RDD so it does have the method collectAsMap().
Advise reading up on types https://en.wikipedia.org/wiki/Type_safety, https://en.wikipedia.org/wiki/Type_system

Convert java.util.Map to Scala List[NewObject]

I have a java.util.Map[String, MyObject] and want to create a Scala List[MyNewObject] consisting of alle entries of the map with some special values.
I found a way but, well, this is really ugly:
val result = ListBuffer[MyNewObject]()
myJavaUtilMap.forEach
(
(es: Entry[String, MyObject]) =>
{ result += MyNewObject(es.getKey(), ey.getValue().getMyParameter); println("Aa")}
)
How can I get rid of the println("Aa")? Just deleting does not help because foreach needs a Consumer but the += operation yields a list....
Is there a more elegant way to convert the java.util.Map to a List[MyNewObject]?
Scala has conversions that give you all the nice methods of the Scala collection API on Java collections:
import collection.JavaConversions._
val result = myJavaUtilMap.map{
case (k,v) => MyNewObject(k, v.getMyParameter)
}.toList
By the way: to define a function which returns Unit, you can explicitly specify the return type:
val f = (x: Int) => x: Unit