Found Option[CSVWriter] but required CSVWriter? `var` caused the issue? - scala

The following code
var writers = new HashMap[String, CSVWriter]()
var writer = writers.get(pool)
if (writer == null) {
//writer = new CSVWriter(new FileWriter(s"..."))
writers.put(pool, writer) // Errr
}
Got the error of
[error] found : Option[au.com.bytecode.opencsv.CSVWriter]
[error] required: au.com.bytecode.opencsv.CSVWriter
[error] writers.put(pool, writer)
[error] ^
[error] one error found
Does var automatically add an Option wrapper? What I should do to put CSVWriter in the hashmap?

When a value is retrieved from a HashMap in scala using .get(key), the default returned value is of Option instance.
scala> val writers = collection.mutable.HashMap("abc" -> "def")
writers: scala.collection.mutable.HashMap[String,String] = Map(abc -> def)
scala> writers.get("abc")
res10: Option[String] = Some(def)
Option will return Some instance if the key exists in the HashMap and returns None if the key is not present.
scala> writers.get("a")
res13: Option[String] = None
This helps to avoid the nullpointer exceptions.
We need to get the real value using another get
scala> writers.get("abc").get
res11: String = def
But to be on the safe-side we can use getOrElse
scala> writers.getOrElse("abc", "no value")
res1: String = def
scala> writers.getOrElse("a", "no value")
res2: String = no value
I hope the explanation is clear

Related

scala spark type mismatching

I need to group my rdd by two columns and aggregate the count. I have a function:
def constructDiagnosticFeatureTuple(diagnostic: RDD[Diagnostic])
: RDD[FeatureTuple] = {
val grouped_patients = diagnostic
.groupBy(x => (x.patientID, x.code))
.map(_._2)
.map{ events =>
val p_id = events.map(_.patientID).take(1).mkString
val f_code = events.map(_.code).take(1).mkString
val count = events.size.toDouble
((p_id, f_code), count)
}
//should be in form:
//diagnostic.sparkContext.parallelize(List((("patient", "diagnostics"), 1.0)))
}
At compile time, I am getting an error:
/FeatureConstruction.scala:38:3: type mismatch;
[error] found : Unit
[error] required: org.apache.spark.rdd.RDD[edu.gatech.cse6250.features.FeatureConstruction.FeatureTuple]
[error] (which expands to) org.apache.spark.rdd.RDD[((String, String), Double)]
[error] }
[error] ^
How can I fix it?
I red this post: Scala Spark type missmatch found Unit, required rdd.RDD , but I do not use collect(), so, it does not help me.

Scala type mismatch in map operation

I am trying a map operation on a Spark DStream in the code below:
val hashesInRecords: DStream[(RecordKey, Array[Int])] = records.map(record => {
val hashes: List[Int] = calculateIndexing(record.fields())
val ints: Array[Int] = hashes.toArray(Array.ofDim[Int](hashes.length))
(new RecordKey(record.key, hashes.length), ints)
})
The code looks fine in IntelliJ however when I try to build I get an error which I don't really understand:
Error:(53, 61) type mismatch;
found : Array[Int]
required: scala.reflect.ClassTag[Int]
val ints: Array[Int] = hashes.toArray(Array.ofDim[Int](hashes.length))
This error remains even after I add the type in the map operation like so :
records.map[(RecordKey, Array[Int])](record => {...
This should fix your problem, also it avoids the call of List.length which is O(N), and uses Array.length instead which is O(1).
val hashesInRecords: DStream[(RecordKey, Array[Int])] = records.map { record =>
val ints = calculateIndexing(record.fields()).toArray
(new RecordKey(record.key, ints.length), ints)
}

How to access individual value of a map in a Map[String,(String, String)]

How can i access the individual values in a Map, so to say. the Map is of type Map[String,(String, String)]. Based on the input string i want to return value(String1) or value(String2) if the argument matches key or to return the argument itself in case there is no match ,
val mappeddata = Map("LOWES" -> ("Lowes1","Lowes2"))
Updated.
the below is working in case when none of the values are empty
scala> mappeddata.find(_._1 == "LOWES").map(_._2._2).getOrElse("LOWES")
res135: Option[String] = Some(Lowes2)
scala> mappeddata.find(_._1 == "LOWES").map(_._2._1).getOrElse("LOWES")
res136: Option[String] = Some(Lowes1)
but if the value is empty that i want to return input string itself but instead its returning null
scala> val mappeddata = Map("LOWES" -> ("Lowes1",""))
mappeddata: scala.collection.immutable.Map[String,(String, String)] = Map(LOWES -> (Lowes1,""))
scala> mappeddata.find(_._1 == "LOWES").map(_._2._2).getOrElse("LOWES")
res140: String = "
what needs to be done to fix this?
Basically you are asking to get the values part of a Map. In my below example, I am extracting Lowes2.
val m = Map("LOWES" -> ("Lowes1","Lowes2"), "Other" -> ("other1","other2"))
println(m.get("LOWES").get._1) // will print **Lowes2**
Not sure what you want but maybe this is helpful:
val m = Map[String, (String, String)]()
val value = m("first") // value if exists or throws exception
val value: Option[(String, String)] = m.get("first")// as an option
val values: List[(String, String)] = m.map(_._2).toList // list of values
This works.
scala> if (mappeddata.get("LOWES").get._1.isEmpty) "LOWES" else mappeddata.get("LOWES").get._1
res163: String = Lowes1
scala> if (mappeddata.get("LOWES").get._2.isEmpty) "LOWES" else mappeddata.get("LOWES").get._2
res164: String = LOWES
//Updated
scala> if (mappeddata("LOWES")._1.isEmpty) "LOWES" else mappeddata("LOWES")._1
res163: String = Lowes1
scala> if (mappeddata("LOWES")._2.isEmpty) "LOWES" else mappeddata("LOWES")._2
res164: String = LOWES

Why I get type mismatch in scala Spark?

First, I read a text file and turn it into RDD[(String,(String,Float))]:
val data = sc.textFile(dataInputPath);
val dataRDD:RDD[(String,(String,Float))] = data.map{f=> {
val temp=f.split("//x01");
(temp(0),(temp(1),temp(2).toInt ) );
}
} ;
Then, I run following code to turn my data into Rating type
import org.apache.spark.mllib.recommendation.Rating
val imeiMap = dataRDD.reduceByKey((s1,s2)=>s1).collect().zipWithIndex.toMap;
val docidMap = dataRDD.map( f=>(f._2._1,1)).reduceByKey((s1,s2)=>s1).collect().zipWithIndex.toMap;
val ratings = dataRDD.map{case (imei, (doc_id,rating))=> Rating(imeiMap(imei),docidMap(doc_id),rating)};
But I got an error:
Error:(32, 77) type mismatch;
found : String
required: (String, (String, Float))
val ratings = dataRDD.map{case (imei, (doc_id,rating))=> Rating(imeiMap(imei),docidMap(doc_id),rating)};
Why this happen? I think that the string have already changed to (String, (String, Float)).
The key of docidMap is not a String, is a Tuple (String, Int)
This is because you have the zipWithIndex before the .toMap method:
With this rdd as input for a quick test:
(String1,( String2,32.0))
(String1,( String2,35.0))
scala> val docidMap = dataRDD.map( f=>(f._2._1,1)).reduceByKey((s1,s2)=>s1).collect().zipWithIndex.toMap;
docidMap: scala.collection.immutable.Map[(String, Int),Int] = Map((" String2",1) -> 0)
val docidMap = dataRDD.map( f=>(f._2._1,1)).reduceByKey((s1,s2)=>s1).collect().toMap;
docidMap: scala.collection.immutable.Map[String,Int] = Map(" String2" -> 1)
The same will happen with your imeiMap, it seems that you just need to remove the zipWithIndex from there too
val imeiMap = dataRDD.reduceByKey((s1,s2)=>s1).collect.toMap
it is not about your dataRDD, it is about imeiMap:
imeiMap: scala.collection.immutable.Map[(String, (String, Float)),Int]

Problem with outputting map values in scala

I have the following code snippet:
val map = new LinkedHashMap[String,String]
map.put("City","Dallas")
println(map.get("City"))
This outputs Some(Dallas) instead of just Dallas. Whats the problem with my code ?
Thank You
Use the apply method, it returns directly the String and throws a NoSuchElementException if the key is not found:
scala> import scala.collection.mutable.LinkedHashMap
import scala.collection.mutable.LinkedHashMap
scala> val map = new LinkedHashMap[String,String]
map: scala.collection.mutable.LinkedHashMap[String,String] = Map()
scala> map.put("City","Dallas")
res2: Option[String] = None
scala> map("City")
res3: String = Dallas
It's not really a problem.
While Java's Map version uses null to indicate that a key don't have an associated value, Scala's Map[A,B].get returns a Options[B], which can be Some[B] or None, and None plays a similar role to java's null.
REPL session showing why this is useful:
scala> map.get("State")
res6: Option[String] = None
scala> map.get("State").getOrElse("Texas")
res7: String = Texas
Or the not recommended but simple get:
scala> map.get("City").get
res8: String = Dallas
scala> map.get("State").get
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:262)
Check the Option documentation for more goodies.
There are two more ways you can handle Option results.
You can pattern match them:
scala> map.get("City") match {
| case Some(value) => println(value)
| case _ => println("found nothing")
| }
Dallas
Or there is another neat approach that appears somewhere in Programming in Scala. Use foreach to process the result. If a result is of type Some, then it will be used. Otherwise (if it's None), nothing happens:
scala> map.get("City").foreach(println)
Dallas
scala> map.get("Town").foreach(println)