Hi I am very new to Scala and Spark. I am writing a test to check the integrity of my data. For this I have a Coordinated Matrix and I map it with the ResultMap. Now in my Testing method I need to fetch it from result map and covert the type into Coordinate it raised an Exception in thread
"main" java.lang.ClassCastException: scala.Some cannot be cast to org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
This is my code .
def SinghTest(map:Map[String,Any ]):Boolean={
var res:Boolean=false // false
val connection= DriverManager.getConnection("Connectionstring ")
val statement = connection.createStatement();
val rs = statement.executeQuery("select A,B from Demo P" +
" join Demo_REL R on p.id=R.ID " +
"join Cpu CN on CN.id=R.CID" +
" limit 10 ");
/***
* Maping with ResultMap
***/
val matrix=map.get("MatrixEntries").asInstanceOf[CoordinateMatrix]
matrix.entries.take(10).foreach(x=> {
val ph=x.i
val ch=x.j
val pid=rs.getLong(1)
val cid=rs.getLong(2)
if((ph!=pid)&&ch!=cid)
throw new Exception("Fail")
})
The get method on maps does not return the element directly, but an Option of it. This means, for a Map[String, Any] the result type is Option[Any]. An option can either contain a value or be empty; if your maps contains the key, you will get a Some with the value, otherwise a None. You can then operate on the value using the methods on Option or get it via getOrElse, which takes a default value to use if it was a None.
val matrix = map.getOrElse("MatrixEntries", someDefaultMatrix).asInstanceOf[CoordinateMatrix]
If you are sure that the map contains the key, you can access the elements directly by using map(key), just leaving out the .get. This will give you the element directly, but throw an exception, if the key is not defined for your map.
val matrix = map("MatrixEntries").asInstanceOf[CoordinateMatrix]
PS: Note that using Any is usually considered bad style, as it throws away any type safety. If your map contains a mostly fixed set of keys and you have control over its creation (i.e. it's not from a library), look into replacing it with a case class:
case class TestData(matrixEntries: CoordinateMatrix /* further elements here */)
// ...
val matrix = testData.matrixEntries // no casting required, type errors checked at compile time
I did it this way, using the help of #crater2150
val matrix=map("MatrixEntries").asInstanceOf[CoordinateMatrix] // But the conversion is required.
Related
I am having one Map containing two Scala objects as value and unique string.
val vv = Map("N"-> Nconstant, "M"-> Mconstant)
Here the Nconstant and Mconstant are two Objects having constant values in it. After that, I try to access the constant variable inside that object by passing the key below,
val contract = vv("N").contractVal
contractVal is the variable which has values and is inside both the Mconstant and Nconstant.
But IntelliJ is showing
"Cannot resolve symbol contractVal".
Can anyone help with this issue?
As an addition to Tim's answer, in case you've got types that have a common field but don't have a common type, then you can use duck typing:
object Nconstant {
val contractVal = "N"
}
object Mconstant {
val contractVal = "M"
}
val vv = Map("N"-> Nconstant, "M"-> Mconstant, "X" -> Xconstant)
import scala.language.reflectiveCalls
vv("N").asInstanceOf[{ val contractVal: String }].contractVal //N
But beware, it will fail on runtime if N doesn't really have contractVal field!
It sounds like Nconstant and Mconstant are different types that happen to have the same field contractVal. If so, you need to determine which type you have by using match:
val contract = vv("N") match {
case n: Nconstant => n.contractVal
case m: Mconstant => m.contractVal
}
This will throw a MatchError if the value is neither Nconstant or Mconstant.
I have an array of values as shown below:
scala> number.take(5)
res1: Array[Any] = Array(908.76, 901.74, 83.71, 39.36, 234.64)
I need to find the mean value of the array using RDD method.
I have tried using number.mean() method but it keeps giving me following error:
error: could not find implicit value for parameter num: Numeric[Any]
I am new to Spark, please provide some suggestions. Thank you.
That's not Spark related. Compiler gives you a hint - there is no .mean() method for Array[Any] because it requires that elements of Array must be Numeric.
It means that it would work if it was an Array of Double or Ints.
number.take(5) returned Array[Any] because somewhere above it you provided no guarantee that Array will contain only Numeric elements.
If you can't provide that guarantee, then you have to map over that array and explicitly cast all these values to Double or other Numeric type of your choice.
implicit class AnyExtended(value: Any) {
def toDoubleO: Option[Double] = {
Try(value.toDouble).toOption
}
}
val array: Array[Double] = number.take(5).flatMap(_.toDoubleO)
val mean: Double = array.mean
Note that instead of using basic .toDouble I've written implicit extension because .toDouble can fail and throw an exception. Instead, we can wrap that into Try and turn into Option - in case of exception we'll get None and this value will be skipped from computation of the mean due to flatMap
If you are happy to convert to a DF, then spark will do this for you with minimal effort.
val number = List(908.76, 901.74, 83.71, 39.36, 234.64)
val numberRDD = sc.parallelize(number)
numberRDD.toDF("x").agg(avg(col("x")))
res1.show
This will produce the answer 433.642
I have the two following objects (in scala and using spark):
1. The main object
object Omain {
def main(args: Array[String]) {
odbscan
}
}
2. The object odbscan
object odbscan {
val conf = new SparkConf().setAppName("Clustering").setMaster("local")
conf.set("spark.driver.maxResultSize", "3g")
val sc = new SparkContext(conf)
val param_user_minimal_rating_count = 2
/***Connexion***/
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val sql = "SELECT id, data FROM user_profile"
val options = connectMysql.getOptionsMap(sql)
val uSQL = sqlcontext.load("jdbc", options)
val users = uSQL.rdd.map { x =>
val v = x.toString().substring(1, x.toString().size - 1).split(",")
var ap: Map[Int, Double] = Map()
if (v.size > 1)
ap = v(1).split(";").map { y => (y.split(":")(0).toInt, y.split(":")(1).toDouble) }.toMap
(v(0).toInt, ap)
}.filter(_._2.size >= param_user_minimal_rating_count)
println(users.collect().mkString("\n"))
}
When I execute this code I obtain an infinite loop, until I change:
filter(_._2.size >= param_user_minimal_rating_count)
to
filter(_._2.size >= 1)
or any other numerical value, in this case the code work, and I have my result displayed
What I think is happening here is that Spark serializes functions to send them over the wire. And that because your function (the one you're passing to map) calls the accessor param_user_minimal_rating_count of object odbscan, the entire object odbscan will need to get serialized and sent along with it. Deserializing and then using that deserialized object will cause the code in its body to get executed again which causes an infinite loop of serializing-->sending-->deserializing-->executing-->serializing-->...
Probably the easiest thing to do here is changing that val to final val param_user_minimal_rating_count = 2 so the compiler will inline the value. But note that this will only be a solution for literal constants. For more information see constant value definitions and constant expressions.
An other and better solution would be to refactor your code so that no instance variables are used in lambda expressions. Referencing vals that are defined in an object or class will get the whole object serialized. So try to only refer to vals that are local (to a method). And most importantly don't execute your business logic from within a constructor/the body of an object or class.
Your problem is somewhere else.
The only difference between the 2 snippets is the definition of val Eps = 5 outside of the map which does not change at all the control flow of your code.
Please post more context so we can help.
val vJsonLoc = new HashMap[String, String]();
def getPrevJson(s:String) = vJsonLoc.get(s)
val previousFile = getPrevJson(s"/${site.toLowerCase}/$languagePath/$channel/v$v/$segment")
this returns
Some(/Users/abc/git/abc-c2c/)
on trying to append string previousFile + "/" + index + ".json"
the result is Some(/Users/abc/git/abc-c2c/)/0.json when the desired result is /Users/abc/git/abc-c2c/0.json
Guess this is some concept of Option that have not understood. New to scala.
As you pointed out, you're getting back an Option type, and not a direct reference to the String contained in your data structure. This is a very standard Scala practice, allowing you to better handle cases where an expected value might not be present in your data structure.
For example, in Java, this type of method typically returns the value if it exists and null if it doesn't. This means, however, subsequent code could be operating on the null value and thus you'd need further protection against exceptions.
In Scala, you're getting a reference to an object which may, or may not, have the value you expect. This is the Option type, and can be either Some (in which case the reference is accessible) or None (in which case you have several options for handling it).
Consider your code:
val vJsonLoc = new HashMap[String, String]();
def getPrevJson(s:String) = vJsonLoc.get(s)
val previousFile = getPrevJson(s"/${site.toLowerCase}/$languagePath/$channel/v$v/$segment")
If the HashMap returned String, your previousFile reference could point to either a null value or to a String value. You'd need to protect against a potential exception (regular practice in Java).
But in Scala, get is returning an Option type, which can be handled in a number of ways:
val previousFile = getPrevJson("your_string").getOrElse("")
//or
val previousFile = getPrevJson("your_string") match {
case Some(ref) => ref
case None => ""
}
The resulting reference previousFile will point to a String value: either the expected value ("get") or the empty string ("OrElse").
Scala Map on get returns Option. Use vJsonLoc(s) instead of vJsonLoc.get(s)
I am very new to Scala, I try some tutorials but I didn't get the point in this problem here:
val reverse = new mutable.HashMap[String, String]() with mutable.SynchronizedMap[String, String]
def search(query: String) = Future.value {
val tokens = query.split(" ")
val hits = tokens map { token => reverse.getOrElse(token, Set()) }
if (hits.isEmpty)
Nil
else
hits reduceLeft {_ & _} toList // value & is not a member of java.lang.Object
}
The compiler says value & is not a member of java.lang.Object. Can somebody explain me why I am getting a compiler error ? I have this from this tutorial here: https://twitter.github.io/scala_school/searchbird.html
"tokens" is of type Array[String]. Now, when you iterate over the array, there are two possibilities. Either reverse will have a value for the token or not. If it has, then the Array element get a string value otherwise an empty set.
For example: Lets say reverse has two values - ("a" -> "a1", "b" ->"b1") a maps to a1 and b maps to b1.
Suppose, The query string is "a c".
tokens will be ["a","c"] after splitting.
After mapping you will get in array ["a1", Set()] (a got mapped to a1 and there is no value for "c" in the map hence, you got an empty Set())
Now, the overall type of the hits array is Array[Object].
So, now you are getting an error as the last line will be "&" operator on 2 Objects according to the compiler.
Mohit Has The right answer, you end up with an Array of objects. This is because your HashMap for reverse has a value type of String, so it'll return a String for a given key. Your getOrElse however, will return a Set type if the key is not found in your HashMap reverse. These need to return the same type so you don't end up with an Array[Objects]
If you notice a few lines above in the tutorial you linked, reverse is defined as follows:
val reverse = new mutable.HashMap[String, Set[String]] with mutable.SynchronizedMap[String, Set[String]]