How to remove found :AnyVal required :Double in scala ? - scala

I am traversing a Scala Map and I am getting type mismatch error in my code. Here is what I am trying to do.
private var cumulativeCapacity:Map[String , Double] = Map()
private var cumulativeDelay:Map[String ,Double] = Map()
cumulativeCapacity.keys.foreach { linkId =>
val delay = cumulativeDelay.get(linkId).getOrElse(0)
val capacity = cumulativeCapacity.get(linkId).getOrElse(0)
val bin = largeset(capacity)
}
So the error is coming inside val bin = largeset(capacity) that, capacity should be double but found AnyVal. Provide me any solution or let me know if I am doing something wrong.

Welcome to SO.
The problem you are experiencing is due to the fact that you are providing an Int as a default value when the key is not found in your Map, instead of a Double. If you change 0 by 0.0 or 0D it should work. i.e
cumulativeCapacity.keys.foreach { linkId =>
val delay = cumulativeDelay.getOrElse(linkId, 0D)
val capacity = cumulativeCapacity.getOrElse(linkId, 0D)
val bin = largeset(capacity)
}

Related

Scala Spark: How do I bootstrap sample from a column of a Spark Dataframe?

I am looking to sample values, with replacement, from a column of a Spark DataFrame, using the Scala programming language in a Jupyter Notebook setting in a cluster environment. How do I do this?
I tried the following function that I found online:
import scala.util
def bootstrapMean(originalData: Array[Double]): Double = {
val n = originalData.length
def draw: Double = originalData(util.Random.nextInt(n))
// a tail recursive loop to randomly draw and add a value to the accumulating sum
def drawAndSumValues(current: Int, acc: Double = 0D): Double = {
if (current == 0) acc
else drawAndSumValues(current - 1, acc + draw)
}
drawAndSumValues(n) / n
}
Like so:
val data = stack.select("column_with_values").collect.map(_.toSeq).flatten
val m = 10
val bootstraps = Vector.fill(m)(bootstrapMean(data))
But I get the error:
An error was encountered:
<console>:47: error: type mismatch;
found : Array[Any]
required: Array[Double]
Note: Any >: Double, but class Array is invariant in type T.
You may wish to investigate a wildcard type such as `_ >: Double`. (SLS 3.2.10)
val bootstraps = Vector.fill(m)(bootstrapMean(data))
Not sure how to debug this, and whether I should bother to or try another approach. I'm looking for ideas/documentation/code. Thanks.
Update:
How do I put the user mck's solution below, in a for loop? I tried the following:
var bootstrap_container = Seq()
var a = 1
for( a <- 1 until 3){
var sampled = stack_b.select("diff_hours").sample(withReplacement = true, fraction = 0.5, seed = a)
var smpl_average = sampled.select(avg("diff_hours")).collect()(0)(0)
var bootstrap_smpls = bootstrap_container.union(Seq(smpl_average)).collect()
}
bootstrap_smpls
but that gives an error:
<console>:49: error: not enough arguments for method collect: (pf: PartialFunction[Any,B])(implicit bf: scala.collection.generic.CanBuildFrom[Seq[Any],B,That])That.
Unspecified value parameter pf.
var bootstrap_smpls = bootstrap_container.union(Seq(smpl_average)).collect()
You can use the sample method of dataframes, for example, if you want to sample with replacement and with a fraction of 0.5:
val sampled = stack.select("column_with_values").sample(true, 0.5)
To get the mean, you can do:
val col_average = sampled.select(avg("column_with_values")).collect()(0)(0)
EDIT:
var bootstrap_container = List[Double]()
var a = 1
for( a <- 1 until 3){
var sampled = stack_b2.select("diff_hours").sample(withReplacement = true, fraction = 0.5, seed = a)
var smpl_average = sampled.select(avg("diff_hours")).collect()(0)(0)
bootstrap_container = bootstrap_container :+ smpl_average.asInstanceOf[Double]
}
var mean_bootstrap = bootstrap_container.reduce(_ + _) / bootstrap_container.length

How to append a string to a list or array in a for loop in scala?

var RetainKeyList: mutable.Seq[String] = new scala.collection.mutable.ListBuffer[String]()
for(element<-ts_rdd)
{
var elem1 = element._1
var kpssSignificance: Double = 0.05
var dOpt: Option[Int] = (0 to 2).find
{
diff =>
var testTs = differencesOfOrderD(element._2, diff)
var (stat, criticalValues) = kpsstest(testTs, "c")
stat < criticalValues(kpssSignificance)
}
var d = dOpt match
{
case Some(v) => v
case None => 300000
}
if(d.equals(300000))
{
println("Bad Key: " + elem1)
RetainKeyList += elem1
}
Hi all,
I created a empty mutable list buffer var RetainKeyList: mutable.Seq[String] = new scala.collection.mutable.ListBuffer[String]() and I am trying to add a string elem1 to it in a for loop.
When I try to compile the code it hangs with no error message but if I remove the code RetainKeyList += elem1 I am able to print all of the elem1 string properly.
What am I doing wrong here? Is there a cleaner way to collect all the string elem1 generated in the for loop?
Long story short, your code is running on a distributed environment, so the local collection is not modified. Every week someone asks this question, please if you do not understand what are the implications of distributed computing do not use a distributed framework like Spark.
Also, you are abusing of mutability in all parts. And mutability and a distributed environment don't play nicely.
Anyway, here is a better way to solve your problem.
val retainKeysRdd = ts_rdd.map {
case (elem1, elem2) =>
val kpssSignificance = 0.05d
val dOpt = (0 to 2).find { diff =>
val testTs = differencesOfOrderD(elem2, diff)
val (stat, criticalValues) = kpsstest(testTs, "c")
stat < criticalValues(kpssSignificance)
}
(elem1 -> dOpt)
} collect {
case (key, None) => key
}
This returns an RDD with the retain keys. If you are really sure you need this as a local collection and that they won't blow up your memory, you can do this:
val retainKeysList = retainKeysRdd.collect().toList

Flink: PageRank type mismatch error

I want to compute PageRank from a CSV file of edges formatted as follows:
12,13,1.0
12,14,1.0
12,15,1.0
12,16,1.0
12,17,1.0
...
My code:
var filename = "<filename>.csv"
val graph = Graph.fromCsvReader[Long,Double,Double](
env = env,
pathEdges = filename,
readVertices = false,
hasEdgeValues = true,
vertexValueInitializer = new MapFunction[Long, Double] {
def map(id: Long): Double = 0.0 } )
val ranks = new PageRank[Long](0.85, 20).run(graph)
I get the following error from the Flink Scala Shell:
error: type mismatch;
found : org.apache.flink.graph.scala.Graph[Long,_23,_24] where type _24 >: Double with _22, type _23 >: Double with _21
required: org.apache.flink.graph.Graph[Long,Double,Double]
val ranks = new PageRank[Long](0.85, 20).run(graph)
^
What am I doing wrong?
( And are the initial values 0.0 for every vertex and 1.0 for every edge correct? )
The problem is that you're giving the Scala org.apache.flink.graph.scala.Graph to PageRank.run which expects the Java org.apache.flink.graph.Graph.
In order to run a GraphAlgorithm for a Scala Graph object, you have to call the run method of the Scala Graph with the GraphAlgorithm.
graph.run(new PageRank[Long](0.85, 20))
Update
In the case of the PageRank algorithm it is important to note that the algorithm expects an instance of type Graph[K, java.lang.Double, java.lang.Double]. Since Java's Double type is different from Scala's Double type (in terms of type checking), this has to be accounted for.
For the example code this means
val graph = Graph.fromCsvReader[Long,java.lang.Double,java.lang.Double](
env = env,
pathEdges = filename,
readVertices = false,
hasEdgeValues = true,
vertexValueInitializer = new MapFunction[Long, java.lang.Double] {
def map(id: Long): java.lang.Double = 0.0 } )
.asInstanceOf[Graph[Long, java.lang.Double, java.lang.Double]]

How to get the type of a field using reflection?

Is there a way to get the Type of a field with scala reflection?
Let's see the standard reflection example:
scala> class C { val x = 2; var y = 3 }
defined class C
scala> val m = ru.runtimeMirror(getClass.getClassLoader)
m: scala.reflect.runtime.universe.Mirror = JavaMirror ...
scala> val im = m.reflect(new C)
im: scala.reflect.runtime.universe.InstanceMirror = instance mirror for C#5f0c8ac1
scala> val fieldX = ru.typeOf[C].declaration(ru.newTermName("x")).asTerm.accessed.asTerm
fieldX: scala.reflect.runtime.universe.TermSymbol = value x
scala> val fmX = im.reflectField(fieldX)
fmX: scala.reflect.runtime.universe.FieldMirror = field mirror for C.x (bound to C#5f0c8ac1)
scala> fmX.get
res0: Any = 2
Is there a way to do something like
val test: Int = fmX.get
That means can I "cast" the result of a reflection get to the actual type of the field? And otherwise: is it possible to do a reflection set from a string? In the example something like
fmx.set("10")
Thanks for hints!
Here's the deal... the type is not known at compile time, so, basically, you have to tell the compiler what the type it's supposed to be. You can do it safely or not, like this:
val test: Int = fmX.get.asInstanceOf[Int]
val test: Int = fmX.get match {
case n: Int => n
case _ => 0 // or however you want to handle the exception
}
Note that, since you declared test to be Int, you have to assign an Int to it. And even if you kept test as Any, at some point you have to pick a type for it, and it is always going to be something static -- as in, in the source code.
The second case just uses pattern matching to ensure you have the right type.
I'm not sure I understand what you mean by the second case.

how do I increment an integer variable I passed into a function in Scala?

I declared a variable outside the function like this:
var s: Int = 0
passed it such as this:
def function(s: Int): Boolean={
s += 1
return true
}
but the error lines wont go away under the "s +=" for the life of me. I tried everything. I am new to Scala btw.
First of all, I will repeat my words of caution: solution below is both obscure and inefficient, if it possible try to stick with values.
implicit class MutableInt(var value: Int) {
def inc() = { value+=1 }
}
def function(s: MutableInt): Boolean={
s.inc() // parentheses here to denote that method has side effects
return true
}
And here is code in action:
scala> val x: MutableInt = 0
x: MutableInt = MutableInt#44e70ff
scala> function(x)
res0: Boolean = true
scala> x.value
res1: Int = 1
If you just want continuously increasing integers, you can use a Stream.
val numberStream = Stream.iterate(0)(_ + 1).iterator
That creates an iterator over a never-ending stream of number, starting at zero. Then, to get the next number, call
val number: Int = numberStream.next
I have also just started using Scala this was my work around.
var s: Int = 0
def function(s: Int): Boolean={
var newS = s
newS = newS + 1
s = newS
return true
}
From What i read you are not passing the same "s" into your function as is in the rest of the code. I am sure there is a even better way but this is working for me.
You don't.
A var is a name that refers to a reference which might be changed. When you call a function, you pass the reference itself, and a new name gets bound to it.
So, to change what reference the name points to, you need a reference to whatever contains the name. If it is an object, that's easy enough. If it is a local variable, then it is not possible.
See also call by reference, though I don't think this question is a true duplicate.
If you just want to increment a variable starting with 3
val nextId = { var i = 3; () => { i += 1; i } }
then invoke it:
nextId()