Related
I am at the beginning of my Scala journey. I am trying to find and compare the average value of a given dataset - type Map(String, List[Int]), for two random rows selected by the user, in order to return the greater average value between the two. I can calculate the average for each row but I can't find a way to compare the average between the two rows.
I have tried in different ways, but I only get error messages. However the program calculates the average of each row
DATASET
SK1, 9, 7, 2, 0, 7, 3, 7, 9, 1, 2, 8, 1, 9, 6, 5, 3, 2, 2, 7, 2, 8, 5, 4, 5, 1, 6, 5, 2, 4, 1
SK2, 0, 7, 6, 3, 3, 3, 1, 6, 9, 2, 9, 7, 8, 7, 3, 6, 3, 5, 5, 2, 9, 7, 3, 4, 6, 3, 4, 3, 4, 1
SK3, 8, 7, 1, 8, 0, 5, 8, 3, 5, 9, 7, 5, 4, 7, 9, 8, 1, 4, 6, 5, 6, 6, 3, 6, 8, 8, 7, 4, 0, 6
This is how I the program calculates the average of a row
//Function to find the average
def average(list: List[Int]): Double = list.sum.toDouble / list.size
def averageStockLevel1(stock1: String, stock2: String): (String, Int) = {
val ave1 = mapdata.get(stock1).map(average(_).toInt).getOrElse(0)
val ave2 = mapdata.get(stock2).map(average(_).toInt).getOrElse(0)
if (ave1>ave2){
(stock1,ave1)
}else{
(stock2,ave2)
}
}
This is how I have called the function in the menu
def handleFour(): Boolean = {
menuDoubleDataStock(averageStockLevel1)
true
}
//Pull two rows from the dataset
def menuShowDoubleDataStock(f: (String) => (String, Int), g:(String) => (String, Int)) = {
print("Please insert the Stock > ")
val data = f(readLine)
println(s"${data._1}: ${data._2}")
print("Please insert the Stock > ")
val data1 = g(readLine)
println(s"${data1._1}: ${data1._2}")
}
error message
Unspecified value parameters: g: String => (String, Int)
The error message "Unspecified value parameters: g: String => (String, Int)" tells you the following:
Your menuShowDoubleDataStock expects two parameters (f and g), but where you call it (from handleFour()), you only pass one value (averageStockLevel1) - that value is accepted as f, so the compiler complains that no value was passed for g.
Besides that specific error that the compiler currently complains about, there is also a second problem (which currently seems to be overshadowed by the one above): the type of f is defined as String => (String, Int) (a function that takes one String parameter), but the value that you are passing (averageStockLevel1) has the type (String, String) => (String, Int) (a function that takes two String parameters).
I'm not 100% sure if I understood what you are aiming to do, but I think the solution could be to change the signature of menuShowDoubleDataStock so that it only takes one parameter of type (String, String) => (String, Int):
// make the user enter two stock-names and pass them into resultCalculator to
// get the result (and then print it)
def menuShowDoubleDataStock(resultCalculator: (String, String) => (String, Int)) = {
print("Please insert the Stock > ")
val stockName1 = readLine
print("Please insert the Stock > ")
val stockName2 = readLine
val result = resultCalculator(stockName1, stockName2)
println(s"${result._1}: ${result._2}")
}
Then calling menuDoubleDataStock(averageStockLevel1) should work.
I understand how to use if in map. For example, val result = list.map(x => if (x % 2 == 0) x * 2 else x / 2).
However, I want to use if for only part of the arguments.
val inputColumns = List(
List(1, 2, 3, 4, 5, 6), // first "column"
List(4, 6, 5, 7, 12, 15) // second "column"
)
inputColumns.zipWithIndex.map{ case (col, idx) => if (idx == 0) col * 2 else col / 10}
<console>:1: error: ';' expected but integer literal found.
inputColumns.zipWithIndex
res4: List[(List[Int], Int)] = List((List(1, 2, 3, 4, 5, 6),0), (List(4, 6, 5, 7, 12, 15),1))
I have searched the error info but have not found a solution.
Why my code is not 'legal' in Scala? Is there a better way to write it? Basically, I want to do a pattern matching and then do something on other arguments.
To explain your problem another way, inputColumns has type List[List[Int]]. You can verify this in the Scala REPL:
$ scala
Welcome to Scala 2.12.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_161).
Type in expressions for evaluation. Or try :help.
scala> val inputColumns = List(
| List(1, 2, 3, 4, 5, 6), // first "column"
| List(4, 6, 5, 7, 12, 15) // second "column"
| )
inputColumns: List[List[Int]] = List(List(1, 2, 3, 4, 5, 6), List(4, 6, 5, 7, 12, 15))
Now, when you call .zipWithIndex on that list, you end up with a List[(List[Int], Int)] - that is, a list of a tuple, in which the first tuple type is a List[Int] (the column) and the second is an Int (the index):
scala> inputColumns.zipWithIndex
res0: List[(List[Int], Int)] = List((List(1, 2, 3, 4, 5, 6),0), (List(4, 6, 5, 7, 12, 15),1))
Consequently, when you try to apply a map function to this list, col is a List[Int] and not an Int, and so col * 2 makes no sense - you're multiplying a List[Int] by 2. You then also try to divide the list by 10, obviously.
scala> inputColumns.zipWithIndex.map{ case(col, idx) => if(idx == 0) col * 2 else col / 10 }
<console>:13: error: value * is not a member of List[Int]
inputColumns.zipWithIndex.map{ case(col, idx) => if(idx == 0) col * 2 else col / 10 }
^
<console>:13: error: value / is not a member of List[Int]
inputColumns.zipWithIndex.map{ case(col, idx) => if(idx == 0) col * 2 else col / 10 }
^
In order to resolve this, it depends what you're trying to achieve. If you want a single list of integers, and then zip those so that each value has an associated index, you should call flatten on inputColumns before calling zipWithIndex. This will result in List[(Int, Int)], where the first value in the tuple is the column value, and the second is the index. Your map function will then work correctly without modification:
scala> inputColumns.flatten.zipWithIndex.map{ case(col, idx) => if(idx == 0) col * 2 else col / 10 }
res3: List[Int] = List(2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1)
Of course, you no longer have separate columns.
If you wish each value in each list to have an associated index, you need to firstly map inputColumns into two zipped lists, using inputColumns.map(_.zipWithIndex) to create a List[List[(Int, Int)]] - a list of a list of (Int, Int) tuples:
scala> inputColumns.map(_.zipWithIndex)
res4: List[List[(Int, Int)]] = List(List((1,0), (2,1), (3,2), (4,3), (5,4), (6,5)), List((4,0), (6,1), (5,2), (7,3), (12,4), (15,5)))
We can now apply your original map function to the result of the zipWithIndex operation:
scala> inputColumns.map(_.zipWithIndex.map { case (col, idx) => if(idx == 0) col * 2 else col / 10 })
res5: List[List[Int]] = List(List(2, 0, 0, 0, 0, 0), List(8, 0, 0, 0, 1, 1))
The result is another List[List[Int]] with each internal list being the results of your map operation on the original two input columns.
On the other hand, if idx is meant to be the index of the column, and not of each value, and you want to multiply all of the values in the first column by 2 and divide all of the values in the other columns by 10, then you need to change your original map function to map across each column, as follows:
scala> inputColumns.zipWithIndex.map {
| case (col, idx) => {
| if(idx == 0) col.map(_ * 2) // Multiply values in first column by 1
| else col.map(_ / 10) // Divide values in all other columns by 10
| }
| }
res5: List[List[Int]] = List(List(2, 4, 6, 8, 10, 12), List(0, 0, 0, 0, 1, 1))
Let me know if you require any further clarification...
UPDATE:
The use of case in map is a common Scala shorthand. If a higher-order function takes a single argument, something such as this:
def someHOF[A, B](x: A => B) = //...
and you call that function like this (with what Scala terms a partial function - a function consisting solely of a list of case statements):
someHOF {
case expr1 => //...
case expr2 => //...
...
}
then Scala treats it as a kind-of shorthand for:
someHOF {a =>
a match {
case expr1 => //...
case expr2 => //...
...
}
}
or, being slightly more terse,
someHOF {
_ match {
case expr1 => //...
case expr2 => //...
...
}
}
For a List, for example, you can use it with functions such as map, flatMap, filter, etc.
In the case of your map function, the sole argument is a tuple, and the sole case statement acts to break open the tuple and expose its contents. That is:
val l = List((1, 2), (3, 4), (5, 6))
l.map { case(a, b) => println(s"First is $a, second is $b") }
is equivalent to:
l.map {x =>
x match {
case (a, b) => println(s"First is $a, second is $b")
}
}
and both will output:
First is 1, second is 2
First is 3, second is 4
First is 5, second is 6
Note: This latter is a bit of a dumb example, since map is supposed to map (i.e. change) the values in the list into new values in a new list. If all you were doing was printing the values, this would be better:
val l = List((1, 2), (3, 4), (5, 6))
l.foreach { case(a, b) => println(s"First is $a, second is $b") }
You are trying to multiply a list by 2 when you do col * 2 as col is List(1, 2, 3, 4, 5, 6) when idx is 0, which is not possible and similar is the case with else part col / 10
If you are trying to multiply the elements of first list by 2 and devide the elements of rest of the list by 10 then you should be doing the following
inputColumns.zipWithIndex.map{ case (col, idx) => if (idx == 0) col.map(_*2) else col.map(_/10)}
Even better approach would be to use match case
inputColumns.zipWithIndex.map(x => x._2 match {
case 0 => x._1.map(_*2)
case _ => x._1.map(_/10)
})
I have user hobby data(RDD[Map[String, Int]]) like:
("food" -> 3, "music" -> 1),
("food" -> 2),
("game" -> 5, "twitch" -> 3, "food" -> 3)
I want to calculate stats of them, and represent the stats as Map[String, Array[Int]] while the array size is 5, like:
("food" -> Array(0, 1, 2, 0, 0),
"music" -> Array(1, 0, 0, 0, 0),
"game" -> Array(0, 0, 0, 0, 1),
"twitch" -> Array(0, 0, 1, 0 ,0))
foldLeft seems to be the right solution, but RDD cannot use it, and the data is too big to convert to List/Array to use foldLeft, how could I do this job?
The trick is to replace the Array in your example by a class that contains the statistic you want for some part of the data, and that can be combined with another instance of the same statistic (covering other part of the data) to provide the statistic on the whole data.
For instance, if you have a statistic that covers the data 3, 3, 2 and 5, I gather it would look something like (0, 1, 2, 0, 1) and if you have another instance covering the data 3,4,4 it would look like (0, 0, 1, 2,0). Now all you have to do is define a + operation that let you combine (0, 1, 2, 0, 1) + (0, 0, 1, 2, 0) = (0,1,3,2,1), covering the data 3,3,2,5 and 3,4,4.
Let's just do that, and call the class StatMonoid:
case class StatMonoid(flags: Seq[Int] = Seq(0,0,0,0,0)) {
def + (other: StatMonoid) =
new StatMonoid( (0 to 4).map{idx => flags(idx) + other.flags(idx)})
}
This class contains the sequence of 5 counters, and define a + operation that let it be combined with other counters.
We also need a convenience method to build it, this could be a constructor in StatMonoid, in the companion object, or just a plain method, as you prefer:
def stat(value: Int): StatMonoid = value match {
case 1 => new StatMonoid(Seq(1,0,0,0,0))
case 2 => new StatMonoid(Seq(0,1,0,0,0))
case 3 => new StatMonoid(Seq(0,0,1,0,0))
case 4 => new StatMonoid(Seq(0,0,0,1,0))
case 5 => new StatMonoid(Seq(0,0,0,0,1))
case _ => throw new RuntimeException("illegal init value: $value")
}
This allows us to easily compute instance of the statistic covering one single piece of data, for example:
scala> stat(4)
res25: StatMonoid = StatMonoid(List(0, 0, 0, 1, 0))
And it also allows us to combine them together simply by adding them:
scala> stat(1) + stat(2) + stat(2) + stat(5) + stat(5) + stat(5)
res18: StatMonoid = StatMonoid(Vector(1, 2, 0, 0, 3))
Now to apply this to your example, let's assume we have the data you mention as an RDD of Map:
val rdd = sc.parallelize(List(Map("food" -> 3, "music" -> 1), Map("food" -> 2), Map("game" -> 5, "twitch" -> 3, "food" -> 3)))
All we need to do to find the stat for each kind of food, is to flatten the data to get ("foodId" -> id) tuples, transform each id into an instance of StatMonoid above, and finally combine them all together for each kind of food:
import org.apache.spark.rdd.PairRDDFunctions
rdd.flatMap(_.toList).mapValue(stat).reduceByKey(_ + _).collect
Which yields:
res24: Array[(String, StatMonoid)] = Array((game,StatMonoid(List(0, 0, 0, 0, 1))), (twitch,StatMonoid(List(0, 0, 1, 0, 0))), (music,StatMonoid(List(1, 0, 0, 0, 0))), (food,StatMonoid(Vector(0, 1, 2, 0, 0))))
Now, for the side story, if you wonder why I call the class StateMonoid it's simply because... it is a monoid :D, and a very common and handy one, called product . In short, monoids are just thingies that can be combined with each other in associative fashion, they are super common when developing in Spark since they naturally define operations that can be executed in parallel on the distributed slaves, and gathered together into a final result.
This is my first day using scala. I am trying to make a string of the number of times each digit is represented in a string. For instance, the number 4310227 would return "1121100100" because 0 appears once, 1 appears once, 2 appears twice and so on...
def pow(n:Int) : String = {
val cubed = (n * n * n).toString
val digits = 0 to 9
val str = ""
for (a <- digits) {
println(a)
val b = cubed.count(_==a.toString)
println(b)
}
return cubed
}
and it doesn't seem to work. would like some scalay reasons why and to know whether I should even be going about it in this manner. Thanks!
When you iterate over strings, which is what you are doing when you call String#count(), you are working with Chars, not Strings. You don't want to compare these two with ==, since they aren't the same type of object.
One way to solve this problem is to call Char#toString() before performing the comparison, e.g., amend your code to read cubed.count(_.toString==a.toString).
As Rado and cheeken said, you're comparing a Char with a String, which will never be be equal. An alternative to cheekin's answer of converting each character to a string is to create a range from chars, ie '0' to '9':
val digits = '0' to '9'
...
val b = cubed.count(_ == a)
Note that if you want the Int that a Char represents, you can call char.asDigit.
Aleksey's, Ren's and Randall's answers are something you will want to strive towards as they separate out the pure solution to the problem. However, given that it's your first day with Scala, depending on what background you have, you might need a bit more context before understanding them.
Fairly simple:
scala> ("122333abc456xyz" filter (_.isDigit)).foldLeft(Map.empty[Char, Int]) ((histo, c) => histo + (c -> (histo.getOrElse(c, 0) + 1)))
res1: scala.collection.immutable.Map[Char,Int] = Map(4 -> 1, 5 -> 1, 6 -> 1, 1 -> 1, 2 -> 2, 3 -> 3)
This is perhaps not the fastest approach because intermediate datatype like String and Char are used but one of the most simplest:
def countDigits(n: Int): Map[Int, Int] =
n.toString.groupBy(x => x) map { case (n, c) => (n.asDigit, c.size) }
Example:
scala> def countDigits(n: Int): Map[Int, Int] = n.toString.groupBy(x => x) map { case (n, c) => (n.asDigit, c.size) }
countDigits: (n: Int)Map[Int,Int]
scala> countDigits(12345135)
res0: Map[Int,Int] = Map(5 -> 2, 1 -> 2, 2 -> 1, 3 -> 2, 4 -> 1)
Where myNumAsString is a String, eg "15625"
myNumAsString.groupBy(x => x).map(x => (x._1, x._2.length))
Result = Map(2 -> 1, 5 -> 2, 1 -> 1, 6 -> 1)
ie. A map containing the digit with its corresponding count.
What this is doing is taking your list, grouping the values by value (So for the initial string of "15625", it produces a map of 1 -> 1, 2 -> 2, 6 -> 6, and 5 -> 55.). The second bit just creates a map of the value to the count of how many times it occurs.
The counts for these hundred digits happen to fit into a hex digit.
scala> val is = for (_ <- (1 to 100).toList) yield r.nextInt(10)
is: List[Int] = List(8, 3, 9, 8, 0, 2, 0, 7, 8, 1, 6, 9, 9, 0, 3, 6, 8, 6, 3, 1, 8, 7, 0, 4, 4, 8, 4, 6, 9, 7, 4, 6, 6, 0, 3, 0, 4, 1, 5, 8, 9, 1, 2, 0, 8, 8, 2, 3, 8, 6, 4, 7, 1, 0, 2, 2, 6, 9, 3, 8, 6, 7, 9, 5, 0, 7, 6, 8, 7, 5, 8, 2, 2, 2, 4, 1, 2, 2, 6, 8, 1, 7, 0, 7, 6, 9, 5, 5, 5, 3, 5, 8, 2, 5, 1, 9, 5, 7, 2, 3)
scala> (new Array[Int](10) /: is) { case (a, i) => a(i) += 1 ; a } map ("%x" format _) mkString
warning: there were 1 feature warning(s); re-run with -feature for details
res7: String = a8c879caf9
scala> (new Array[Int](10) /: is) { case (a, i) => a(i) += 1 ; a } sum
warning: there were 1 feature warning(s); re-run with -feature for details
res8: Int = 100
I was going to point out that no one used a char range, but now I see Kristian did.
def pow(n:Int) : String = {
val cubed = (n * n * n).toString
val cnts = for (a <- '0' to '9') yield cubed.count(_ == a)
(cnts map (c => ('0' + c).toChar)).mkString
}
I need to shuffle part of an ArrayBuffer, preferably in-place so no copies are required. For example, if an ArrayBuffer has 10 elements, and I want to shuffle elements 3-7:
// Unshuffled ArrayBuffer of ints numbered 0-9
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
// Region I want to shuffle is between the pipe symbols (3-7)
0, 1, 2 | 3, 4, 5, 6, 7 | 8, 9
// Example of how it might look after shuffling
0, 1, 2 | 6, 3, 5, 7, 4 | 8, 9
// Leaving us with a partially shuffled ArrayBuffer
0, 1, 2, 6, 3, 5, 7, 4, 8, 9
I've written something like shown below, but it requires copies and iterating over loops a couple of times. It seems like there should be a more efficient way of doing this.
def shufflePart(startIndex: Int, endIndex: Int) {
val part: ArrayBuffer[Int] = ArrayBuffer[Int]()
for (i <- startIndex to endIndex ) {
part += this.children(i)
}
// Shuffle the part of the array we copied
val shuffled = this.random.shuffle(part)
var count: Int = 0
// Overwrite the part of the array we chose with the shuffled version of it
for (i <- startIndex to endIndex ) {
this.children(i) = shuffled(count)
count += 1
}
}
I could find nothing about partially shuffling an ArrayBuffer with Google. I assume I will have to write my own method, but in doing so I would like to prevent copies.
If you can use a subtype of ArrayBuffer you can access the underlying array directly, since ResizableArray has a protected member array:
import java.util.Collections
import java.util.Arrays
import collection.mutable.ArrayBuffer
val xs = new ArrayBuffer[Int]() {
def shuffle(a: Int, b: Int) {
Collections.shuffle(Arrays.asList(array: _*).subList(a, b))
}
}
xs ++= (0 to 9) // xs = ArrayBuffer(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
xs.shuffle(3, 8) // xs = ArrayBuffer(0, 1, 2, 4, 6, 5, 7, 3, 8, 9)
Notes:
The upper bound for java.util.List#subList is exclusive
It's reasonably efficient because Arrays#asList doesn't create a new set of elements: it's backed by the array itself (hence no add or remove methods)
If using for real, you probably want to add bounds-checks on a and b
I'm not entirely sure why it must be in place - you might want to think that over. But, assuming it's the right thing to do, this could do it:
import scala.collection.mutable.ArrayBuffer
implicit def array2Shufflable[T](a: ArrayBuffer[T]) = new {
def shufflePart(start: Int, end: Int) = {
val seq = (start.max(0) until end.min(a.size - 1)).toSeq
seq.zip(scala.util.Random.shuffle(seq)) foreach { t =>
val x = a(t._1)
a.update(t._1, a(t._2))
a(t._2) = x
}
a
}
}
You can use it like:
val a = ArrayBuffer(1,2,3,4,5,6,7,8,9)
println(a)
println(a.shufflePart(2, 7))
edit: If you're willing to pay the extra cost of an intermediate sequence, this is more reasonable, algorithmically speaking:
def shufflePart(start: Int, end: Int) = {
val seq = (start.max(0) until end.min(a.size - 1)).toSeq
seq.zip(scala.util.Random.shuffle(seq) map { i =>
a(i)
}) foreach { t =>
a.update(t._1, t._2)
}
a
}
}