Adding key-value pairs to Map in scala - scala

I am very new to scala, and want to create a hash map with the key being a candidate, and value being number of votes. Something like this: {(1:20),(2:4),(3:42),..}.
I`ve attempted with the following code:
val voteTypeList = textFile.map(x=>x(2)) //String array containing votes: [3,4,2,3,2,1,1,1,9,..]
var voteCount:Map[String,Int] = Map()
voteTypeList.foreach{x=>{
if (voteCount.contains(x)){ //Increment value
var i: Integer = voteCount(x)
voteCount.updated(x, i+1)
// print(voteCount(x))
}
else{ //Create new key-value pair
// println(x)
voteCount += (x -> 1)
}}}
print(voteCount.size)
But the voteCount does not get created and .size returns 0.
Thank you!

The problem you're encountering is caused by using a var to hold an immutable Map. Change that to a val holding a mutable Map and it works.
val voteCount:collection.mutable.Map[String,Int] = collection.mutable.Map()
Having said that, there are a number of other issues with the code that makes it non-idiomatic to the Scala way of doing things.
What you really want is something closer to this.
val voteCount = voteTypeList.groupBy(identity).mapValues(_.length)

Jwvh's answer is idiomatic but non-obvious and, if the list has very large numbers of duplicates, memory-heavy.
You might also consider the literal-minded (yet still Scalarific) way of doing this:
val voteCount = voteTypeList.foldLeft(Map(): Map[Int, Int]) { (v, x) =>
v + (x -> (v.getOrElse(x, 0) + 1))
}

Related

Reducing/Folding a List of Strings to a Map[String,Boolean] in Scala

I have a list like this:
val objectKeys = List("Name","Place","Animal","Thing");
I want to reduce it to a Map[String,Boolean] where Boolean is element.size < 8.
Here's what I wrote:
val mappedObject = objectKeys.fold(Map[String,Boolean])((map,key) => map + (key -> key.size < 8))
which gives me the following error:
value + is not a member of Object, but could be made available as an extension method.
and
value size is not a member of Object
My understanding about fold is that it takes a default argument and reduces the entire value around it which however doesn't seem to work in this case. Can anyone help me with this?
Ideally mappedObject should be like:
val mappedObject = Map[String,Boolean]("Name"->true,"Place"->true,"Animal"->true,"Thing"->true)
An equivalent Javascript implementation will be:
const listValues = ["Name","Place","Animal","Thing"];
const reducedObject = listValues.reduce((acc,curr) => {acc[curr] = curr.length < 8;
return acc;
},{});
If you really want to do it with a fold, that's easy to do:
objectKeys.foldLeft(Map.empty[String, Boolean]) { (acc, key) =>
acc + ((key, key.length < 8))
}
That said, I'm with Ivan on this one. map is clearly the better solution here (or fproduct if you're using the cats library).
I think in this case you should just map to a tuple containing your key with boolean check and then convert it to Map[String, Boolean] via toMap method as following.
objectKeys.map(key => (key, key.length < 8)).toMap

working with RDDS and collections in Scala

I have the below function that takes an array of means of type Likes (type Likes=Int)
and an RDD of numbers of type Likes (likesVector). For each number in the likesVector RDD, it computes the distance from each mean in means array and chooses the mean which has the least distance (val distance = (mean-number).abs). While I expect a result of type Map[Likes,Array[Likes]], I get an empty map. Map[Likes,Array[Likes]] represents (mean->Array of number-nearest numbers).
What is the best way to achieve this? I suspect it has a lot to do with the mutability of Scala collections.
def assignDataPoints(means:Array[Likes],likesVector:RDD[Likes]): Map[Likes,Array[Likes]] ={
var likes_Mean = IntMap(1->1)
var likes_mean_final = mutable.Map.empty[Likes,Array[Likes]]
likesVector.map(dataPoint => {
means.foldLeft(Array.empty[Likes])( (accumulator, mean)=> {
val dist= computeDistance(dataPoint,mean)
val nearestMean = if (dist < accumulator(0)) {
accumulator(0)=dist
accumulator(0)
} else{
accumulator(0)
}
val b= IntMap(nearestMean.toInt -> dataPoint)
println("b:"+ b)
likes_mean_final ++ likes_Mean.intersectionWith(b,(_, av, bv: Likes) => Array(av, bv))
accumulator
})})
likes_mean_final.toMap
}
The reason for your empty map is that you are using ++ operation here in the line:
likes_mean_final ++ likes_Mean.intersectionWith(b,(_, av, bv: Likes) => Array(av, bv))
++ operation returns a new map hence we are creating a new object and not changes the current value
The operation for mutating the current mutable map is ++=. So you should use:
likes_mean_final ++= likes_Mean.intersectionWith(b,(_, av, bv: Likes) => Array(av, bv))
Also no need to use var in your context as you are not reassigning values to the same references.
But you should not use mutability in Scala.

What would be the idiomatic way to map this data?

I'm pretty new to Scala and while working I found the need to map some data found within a log file. The log file follows this format (values changed from original):
1343,37284.ab1-tbd,283
1344,37284.ab1-tbd,284
1345,37284.ab1-tbd,0
1346,28374.ab1-tbd,107
1347,28374.ab1-tbd,0
...
The first number is not important, but the number portion of the second field and the third field are what need to be mapped. I need the map to have keys that correspond to the number portion of the second field that map to a list of every 3rd field that follows it. That was a bad explanation, so as an example here is what I would need after parsing the above log:
{
37284 => { 283, 284, 0 }
28374 => { 107, 0 }
}
The solution I came up with is this:
val data = for (line <- Source fromFile "path/to/log" getLines) yield line.split(',')
val ls = data.toList
val keys = ls.map(_(1).split('.')(0).toInt)
val vals = ls.map(_(2).toInt)
val keys2vals = for {
(k, v) <- (keys zip vals).groupBy(_._1)
list = v.map(_._2)
} yield (k, list)
Is there a more idiomatic way to do this in Scala? This seems kinda awkward and convoluted to me. (When explaining, please assume little to no background knowledge of langauge features, etc.) Also, if later down the line I wanted to exclude the number zero from the mappings, how would I do so?
EDIT:
In addition, how would I similarly turn the data into the form:
{
{ 37284, { 283 ,284, 0 } }
{ 28374, { 107, 0 } }
}
i.e. a List[(Int, List[Int])]? (This form is for use with apache-spark's indexed rdds)
How about:
val assocList = for {
line <- Source.fromFile("path/to/log").getLines
Array(_, snd, thd) = line.split(',')
} yield (snd.split('.')(0).toInt, thd.toInt)
assocList.toList.groupBy(_._1).mapValues(_.map(_._2))
If you want a List[(Int, List[Int])], add .toList.
I might be tempted to write it in fewer lines (arguably clearer too) like this:
val l = List((1343,"37284.ab1-tbd",283),
(1344,"37284.ab1-tbd",284),
(1345,"37284.ab1-tbd",0),
(1346,"28374.ab1-tbd",107),
(1347,"28374.ab1-tbd",0))
// drop the unused data
val m = l.map(a => a._2.split('.')(0).toInt -> a._3)
// transform to Map of key -> matchedValues
m.groupBy(_._1) mapValues (_ map (_._2))
gives:
m: List[(Int, Int)] = List((37284,283), (37284,284), (37284,0), (28374,107), (28374,0))
res0: scala.collection.immutable.Map[Int,List[Int]] = Map(37284 -> List(283, 284, 0), 28374 -> List(107, 0))
"Also, if later down the line I wanted to exclude the number zero from the mappings, how would I do so?" - You could filter the initial list:
val m = l.filter(_._3 != 0).map(a => a._2.split('.')(0) -> a._3)
To convert to List[(Int, List[Int])] you just need to call .toList on the resulting Map.
val lines = io.Source.fromFile("path/to/log").getLines.toList
lines.map{x=>
val Array(_,second,_,fourth) = x.split("[,.]")
(second,fourth)
}.groupBy(_._1)
.mapValues(_.map(_._2))

Appending element to list in Scala

val indices: List[Int] = List()
val featValues: List[Double] = List()
for (f <- feat) {
val q = f.split(':')
if (q.length == 2) {
println(q.mkString("\n")) // works fine, displays info
indices :+ (q(0).toInt)
featValues :+ (q(1).toDouble)
}
}
println(indices.mkString("\n") + indices.length) // prints nothing and 0?
indices and featValues are not being filled. I'm at a loss here.
You cannot append anything to an immutable data structure such as List stored in a val (immutable named slot).
What your code is doing is creating a new list every time with one element appended, and then throwing it away (by not doing anything with it) — the :+ method on lists does not modify the list in place (even when it's a mutable list such as ArrayBuffer) but always returns a new list.
In order to achieve what you want, the quickest way (as opposed to the right way) is either to use a var (typically preferred):
var xs = List.empty[Int]
xs :+= 123 // same as `xs = xs :+ 123`
or a val containing a mutable collection:
import scala.collection.mutable.ArrayBuffer
val buf = ArrayBuffer.empty[Int]
buf += 123
However, if you really want to make your code idiomatic, you should instead just use a functional approach:
val indiciesAndFeatVals = feat.map { f =>
val Array(q0, q1) = f.split(':') // pattern matching in action
(q0.toInt, q1.toDouble)
}
which will give you a sequence of pairs, which you can then unzip to 2 separate collections:
val (indicies, featVals) = indiciesAndFeatVals.unzip
This approach will avoid the use of any mutable data structures as well as vars (i.e. mutable slots).

How to access/initialize and update values in a mutable map?

Consider the simple problem of using a mutable map to keep track of occurrences/counts, i.e. with:
val counts = collection.mutable.Map[SomeKeyType, Int]()
My current approach to incrementing a count is:
counts(key) = counts.getOrElse(key, 0) + 1
// or equivalently
counts.update(key, counts.getOrElse(key, 0) + 1)
This somehow feels a bit clumsy, because I have to specify the key twice. In terms of performance, I would also expect that key has to be located twice in the map, which I would like to avoid. Interestingly, this access and update problem would not occur if Int would provide some mechanism to modify itself. Changing from Int to a Counter class that provides an increment function would for instance allow:
// not possible with Int
counts.getOrElseUpdate(key, 0) += 1
// but with a modifiable counter
counts.getOrElseUpdate(key, new Counter).increment
Somehow I'm always expecting to have the following functionality with a mutable map (somewhat similar to transform but without returning a new collection and on a specific key with a default value):
// fictitious use
counts.updateOrElse(key, 0, _ + 1)
// or alternatively
counts.getOrElseUpdate(key, 0).modify(_ + 1)
However as far as I can see, such a functionality does not exist. Wouldn't it make sense in general (performance and syntax wise) to have such a f: A => A in-place modification possibility? Probably I'm just missing something here... I guess there must be some better solution to this problem making such a functionality unnecessary?
Update:
I should have clarified that I'm aware of withDefaultValue but the problem remains the same: performing two lookups is still twice as slow than one, no matter if it is a O(1) operation or not. Frankly, in many situations I would be more than happy to achieve a speed-up of factor 2. And obviously the construction of the modification closure can often be moved outside of the loop, so imho this is not a big issue compared to running an operation unnecessarily twice.
You could create the map with a default value, which would allow you to do the following:
scala> val m = collection.mutable.Map[String, Int]().withDefaultValue(0)
m: scala.collection.mutable.Map[String,Int] = Map()
scala> m.update("a", m("a") + 1)
scala> m
res6: scala.collection.mutable.Map[String,Int] = Map(a -> 1)
As Impredicative mentioned, map lookups are fast so I wouldn't worry about 2 lookups.
Update:
As Debilski pointed out you can do this even more simply by doing the following:
scala> val m = collection.mutable.Map[String, Int]().withDefaultValue(0)
scala> m("a") += 1
scala> m
res6: scala.collection.mutable.Map[String,Int] = Map(a -> 1)
Starting Scala 2.13, Map#updateWith serves this exact purpose:
map.updateWith("a")({
case Some(count) => Some(count + 1)
case None => Some(1)
})
def updateWith(key: K)(remappingFunction: (Option[V]) => Option[V]): Option[V]
For instance, if the key doesn't exist:
val map = collection.mutable.Map[String, Int]()
// map: collection.mutable.Map[String, Int] = HashMap()
map.updateWith("a")({ case Some(count) => Some(count + 1) case None => Some(1) })
// Option[Int] = Some(1)
map
// collection.mutable.Map[String, Int] = HashMap("a" -> 1)
and if the key exists:
map.updateWith("a")({ case Some(count) => Some(count + 1) case None => Some(1) })
// Option[Int] = Some(2)
map
// collection.mutable.Map[String, Int] = HashMap("a" -> 2)
I wanted to lazy-initialise my mutable map instead of doing a fold (for memory efficiency). The collection.mutable.Map.getOrElseUpdate() method suited my purposes. My map contained a mutable object for summing values (again, for efficiency).
val accum = accums.getOrElseUpdate(key, new Accum)
accum.add(elem.getHours, elem.getCount)
collection.mutable.Map.withDefaultValue() does not keep the default value for a subsequent requested key.