How to access/initialize and update values in a mutable map? - scala

Consider the simple problem of using a mutable map to keep track of occurrences/counts, i.e. with:
val counts = collection.mutable.Map[SomeKeyType, Int]()
My current approach to incrementing a count is:
counts(key) = counts.getOrElse(key, 0) + 1
// or equivalently
counts.update(key, counts.getOrElse(key, 0) + 1)
This somehow feels a bit clumsy, because I have to specify the key twice. In terms of performance, I would also expect that key has to be located twice in the map, which I would like to avoid. Interestingly, this access and update problem would not occur if Int would provide some mechanism to modify itself. Changing from Int to a Counter class that provides an increment function would for instance allow:
// not possible with Int
counts.getOrElseUpdate(key, 0) += 1
// but with a modifiable counter
counts.getOrElseUpdate(key, new Counter).increment
Somehow I'm always expecting to have the following functionality with a mutable map (somewhat similar to transform but without returning a new collection and on a specific key with a default value):
// fictitious use
counts.updateOrElse(key, 0, _ + 1)
// or alternatively
counts.getOrElseUpdate(key, 0).modify(_ + 1)
However as far as I can see, such a functionality does not exist. Wouldn't it make sense in general (performance and syntax wise) to have such a f: A => A in-place modification possibility? Probably I'm just missing something here... I guess there must be some better solution to this problem making such a functionality unnecessary?
Update:
I should have clarified that I'm aware of withDefaultValue but the problem remains the same: performing two lookups is still twice as slow than one, no matter if it is a O(1) operation or not. Frankly, in many situations I would be more than happy to achieve a speed-up of factor 2. And obviously the construction of the modification closure can often be moved outside of the loop, so imho this is not a big issue compared to running an operation unnecessarily twice.

You could create the map with a default value, which would allow you to do the following:
scala> val m = collection.mutable.Map[String, Int]().withDefaultValue(0)
m: scala.collection.mutable.Map[String,Int] = Map()
scala> m.update("a", m("a") + 1)
scala> m
res6: scala.collection.mutable.Map[String,Int] = Map(a -> 1)
As Impredicative mentioned, map lookups are fast so I wouldn't worry about 2 lookups.
Update:
As Debilski pointed out you can do this even more simply by doing the following:
scala> val m = collection.mutable.Map[String, Int]().withDefaultValue(0)
scala> m("a") += 1
scala> m
res6: scala.collection.mutable.Map[String,Int] = Map(a -> 1)

Starting Scala 2.13, Map#updateWith serves this exact purpose:
map.updateWith("a")({
case Some(count) => Some(count + 1)
case None => Some(1)
})
def updateWith(key: K)(remappingFunction: (Option[V]) => Option[V]): Option[V]
For instance, if the key doesn't exist:
val map = collection.mutable.Map[String, Int]()
// map: collection.mutable.Map[String, Int] = HashMap()
map.updateWith("a")({ case Some(count) => Some(count + 1) case None => Some(1) })
// Option[Int] = Some(1)
map
// collection.mutable.Map[String, Int] = HashMap("a" -> 1)
and if the key exists:
map.updateWith("a")({ case Some(count) => Some(count + 1) case None => Some(1) })
// Option[Int] = Some(2)
map
// collection.mutable.Map[String, Int] = HashMap("a" -> 2)

I wanted to lazy-initialise my mutable map instead of doing a fold (for memory efficiency). The collection.mutable.Map.getOrElseUpdate() method suited my purposes. My map contained a mutable object for summing values (again, for efficiency).
val accum = accums.getOrElseUpdate(key, new Accum)
accum.add(elem.getHours, elem.getCount)
collection.mutable.Map.withDefaultValue() does not keep the default value for a subsequent requested key.

Related

reduce a list in scala by value

How can I reduce a list like below concisely
Seq[Temp] = List(Temp(a,1), Temp(a,2), Temp(b,1))
to
List(Temp(a,2), Temp(b,1))
Only keep Temp objects with unique first param and max of second param.
My solution is with lot of groupBys and reduces which is giving a lengthy answer.
you have to
groupBy
sortBy values in ASC order
get the last one which is the largest
Example,
scala> final case class Temp (a: String, value: Int)
defined class Temp
scala> val data : Seq[Temp] = List(Temp("a",1), Temp("a",2), Temp("b",1))
data: Seq[Temp] = List(Temp(a,1), Temp(a,2), Temp(b,1))
scala> data.groupBy(_.a).map { case (k, group) => group.sortBy(_.value).last }
res0: scala.collection.immutable.Iterable[Temp] = List(Temp(b,1), Temp(a,2))
or instead of sortBy(fn).last you can maxBy(fn)
scala> data.groupBy(_.a).map { case (k, group) => group.maxBy(_.value) }
res1: scala.collection.immutable.Iterable[Temp] = List(Temp(b,1), Temp(a,2))
You can generate a Map with groupBy, compute the max in mapValues and convert it back to the Temp classes as in the following example:
case class Temp(id: String, value: Int)
List(Temp("a", 1), Temp("a", 2), Temp("b", 1)).
groupBy(_.id).mapValues( _.map(_.value).max ).
map{ case (k, v) => Temp(k, v) }
// res1: scala.collection.immutable.Iterable[Temp] = List(Temp(b,1), Temp(a,2))
Worth noting that the solution using maxBy in the other answer is more efficient as it minimizes necessary transformations.
You can do this using foldLeft:
data.foldLeft(Map[String, Int]().withDefaultValue(0))((map, tmp) => {
map.updated(tmp.id, max(map(tmp.id), tmp.value))
}).map{case (i,v) => Temp(i, v)}
This is essentially combining the logic of groupBy with the max operation in a single pass.
Note This may be less efficient because groupBy uses a mutable.Map internally which avoids constantly re-creating a new map. If you care about performance and are prepared to use mutable data, this is another option:
val tmpMap = mutable.Map[String, Int]().withDefaultValue(0)
data.foreach(tmp => tmpMap(tmp.id) = max(tmp.value, tmpMap(tmp.id)))
tmpMap.map{case (i,v) => Temp(i, v)}.toList
Use a ListMap if you need to retain the data order, or sort at the end if you need a particular ordering.

How to extract elements from 4 lists in scala?

case class TargetClass(key: Any, value: Number, lowerBound: Double, upperBound: Double)
val keys: List[Any] = List("key1", "key2", "key3")
val values: List[Number] = List(1,2,3);
val lowerBounds: List[Double] = List(0.1, 0.2, 0.3)
val upperBounds: List[Double] = List(0.5, 0.6, 0.7)
Now I want to construct a List[TargetClass] to hold the 4 lists. Does anyone know how to do it efficiently? Is using for-loop to add elements one by one very inefficient?
I tried to use zipped, but it seems that this only applies for combining up to 3 lists.
Thank you very much!
One approach:
keys.zipWithIndex.map {
case (item,i)=> TargetClass(item,values(i),lowerBounds(i),upperBounds(i))
}
You may want to consider using the lift method to deal with case of lists being of unequal lengths (and thereby provide a default if keys is longer than any of the lists?)
I realise this doesn't address your question of efficiency. You could fairly easily run some tests on different approaches.
You can apply zipped to the first two lists, to the last two lists, then to the results of the previous zips, then map to your class, like so:
val z12 = (keys, values).zipped
val z34 = (lowerBounds, upperBounds).zipped
val z1234 = (z12.toList, z34.toList).zipped
val targs = z1234.map { case ((k,v),(l,u)) => TargetClass(k,v,l,u) }
// targs = List(TargetClass(key1,1,0.1,0.5), TargetClass(key2,2,0.2,0.6), TargetClass(key3,3,0.3,0.7))
How about:
keys zip values zip lowerBounds zip upperBounds map {
case (((k, v), l), u) => TargetClass(k, v, l, u)
}
Example:
scala> val zipped = keys zip values zip lowerBounds zip upperBounds
zipped: List[(((Any, Number), Double), Double)] = List((((key1,1),0.1),0.5), (((key2,2),0.2),0.6), (((key3,3),0.3),0.7))
scala> zipped map { case (((k, v), l), u) => TargetClass(k, v, l, u) }
res6: List[TargetClass] = List(TargetClass(key1,1,0.1,0.5), TargetClass(key2,2,0.2,0.6), TargetClass(key3,3,0.3,0.7))
It would be nice if .transpose worked on a Tuple of Lists.
for (List(k, v:Number, l:Double, u:Double) <-
List(keys, values, lowerBounds, upperBounds).transpose)
yield TargetClass(k,v,l,u)
I think no matter what you use from an efficiency point of view, you will have to traverse the lists individually. The only question is, do you do it OR for the sake of readability, you use Scala idioms and let Scala do the dirty work for you :) ?
Other approaches are not necessarily more efficient. You can change the order of zipping and the order of assembling the return value of the map function as you like.
Here is a more functional way but I am not sure it will be more efficient. See comments on #wwkudu (zip with index) answer
val res1 = keys zip lowerBounds zip values zip upperBounds
res1.map {
x=> (x._1._1._1,x._1._1._2, x._1._2, x._2)
//Of course, you can return an instance of TargetClass
//here instead of the touple I am returning.
}
I am curious, why do you need a "TargetClass"? Will a touple work?

How do you stop building an Option[Collection] upon reaching the first None?

When building up a collection inside an Option, each attempt to make the next member of the collection might fail, making the collection as a whole a failure, too. Upon the first failure to make a member, I'd like to give up immediately and return None for the whole collection. What is an idiomatic way to do this in Scala?
Here's one approach I've come up with:
def findPartByName(name: String): Option[Part] = . . .
def allParts(names: Seq[String]): Option[Seq[Part]] =
names.foldLeft(Some(Seq.empty): Option[Seq[Part]]) {
(result, name) => result match {
case Some(parts) =>
findPartByName(name) flatMap { part => Some(parts :+ part) }
case None => None
}
}
In other words, if any call to findPartByName returns None, allParts returns None. Otherwise, allParts returns a Some containing a collection of Parts, all of which are guaranteed to be valid. An empty collection is OK.
The above has the advantage that it stops calling findPartByName after the first failure. But the foldLeft still iterates once for each name, regardless.
Here's a version that bails out as soon as findPartByName returns a None:
def allParts2(names: Seq[String]): Option[Seq[Part]] = Some(
for (name <- names) yield findPartByName(name) match {
case Some(part) => part
case None => return None
}
)
I currently find the second version more readable, but (a) what seems most readable is likely to change as I get more experience with Scala, (b) I get the impression that early return is frowned upon in Scala, and (c) neither one seems to make what's going on especially obvious to me.
The combination of "all-or-nothing" and "give up on the first failure" seems like such a basic programming concept, I figure there must be a common Scala or functional idiom to express it.
The return in your code is actually a couple levels deep in anonymous functions. As a result, it must be implemented by throwing an exception which is caught in the outer function. This isn't efficient or pretty, hence the frowning.
It is easiest and most efficient to write this with a while loop and an Iterator.
def allParts3(names: Seq[String]): Option[Seq[Part]] = {
val iterator = names.iterator
var accum = List.empty[Part]
while (iterator.hasNext) {
findPartByName(iterator.next) match {
case Some(part) => accum +:= part
case None => return None
}
}
Some(accum.reverse)
}
Because we don't know what kind of Seq names is, we must create an iterator to loop over it efficiently rather than using tail or indexes. The while loop can be replaced with a tail-recursive inner function, but with the iterator a while loop is clearer.
Scala collections have some options to use laziness to achieve that.
You can use view and takeWhile:
def allPartsWithView(names: Seq[String]): Option[Seq[Part]] = {
val successes = names.view.map(findPartByName)
.takeWhile(!_.isEmpty)
.map(_.get)
.force
if (!names.isDefinedAt(successes.size)) Some(successes)
else None
}
Using ifDefinedAt avoids potentially traversing a long input names in the case of an early failure.
You could also use toStream and span to achieve the same thing:
def allPartsWithStream(names: Seq[String]): Option[Seq[Part]] = {
val (good, bad) = names.toStream.map(findPartByName)
.span(!_.isEmpty)
if (bad.isEmpty) Some(good.map(_.get).toList)
else None
}
I've found trying to mix view and span causes findPartByName to be evaluated twice per item in case of success.
The whole idea of returning an error condition if any error occurs does, however, sound more like a job ("the" job?) for throwing and catching exceptions. I suppose it depends on the context in your program.
Combining the other answers, i.e., a mutable flag with the map and takeWhile we love.
Given an infinite stream:
scala> var count = 0
count: Int = 0
scala> val vs = Stream continually { println(s"Compute $count") ; count += 1 ; count }
Compute 0
vs: scala.collection.immutable.Stream[Int] = Stream(1, ?)
Take until a predicate fails:
scala> var failed = false
failed: Boolean = false
scala> vs map { case x if x < 5 => println(s"Yup $x"); Some(x) case x => println(s"Nope $x"); failed = true; None } takeWhile (_.nonEmpty) map (_.get)
Yup 1
res0: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> .toList
Compute 1
Yup 2
Compute 2
Yup 3
Compute 3
Yup 4
Compute 4
Nope 5
res1: List[Int] = List(1, 2, 3, 4)
or more simply:
scala> var count = 0
count: Int = 0
scala> val vs = Stream continually { println(s"Compute $count") ; count += 1 ; count }
Compute 0
vs: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> var failed = false
failed: Boolean = false
scala> vs map { case x if x < 5 => println(s"Yup $x"); x case x => println(s"Nope $x"); failed = true; -1 } takeWhile (_ => !failed)
Yup 1
res3: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> .toList
Compute 1
Yup 2
Compute 2
Yup 3
Compute 3
Yup 4
Compute 4
Nope 5
res4: List[Int] = List(1, 2, 3, 4)
I think your allParts2 function has a problem as one of the two branches of your match statement will perform a side effect. The return statement is the not-idiomatic bit, behaving as if you are doing an imperative jump.
The first function looks better, but if you are concerned with the sub-optimal iteration that foldLeft could produce you should probably go for a recursive solution as the following:
def allParts(names: Seq[String]): Option[Seq[Part]] = {
#tailrec
def allPartsRec(names: Seq[String], acc: Seq[String]): Option[Seq[String]] = names match {
case Seq(x, xs#_*) => findPartByName(x) match {
case Some(part) => allPartsRec(xs, acc +: part)
case None => None
}
case _ => Some(acc)
}
allPartsRec(names, Seq.empty)
}
I didn't compile/run it but the idea should be there and I believe it is more idiomatic than using the return trick!
I keep thinking that this has to be a one- or two-liner. I came up with one:
def allParts4(names: Seq[String]): Option[Seq[Part]] = Some(
names.map(findPartByName(_) getOrElse { return None })
)
Advantage:
The intent is extremely clear. There's no clutter and there's no exotic or nonstandard Scala.
Disadvantages:
The early return violates referential transparency, as Aldo Stracquadanio pointed out. You can't put the body of allParts4 into its calling code without changing its meaning.
Possibly inefficient due to the internal throwing and catching of an exception, as wingedsubmariner pointed out.
Sure enough, I put this into some real code, and within ten minutes, I'd enclosed the expression inside something else, and predictably got surprising behavior. So now I understand a little better why early return is frowned upon.
This is such a common operation, so important in code that makes heavy use of Option, and Scala is normally so good at combining things, I can't believe there isn't a pretty natural idiom to do it correctly.
Aren't monads good for specifying how to combine actions? Is there a GiveUpAtTheFirstSignOfResistance monad?

Strange behavior with reduce and fold on identical code block

EDIT
Ok, #dhg discovered that dot-method syntax required if the code block to fold() is not bound to a val (why with reduce() in the same code block one can use space-method syntax, I don't know). At any rate, the end result is the nicely concise:
result.map { row =>
addLink( row.href, row.label )
}.fold(NodeSeq.Empty)(_++_)
Which negates to some degree the original question; i.e. in many cases one can higher-order away either/or scenarios and avoid "fat", repetitive if/else statements.
ORIGINAL
Trying to reduce if/else handling when working with possibly empty collections like List[T]
For example, let's say I need to grab the latest news articles to build up a NodeSeq of html news <li><a>links</a></li>:
val result = dao.getHeadlines // List[of model objects]
if(result.isEmpty) NodeSeq.Empty
else
result map { row =>
addLink( row.href, row.label ) // NodeSeq
} reduce(_ ++ _)
This is OK, pretty terse, but I find myself wanting to go ternary style to address these only-will-ever-be either/or cases:
result.isEmpty ? NodeSeq.Empty :
result map { row =>
addLink( row.href, row.label )
} reduce(_ ++ _)
I've seen some old postings on pimping ternary onto boolean, but curious to know what the alternatives are, if any, to streamline if/else?
match {...} is, IMO, a bit bloated for this scenario, and for {...} yield doesn't seem to help much either.
You don't need to check for emptiness at all. Just use fold instead of reduce since fold allows you to specify a default "empty" value:
scala> List(1,2,3,4).map(_ + 1).fold(0)(_+_)
res0: Int = 14
scala> List[Int]().map(_ + 1).fold(0)(_+_)
res1: Int = 0
Here's an example with a List of Seqs:
scala> List(1,2).map(Seq(_)).fold(Seq.empty)(_++_)
res14: Seq[Int] = List(1, 2)
scala> List[Int]().map(Seq(_)).fold(Seq.empty)(_++_)
res15: Seq[Int] = List()
EDIT: Looks like the problem in your sample has to do with the dropping of dot (.) characters between methods. If you keep them in, it all works:
scala> List(1,2,3).map(i => node).fold(NodeSeq.Empty)(_ ++ _)
res57: scala.xml.NodeSeq = NodeSeq(<li>Link</li>, <li>Link</li>, <li>Link</li>)

Easiest way to decide if List contains duplicates?

One way is this
list.distinct.size != list.size
Is there any better way? It would have been nice to have a containsDuplicates method
Assuming "better" means "faster", see the alternative approaches benchmarked in this question, which seems to show some quicker methods (although note that distinct uses a HashSet and is already O(n)). YMMV of course, depending on specific test case, scala version etc. Probably any significant improvement over the "distinct.size" approach would come from an early-out as soon as a duplicate is found, but how much of a speed-up is actually obtained would depend strongly on how common duplicates actually are in your use-case.
If you mean "better" in that you want to write list.containsDuplicates instead of containsDuplicates(list), use an implicit:
implicit def enhanceWithContainsDuplicates[T](s:List[T]) = new {
def containsDuplicates = (s.distinct.size != s.size)
}
assert(List(1,2,2,3).containsDuplicates)
assert(!List("a","b","c").containsDuplicates)
You can also write:
list.toSet.size != list.size
But the result will be the same because distinct is already implemented with a Set. In both case the time complexity should be O(n): you must traverse the list and Set insertion is O(1).
I think this would stop as soon as a duplicate was found and is probably more efficient than doing distinct.size - since I assume distinct keeps a set as well:
#annotation.tailrec
def containsDups[A](list: List[A], seen: Set[A] = Set[A]()): Boolean =
list match {
case x :: xs => if (seen.contains(x)) true else containsDups(xs, seen + x)
case _ => false
}
containsDups(List(1,1,2,3))
// Boolean = true
containsDups(List(1,2,3))
// Boolean = false
I realize you asked for easy and I don't now that this version is, but finding a duplicate is also finding if there is an element that has been seen before:
def containsDups[A](list: List[A]): Boolean = {
list.iterator.scanLeft(Set[A]())((set, a) => set + a) // incremental sets
.zip(list.iterator)
.exists{ case (set, a) => set contains a }
}
#annotation.tailrec
def containsDuplicates [T] (s: Seq[T]) : Boolean =
if (s.size < 2) false else
s.tail.contains (s.head) || containsDuplicates (s.tail)
I didn't measure this, and think it is similar to huynhjl's solution, but a bit more simple to understand.
It returns early, if a duplicate is found, so I looked into the source of Seq.contains, whether this returns early - it does.
In SeqLike, 'contains (e)' is defined as 'exists (_ == e)', and exists is defined in TraversableLike:
def exists (p: A => Boolean): Boolean = {
var result = false
breakable {
for (x <- this)
if (p (x)) { result = true; break }
}
result
}
I'm curious how to speed things up with parallel collections on multi cores, but I guess it is a general problem with early-returning, while another thread will keep running, because it doesn't know, that the solution is already found.
Summary:
I've written a very efficient function which returns both List.distinct and a List consisting of each element which appeared more than once and the index at which the element duplicate appeared.
Note: This answer is a straight copy of the answer on a related question.
Details:
If you need a bit more information about the duplicates themselves, like I did, I have written a more general function which iterates across a List (as ordering was significant) exactly once and returns a Tuple2 consisting of the original List deduped (all duplicates after the first are removed; i.e. the same as invoking distinct) and a second List showing each duplicate and an Int index at which it occurred within the original List.
Here's the function:
def filterDupes[A](items: List[A]): (List[A], List[(A, Int)]) = {
def recursive(remaining: List[A], index: Int, accumulator: (List[A], List[(A, Int)])): (List[A], List[(A, Int)]) =
if (remaining.isEmpty)
accumulator
else
recursive(
remaining.tail
, index + 1
, if (accumulator._1.contains(remaining.head))
(accumulator._1, (remaining.head, index) :: accumulator._2)
else
(remaining.head :: accumulator._1, accumulator._2)
)
val (distinct, dupes) = recursive(items, 0, (Nil, Nil))
(distinct.reverse, dupes.reverse)
}
An below is an example which might make it a bit more intuitive. Given this List of String values:
val withDupes =
List("a.b", "a.c", "b.a", "b.b", "a.c", "c.a", "a.c", "d.b", "a.b")
...and then performing the following:
val (deduped, dupeAndIndexs) =
filterDupes(withDupes)
...the results are:
deduped: List[String] = List(a.b, a.c, b.a, b.b, c.a, d.b)
dupeAndIndexs: List[(String, Int)] = List((a.c,4), (a.c,6), (a.b,8))
And if you just want the duplicates, you simply map across dupeAndIndexes and invoke distinct:
val dupesOnly =
dupeAndIndexs.map(_._1).distinct
...or all in a single call:
val dupesOnly =
filterDupes(withDupes)._2.map(_._1).distinct
...or if a Set is preferred, skip distinct and invoke toSet...
val dupesOnly2 =
dupeAndIndexs.map(_._1).toSet
...or all in a single call:
val dupesOnly2 =
filterDupes(withDupes)._2.map(_._1).toSet
This is a straight copy of the filterDupes function out of my open source Scala library, ScalaOlio. It's located at org.scalaolio.collection.immutable.List_._.
If you're trying to check for duplicates in a test then ScalaTest can be helpful.
import org.scalatest.Inspectors._
import org.scalatest.Matchers._
forEvery(list.distinct) { item =>
withClue(s"value $item, the number of occurences") {
list.count(_ == item) shouldBe 1
}
}
// example:
scala> val list = List(1,2,3,4,3,2)
list: List[Int] = List(1, 2, 3, 4, 3, 2)
scala> forEvery(list) { item => withClue(s"value $item, the number of occurences") { list.count(_ == item) shouldBe 1 } }
org.scalatest.exceptions.TestFailedException: forEvery failed, because:
at index 1, value 2, the number of occurences 2 was not equal to 1 (<console>:19),
at index 2, value 3, the number of occurences 2 was not equal to 1 (<console>:19)
in List(1, 2, 3, 4)