Fastest way to append sequence objects in loop - scala

I have a for loop within which I get an Seq[Seq[(String,Int)]] for every run. I have the usual way of running through the Seq[Seq[(String,Int)]] to get every Seq[(String,Int)] and then append it to a ListBuffer[Seq[String,Int]].
Here is the following code:
var lis; //Seq[Seq[Tuple2(String,Int)]]
var matches = new ListBuffer[(String,Int)]
someLoop.foreach(k=>
// someLoop gives lis object on evry run,
// and that needs to be added to matches list
lis.foreach(j => matches.appendAll(j))
)
Is there better way to do this process without running through Seq[Seq[String,Int]] loop, say directly adding all the seq objects from the Seq to the ListBuffer?
I tried the ++ operator, by adding matches and lis directly. It didn't work either. I use Scala 2.10.2

Try this:
matches.appendAll(lis.flatten)
This way you can avoid the mutable ListBuffer at all. lis.flatten will be the Seq[(String, Int)]. So you can shorten your code like this:
val lis = ... //whatever that is Seq[Seq[(String, Int)]]
val flatLis = lis.flatten // Seq[(String, Int)]
Avoid var's and mutable structures like ListBuffer as much as you can

You don't need to append to an empty ListBuffer, just create it directly:
import collection.breakOut
val matches: ListBuffer[(String,Int)] =
lis.flatten(breakOut)
breakOut is the magic here. Calling flatten on a Seq[Seq[T]] would usually create a Seq[T] that you'd then have to convert to a ListBuffer. Using breakOut causes it to look at the expected output type and build that kind of collection instead.
Of course... You were only using ListBuffer for mutability anyway, so a Seq[T] is probably exactly what you really want. In which case, just let the inferencer do its thing:
val matches = lis.flatten

Related

Char count in string

I'm new to scala and FP in general and trying to practice it on a dummy example.
val counts = ransomNote.map(e=>(e,1)).reduceByKey{case (x,y) => x+y}
The following error is raised:
Line 5: error: value reduceByKey is not a member of IndexedSeq[(Char, Int)] (in solution.scala)
The above example looks similar to staring FP primer on word count, I'll appreciate it if you point on my mistake.
It looks like you are trying to use a Spark method on a Scala collection. The two APIs have a few similarities, but reduceByKey is not part of it.
In pure Scala you can do it like this:
val counts =
ransomNote.foldLeft(Map.empty[Char, Int].withDefaultValue(0)) {
(counts, c) => counts.updated(c, counts(c) + 1)
}
foldLeft iterates over the collection from the left, using the empty map of counts as the accumulated state (which returns 0 is no value is found), which is updated in the function passed as argument by being updated with the found value, incremented.
Note that accessing a map directly (counts(c)) is likely to be unsafe in most situations (since it will throw an exception if no item is found). In this situation it's fine because in this scope I know I'm using a map with a default value. When accessing a map you will more often than not want to use get, which returns an Option. More on that on the official Scala documentation (here for version 2.13.2).
You can play around with this code here on Scastie.
On Scala 2.13 you can use the new groupMapReduce
ransomNote.groupMapReduce(identity)(_ => 1)(_ + _)
val str = "hello"
val countsMap: Map[Char, Int] = str
.groupBy(identity)
.mapValues(_.length)
println(countsMap)

Convert Map of mutable Set of strings to Map of immutable set of strings in Scala

I have a "dirtyMap" which is immutable.Map[String, collection.mutable.Set[String]]. I want to convert dirtyMap to immutable Map[String, Set[String]]. Could you please let me know how to do this. I tried couple of ways that didn't produce positive result
Method 1: Using map function
dirtyMap.toSeq.map(e => {
val key = e._1
val value = e._2.to[Set]
e._1 -> e._2
}).toMap()
I'm getting syntax error
Method 2: Using foreach
dirtyMap.toSeq.foreach(e => {
val key = e._1
val value = e._2.to[Set]
e._1 -> e._2
}).toMap()
cannot apply toMap to output of foreach
Disclaimer: I am a Scala noob if you couldn't tell.
UPDATE: Method 1 works when I remove parenthesis from toMap() function. However, following is an elegant solution
dirtyMap.mapValues(v => v.toSet)
Thank you Gabriele for providing answer with a great explanation. Thanks Duelist and Debojit for your answer as well
You can simply do:
dirtyMap.mapValues(_.toSet)
mapValues will apply the function to only the values of the Map, and .toSet converts a mutable Set to an immutable one.
(I'm assuming dirtyMap is a collection.immutable.Map. In case it's a mutable one, just add toMap in the end)
If you're not familiar with the underscore syntax for lambdas, it's a shorthand for:
dirtyMap.mapValues(v => v.toSet)
Now, your first example doesn't compile because of the (). toMap takes no explicit arguments, but it takes an implicit argument. If you want the implicit argument to be inferred automatically, just remove the ().
The second example doesn't work because foreach returns Unit. This means that foreach executes side effects, but it doesn't return a value. If you want to chain transformations on a value, never use foreach, use map instead.
You can use
dirtyMap.map({case (k,v) => (k,v.toSet)})
You can use flatMap for it:
dirtyMap.flatMap(entry => Map[String, Set[String]](entry._1 -> entry._2.toSet)).toMap
Firstly you map each entry to immutable.Map(entry) with updated entry, where value is immutable.Set now. Your map looks like this: mutable.Map.
And then flatten is called, so you get mutable.Map with each entry with immutable.Set. And then toMap converts this map to to immutable.
This variant is complicated a bit, you simply can use dirtyMap.map(...).toMap as Debojit Paul mentioned.
Another variant is foldLeft:
dirtyMap.foldLeft(Map[String, Set[String]]())(
(map, entry) => map + (entry._1 -> entry._2.toSet)
)
You specify accumulator, which is immutable.Map and you add each entry to this map with converted Set.
As for me, I think using foldLeft is more effective way.

Scala practices: lists and case classes

I've just started using Scala/Spark and having come from a Java background and I'm still trying to wrap my head around the concept of immutability and other best practices of Scala.
This is a very small segment of code from a larger program:
intersections is RDD(Key, (String, String))
obs is (Key, (String, String))
Data is just a case class I've defined above.
val intersections = map1 join map2
var listOfDatas = List[Data]()
intersections take NumOutputs foreach (obs => {
listOfDatas ::= ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})
listOfDatas foreach println
This code works and does what I need it to do, but I was wondering if there was a better way of making this happen. I'm using a variable list and rewriting it with a new list every single time I iterate, and I'm sure there has to be a better way to create an immutable list that's populated with the results of the ParseInformation method call. Also, I remember reading somewhere that instead of accessing the tuple values directly, the way I have done, you should use case classes within functions (as partial functions I think?) to improve readability.
Thanks in advance for any input!
This might work locally, but only because you are takeing locally. It will not work once distributed as the listOfDatas is passed to each worker as a copy. The better way of doing this IMO is:
val processedData = intersections map{case (key, (item1, item2)) => {
ParseInfo(key, item1, item2)
}}
processedData foreach println
A note for a new to functional dev: If all you are trying to do is transform data in an iterable (List), forget foreach. Use map instead, which runs your transformation on each item and spits out a new iterable of the results.
What's the type of intersections? It looks like you can replace foreach with map:
val listOfDatas: List[Data] =
intersections take NumOutputs map (obs => {
ParseInformation(obs._1.key, obs._2._1, obs._2._2)
})

Scala code: reverse operation

I wanted to ensure I understand some scala code correctly. I have a method in a class as:
def getNodes(): IndexedSeq[Node] = allNodes
Then somewhere this method gets called as:
val nodes = graph.getNodes()
and then there is a line
val orderedNodes = nodes ++ nodes.reverse
Does this make another sequence where the original sequence and the reversed get concatenated or is there some other subtlety to it as well?
Yes, the result is a new IndexedSeq containing items just like you wrote. You're calling methods ++ and reverse that are well documented here:
http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.IndexedSeq
Your code can be written like this:
val orderedNodes = nodes.++(nodes.reverse)

Converting mutable collection to immutable

I'm looking for a best way of converting a collection.mutable.Seq[T] to collection.immutable.Seq[T].
If you want to convert ListBuffer into a List, use .toList. I mention this because that particular conversion is performed in constant time. Note, though, that any further use of the ListBuffer will result in its contents being copied first.
Otherwise, you can do collection.immutable.Seq(xs: _*), assuming xs is mutable, as you are unlikely to get better performance any other way.
As specified:
def convert[T](sq: collection.mutable.Seq[T]): collection.immutable.Seq[T] =
collection.immutable.Seq[T](sq:_*)
Addition
The native methods are a little tricky to use. They are already defined on scala.collection.Seq and you’ll have to take a close look whether they return a collection.immutable or a collection.mutable. For example .toSeq returns a collection.Seq which makes no guarantees about mutability. .toIndexedSeq however, returns a collection.immutable.IndexedSeq so it seems to be fine to use. I’m not sure though, if this is really the intended behaviour as there is also a collection.mutable.IndexedSeq.
The safest approach would be to convert it manually to the intended collection as shown above. When using a native conversion, I think it is best practice to add a type annotation including (mutable/immutable) to ensure the correct collection is returned.
toList (or toStream if you want it lazy) are the preferred way if you want a LinearSeq, as you can be sure what you get back is immutable (because List and Stream are). There's no toVector method if you want an immutable IndexedSeq, but it seems that toIndexedSeq gives you a Vector (which is immutable) most if not all of the time.
Another way is to use breakOut. This will look at the type you're aiming for in your return type, and if possible oblige you. e.g.
scala> val ms = collection.mutable.Seq(1,2,3)
ms: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2, 3)
scala> val r: List[Int] = ms.map(identity)(collection.breakOut)
r: List[Int] = List(1, 2, 3)
scala> val r: collection.immutable.Seq[Int] = ms.map(identity)(collection.breakOut)
r: scala.collection.immutable.Seq[Int] = Vector(1, 2, 3)
For more info on such black magic, get some strong coffee and see this question.
If you are also working with Set and Map you can also try these, using TreeSet as an example.
import scala.collection.mutable
val immutableSet = TreeSet(blue, green, red, yellow)
//converting a immutable set to a mutable set
val mutableSet = mutable.Set.empty ++= immutableSet
//converting a mutable set back to immutable set
val anotherImmutableSet = Set.empty ++ mutableSet
The above example is from book Programming in Scala