Getting a reference to an immutable Map - scala

I'm parallelising over a collection to count the number same item values in a List. The list in this case is uniqueSetOfLinks :
for (iListVal <- uniqueSetOfLinks.par) {
try {
val num : Int = listOfLinks.count(_.equalsIgnoreCase(iListVal))
linkTotals + iListVal -> num
}
catch {
case e : Exception => {
e.printStackTrace()
}
}
}
linkTotals is an immutable Map. To gain a reference to the total number of links do I need to update linkTotals so that it is immutable ?
I can then do something like :
linkTotals.put(iListVal, num)

You can't update immutable collection, all you can do is to combine immutable collection with addition element to get new immutable collection, like this:
val newLinkTotals = linkTotals + (iListVal -> num)
In case of collection you could create new collection of pairs and than add all pairs to the map:
val optPairs =
for (iListVal <- uniqueSetOfLinks.par)
yield
try {
val num : Int = listOfLinks.count(_.equalsIgnoreCase(iListVal))
Some(iListVal -> num)
}
catch {
case e : Exception => e.printStackTrace()
None
}
val newLinkTotals = linkTotals ++ optPairs.flatten // for non-empty initial map
val map = optPairs.flatten.toMap // in case there is no initial map
Note that you are using parallel collections (.par), so you should not use mutable state, like linkTotals += iListVal -> num.

Possible variation of #senia's answer (got rid of explicit flatten):
val optPairs =
(for {
iListVal <- uniqueSetOfLinks.par
count <- {
try
Some(listOfLinks.count(_.equalsIgnoreCase(iListVal)))
catch {
case e: Exception =>
e.printStackTrace()
None
}
}
} yield iListVal -> count) toMap

I think that you need some form of MapReduce in order to have parallel number of items estimation.
In your problem you already have all unique links. The partial intermediate result of map is simply a pair. And "reduce" is just toMap. So you can simply par-map the link to pair (link-> count) and then finally construct a map:
def count(iListVal:String) = listOfLinks.count(_.equalsIgnoreCase(iListVal))
val listOfPairs = uniqueSetOfLinks.par.map(iListVal => Try( (iListVal, count(iListVal)) ))
("map" operation is par-map)
Then remove exceptions:
val clearListOfPairs = listOfPairs.flatMap(_.toOption)
And then simply convert it to a map ("reduce"):
val linkTotals = clearListOfPairs.toMap
(if you need to check for exceptions, use Try.failure)

Related

How to make loop and exception inside flatmap?

I am having a requirement where I have to loop over a list and do create Map[String,String]. Here the header has values as list like below:
val headersMap = scala.collection.mutable.Map[String, String]()
try {
val payloadHeaders = collectorPayload.headers
for (values <- payloadHeaders.toList) {
for (value <- values) {
val header = value.split(":").map(_.trim)
headersMap += (header(0) -> header(1))
headersMap += ("Content-Type" -> "application/json; charset=UTF-8")
}
}
} catch {
case e: Exception => {
logger.error("Collector Payload extraction error with : " + e.getMessage)
}
}
Is there any better way to handle this any map or flatMap way?
Don't use mutable collections or variables (just pretend they don't exist util you run into a use case where you positively cannot do without them ... it won't be soon).
Also generally avoid using loops (because they kinda assume and promote mutability and side effects), you'll need them even less often than mutable collections.
collectorPayload
.headers
.iterator
.flatMap(_.split(":").map(_.trim))
.map { case Array(a,b) => a -> b }
.toMap + ("Content-Type" -> "application/json; charset=UTF-8")

Creating a Map by reading elements of List in Scala

I have some records in a List .
Now I want to create a new Map(Mutable Map) from that List with unique key for each record. I want to achieve this my reading a List and calling the higher order method called map in scala.
records.txt is my input file
100,Surender,2015-01-27
100,Surender,2015-01-30
101,Raja,2015-02-19
Expected Output :
Map(0-> 100,Surender,2015-01-27, 1 -> 100,Surender,2015-01-30,2 ->101,Raja,2015-02-19)
Scala Code :
object SampleObject{
def main(args:Array[String]) ={
val mutableMap = scala.collection.mutable.Map[Int,String]()
var i:Int =0
val myList=Source.fromFile("D:\\Scala_inputfiles\\records.txt").getLines().toList;
println(myList)
val resultList= myList.map { x =>
{
mutableMap(i) =x.toString()
i=i+1
}
}
println(mutableMap)
}
}
But I am getting output like below
Map(1 -> 101,Raja,2015-02-19)
I want to understand why it is keeping the last record alone .
Could some one help me?
val mm: Map[Int, String] = Source.fromFile(filename).getLines
.zipWithIndex
.map({ case (line, i) => i -> line })(collection.breakOut)
Here the (collection.breakOut) is to avoid the extra parse caused by toMap.
Consider
(for {
(line, i) <- Source.fromFile(filename).getLines.zipWithIndex
} yield i -> line).toMap
where we read each line, associate an index value starting from zero and create a map out of each association.

For loop inside a .map() in scala: return type is "Unit"

EDIT: I found this What is Scala's yield? (particularly the second, most popular, answer) to be very instructive after the accepted answer solved my problem.
==
I have a HashMap, which I want to iterate in, and for each keys, use a for loop to create new objects.
I'm trying to get a list of those new objects, but I'm always given back an empty "Unit" sequence. I'd like to understand better the behaviour of my code.
case class MyObject(one: String, two: String, three: Int)
val hm = new HashMap[String,Int]
hm += ("key" -> 3)
hm += ("key2" -> 4)
val newList = hm.map { case (key,value) =>
for (i <- 0 until value) {
new MyObject(key, "a string", i)
}}.toSeq
result:
newList:Seq[Unit] = ArrayBuffer((), ())
If I don't use any for loop inside the .map(), I have the type of structure I'm expecting:
val newList = hm.map { case (key,value) =>
new MyObject(key, "a string", value)}.toSeq
results in:
newList:Seq[MyObject] = ArrayBuffer(MyObject(key,host,3), MyObject(key2,host,4))
As I mentioned in my comment, you are missing yield on the for comprehension in your map statement. If you do not include the yield keyword then your for comprehension is purely side effecting and does not produce anything. Change it to:
for (i <- 0 until value) yield {
Now from here, you will end up with a Seq[IndexedSeq[MyObject]]. If you want to end up with just a Seq[MyObject] then you can flatten like so:
val newList = hm.map { case (key,value) =>
for (i <- 0 until value) yield {
MyObject(key, "a string", i)
}}.toSeq.flatten
}
And in fact (as pointed out by #KarolS), you can shorten this even further by replacing map with flatMap and remove the explicit flatten at the end:
val newList = hm.flatMap { case (key,value) =>
for (i <- 0 until value) yield {
MyObject(key, "a string", i)
}}.toSeq
}

How do I flatten a nested For Comprehension that uses I/O?

I am having trouble flattening a nested For Generator into a single For Generator.
I created MapSerializer to save and load Maps.
Listing of MapSerializer.scala:
import java.io.{ObjectInputStream, ObjectOutputStream}
object MapSerializer {
def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] =
(for (_ <- 1 to in.readInt()) yield {
val key = in.readUTF()
for (_ <- 1 to in.readInt()) yield {
val value = in.readInt()
(key, value)
}
}).flatten.groupBy(_ _1).mapValues(_ map(_ _2))
def saveMap(out: ObjectOutputStream, map: Map[String, Seq[Int]]) {
out.writeInt(map size)
for ((key, values) <- map) {
out.writeUTF(key)
out.writeInt(values size)
values.foreach(out.writeInt(_))
}
}
}
Modifying loadMap to assign key within the generator causes it to fail:
def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] =
(for (_ <- 1 to in.readInt();
key = in.readUTF()) yield {
for (_ <- 1 to in.readInt()) yield {
val value = in.readInt()
(key, value)
}
}).flatten.groupBy(_ _1).mapValues(_ map(_ _2))
Here is the stacktrace I get:
java.io.UTFDataFormatException
at java.io.ObjectInputStream$BlockDataInputStream.readWholeUTFSpan(ObjectInputStream.java)
at java.io.ObjectInputStream$BlockDataInputStream.readOpUTFSpan(ObjectInputStream.java)
at java.io.ObjectInputStream$BlockDataInputStream.readWholeUTFSpan(ObjectInputStream.java)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2819)
at java.io.ObjectInputStream.readUTF(ObjectInputStream.java:1050)
at MapSerializer$$anonfun$loadMap$1.apply(MapSerializer.scala:8)
at MapSerializer$$anonfun$loadMap$1.apply(MapSerializer.scala:7)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.immutable.Range.foreach(Range.scala:76)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
at scala.collection.immutable.Range.map(Range.scala:43)
at MapSerializer$.loadMap(MapSerializer.scala:7)
I would like to flatten the loading code to a single For Comprehension, but I get errors that suggest that it is either executing in a different order or repeating steps I am not expecting it to repeat.
Why is it that moving the assignment of key into the generator causes it to fail?
Can I flatten this into a single generator? If so, what would that generator be?
Thank you for self contained compiling code in your question. I don't think you want to flatten the loops as the structure is not flat. You then need to use groupBy to recover the structure. Also if you have "zero -> Seq()" as an element of the map, it would be lost. Using this simple map avoids the groupBy and preserves the elements mapped to empty sequences:
def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] = {
val size = in.readInt
(1 to size).map{ _ =>
val key = in.readUTF
val nval = in.readInt
key -> (1 to nval).map(_ => in.readInt)
}(collection.breakOut)
}
I use breakOut to generate the right type as otherwise I think the compilers complains about generic Map and immutable Map mismatch. You can also use Map() ++ (...).
Note: I arrived at this solution by being confused by your for loop and starting to rewrite using as flatMap and map:
val tuples = (1 to size).flatMap{ _ =>
val key = in.readUTF
println("key " + key)
val nval = in.readInt
(1 to nval).map(_ => key -> in.readInt)
}
I think in the for loop, something happens when you don't use some of the generator. I though this would be equivalent to:
val tuples = for {
_ <- 1 to size
key = in.readUTF
nval = in.readInt
_ <- 1 to nval
value = in.readInt
} yield { key -> value }
But this is not the case, so I think I'm missing something in the translation.
Edit: figured out what's wrong with a single for loop. Short story: the translation of definitions within for loops caused the key = in.readUTF statement to be called consecutively before the inner loop is executed. To work around this, use view and force:
val tuples = (for {
_ <- (1 to size).view
key = in.readUTF
nval = in.readInt
_ <- 1 to nval
value = in.readInt
} yield { key -> value }).force
The issue can be demonstrated more clearly with this piece of code:
val iter = Iterator.from(1)
val tuple = for {
_ <- 1 to 3
outer = iter.next
_ <- 1 to 3
inner = iter.next
} yield (outer, inner)
It returns Vector((1,4), (1,5), (1,6), (2,7), (2,8), (2,9), (3,10), (3,11), (3,12)) which shows that all outer values are evaluated before inner values. This is due to the fact that it is more or less translated to something like:
for {
(i, outer) <- for (i <- (1 to 3)) yield (i, iter.next)
_ <- 1 to 3
inner = iter.next
} yield (outer, inner)
This computes all outer iter.next first. Going back to the original use case, all in.readUTF values would be called consecutively before in.readInt.
Here is the compacted version of #huynhjl's answer that I eventually deployed:
def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] =
((1 to in.readInt()) map { _ =>
in.readUTF() -> ((1 to in.readInt()) map { _ => in.readInt()) }
})(collection.breakOut)
The advantage of this version is that there are no direct assignments.

Tune Nested Loop in Scala

I was wondering if I can tune the following Scala code :
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
var listNoDuplicates: List[(Class1, Class2)] = Nil
for (outerIndex <- 0 until listOfTuple.size) {
if (outerIndex != listOfTuple.size - 1)
for (innerIndex <- outerIndex + 1 until listOfTuple.size) {
if (listOfTuple(i)._1.flag.equals(listOfTuple(j)._1.flag))
listNoDuplicates = listOfTuple(i) :: listNoDuplicates
}
}
listNoDuplicates
}
Usually if you have someting looking like:
var accumulator: A = new A
for( b <- collection ) {
accumulator = update(accumulator, b)
}
val result = accumulator
can be converted in something like:
val result = collection.foldLeft( new A ){ (acc,b) => update( acc, b ) }
So here we can first use a map to force the unicity of flags. Supposing the flag has a type F:
val result = listOfTuples.foldLeft( Map[F,(ClassA,ClassB)] ){
( map, tuple ) => map + ( tuple._1.flag -> tuple )
}
Then the remaining tuples can be extracted from the map and converted to a list:
val uniqList = map.values.toList
It will keep the last tuple encoutered, if you want to keep the first one, replace foldLeft by foldRight, and invert the argument of the lambda.
Example:
case class ClassA( flag: Int )
case class ClassB( value: Int )
val listOfTuples =
List( (ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)), (ClassA(1),ClassB(-1)) )
val result = listOfTuples.foldRight( Map[Int,(ClassA,ClassB)]() ) {
( tuple, map ) => map + ( tuple._1.flag -> tuple )
}
val uniqList = result.values.toList
//uniqList: List((ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)))
Edit: If you need to retain the order of the initial list, use instead:
val uniqList = listOfTuples.filter( result.values.toSet )
This compiles, but as I can't test it it's hard to say if it does "The Right Thing" (tm):
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
(for {outerIndex <- 0 until listOfTuple.size
if outerIndex != listOfTuple.size - 1
innerIndex <- outerIndex + 1 until listOfTuple.size
if listOfTuple(i)._1.flag == listOfTuple(j)._1.flag
} yield listOfTuple(i)).reverse.toList
Note that you can use == instead of equals (use eq if you need reference equality).
BTW: https://codereview.stackexchange.com/ is better suited for this type of question.
Do not use index with lists (like listOfTuple(i)). Index on lists have very lousy performance. So, some ways...
The easiest:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
SortedSet(listOfTuple: _*)(Ordering by (_._1.flag)).toList
This will preserve the last element of the list. If you want it to preserve the first element, pass listOfTuple.reverse instead. Because of the sorting, performance is, at best, O(nlogn). So, here's a faster way, using a mutable HashSet:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
// Produce a hash map to find the duplicates
import scala.collection.mutable.HashSet
val seen = HashSet[Flag]()
// now fold
listOfTuple.foldLeft(Nil: List[(Class1,Class2)]) {
case (acc, el) =>
val result = if (seen(el._1.flag)) acc else el :: acc
seen += el._1.flag
result
}.reverse
}
One can avoid using a mutable HashSet in two ways:
Make seen a var, so that it can be updated.
Pass the set along with the list being created in the fold. The case then becomes:
case ((seen, acc), el) =>