I'm new to scala and FP in general and trying to practice it on a dummy example.
val counts = ransomNote.map(e=>(e,1)).reduceByKey{case (x,y) => x+y}
The following error is raised:
Line 5: error: value reduceByKey is not a member of IndexedSeq[(Char, Int)] (in solution.scala)
The above example looks similar to staring FP primer on word count, I'll appreciate it if you point on my mistake.
It looks like you are trying to use a Spark method on a Scala collection. The two APIs have a few similarities, but reduceByKey is not part of it.
In pure Scala you can do it like this:
val counts =
ransomNote.foldLeft(Map.empty[Char, Int].withDefaultValue(0)) {
(counts, c) => counts.updated(c, counts(c) + 1)
}
foldLeft iterates over the collection from the left, using the empty map of counts as the accumulated state (which returns 0 is no value is found), which is updated in the function passed as argument by being updated with the found value, incremented.
Note that accessing a map directly (counts(c)) is likely to be unsafe in most situations (since it will throw an exception if no item is found). In this situation it's fine because in this scope I know I'm using a map with a default value. When accessing a map you will more often than not want to use get, which returns an Option. More on that on the official Scala documentation (here for version 2.13.2).
You can play around with this code here on Scastie.
On Scala 2.13 you can use the new groupMapReduce
ransomNote.groupMapReduce(identity)(_ => 1)(_ + _)
val str = "hello"
val countsMap: Map[Char, Int] = str
.groupBy(identity)
.mapValues(_.length)
println(countsMap)
Related
pqr:String
xyz:List[(String,String),(String,String),......]
could be one pair or many so using map and I want to update so converting ListBuffer
var abc:ListBuffer[(String,String)] = xyz.to[ListBuffer]
abc.map(x => x._2 = pqr)
in above line I am getting an error reassignment to val
How can I update second string in every element of listBuffer ?
Scala places a big emphasis on immutable data. That is, generally in Scala you don't want to change existing values; you want to make new data that represents the changes you want. Your input is of type List, which is immutable. An immutable type will never become mutable; when you do to[ListBuffer], you're making a new list buffer which is unrelated to the previous, and the new list buffer is mutable.
If we have xyz a List[(String, String)], then we can use map to change the second element.
val myNewList = xyz.map { case (x, _) => (x, pqr) }
Note that xyz is not changed. That's the point. We made a new list with the changes intact. You can't change immutable data.
(Minor note: If you're using Scala 3, you can remove the word case from the above example, as tuple destructuring is inferred in Scala 3)
I'm upgrading a software scala based from scala 2.12 to scala 2.13.
Some blocks of code was broken and fixing them I found a wierd behavior in an already existing code.
#Edit
Below we have an explanation of a use case of this problem.
The problem in general is the usage of the method .map in mutable or immutable maps since Scala 2.13. At the end of the question we have an easy way to reproduce, and further we have the answer.
Obs.:
The term variables used there is a business field, its not related to
scala (or java) variables.
list is an Array[Long] containing idProject.
The return of p.getVariables is a Map[String, VariableSet].
So after the second .map (line 4), we have an Array[(Project,
Map[String, VariableSet])]
The code is:
// get the projects
list.map(projectById)
// and map their variable
.map(p => p -> p.getVariables)
// and transform them to DTOs
.map(pair => pair._2.map(set => toVariableDTO(set._2, pair._1.getId)).toArray)
// and return the union of all of them
.reduce((l, r) => l.union(r))
// and sort them by name
.sortBy(_.name.toLowerCase)
The problems comes at the 6th line, because after the upgrade it recognizes the set (pair._2.map(set => ) as type "Nothing" .
I tried line by line and it seems to work.
Like this:
val abs = list.map(projectById).map(p => p -> p.getVariables)
val ab = abs.map(pair => pair._2)
ab.map(pair => pair)
The problem here is that in the 6th line of previous example I need the reference to the project associated to that flow.
Of course, there would be room to rewrite this in another way (continuing the work on the second example), but I have many many other cases like this and would like to know if its really supposed not to work anymore of if I miss something during the upgrade.
Thanks in advance!
#Edit
Easy way to reproduce:
import scala.collection.mutable.{Map => MMap}
val mmap: MMap[String, Long] = MMap[String, Long]()
mmap.map(set => ) // Here, it recognizes 'set' as Nothing .
Looks like scala 2.13 see an element of Mutable Map as 'Nothing' ?
Well, after searching and struggling for hours, I figure out that this is one of the major changes in scala 2.13.
To use the expected behavior from the .map method, executed in map objects, we need to explicitly say that it should use the implementation from Iterable (which is the default on in scala 2.12 or lower).
We do that adding a .iterator before the .map call.
So for that, according to my "easy to reproduce" step would be like that:
import scala.collection.mutable.{Map => MMap}
val mmap: MMap[String, Long] = MMap[String, Long]()
mmap.iterator.map(set => set._2) // Now we may use the 'set' normally
I will make a few changes in my question to make it easier to find, for those who may have a similar problem.
I have a "dirtyMap" which is immutable.Map[String, collection.mutable.Set[String]]. I want to convert dirtyMap to immutable Map[String, Set[String]]. Could you please let me know how to do this. I tried couple of ways that didn't produce positive result
Method 1: Using map function
dirtyMap.toSeq.map(e => {
val key = e._1
val value = e._2.to[Set]
e._1 -> e._2
}).toMap()
I'm getting syntax error
Method 2: Using foreach
dirtyMap.toSeq.foreach(e => {
val key = e._1
val value = e._2.to[Set]
e._1 -> e._2
}).toMap()
cannot apply toMap to output of foreach
Disclaimer: I am a Scala noob if you couldn't tell.
UPDATE: Method 1 works when I remove parenthesis from toMap() function. However, following is an elegant solution
dirtyMap.mapValues(v => v.toSet)
Thank you Gabriele for providing answer with a great explanation. Thanks Duelist and Debojit for your answer as well
You can simply do:
dirtyMap.mapValues(_.toSet)
mapValues will apply the function to only the values of the Map, and .toSet converts a mutable Set to an immutable one.
(I'm assuming dirtyMap is a collection.immutable.Map. In case it's a mutable one, just add toMap in the end)
If you're not familiar with the underscore syntax for lambdas, it's a shorthand for:
dirtyMap.mapValues(v => v.toSet)
Now, your first example doesn't compile because of the (). toMap takes no explicit arguments, but it takes an implicit argument. If you want the implicit argument to be inferred automatically, just remove the ().
The second example doesn't work because foreach returns Unit. This means that foreach executes side effects, but it doesn't return a value. If you want to chain transformations on a value, never use foreach, use map instead.
You can use
dirtyMap.map({case (k,v) => (k,v.toSet)})
You can use flatMap for it:
dirtyMap.flatMap(entry => Map[String, Set[String]](entry._1 -> entry._2.toSet)).toMap
Firstly you map each entry to immutable.Map(entry) with updated entry, where value is immutable.Set now. Your map looks like this: mutable.Map.
And then flatten is called, so you get mutable.Map with each entry with immutable.Set. And then toMap converts this map to to immutable.
This variant is complicated a bit, you simply can use dirtyMap.map(...).toMap as Debojit Paul mentioned.
Another variant is foldLeft:
dirtyMap.foldLeft(Map[String, Set[String]]())(
(map, entry) => map + (entry._1 -> entry._2.toSet)
)
You specify accumulator, which is immutable.Map and you add each entry to this map with converted Set.
As for me, I think using foldLeft is more effective way.
Problem
Maybe this is due to my lack of Scala knowledge, but it seems like adding another level to the for comprehension should just work. If the first for comprehension line is commented out, the code works. I ultimately want a Set[Int] instead of '1 to 2', but it serves to show the problem. The first two lines of the for should not need a type specifier, but I include it to show that I've tried the obvious.
Tools/Jars
IntelliJ 2016.1
Java 8
Scala 2.10.5
Cassandra 3.x
spark-assembly-1.6.0-hadoop2.6.0.jar (pre-built)
spark-cassandra-connector_2.10-1.6.0-M1-SNAPSHOT.jar (pre-built)
spark-cassandra-connector-assembly-1.6.0-M1-SNAPSHOT.jar (I built)
Code
case class NotifHist(intnotifhistid:Int, eventhistids:Seq[Int], yosemiteid:String, initiatorname:String)
case class NotifHistSingle(intnotifhistid:Int, inteventhistid:Int, dataCenter:String, initiatorname:String)
object SparkCassandraConnectorJoins {
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeId:Int) {
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
...etc...
}
Compilation Output
Information:3/29/16 8:52 AM - Compilation completed with 1 error and 0 warnings in 1s 507ms
/home/jpowell/Projects/SparkCassandraConnector/src/com/mir3/spark/SparkCassandraConnectorJoins.scala
**Error:(88, 21) type mismatch;
found : scala.collection.immutable.IndexedSeq[Nothing]
required: org.apache.spark.rdd.RDD[com.mir3.spark.NotifHistSingle]
orgNodeId:Int <- 1 to 2
^**
Later
#slouc Thanks for the comprehensive answer. I was using the for comprehension's syntactic sugar to also keep state from the second statement to fill elements in the NotifHistSingle ctor, so I don't see how to get the equivalent map/flatmap to work. Therefore, I went with the following solution:
def joinQueryAfterMakingExpandedRdd(sc:SparkContext, orgNodeIds:Set[Int]) {
def notifHistForOrg(orgNodeId:Int): RDD[NotifHistSingle] = {
for {
notifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(notifHist.intnotifhistid, eventHistId, notifHist.yosemiteid, notifHist.initiatorname)
}
val emptyTable:RDD[NotifHistSingle] = sc.emptyRDD[NotifHistSingle]
val notifHistForAllOrgs:RDD[NotifHistSingle] = orgNodeIds.foldLeft(emptyTable)((accum, oid) => accum ++ notifHistForOrg(oid))
}
For comprehension is actually syntax sugar; what's really going on underneath is a series of chained flatMap calls, with a single map at the end which replaces yield. Scala compiler translates every for comprehension like this. If you use if conditions in your for comprehension, they are translated into filters, and if you don't yield anything foreach is used. For more information, see here.
So, to explain on your case - this:
val notifHist:RDD[NotifHistSingle] = for {
orgNodeId:Int <- 1 to 2 // comment out this line and it works
notifHist:NotifHist <- sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", orgNodeId)
eventHistId <- notifHist.eventhistids
} yield NotifHistSingle(...)
is actually translated by the compiler to this:
val notifHist:RDD[NotifHistSingle] = (1 to 2)
.flatMap(x => sc.cassandraTable[NotifHist](keyspace, "notifhist").where("intorgnodeid = ?", x)
.flatMap(x => x.eventhistids)
.map(x => NotifHistSingle(...))
You are getting the error if you include the 1 to 2 line because that makes your for comprehension operate on a sequence (vector, to be more precise). So when invoking flatMap(), compiler expects you to follow up with a function that transforms each element of your vector to a GenTraversableOnce. If you take a closer look at the type of your for expression (most IDEs will display it just by hovering over it) you can see it for yourself:
def flatMap[B, That](f: A => GenTraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That
This is the problem. Compiler doesn't know how to flatMap the vector 1 to 10 using a function that returns CassandraRDD. It wants a function that returns GenTraversableOnce. If you remove the 1 to 2 line then you remove this restriction.
Bottom line - if you want to use a for comprehension and yield values out of it, you have to obey the type rules. It's impossible to flatten a sequence consisting of elements which are not sequences and cannot be turned into sequences.
You can always map instead of flatMap since map is less restrictive (it requires A => B instead of A => GenTraversableOnce[B]). This means that instead of getting all results in one giant sequence, you will get a sequence where each element is a group of results (one group for each query). You can also play around the types, trying to get a GenTraversableOnce from your query result (e.g. invoking sc.cassandraTable().where().toArray or something; I don't really work with Cassandra so I don't know).
I have a for loop within which I get an Seq[Seq[(String,Int)]] for every run. I have the usual way of running through the Seq[Seq[(String,Int)]] to get every Seq[(String,Int)] and then append it to a ListBuffer[Seq[String,Int]].
Here is the following code:
var lis; //Seq[Seq[Tuple2(String,Int)]]
var matches = new ListBuffer[(String,Int)]
someLoop.foreach(k=>
// someLoop gives lis object on evry run,
// and that needs to be added to matches list
lis.foreach(j => matches.appendAll(j))
)
Is there better way to do this process without running through Seq[Seq[String,Int]] loop, say directly adding all the seq objects from the Seq to the ListBuffer?
I tried the ++ operator, by adding matches and lis directly. It didn't work either. I use Scala 2.10.2
Try this:
matches.appendAll(lis.flatten)
This way you can avoid the mutable ListBuffer at all. lis.flatten will be the Seq[(String, Int)]. So you can shorten your code like this:
val lis = ... //whatever that is Seq[Seq[(String, Int)]]
val flatLis = lis.flatten // Seq[(String, Int)]
Avoid var's and mutable structures like ListBuffer as much as you can
You don't need to append to an empty ListBuffer, just create it directly:
import collection.breakOut
val matches: ListBuffer[(String,Int)] =
lis.flatten(breakOut)
breakOut is the magic here. Calling flatten on a Seq[Seq[T]] would usually create a Seq[T] that you'd then have to convert to a ListBuffer. Using breakOut causes it to look at the expected output type and build that kind of collection instead.
Of course... You were only using ListBuffer for mutability anyway, so a Seq[T] is probably exactly what you really want. In which case, just let the inferencer do its thing:
val matches = lis.flatten