Extend / Replicate Scala collections syntax to create your own collection? - scala

I want to build a map however I want to discard all keys with empty values as shown below:
#tailrec
def safeFiltersMap(
map: Map[String, String],
accumulator: Map[String,String] = Map.empty): Map[String, String] = {
if(map.isEmpty) return accumulator
val curr = map.head
val (key, value) = curr
safeFiltersMap(
map.tail,
if(value.nonEmpty) accumulator + (key->value)
else accumulator
)
}
Now this is fine however I need to use it like this:
val safeMap = safeFiltersMap(Map("a"->"b","c"->"d"))
whereas I want to use it like the way we instantiate a map:
val safeMap = safeFiltersMap("a"->"b","c"->"d")
What syntax can I follow to achieve this?

The -> syntax isn't a special syntax in Scala. It's actually just a fancy way of constructing a 2-tuple. So you can write your own functions that take 2-tuples as well. You don't need to define a new Map type. You just need a function that filters the existing one.
def safeFiltersMap(args: (String, String)*): Map[String, String] =
Map(args: _*).filter {
result => {
val (_, value) = result
value.nonEmpty
}
}
Then call using
safeFiltersMap("a"->"b","c"->"d")

Related

Scala map with function call results in references to the function instead of results

I have a list of keys for which I want to fetch data. The data is fetched via a function call for each key. I want to end up with a Map of key -> data. Here's what I've tried:
case class MyDataClass(val1: Int, val2: Boolean)
def getData(key: String): MyDataClass = {
// Dummy implementation
MyDataClass(1, true)
}
def getDataMapForKeys(keys: Seq[String]): Map[String, MyDataClass] = {
val dataMap: Map[String, MyDataClass] = keys.map((_, getData(_))).toMap
dataMap
}
This results in a type mismatch error:
type mismatch;
found : scala.collection.immutable.Map[String,String => MyDataClass]
required: Map[String,MyDataClass]
val dataMap: Map[String, MyDataClass] = keys.map((_, getData(_))).toMap
Why is it setting the values in the resulting Map to instances of the getData() function, rather than its result? How do I make it actually CALL the getData() function for each key and put the results as the values in the Map?
The code you wrote is the same as the following statements:
keys.map((_, getData(_)))
keys.map(x => (x, getData(_)))
keys.map(x => (x, y => getData(y)))
This should clarify why you obtain the error.
As suggested in the comments, stay away from _ unless in simple cases with only one occurrences.
The gist of the issue is (_, getData(_))) is creating a Tuple instead of a map entry for each key that is being mapped over. Using -> creates a Map which is what you want.
...
val dataMap: Map[String, MyDataClass] = keys.map(key => (key -> getData(key))).toMap
...

Scala Future Sequence Mapping: finding length?

I want to return both a Future[Seq[String]] from a method and the length of that Seq[String] as well. Currently I'm building the Future[Seq[String]] using a mapping function from another Future[T].
Is there any way to do this without awaiting for the Future?
You can map over the current Future to create a new one with the new data added to the type.
val fss: Future[Seq[String]] = Future(Seq("a","b","c"))
val x: Future[(Seq[String],Int)] = fss.map(ss => (ss, ss.length))
If you somehow know what the length of the Seq will be without actually waiting for it, then something like this;
val t: Future[T] = ???
def foo: (Int, Future[Seq[String]]) = {
val length = 42 // ???
val fut: Future[Seq[String]] = t map { v =>
genSeqOfLength42(v)
}
(length, fut)
}
If you don't, then you will have to return Future[(Int, Seq[String])] as jwvh said, or you can easily get the length later in the calling function.

Spark: Not able to use accumulator on a tuple/count using scala

I am trying to replace reduceByKey with accumulator logic for word count.
wc.txt
Hello how are are you
Here's what I've got so far:
val words = sc.textFile("wc.txt").flatMap(_.split(" "))
val accum = sc.accumulator(0,"myacc")
for (i <- 1 to words.count.toInt)
foreach( x => accum+ =x)
.....
How to proceed about it. Any thoughts or ideas are appreciated.
Indeed, using Accumulators for this is cumbersome and not recommended - but for completeness - here's how it can be done (at least with Spark versions 1.6 <= V <= 2.1). Do note that this uses a deprecated API that will not be a part of next versions.
You'll need a Map[String, Long] accumulator, which is not available by default, so you'll need to create your own AccumulableParam implementation and use it implicitly:
// some data:
val words = sc.parallelize(Seq("Hello how are are you")).flatMap(_.split(" "))
// aliasing the type, just for convenience
type AggMap = Map[String, Long]
// creating an implicit AccumulableParam that counts by String key
implicit val param: AccumulableParam[AggMap, String] = new AccumulableParam[AggMap, String] {
// increase matching value by 1, or create it if missing
override def addAccumulator(r: AggMap, t: String): AggMap =
r.updated(t, r.getOrElse(t, 0L) + 1L)
// merge two maps by summing matching values
override def addInPlace(r1: AggMap, r2: AggMap): AggMap =
r1 ++ r2.map { case (k, v) => k -> (v + r1.getOrElse(k, 0L)) }
// start with an empty map
override def zero(initialValue: AggMap): AggMap = Map.empty
}
// create the accumulator; This will use the above `param` implicitly
val acc = sc.accumulable[AggMap, String](Map.empty[String, Long])
// add each word to accumulator; the `count()` can be replaced by any Spark action -
// we just need to trigger the calculation of the mapped RDD
words.map(w => { acc.add(w); w }).count()
// after the action, we acn read the value of the accumulator
val result: AggMap = acc.value
result.foreach(println)
// (Hello,1)
// (how,1)
// (are,2)
// (you,1)
As I understand you want to count all words in you text file using Spark accumulator, in this case you can use:
words.foreach(_ => accum.add(1L))

Create Map in Scala using loop

I am trying to create a map after getting result for each items in the list. Here is what I tried so far:
val sourceList: List[(Int, Int)] = ....
val resultMap: Map[Int, Int] = for(srcItem <- sourceList) {
val result: Int = someFunction(srcItem._1)
Map(srcItem._1 -> result)
}
But I am getting type mismatch error in IntelliJ and I am definitely not writing proper syntax here. I don't think I can use yield as I don't want List of Map. What is correct way to create Map using for loop. Any suggestion?
The simplest way is to create the map out of a list of tuples:
val resultMap = sourceList.map(item => (item._1, someFunction(item._1))).toMap
Or, in the monadic way:
val listOfTuples = for {
(value, _) <- sourceList
} yield (value, someFunction(value))
val resultMap = listOfTuples.toMap
Alternatively, if you want to avoid the creation of listOfTuples you can make the transformation a lazy one by calling .view on sourceList and then call toMap:
val resultMap = sourceList.view
.map(item => (item._1, someFunction(item._1)))
.toMap
Finally, if you really want to avoid generating extra objects you can use a mutable Map instead and append the keys and values to it using += or .put

How to avoid any mutable things in this builder?

I have a simple Scala class like this:
class FiltersBuilder {
def build(filter: CommandFilter) = {
val result = collection.mutable.Map[String, String]()
if (filter.activity.isDefined) {
result += ("activity" -> """ some specific expression """)
} // I well know that manipulating option like this is not recommanded,
//it's just for the simplicity of the example
if (filter.gender.isDefined) {
result += ("gender" -> """ some specific expression """)
}
result.toMap //in order to return an immutable Map
}
}
using this class so:
case class CommandFilter(activity: Option[String] = None, gender: Option[String] = None)
The result content depends on the nature of the selected filters and their associated and hardcoded expressions (String).
Is there a way to transform this code snippet by removing this "mutability" of the mutable.Map?
Map each filter field to a tuple while you add the result to a Seq, then filter out the Nones with flatten finally convert the Seq of tuples to a Map with toMap.
For adding more fields to filter you just have to add a new line to the Seq
def build(filter: CommandFilter) = {
// map each filter filed to the proper tuple
// as they are options, map will transform just the Some and let the None as None
val result = Seq(
filter.activity.map(value => "activity" -> s""" some specific expression using $value """),
filter.gender.map(value => "gender" -> s""" some specific expression using $value """)
).flatten // flatten will filter out all the Nones
result.toMap // transform list of tuple to a map
}
Hope it helps.
Gaston.
Since there are at most 2 elements in your Map:
val activity = filter.activity.map(_ => Map("activity" -> "xx"))
val gender = filter.gender.map(_ => Map("gender" -> "xx"))
val empty = Map[String, String]()
activity.getOrElse(empty) ++ gender.getOrElse(empty)
I've just managed to achieve it with this solution:
class FiltersBuilder(commandFilter: CommandFilter) {
def build = {
val result = Map[String, String]()
buildGenderFilter(buildActivityFilter(result))
}
private def buildActivityFilter(expressions: Map[String, String]) =
commandFilter.activity.fold(expressions)(activity => result + ("activity" -> """ expression regarding activity """))
private def buildGenderFilter(expressions: Map[String, String]) =
commandFilter.gender.fold(expressions)(gender => result + ("gender" -> """ expression regarding gender """))
}
Any better way?