How to optionally return value from map function - scala

map function on collections requires to return some value for each iteration. But I'm trying to find a way to return value not for each iteration, but only for initial values which matches some predicate .
What I want looks something like this:
(1 to 10).map { x =>
val res: Option[Int] = service.getById(x)
if (res.isDefined) Pair(x, res.get )// no else part
}
I think something like .collect function could do it, but seems with collect function I need to write many code in guards blocks (case x if {...// too much code here})

If you are returning an Option you can flatMap it and get only the values that are present (that is, are not None).
(1 to 10).flatMap { x =>
val res: Option[Int] = service.getById(x)
res.map{y => Pair(x, y) }
}
As you suggest, an alternative way to combine map and filter is to use collect and a partially applied function. Here is a simplified example:
(1 to 10).collect{ case x if x > 5 => x*2 }
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(12, 14, 16, 18, 20)

You can use the collect function (see here) to do exactly what you want. Your example would then look like:
(1 to 10) map (x => (x, service.getById(x))) collect {
case (x, Some(res)) => Pair(x, res)
}

Using a for comprehension, like this,
for ( x <- 1 to 10; res <- service.getById(x) ) yield Pair(x, res.get)
This yields pairs where res does not evaluate to None.

Getting the first element:
(1 to 10).flatMap { x =>
val res: Option[Int] = service.getById(x)
res.map{y => Pair(x, y) }
}.head

Related

Scala - access collection members within map or flatMap

Suppose that I use a sequence of various maps and/or flatMaps to generate a sequence of collections. Is it possible to access information about the "current" collection from within any of those methods? For example, without knowing anything specific about the functions used in the previous maps or flatMaps, and without using any intermediate declarations, how can I get the maximum value (or length, or first element, etc.) of the collection upon which the last map acts?
List(1, 2, 3)
.flatMap(x => f(x) /* some unknown function */)
.map(x => x + ??? /* what is the max element of the collection? */)
Edit for clarification:
In the example, I'm not looking for the max (or whatever) of the initial List. I'm looking for the max of the collection after the flatMap has been applied.
By "without using any intermediate declarations" I mean that I do not want to use any temporary collections en route to the final result. So, the example by Steve Waldman below, while giving the desired result, is not what I am seeking. (I include this condition is mostly for aesthetic reasons.)
Edit for clarification, part 2:
The ideal solution would be some magic keyword or syntactic sugar that lets me reference the current collection:
List(1, 2, 3)
.flatMap(x => f(x))
.map(x => x + theCurrentList.max)
I'm prepared to accept the fact, however, that this simply is not possible.
Maybe just define the list as a val, so you can name it? I don't know of any facility built into map(...) or flatMap(...) that would help.
val myList = List(1, 2, 3)
myList
.flatMap(x => f(x) /* some unknown function */)
.map(x => x + myList.max /* what is the max element of the List? */)
Update: By this approach at least, if you have multiple transformations and want to see the transformed version, you'd have to name that. You could get away with
val myList = List(1, 2, 3).flatMap(x => f(x) /* some unknown function */)
myList.map(x => x + myList.max /* what is the max element of the List? */)
Or, if there will be multiple transformations, get in the habit of naming the stages.
val rawList = List(1, 2, 3)
val smordified = rawList.flatMap(x => f(x) /* some unknown function */)
val maxified = smordified.map(x => x + smordified.max /* what is the max element of the List? */)
maxified
Update 2: Watch it work in the REPL even with heterogenous types:
scala> def f( x : Int ) : Vector[Double] = Vector(x * math.random, x * math.random )
f: (x: Int)Vector[Double]
scala> val rawList = List(1, 2, 3)
rawList: List[Int] = List(1, 2, 3)
scala> val smordified = rawList.flatMap(x => f(x) /* some unknown function */)
smordified: List[Double] = List(0.40730853571901315, 0.15151641399798665, 1.5305929709857609, 0.35211231420067435, 0.644241939254793, 0.15530230501048903)
scala> val maxified = smordified.map(x => x + smordified.max /* what is the max element of the List? */)
maxified: List[Double] = List(1.937901506704774, 1.6821093849837476, 3.0611859419715217, 1.8827052851864352, 2.1748349102405538, 1.6858952759962498)
scala> maxified
res3: List[Double] = List(1.937901506704774, 1.6821093849837476, 3.0611859419715217, 1.8827052851864352, 2.1748349102405538, 1.6858952759962498)
It is possible, but not pretty, and not likely something you want if you are doing it for "aesthetic reasons."
import scala.math.max
def f(x: Int): Seq[Int] = ???
List(1, 2, 3).
flatMap(x => f(x) /* some unknown function */).
foldRight((List[Int](),List[Int]())) {
case (x, (xs, Nil)) => ((x :: xs), List.fill(xs.size + 1)(x))
case (x, (xs, xMax :: _)) => ((x :: xs), List.fill(xs.size + 1)(max(x, xMax)))
}.
zipped.
map {
case (x, xMax) => x + xMax
}
// Or alternately, a slightly more efficient version using Streams.
List(1, 2, 3).
flatMap(x => f(x) /* some unknown function */).
foldRight((List[Int](),Stream[Int]())) {
case (x, (xs, Stream())) =>
((x :: xs), Stream.continually(x))
case (x, (xs, curXMax #:: _)) =>
val newXMax = max(x, curXMax)
((x :: xs), Stream.continually(newXMax))
}.
zipped.
map {
case (x, xMax) => x + xMax
}
Seriously though, I just took this on to see if I could do it. While the code didn't turn out as bad as I expected, I still don't think it's particularly readable. I'd discourage using this over something similar to Steve Waldman's answer. Sometimes, it's simply better to just introduce a val, rather than being dogmatic about it.
You could define a mapWithSelf (resp. flatMapWithSelf) operation along these lines and add it as an implicit enrichment to the collection. For List it might look like:
// Scala 2.13 APIs
object Enrichments {
implicit class WithSelfOps[A](val lst: List[A]) extends AnyVal {
def mapWithSelf[B](f: (A, List[A]) => B): List[B] =
lst.map(f(_, lst))
def flatMapWithSelf[B](f: (A, List[A]) => IterableOnce[B]): List[B] =
lst.flatMap(f(_, lst))
}
}
The enrichment basically fixes the value of the collection before the operation and threads it through. It should be possible to generify this (at least for the strict collections), though it would look a little different in 2.12 vs. 2.13+.
Usage would look like
import Enrichments._
val someF: Int => IterableOnce[Int] = ???
List(1, 2, 3)
.flatMap(someF)
.mapWithSelf { (x, lst) =>
x + lst.max
}
So at the usage site, it's aesthetically pleasant. Note that if you're computing something which traverses the list, you'll be traversing the list every time (leading to a quadratic runtime). You can get around that with some mutability or by just saving the intermediate list after the flatMap.
One somewhat-simple way of referencing prior output within the current map/collect operation is to use a named reference outside the map, then reference it from within the map block:
var prevOutput = ... // starting value of whatever is referenced within the map
myValues.map {
prevOutput = ... // expression that references prior `prevOutput`
prevOutput // return above computed value for the map to collect
}
This draws attention to the fact that we're referencing prior elements while building the new sequence.
This would be more messy, though, if you wanted to reference arbitrarily previous values, not just the previous one.

Scala collect function

Let's say I want to print duplicates in a list with their count. So I have 3 options as shown below:
def dups(dup:List[Int]) = {
//1)
println(dup.groupBy(identity).collect { case (x,ys) if ys.lengthCompare(1) > 0 => (x,ys.size) }.toSeq)
//2)
println(dup.groupBy(identity).collect { case (x, List(_, _, _*)) => x }.map(x => (x, dup.count(y => x == y))))
//3)
println(dup.distinct.map((a:Int) => (a, dup.count((b:Int) => a == b )) ).filter( (pair: (Int,Int) ) => { pair._2 > 1 } ))
}
Questions:
-> For option 2, is there any way to name the list parameter so that it can be used to append the size of the list just like I did in option 1 using ys.size?
-> For option 1, is there any way to avoid the last call to toSeq to return a List?
-> which one of the 3 choices is more efficient by using the least amount of loops?
As an example input: List(1,1,1,2,3,4,5,5,6,100,101,101,102)
Should print: List((1,3), (5,2), (101,2))
Based on #lutzh answer below the best way would be to do the following:
val list: List[(Int, Int)] = dup.groupBy(identity).collect({ case (x, ys # List(_, _, _*)) => (x, ys.size) })(breakOut)
val list2: List[(Int, Int)] = dup.groupBy(identity).collect { case (x, ys) if ys.lengthCompare(1) > 0 => (x, ys.size) }(breakOut)
For option 1 is there any way to avoid the last call to toSeq to
return a List?
collect takes a CanBuildFrom, so if you assign it to something of the desired type you can use breakOut:
import collection.breakOut
val dups: List[(Int,Int)] =
dup
.groupBy(identity)
.collect({ case (x,ys) if ys.size > 1 => (x,ys.size)} )(breakOut)
collect will create a new collection (just like map), using a Builder. Usually the return type is determined by the origin type. With breakOut you basically ignore the origin type and look for a builder for the result type. So when collect creates the resulting collection, it will already create the "right" type, and you don't have to traverse the result again to convert it.
For option 2, is there any way to name the list parameter so that it
can be used to append the size of the list just like I did in option 1
using ys.size?
Yes, you can bind it to a variable with #
val dups: List[(Int,Int)] =
dup
.groupBy(identity)
.collect({ case (x, ys # List(_, _, _*)) => (x, ys.size) } )(breakOut)
which one of the 3 choices is more efficient?
Calling dup.count on a match seems inefficient, as dup needs to be traversed again then, I'd avoid that.
My guess would be that the guard (if lengthCompare(1) > 0) takes a few cycles less than the List(,,_*) pattern, but I haven't measured. And am not planning to.
Disclaimer: There may be a completely different (and more efficient) way of doing it that I can't think of right now. I'm only answering your specific questions.

Scala: transform a collection, yielding 0..many elements on each iteration

Given a collection in Scala, I'd like to traverse this collection and for each object I'd like to emit (yield) from 0 to multiple elements that should be joined together into a new collection.
For example, I expect something like this:
val input = Range(0, 15)
val output = input.somefancymapfunction((x) => {
if (x % 3 == 0)
yield(s"${x}/3")
if (x % 5 == 0)
yield(s"${x}/5")
})
to build an output collection that will contain
(0/3, 0/5, 3/3, 5/5, 6/3, 9/3, 10/5, 12/3)
Basically, I want a superset of what filter (1 → 0..1) and map (1 → 1) allows to do: mapping (1 → 0..n).
Solutions I've tried
Imperative solutions
Obviously, it's possible to do so in non-functional maneer, like:
var output = mutable.ListBuffer()
input.foreach((x) => {
if (x % 3 == 0)
output += s"${x}/3"
if (x % 5 == 0)
output += s"${x}/5"
})
Flatmap solutions
I know of flatMap, but it again, either:
1) becomes really ugly if we're talking about arbitrary number of output elements:
val output = input.flatMap((x) => {
val v1 = if (x % 3 == 0) {
Some(s"${x}/3")
} else {
None
}
val v2 = if (x % 5 == 0) {
Some(s"${x}/5")
} else {
None
}
List(v1, v2).flatten
})
2) requires usage of mutable collections inside it:
val output = input.flatMap((x) => {
val r = ListBuffer[String]()
if (x % 3 == 0)
r += s"${x}/3"
if (x % 5 == 0)
r += s"${x}/5"
r
})
which is actually even worse that using mutable collection from the very beginning, or
3) requires major logic overhaul:
val output = input.flatMap((x) => {
if (x % 3 == 0) {
if (x % 5 == 0) {
List(s"${x}/3", s"${x}/5")
} else {
List(s"${x}/3")
}
} else if (x % 5 == 0) {
List(s"${x}/5")
} else {
List()
}
})
which is, IMHO, also looks ugly and requires duplicating the generating code.
Roll-your-own-map-function
Last, but not least, I can roll my own function of that kind:
def myMultiOutputMap[T, R](coll: TraversableOnce[T], func: (T, ListBuffer[R]) => Unit): List[R] = {
val out = ListBuffer[R]()
coll.foreach((x) => func.apply(x, out))
out.toList
}
which can be used almost like I want:
val output = myMultiOutputMap[Int, String](input, (x, out) => {
if (x % 3 == 0)
out += s"${x}/3"
if (x % 5 == 0)
out += s"${x}/5"
})
Am I really overlooking something and there's no such functionality in standard Scala collection libraries?
Similar questions
This question bears some similarity to Can I yield or map one element into many in Scala? — but that question discusses 1 element → 3 elements mapping, and I want 1 element → arbitrary number of elements mapping.
Final note
Please note that this is not the question about division / divisors, such conditions are included purely for illustrative purposes.
Rather than having a separate case for each divisor, put them in a container and iterate over them in a for comprehension:
val output = for {
n <- input
d <- Seq(3, 5)
if n % d == 0
} yield s"$n/$d"
Or equivalently in a collect nested in a flatMap:
val output = input.flatMap { n =>
Seq(3, 5).collect {
case d if n % d == 0 => s"$n/$d"
}
}
In the more general case where the different cases may have different logic, you can put each case in a separate partial function and iterate over the partial functions:
val output = for {
n <- input
f <- Seq[PartialFunction[Int, String]](
{case x if x % 3 == 0 => s"$x/3"},
{case x if x % 5 == 0 => s"$x/5"})
if f.isDefinedAt(n)
} yield f(n)
You can also use some functional library (e.g. scalaz) to express this:
import scalaz._, Scalaz._
def divisibleBy(byWhat: Int)(what: Int): List[String] =
(what % byWhat == 0).option(s"$what/$byWhat").toList
(0 to 15) flatMap (divisibleBy(3) _ |+| divisibleBy(5))
This uses the semigroup append operation |+|. For Lists this operation means a simple list concatenation. So for functions Int => List[String], this append operation will produce a function that runs both functions and appends their results.
If you have complex computation, during which you should sometimes add some elements to operation global accumulator, you can use popular approach named Writer Monad
Preparation in scala is somewhat bulky but results are extremely composable thanks to Monad interface
import scalaz.Writer
import scalaz.syntax.writer._
import scalaz.syntax.monad._
import scalaz.std.vector._
import scalaz.syntax.traverse._
type Messages[T] = Writer[Vector[String], T]
def yieldW(a: String): Messages[Unit] = Vector(a).tell
val output = Vector.range(0, 15).traverse { n =>
yieldW(s"$n / 3").whenM(n % 3 == 0) >>
yieldW(s"$n / 5").whenM(n % 5 == 0)
}.run._1
Here is my proposition for a custom function, might be better with pimp my library pattern
def fancyMap[A, B](list: TraversableOnce[A])(fs: (A => Boolean, A => B)*) = {
def valuesForElement(elem: A) = fs collect { case (predicate, mapper) if predicate(elem) => mapper(elem) }
list flatMap valuesForElement
}
fancyMap[Int, String](0 to 15)((_ % 3 == 0, _ + "/3"), (_ % 5 == 0, _ + "/5"))
You can try collect:
val input = Range(0,15)
val output = input.flatMap { x =>
List(3,5) collect { case n if (x%n == 0) => s"${x}/${n}" }
}
System.out.println(output)
I would us a fold:
val input = Range(0, 15)
val output = input.foldLeft(List[String]()) {
case (acc, value) =>
val acc1 = if (value % 3 == 0) s"$value/3" :: acc else acc
val acc2 = if (value % 5 == 0) s"$value/5" :: acc1 else acc1
acc2
}.reverse
output contains
List(0/3, 0/5, 3/3, 5/5, 6/3, 9/3, 10/5, 12/3)
A fold takes an accumumlator (acc), a collection, and a function. The function is called with the initial value of the accumumator, in this case an empty List[String], and each value of the collection. The function should return an updated collection.
On each iteration, we take the growing accumulator and, if the inside if statements are true, prepend the calculation to the new accumulator. The function finally returns the updated accumulator.
When the fold is done, it returns the final accumulator, but unfortunately, it is in reverse order. We simply reverse the accumulator with .reverse.
There is a nice paper on folds: A tutorial on the universality and expressiveness of fold, by Graham Hutton.

How to combine filter and map in Scala?

I have List[Int] in Scala. The List is List(1,2,3,4,5,6,7,8,9,10). I want to filter the list so that it only has even numbers. And I want to multiply the numbers with 2.
Is it possible?
As I state in my comment, collect should do what you want:
list.collect{
case x if x % 2 == 0 => x*2
}
The collect method allows you to both specify a criteria on the matching elements (filter) and modify the values that match (map)
And as #TravisBrown suggested, you can use flatMap as well, especially in situations where the condition is more complex and not suitable as a guard condition. Something like this for your example:
list.flatMap{
case x if x % 2 == 0 => Some(x*2)
case x => None
}
A for comprehension (which internally unfolds into a combination of map and withFilter) as follows,
for (x <- xs if x % 2 == 0) yield x*2
Namely
xs.withFilter(x => x % 2 == 0).map(x => x*2)
As #cmbaxter said, collect suits your need perfectly. The other nice thing about collect is that it figures out resulting type in case you're filtering by class:
scala> trait X
// defined trait X
scala> class Foo extends X
// defined class Foo
scala> class Bar extends X
// defined class Bar
scala> val xs = List(new Foo, new Bar, new Foo, new Bar)
// xs: List[X] = List(Foo#4cfa8227, Bar#78226c36, Foo#3f685162, Bar#11f406f8)
scala> xs.collect { case x: Foo => x }
// res1: List[Foo] = List(Foo#4cfa8227, Foo#3f685162)
On par, filter can't be that smart (see List[Foo] vs List[X]):
scala> xs.filter { case x: Foo => true; case _ => false }
// res3: List[X] = List(Foo#4cfa8227, Foo#3f685162)
This should do the work for you:
Filter first when the condition is p % 2 == 0 (for getting only even numbers).
And then use map to multiply those even numbers by 2.
var myList = List(1,2,3,4,5,6,7,8,9,10).filter(p => p % 2 == 0).map(p => {p*2})
I tend to like the filter approach.
val list1 = List(1,2,3,4,5,6,7,8,9,10)
list1.filter(x => x%2 == 0).map(_*2)
How about a good old fashioned fold?
xs.foldLeft(List[Y]()) { (ys, x) =>
val z = calculateSomethingOnX(x)
if (someConditionOnZ(z))
Y(x, z) :: ys
else
ys
}

Scala: Best way to filter & map in one iteration

I'm new to Scala and trying to figure out the best way to filter & map a collection. Here's a toy example to explain my problem.
Approach 1: This is pretty bad since I'm iterating through the list twice and calculating the same value in each iteration.
val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums filter { x: Int => (x * x) > N } map { x: Int => (x * x).toString }
Approach 2: This is slightly better but I still need to calculate (x * x) twice.
val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums collect { case x: Int if (x * x) > N => (x * x).toString }
So, is it possible to calculate this without iterating through the collection twice and avoid repeating the same calculations?
Could use a foldRight
nums.foldRight(List.empty[Int]) {
case (i, is) =>
val s = i * i
if (s > N) s :: is else is
}
A foldLeft would also achieve a similar goal, but the resulting list would be in reverse order (due to the associativity of foldLeft.
Alternatively if you'd like to play with Scalaz
import scalaz.std.list._
import scalaz.syntax.foldable._
nums.foldMap { i =>
val s = i * i
if (s > N) List(s) else List()
}
The typical approach is to use an iterator (if possible) or view (if iterator won't work). This doesn't exactly avoid two traversals, but it does avoid creation of a full-sized intermediate collection. You then map first and filter afterwards and then map again if needed:
xs.iterator.map(x => x*x).filter(_ > N).map(_.toString)
The advantage of this approach is that it's really easy to read and, since there are no intermediate collections, it's reasonably efficient.
If you are asking because this is a performance bottleneck, then the answer is usually to write a tail-recursive function or use the old-style while loop method. For instance, in your case
def sumSqBigN(xs: Array[Int], N: Int): Array[String] = {
val ysb = Array.newBuilder[String]
def inner(start: Int): Array[String] = {
if (start >= xs.length) ysb.result
else {
val sq = xs(start) * xs(start)
if (sq > N) ysb += sq.toString
inner(start + 1)
}
}
inner(0)
}
You can also pass a parameter forward in inner instead of using an external builder (especially useful for sums).
I have yet to confirm that this is truly a single pass, but:
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
if (square > N) Some(x) else None
}
You can use collect which applies a partial function to every value of the collection that it's defined at. Your example could be rewritten as follows:
val sqNumsLargerThanN = nums collect {
case (x: Int) if (x * x) > N => (x * x).toString
}
A very simple approach that only does the multiplication operation once. It's also lazy, so it will be executing code only when needed.
nums.view.map(x=>x*x).withFilter(x => x> N).map(_.toString)
Take a look here for differences between filter and withFilter.
Consider this for comprehension,
for (x <- 0 until 10; v = x*x if v > N) yield v.toString
which unfolds to a flatMap over the range and a (lazy) withFilter onto the once only calculated square, and yields a collection with filtered results. To note one iteration and one calculation of square is required (in addition to creating the range).
You can use flatMap.
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
if (square > N) Some(square.toString) else None
}
Or with Scalaz,
import scalaz.Scalaz._
val sqNumsLargerThanN = nums flatMap { x =>
val square = x * x
(square > N).option(square.toString)
}
The solves the asked question of how to do this with one iteration. This can be useful when streaming data, like with an Iterator.
However...if you are instead wanting the absolute fastest implementation, this is not it. In fact, I suspect you would use a mutable ArrayList and a while loop. But only after profiling would you know for sure. In any case, that's for another question.
Using a for comprehension would work:
val sqNumsLargerThanN = for {x <- nums if x*x > N } yield (x*x).toString
Also, I'm not sure but I think the scala compiler is smart about a filter before a map and will only do 1 pass if possible.
I am also beginner did it as follows
for(y<-(num.map(x=>x*x)) if y>5 ) { println(y)}