scala> List(List(1), List(2), List(3), List(4))
res18: List[List[Int]] = List(List(1), List(2), List(3), List(4))
scala> res18.flatten
res19: List[Int] = List(1, 2, 3, 4)
scala> res18.flatMap(identity)
res20: List[Int] = List(1, 2, 3, 4)
Is there any difference between these two functions? When is it appropriate to use one over the other? Are there any tradeoffs?
You can view flatMap(identity) as map(identity).flatten. (Of course it is not implemented that way, since it would take two iterations).
map(identity) gives you the same collection, so in the end it is the same as only flatten.
I would personally stick to flatten, since it is shorter/easier to understand and designed to exactly do this.
Conceptually there is no difference in the result... flatMap is taking
bit more time to produce same result...
I will show it more practically with an example of flatMap, map & then flatten and flatten
object Test extends App {
// flatmap
println(timeElapsed(List(List(1, 2, 3, 4), List(5, 6, 7, 8)).flatMap(identity)))
// map and then flatten
println(timeElapsed(List(List(1, 2, 3, 4), List(5, 6, 7, 8)).map(identity).flatten))
// flatten
println(timeElapsed(List(List(1, 2, 3, 4), List(5, 6, 7, 8)).flatten))
/**
* timeElapsed
*/
def timeElapsed[T](block: => T): T = {
val start = System.nanoTime()
val res = block
val totalTime = System.nanoTime - start
println("Elapsed time: %1d nano seconds".format(totalTime))
res
}
}
Both flatMap and flatten executed with same result after repeating several times
Conclusion : flatten is efficient
Elapsed time: 2915949 nano seconds
List(1, 2, 3, 4, 5, 6, 7, 8)
Elapsed time: 1060826 nano seconds
List(1, 2, 3, 4, 5, 6, 7, 8)
Elapsed time: 81172 nano seconds
List(1, 2, 3, 4, 5, 6, 7, 8)
Conceptually, there is no difference. Practically, flatten is more efficient, and conveys a clearer intent.
Generally, you don't use identity directly. It's more there for situations like it getting passed in as a parameter, or being set as a default. It's possible for the compiler to optimize it out, but you're risking a superfluous function call for every element.
You would use flatMap when you need to do a map (with a function other than identity) immediately followed by a flatten.
Related
I have a method that looks like this:
def compute[T](l: List[T]): List[T] = {
val shuffled = util.Random.shuffle(l)
// do some more computations
}
I wanted to seed the random number generator for my unit tests so that I don't have to break down my method into two methods and test only the computation, since this is the method that will be used externally. Is this possible to do in ScalaTest?
I don't have much background in ScalaTest, but if you call setSeed(seed: Long): Unit then you'll always get the same shuffle for any given seed value.
scala> util.Random.shuffle(Seq(1,2,3,4,5,6,7,8,9,0))
res0: Seq[Int] = List(6, 0, 8, 5, 4, 7, 2, 3, 1, 9)
scala> util.Random.setSeed(57L)
scala> util.Random.shuffle(Seq(1,2,3,4,5,6,7,8,9,0))
res1: Seq[Int] = List(5, 3, 2, 0, 6, 8, 7, 4, 1, 9)
scala> util.Random.setSeed(57L)
scala> util.Random.shuffle(Seq(1,2,3,4,5,6,7,8,9,0))
res2: Seq[Int] = List(5, 3, 2, 0, 6, 8, 7, 4, 1, 9)
how would you test the method that depends on current time? or database state? or response from google? or sleeping for some time?
generally when you test any code that depends on some external state (time, other system, or in your case: entropy / seed), you refactor that code and extract the dependency. one way, as #jwvh said is to extract seed. but imho, you should extract the whole transformation
therefore you should create a method that receives the shuffled array and test that method
I found myself lately using sliding(n,n) when I need to iterate collections in groups of n elements without re-processing any of them. I was wondering if it would be more correct to iterate those collections by using grouped(n). My question is if there is an special reason to use one or another for this specific case in terms of performance.
val listToGroup = List(1,2,3,4,5,6,7,8)
listToGroup: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
listToGroup.sliding(3,3).toList
res0: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8))
listToGroup.grouped(3).toList
res1: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8))
The reason to use sliding instead of grouped is really only applicable when you want to have the 'windows' be of a length different than what you 'slide' by (that is to say, using sliding(m, n) where m != n):
listToGroup.sliding(2,3).toList
//returns List(List(1, 2), List(4, 5), List(7, 8))
listToGroup.sliding(4,3).toList
//returns List(List(1, 2, 3, 4), List(4, 5, 6, 7), List(7, 8))
As som-snytt points out in a comment, there's not going to be any performance difference, as both of them are implemented within Iterator as returning a new GroupedIterator. However, it's simpler to write grouped(n) than sliding(n, n), and your code will be cleaner and more obvious in its intended behavior, so I would recommend grouped(n).
As an example for where to use sliding, consider this problem where grouped simply doesn't suffice:
Given a list of numbers, find the sublist of length 4 with the greatest sum.
Now, putting aside the fact that a dynamic programming approach can produce a more efficient result, this can be solved as:
def maxLengthFourSublist(list: List[Int]): List[Int] = {
list.sliding(4,1).maxBy(_.sum)
}
If you were to use grouped here, you wouldn't get all the sublists, so sliding is more appropriate.
Working with Lists in Scala I would like a simple way to get all elements but the last element. Is there a complementary method for .last similar to .head/.tail complement?
I'd rather not dirty up code with something like:
val x: List[String] = List("abc", "def", "ghi")
val allButLast: List[String] = x.reverse.tail.reverse
// List(abc, def)
Thanks.
init selects all elements but the last one.
List API for init.
scala> List(1,2,3,4,5)
res0: List[Int] = List(1, 2, 3, 4, 5)
scala> res0.init
res1: List[Int] = List(1, 2, 3, 4)
The 4 related methods here are head, tail, init, and last.
head and last get the first and final member, whereas
tail and init exclude the first and final members.
scala> val list = (0 to 10).toList
list: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> list.head
res0: Int = 0
scala> list.tail
res1: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> list.init
res2: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
scala> list.last
res3: Int = 10
You should also take care, because all 4 of them are unsafe on the empty list and will throw exceptions.
These methods are defined on GenTraversableLike, which List implements.
That's init.
link to Scaladoc: http://www.scala-lang.org/api/2.11.5/index.html#scala.collection.immutable.List#init:Repr
def init: List[A]
Selects all elements except the last.
Also, note that it's defined in GenTraversableLike, so pretty much any Scala collection has this method.
For dropping off any number of items from the end of a list consider dropRight,
val xs = (1 to 5).toList
xs.dropRight(1)
List(1, 2, 3, 4)
xs.dropRight(2)
List(1, 2, 3)
xs.dropRight(10)
List()
scala> List(1,2,3,4,5,6,7).takeWhile(i=>i<5)
res1: List[Int] = List(1, 2, 3, 4)
What if I also need to include 5 in the result?
Assuming that the function that you will be using is more complicated than the taking first 5 elements then,
You can do
scala> List(1,2,3,4,5,6,7)
res5: List[Int] = List(1, 2, 3, 4, 5, 6, 7)
scala> res5.takeWhile(_<5) ++ res5.dropWhile(_<5).take(1)
res7: List[Int] = List(1, 2, 3, 4, 5)
Also
scala> res5.span(_<5)
res8: (List[Int], List[Int]) = (List(1, 2, 3, 4),List(5, 6, 7))
scala> res8._1 ++ res8._2.take(1)
res10: List[Int] = List(1, 2, 3, 4, 5)
Also
scala> res5.take(res5.segmentLength(_<5, 0) + 1)
res17: List[Int] = List(1, 2, 3, 4, 5)
scala> res5.take(res5.indexWhere(_>5))
res18: List[Int] = List(1, 2, 3, 4, 5)
Edit
If this is not a parallelized computation and you won't be using the parallel collections:
var last = myList.head
val rem = myList.takeWhile{ x=>
last = x
x < 5
}
last :: rem
anonymous function forming a closure around the solution you want.
Previous Answer
I'd settle for the far less complicated:
.takeWhile(_ <= 5)
wherein you're just using the less than or equal operator.
List(1,2,3,4,5,6,7,8).takeWhile(_ <= 5)
This is best optimal solution for it
Should I rely on the order of combinations and permutations generated by corresponding methods of Scala's collections? For example:
scala> Seq(1, 2, 3).combinations(2).foreach(println)
List(1, 2)
List(1, 3)
List(2, 3)
Can I be sure that I will get my results always in the same precise order?
Well the documentation does not says anything on the order. It just says:
An Iterator which traverses the possible n-element combinations of
this sequence.
So it doesn't guarantee.
Ideally you should always get the order as you printed but it is not guaranteed by the library. So its (pessimistic) safe not to trust it and rather do sort it so that you get the same series always:
scala> import scala.math.Ordering.Implicits._
import scala.math.Ordering.Implicits._
scala> Seq(1,2,3).combinations(2).toList.sorted.foreach(println)
List(1, 2)
List(1, 3)
List(2, 3)
The combinations implementation maintains the order of the elements in the given sequence.
Except that input is processed to group repeated elements together.
The output is not sorted.
scala> Seq(3,2,1).combinations(2).toList
res1: List[Seq[Int]] = List(List(3, 2), List(3, 1), List(2, 1))
The sequence is updated to keep repeated elements together. For instance:
scala> Seq(2,1,3,1,2).combinations(2).toList
res2: List[Seq[Int]] = List(List(2, 2), List(2, 1), List(2, 3), List(1, 1), List(1, 3))
in this case seq is first converted to Seq(2,2,1,1,3):
scala> Seq(2,2,1,1,3).combinations(2).toList
res3: List[Seq[Int]] = List(List(2, 2), List(2, 1), List(2, 3), List(1, 1), List(1, 3))
scala> res2 == res3
res4: Boolean = true