Scala: sliding(N,N) vs grouped(N) - scala

I found myself lately using sliding(n,n) when I need to iterate collections in groups of n elements without re-processing any of them. I was wondering if it would be more correct to iterate those collections by using grouped(n). My question is if there is an special reason to use one or another for this specific case in terms of performance.
val listToGroup = List(1,2,3,4,5,6,7,8)
listToGroup: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
listToGroup.sliding(3,3).toList
res0: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8))
listToGroup.grouped(3).toList
res1: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8))

The reason to use sliding instead of grouped is really only applicable when you want to have the 'windows' be of a length different than what you 'slide' by (that is to say, using sliding(m, n) where m != n):
listToGroup.sliding(2,3).toList
//returns List(List(1, 2), List(4, 5), List(7, 8))
listToGroup.sliding(4,3).toList
//returns List(List(1, 2, 3, 4), List(4, 5, 6, 7), List(7, 8))
As som-snytt points out in a comment, there's not going to be any performance difference, as both of them are implemented within Iterator as returning a new GroupedIterator. However, it's simpler to write grouped(n) than sliding(n, n), and your code will be cleaner and more obvious in its intended behavior, so I would recommend grouped(n).
As an example for where to use sliding, consider this problem where grouped simply doesn't suffice:
Given a list of numbers, find the sublist of length 4 with the greatest sum.
Now, putting aside the fact that a dynamic programming approach can produce a more efficient result, this can be solved as:
def maxLengthFourSublist(list: List[Int]): List[Int] = {
list.sliding(4,1).maxBy(_.sum)
}
If you were to use grouped here, you wouldn't get all the sublists, so sliding is more appropriate.

Related

How do I remove the proper subsets from a list of sets in Scala?

I have a list of sets of integers as followed: {(1, 0), (0, 1, 2), (1, 2), (1, 2, 3, 4, 5), (3, 4)}.
I want to write a program in Scala to remove the sets that are proper subset of some set in the given list, i.e. the final result would be: {(0,1,2), (1,2,3,4,5)}.
An O(n2) solution can be done by checking each set against the entire list but that would be very expensive and does not scale very well for ~100000 sets. I also thought of generating edges from the sets, remove duplicate edges and run a DFS but I have no idea how to do it in Scala (the more Scala-ish way and not one-to-one from Java code).
Individual elements (sets) need only be compared to other elements of the same size or larger.
val ss = List(Set(1, 0), Set(0, 1, 2), Set(1, 2), Set(1, 2, 3, 4, 5), Set(3, 4))
ss.sortBy(- _.size) match {
case Nil => Nil
case hd::tl =>
tl.foldLeft(List(hd)){case (acc, s) =>
if (acc.exists(s.forall(_))) acc
else s::acc
}
}
//res0: List[Set[Int]] = List(Set(0, 1, 2), Set(5, 1, 2, 3, 4))

Difference between flatMap and flatten in Scala [duplicate]

scala> List(List(1), List(2), List(3), List(4))
res18: List[List[Int]] = List(List(1), List(2), List(3), List(4))
scala> res18.flatten
res19: List[Int] = List(1, 2, 3, 4)
scala> res18.flatMap(identity)
res20: List[Int] = List(1, 2, 3, 4)
Is there any difference between these two functions? When is it appropriate to use one over the other? Are there any tradeoffs?
You can view flatMap(identity) as map(identity).flatten. (Of course it is not implemented that way, since it would take two iterations).
map(identity) gives you the same collection, so in the end it is the same as only flatten.
I would personally stick to flatten, since it is shorter/easier to understand and designed to exactly do this.
Conceptually there is no difference in the result... flatMap is taking
bit more time to produce same result...
I will show it more practically with an example of flatMap, map & then flatten and flatten
object Test extends App {
// flatmap
println(timeElapsed(List(List(1, 2, 3, 4), List(5, 6, 7, 8)).flatMap(identity)))
// map and then flatten
println(timeElapsed(List(List(1, 2, 3, 4), List(5, 6, 7, 8)).map(identity).flatten))
// flatten
println(timeElapsed(List(List(1, 2, 3, 4), List(5, 6, 7, 8)).flatten))
/**
* timeElapsed
*/
def timeElapsed[T](block: => T): T = {
val start = System.nanoTime()
val res = block
val totalTime = System.nanoTime - start
println("Elapsed time: %1d nano seconds".format(totalTime))
res
}
}
Both flatMap and flatten executed with same result after repeating several times
Conclusion : flatten is efficient
Elapsed time: 2915949 nano seconds
List(1, 2, 3, 4, 5, 6, 7, 8)
Elapsed time: 1060826 nano seconds
List(1, 2, 3, 4, 5, 6, 7, 8)
Elapsed time: 81172 nano seconds
List(1, 2, 3, 4, 5, 6, 7, 8)
Conceptually, there is no difference. Practically, flatten is more efficient, and conveys a clearer intent.
Generally, you don't use identity directly. It's more there for situations like it getting passed in as a parameter, or being set as a default. It's possible for the compiler to optimize it out, but you're risking a superfluous function call for every element.
You would use flatMap when you need to do a map (with a function other than identity) immediately followed by a flatten.

Is there some extended version of unzip in scala which works for any List[n-tuple] instead of just List[pairs] like Unzip?

If I have a list of 3-tuples I want three separate lists. Is there some better way than this:
(listA, listB, listC) = (list.map(_._1), list.map(_._2). list.map(_._3))
which can work for any n-tuple?
EDIT: Though for three unzip3 exists of which I was unaware while writing this question, is there way to write a function for getting in general n lists?
How about this?
scala> val array = Array((1, 2, 3), (4, 5, 6), (7, 8, 9))
array: Array[(Int, Int, Int)] = Array((1,2,3), (4,5,6), (7,8,9))
scala> val tripleArray = array.unzip3
tripleArray: (Array[Int], Array[Int], Array[Int]) = (Array(1, 4, 7),Array(2, 5,8),Array(3, 6, 9))

Are combinations and permutations stable in Scala collections?

Should I rely on the order of combinations and permutations generated by corresponding methods of Scala's collections? For example:
scala> Seq(1, 2, 3).combinations(2).foreach(println)
List(1, 2)
List(1, 3)
List(2, 3)
Can I be sure that I will get my results always in the same precise order?
Well the documentation does not says anything on the order. It just says:
An Iterator which traverses the possible n-element combinations of
this sequence.
So it doesn't guarantee.
Ideally you should always get the order as you printed but it is not guaranteed by the library. So its (pessimistic) safe not to trust it and rather do sort it so that you get the same series always:
scala> import scala.math.Ordering.Implicits._
import scala.math.Ordering.Implicits._
scala> Seq(1,2,3).combinations(2).toList.sorted.foreach(println)
List(1, 2)
List(1, 3)
List(2, 3)
The combinations implementation maintains the order of the elements in the given sequence.
Except that input is processed to group repeated elements together.
The output is not sorted.
scala> Seq(3,2,1).combinations(2).toList
res1: List[Seq[Int]] = List(List(3, 2), List(3, 1), List(2, 1))
The sequence is updated to keep repeated elements together. For instance:
scala> Seq(2,1,3,1,2).combinations(2).toList
res2: List[Seq[Int]] = List(List(2, 2), List(2, 1), List(2, 3), List(1, 1), List(1, 3))
in this case seq is first converted to Seq(2,2,1,1,3):
scala> Seq(2,2,1,1,3).combinations(2).toList
res3: List[Seq[Int]] = List(List(2, 2), List(2, 1), List(2, 3), List(1, 1), List(1, 3))
scala> res2 == res3
res4: Boolean = true

How to split a text into strings?

I want to split a text file into strings, can you please tell me how to split it. For example, the following text file is given:
this course in, a style i
will have to a modern, language that encourages
writing clean; and elegant code in a good
Is there any possibility to split the text file into strings like following, for example by 2 words:
this course
in a
style i
will have
to a
modern language
that encourages
writing clean
and elegant
code in
a good
Can you please give me some hints? Thank you in advance.
Some ideas:
1) Use java.util.Scanner to read in tokens direct from the file using the next(pattern: String) method
or
2) Read in all lines (see scala.io.Source), concatenate them into a single string, split the string into an array, then use the grouped method to split that into sub-arrays of 2 elements
In addition to LuigiĀ“s answer.
3) You should think about filtering out the punctuation.
4) Another hint:
scala> val list = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
list: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> val listOfTwoElements = list.sliding(2).toList
listOfTwoElements: List[List[Int]] = List(List(1, 2), List(2, 3), List(3, 4), List(4, 5), List(5, 6), List(6, 7), List(7, 8), List(8, 9), List(9, 10))