Does scala collection's flatten keep order? - scala

I have looked at the documentation of flatten and the example seems to indicate the order of the elements in the result maintains the order of the input. Is there a documentation or source code we can reference to make sure that it is the case? Or, is the documentation "Converts this collection of traversable collections into a collection formed by the elements of these traversable collections" enough to confirm this?
Update: My original question was not clear enough. I wanted to ask about the collections that maintain order internally (like List) and we use the default implicit traversable in flatten(). prayagupd has answered this question.

If you read the flatten function of scala 2.12.x, you can see it sequentially adding given inputs to a new collection.
//a sequential view of the collection
private def sequential: TraversableOnce[A] = this.asInstanceOf[GenTraversableOnce[A]].seq
def flatten[B](implicit asTraversable: A => /*<:<!!!*/ GenTraversableOnce[B]): CC[B] = {
val b = genericBuilder[B]
for (xs <- sequential)
b ++= asTraversable(xs).seq
b.result()
}
you can verify with example as well,
scala> List(List("order1", "order2"), List("order10", "order11")).flatten
res1: List[String] = List(order1, order2, order10, order11)
The order remains same even if you provide your own traversable,
scala> val asTraversable: List[String] => List[String] = list => list.map(elem => s"mutated $elem")
asTraversable: List[String] => List[String] = $$Lambda$1271/1988351538#513bec8c
scala> List(List("order1", "order2"), List("order10", "order11")).flatten(asTraversable)
res2: List[String] = List(mutated order1, mutated order2, mutated order10, mutated order11)
NOTE: above only applies to underlying data-structure that maintain the order.
For example Set does not maintain order
scala> Set(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).seq
res3: scala.collection.immutable.Set[Int] = Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)

Related

flatten and flatMap in scala

I'll like to check if I have understood flatten and flatMap functions correctly.
1) Am I correct that flatten works only when a collection constitutes of other collections. Eg flatten would work on following lists
//list of lists
val l1 = List(List(1,1,2,-1,3,1,-4,5), List("a","b"))
//list of a set, list and map
val l2 = List(Set(1,2,3), List(4,5,6), Map('a'->"x",'b'->"y"))
But flatten will not work on following
val l3 = List(1,2,3)
val l4 = List(1,2,3,List('a','b'))
val s1 = "hello world"
val m1 = Map('h'->"h", 'e'->"e", 'l'->"l",'o'->"0")
'flatten' method would create a new list consisting of all elements by removing the hierarchy. Thus it sort of 'flattens' the collection and brings all elements at the same level.
l1.flatten
res0: List[Any] = List(1, 1, 2, -1, 3, 1, -4, 5, a, b)
l2.flatten
res1: List[Any] = List(1, 2, 3, 1, 5, 6, (a,x), (b,y))
2) 'flatMap' first applies a method to elements of a list and then flattens the list. As we noticed above, the flatten method works if lists have a hierarchy (contain other collections). Thus it is important that the method we apply to the elements returns a collection otherwise flatMap will not work
//we have list of lists
val l1 = List(List(1,1,2,-1,3,1,-4,5), List("a","b"))
l1 flatMap(x=>x.toSet)
res2: List[Any] = List(5, 1, -4, 2, 3, -1, a, b)
val l2 = List(Set(1,2,3), List(1,5,6), Map('a'->"x",'b'->"y"))
l2.flatMap(x=>x.toSet)
res3: List[Any] = List(1, 2, 3, 1, 5, 6, (a,x), (b,y))
val s1 = "hello world"
s1.flatMap(x=>Map(x->x.toString))
We notice above that s1.flatten didn't work but s1.flatMap did. This is because, in s1.flatMap, we convert elements of a String (characters) into a Map which is a collection. Thus the string got converted into a collection of Maps like (Map('h'->"h"), Map('e'->"e"), Map('l'->"l"),Map ('l'->"l"),Map('o'->"o")....) Thus flatten will work now. Note that the Map created is not Map('h'->"h", 'e'->"e", 'l'->"l",....).
Take a look at the full signature for flatten:
def flatten[B](implicit asTraversable: (A) ⇒ GenTraversableOnce[B]): List[B]
As you can see, flatten takes an implicit parameter. That parameter provides the rules for how to flatten the given collection types. If the compiler can't find an implicit in scope then it can be provided explicitly.
flatten can flatten almost anything as long as you provide the rules to do so.
Flatmap is basically a map operation followed by a flatten

Scala: flatMap with tuples

Why is the below statement valid for .map() but not for .flatMap()?
val tupled = input.map(x => (x*2, x*3))
//Compilation error: cannot resolve reference flatMap with such signature
val tupled = input.flatMap(x => (x*2, x*3))
This statement has no problem, though:
val tupled = input.flatMap(x => List(x*2, x*3))
Assuming input if of type List[Int], map takes a function from Int to A, whereas flatMap takes a function from Int to List[A].
Depending on your use case you can choose either one or the other, but they're definitely not interchangeable.
For instance, if you are merely transforming the elements of a List you typically want to use map:
List(1, 2, 3).map(x => x * 2) // List(2, 4, 6)
but you want to change the structure of the List and - for example - "explode" each element into another list then flattening them, flatMap is your friend:
List(1, 2, 3).flatMap(x => List.fill(x)(x)) // List(1, 2, 2, 3, 3, 3)
Using map you would have had List(List(1), List(2, 2), List(3, 3, 3)) instead.
To understand how this is working, it can be useful to explicitly unpack the functions you're sending to map and flatMap and examine their signatures. I've rewritten them here, so you can see that f is a function mapping from Int to an (Int, Int) tuple, and g is a function that maps from Int to a List[Int].
val f: (Int) => (Int, Int) = x => (x*2, x*3)
val g: (Int) => List[Int] = x => List(x*2, x*3)
List(1,2,3).map(f)
//res0: List[(Int, Int)] = List((2,3), (4,6), (6,9))
List(1,2,3).map(g)
//res1: List[List[Int]] = List(List(2, 3), List(4, 6), List(6, 9))
//List(1,2,3).flatMap(f) // This won't compile
List(1,2,3).flatMap(g)
//res2: List[Int] = List(2, 3, 4, 6, 6, 9)
So why won't flatMap(f) compile? Let's look at the signature for flatMap, in this case pulled from the List implementation:
final override def flatMap[B, That](f : scala.Function1[A, scala.collection.GenTraversableOnce[B]])(...)
This is a little difficult to unpack, and I've elided some of it, but the key is the GenTraversableOnce type. List, if you follow it's chain of inheritance up, has this as a trait it is built with, and thus a function that maps from some type A to a List (or any object with the GenTraversableOnce trait) will be a valid function. Notably, tuples do not have this trait.
That is the in-the-weeds explanation why the typing is wrong, and is worth explaining because any error that says 'cannot resolve reference with such a signature' means that it can't find a function that takes the explicit type you're offering. Types are very often inferred in Scala, and so you're well-served to make sure that the type you're giving is the type expected by the method you're calling.
Note that flatMap has a standard meaning in functional programming, which is, roughly speaking, any mapping function that consumes a single element and produces n elements, but your final result is the concatenation of all of those lists. Therefore, the function you pass to flatMap will always expect to produce a list, and no flatMap function would be expected to know how to act on single elements.

Zip two Arrays, always 3 elements of the first array then 2 elements of the second

I've manually built a method that takes 2 arrays and combines them to 1 like this:
a0,a1,a2,b0,b1,a3,a4,a5,b2,b3,a6,...
So I always take 3 elements of the first array, then 2 of the second one.
As I said, I built that function manually.
Now I guess I could make this a one-liner instead with the help of zip. The problem is, that zip alone is not enough as zip builds tuples like (a0, b0).
Of course I can flatMap this, but still not what I want:
val zippedArray: List[Float] = data1.zip(data2).toList.flatMap(t => List(t._1, t._2))
That way I'd get a List(a0, b0, a1, b1,...), still not what I want.
(I'd then use toArray for the list... it's more convenient to work with a List right now)
I thought about using take and drop but they return new data-structures instead of modifying the old one, so not really what I want.
As you can imagine, I'm not really into functional programming (yet). I do use it and I see huge benefits, but some things are so different to what I'm used to.
Consider grouping array a by 3, and array b by 2, namely
val a = Array(1,2,3,4,5,6)
val b = Array(11,22,33,44)
val g = (a.grouped(3) zip b.grouped(2)).toArray
Array((Array(1, 2, 3),Array(11, 22)), (Array(4, 5, 6),Array(33, 44)))
Then
g.flatMap { case (x,y) => x ++ y }
Array(1, 2, 3, 11, 22, 4, 5, 6, 33, 44)
Very similar answer to #elm but I wanted to show that you can use more lazy approach (iterator) to avoid creating temp structures:
scala> val a = List(1,2,3,4,5,6)
a: List[Int] = List(1, 2, 3, 4, 5, 6)
scala> val b = List(11,22,33,44)
b: List[Int] = List(11, 22, 33, 44)
scala> val groupped = a.sliding(3, 3) zip b.sliding(2, 2)
groupped: Iterator[(List[Int], List[Int])] = non-empty iterator
scala> val result = groupped.flatMap { case (a, b) => a ::: b }
result: Iterator[Int] = non-empty iterator
scala> result.toList
res0: List[Int] = List(1, 2, 3, 11, 22, 4, 5, 6, 33, 44)
Note that it stays an iterator all the way until we materialize it with toList

Scala: Yielding from one type of collection to another

Concerning the yield command in Scala and the following example:
val values = Set(1, 2, 3)
val results = for {v <- values} yield (v * 2)
Can anyone explain how Scala knows which type of collection to yield into? I know it is based on values, but how would I go about writing code that replicates yield?
Is there any way for me to change the type of the collection to yield into? In the example I want results to be of type List instead of Set.
Failing this, what is the best way to convert from one collection to another? I know about _:*, but as a Set is not a Seq this does not work. The best I could find thus far is val listResults = List() ++ results.
Ps. I know the example does not following the recommended functional way (which would be to use map), but it is just an example.
The for comprehensions are translated by compiler to map/flatMap/filter calls using this scheme.
This excellent answer by Daniel answers your first question.
To change the type of result collection, you can use collection.breakout (also explained in the post I linked above.)
scala> val xs = Set(1, 2, 3)
xs: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
scala> val ys: List[Int] = (for(x <- xs) yield 2 * x)(collection.breakOut)
ys: List[Int] = List(2, 4, 6)
You can convert a Set to a List using one of following ways:
scala> List.empty[Int] ++ xs
res0: List[Int] = List(1, 2, 3)
scala> xs.toList
res1: List[Int] = List(1, 2, 3)
Recommended read: The Architecture of Scala Collections
If you use map/flatmap/filter instead of for comprehensions, you can use scala.collection.breakOut to create a different type of collection:
scala> val result:List[Int] = values.map(2*)(scala.collection.breakOut)
result: List[Int] = List(2, 4, 6)
If you wanted to build your own collection classes (which is the closest thing to "replicating yield" that makes any sense to me), you should have a look at this tutorial.
Try this:
val values = Set(1, 2, 3)
val results = for {v <- values} yield (v * 2).toList

Scala: Can I rely on the order of items in a Set?

This was quite an unplesant surprise:
scala> Set(1, 2, 3, 4, 5)
res18: scala.collection.immutable.Set[Int] = Set(4, 5, 1, 2, 3)
scala> Set(1, 2, 3, 4, 5).toList
res25: List[Int] = List(5, 1, 2, 3, 4)
The example by itself suggest a "no" answer to my question. Then what about ListSet?
scala> import scala.collection.immutable.ListSet
scala> ListSet(1, 2, 3, 4, 5)
res21: scala.collection.immutable.ListSet[Int] = Set(1, 2, 3, 4, 5)
This one seems to work, but should I rely on this behavior?
What other data structure is suitable for an immutable collection of unique items, where the original order must be preserved?
By the way, I do know about distict method in List. The problem is, I want to enforce uniqueness of items (while preserving the order) at interface level, so using distinct would mess up my neat design..
EDIT
ListSet doesn't seem very reliable either:
scala> ListSet(1, 2, 3, 4, 5).toList
res28: List[Int] = List(5, 4, 3, 2, 1)
EDIT2
In my search for a perfect design I tried this:
scala> class MyList[A](list: List[A]) { val values = list.distinct }
scala> implicit def toMyList[A](l: List[A]) = new MyList(l)
scala> implicit def fromMyList[A](l: MyList[A]) = l.values
Which actually works:
scala> val l1: MyList[Int] = List(1, 2, 3)
scala> l1.values
res0: List[Int] = List(1, 2, 3)
scala> val l2: List[Int] = new MyList(List(1, 2, 3))
l2: List[Int] = List(1, 2, 3)
The problem, however, is that I do not want to expose MyList outside the library. Is there any way to have the implicit conversion when overriding? For example:
trait T { def l: MyList[_] }
object O extends T { val l: MyList[_] = List(1, 2, 3) }
scala> O.l mkString(" ") // Let's test the implicit conversion
res7: String = 1 2 3
I'd like to do it like this:
object O extends T { val l = List(1, 2, 3) } // Doesn't work
That depends on the Set you are using. If you do not know which Set implementation you have, then the answer is simply, no you cannot be sure. In practice I usually encounter the following three cases:
I need the items in the Set to be ordered. For this I use classes mixing in the SortedSet trait which when you use only the Standard Scala API is always a TreeSet. It guarantees the elements are ordered according to their compareTo method (see the Ordered trat). You get a (very) small performance penalty for the sorting as the runtime of inserts/retrievals is now logarithmic, not (almost) constant like with the HashSet (assuming a good hash function).
You need to preserve the order in which the items are inserted. Then you use the LinkedHashSet. Practically as fast as the normal HashSet, needs a little more storage space for the additional links between elements.
You do not care about order in the Set. So you use a HashSet. (That is the default when using the Set.apply method like in your first example)
All this applies to Java as well, Java has a TreeSet, LinkedHashSet and HashSet and the corresponding interfaces SortedSet, Comparable and plain Set.
It is my belief that you should never rely on the order in a set. In no language.
Apart from that, have a look at this question which talks about this in depth.
ListSet will always return elements in the reverse order of insertion because it is backed by a List, and the optimal way of adding elements to a List is by prepending them.
Immutable data structures are problematic if you want first in, first out (a queue). You can get O(logn) or amortized O(1). Given the apparent need to build the set and then produce an iterator out of it (ie, you'll first put all elements, then you'll remove all elements), I don't see any way to amortize it.
You can rely that a ListSet will always return elements in last in, first out order (a stack). If that suffices, then go for it.