Scala: Can I rely on the order of items in a Set? - scala

This was quite an unplesant surprise:
scala> Set(1, 2, 3, 4, 5)
res18: scala.collection.immutable.Set[Int] = Set(4, 5, 1, 2, 3)
scala> Set(1, 2, 3, 4, 5).toList
res25: List[Int] = List(5, 1, 2, 3, 4)
The example by itself suggest a "no" answer to my question. Then what about ListSet?
scala> import scala.collection.immutable.ListSet
scala> ListSet(1, 2, 3, 4, 5)
res21: scala.collection.immutable.ListSet[Int] = Set(1, 2, 3, 4, 5)
This one seems to work, but should I rely on this behavior?
What other data structure is suitable for an immutable collection of unique items, where the original order must be preserved?
By the way, I do know about distict method in List. The problem is, I want to enforce uniqueness of items (while preserving the order) at interface level, so using distinct would mess up my neat design..
EDIT
ListSet doesn't seem very reliable either:
scala> ListSet(1, 2, 3, 4, 5).toList
res28: List[Int] = List(5, 4, 3, 2, 1)
EDIT2
In my search for a perfect design I tried this:
scala> class MyList[A](list: List[A]) { val values = list.distinct }
scala> implicit def toMyList[A](l: List[A]) = new MyList(l)
scala> implicit def fromMyList[A](l: MyList[A]) = l.values
Which actually works:
scala> val l1: MyList[Int] = List(1, 2, 3)
scala> l1.values
res0: List[Int] = List(1, 2, 3)
scala> val l2: List[Int] = new MyList(List(1, 2, 3))
l2: List[Int] = List(1, 2, 3)
The problem, however, is that I do not want to expose MyList outside the library. Is there any way to have the implicit conversion when overriding? For example:
trait T { def l: MyList[_] }
object O extends T { val l: MyList[_] = List(1, 2, 3) }
scala> O.l mkString(" ") // Let's test the implicit conversion
res7: String = 1 2 3
I'd like to do it like this:
object O extends T { val l = List(1, 2, 3) } // Doesn't work

That depends on the Set you are using. If you do not know which Set implementation you have, then the answer is simply, no you cannot be sure. In practice I usually encounter the following three cases:
I need the items in the Set to be ordered. For this I use classes mixing in the SortedSet trait which when you use only the Standard Scala API is always a TreeSet. It guarantees the elements are ordered according to their compareTo method (see the Ordered trat). You get a (very) small performance penalty for the sorting as the runtime of inserts/retrievals is now logarithmic, not (almost) constant like with the HashSet (assuming a good hash function).
You need to preserve the order in which the items are inserted. Then you use the LinkedHashSet. Practically as fast as the normal HashSet, needs a little more storage space for the additional links between elements.
You do not care about order in the Set. So you use a HashSet. (That is the default when using the Set.apply method like in your first example)
All this applies to Java as well, Java has a TreeSet, LinkedHashSet and HashSet and the corresponding interfaces SortedSet, Comparable and plain Set.

It is my belief that you should never rely on the order in a set. In no language.
Apart from that, have a look at this question which talks about this in depth.

ListSet will always return elements in the reverse order of insertion because it is backed by a List, and the optimal way of adding elements to a List is by prepending them.
Immutable data structures are problematic if you want first in, first out (a queue). You can get O(logn) or amortized O(1). Given the apparent need to build the set and then produce an iterator out of it (ie, you'll first put all elements, then you'll remove all elements), I don't see any way to amortize it.
You can rely that a ListSet will always return elements in last in, first out order (a stack). If that suffices, then go for it.

Related

Does scala collection's flatten keep order?

I have looked at the documentation of flatten and the example seems to indicate the order of the elements in the result maintains the order of the input. Is there a documentation or source code we can reference to make sure that it is the case? Or, is the documentation "Converts this collection of traversable collections into a collection formed by the elements of these traversable collections" enough to confirm this?
Update: My original question was not clear enough. I wanted to ask about the collections that maintain order internally (like List) and we use the default implicit traversable in flatten(). prayagupd has answered this question.
If you read the flatten function of scala 2.12.x, you can see it sequentially adding given inputs to a new collection.
//a sequential view of the collection
private def sequential: TraversableOnce[A] = this.asInstanceOf[GenTraversableOnce[A]].seq
def flatten[B](implicit asTraversable: A => /*<:<!!!*/ GenTraversableOnce[B]): CC[B] = {
val b = genericBuilder[B]
for (xs <- sequential)
b ++= asTraversable(xs).seq
b.result()
}
you can verify with example as well,
scala> List(List("order1", "order2"), List("order10", "order11")).flatten
res1: List[String] = List(order1, order2, order10, order11)
The order remains same even if you provide your own traversable,
scala> val asTraversable: List[String] => List[String] = list => list.map(elem => s"mutated $elem")
asTraversable: List[String] => List[String] = $$Lambda$1271/1988351538#513bec8c
scala> List(List("order1", "order2"), List("order10", "order11")).flatten(asTraversable)
res2: List[String] = List(mutated order1, mutated order2, mutated order10, mutated order11)
NOTE: above only applies to underlying data-structure that maintain the order.
For example Set does not maintain order
scala> Set(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).seq
res3: scala.collection.immutable.Set[Int] = Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)

Cannot construct a collection of type ...Inclusive[Long] with elements of type Long based on a collection of type ...Inclusive[Long]

I'm not sure I understand why the following happens.
Compiles and works:
With Ints without converting to a List
import scala.util.Random
val xs = 1 to 10
Random.shuffle(xs)
With Longs after converting to a List
import scala.util.Random
val xs = 1L to 10L
Random.shuffle(xs.toList) //<-- I had to materialize it to a list
Doesn't compile
With Longs without converting to a List
val xs = 1L to 10L
Random.shuffle(xs)
This one throws this exception:
Error: Cannot construct a collection of type
scala.collection.immutable.NumericRange.Inclusive[Long] with elements of type
Long based on a collection of type
scala.collection.immutable.NumericRange.Inclusive[Long].
Random.shuffle(xs)
^
I'm curious why? Is that because there is a missing CanBuildFrom or something like that? Is there a good reason why there isn't one?
(scala version 2.11.5)
That's because of both CanBuildFrom(1) and type inference mechanism(2).
1) You may find that genericBuilder of Range/NumericRange (same for Inclusive) is:
genericBuilder[B]: Builder[B, IndexedSeq[B]]
So there is only CanBuildFrom[Range, B, IndexedSeq], which uses this builder. The reason why is simple, you may find it in builder's description:
A builder lets one construct a collection incrementally, by adding elements to the builder with += and then converting to the required collection type with result.
You just can't construct inclusive range incrementally, as it won't be a range anymore then (but still be an IndexedSeq); however, you can do such constructions with Seq.
Just to demonstrate the difference between IndexedSeq and Inclusive
scala> (1 to 5)
res14: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5)
scala> (1 to 5) ++ (7 to 10) //builder used here
res15: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3, 4, 5, 7, 8, 9, 10)
This means that you can't "build" any range, regardless Int (Range) or Long (Numeric) and you should always pass IndexedSeq as To parameter of the builder. However, IndexedSeq is automatically specified for Int (Range), when you pass it to the shuffle function.
2) It's not working for NumericRange.Inclusive[T] because it's a polymorphic type (generic). While, regular Range.Inclusive (not generic) explicitly extends IndexedSeq[Int]. Looking on shuffle signature:
shuffle[T, CC[X] <: TraversableOnce[X]](xs: CC[T])(implicit bf: CanBuildFrom[CC[T], T, CC[T]]): CC[T]
Higher-order type CC is becoming NumericRange.Inclusive here as it's the biggest parametrized type inherited by NumericRange.Inclusive. In case of Range.Inclusive, that was an IndexedSeq (as smaller Range.Inclusive is not generic). So Range.Inclusive just got lucky to be not affected by (1).
Finally, this will work:
scala> Random.shuffle[Long, IndexedSeq](xs)
res8: IndexedSeq[Long] = Vector(9, 3, 8, 6, 7, 2, 5, 4, 10, 1)
scala> Random.shuffle(xs: IndexedSeq[Long])
res11: IndexedSeq[Long] = Vector(6, 9, 7, 3, 1, 8, 5, 10, 4, 2)

Copy constructor of BitSet (or other collections) in Scala

I need to create a new instance of BitSet class from another BitSet object (input).
I expected something like new BitSet(input), but none found. I could get the new instance with map() method as follows, but I don't think this is the best solution.
var r = input.map(_ + 0)(BitSet.canBuildFrom)
What's the copy constructor of BitSet? What's the general rule for copy constructor in Scala?
You can create another with the bitmask of the first:
var r = new BitSet(input.toBitMask)
I think, the general rule is to use immutable collections. They are, well, immutable, so you can pass them around freely without taking special care for copying them.
When you need mutable collections, however, copying collections becomes useful. I discovered that using standard to method works:
scala> mutable.Set(1, 2, 3)
res0: scala.collection.mutable.Set[Int] = Set(1, 2, 3)
scala> res0.to[mutable.Set]
res1: scala.collection.mutable.Set[Int] = Set(1, 2, 3)
scala> res0 eq res1
res2: Boolean = false
However, it won't work with BitSet because it is not a generic collection, and to needs type constructor as its generic parameter. For BitSet you can use the method suggested by Lee. BTW, it is intended exactly for scala.collection.mutable.BitSet, because scala.collection.immutable.BitSet does not contain such constructor (nor does it need it).
The "copy" method on collections is called clone (to be consistent with Java style).
scala> collection.mutable.BitSet(1,2,3)
res0: scala.collection.mutable.BitSet = BitSet(1, 2, 3)
scala> res0.clone
res1: scala.collection.mutable.BitSet = BitSet(1, 2, 3)
scala> res0 += 4
res2: res0.type = BitSet(1, 2, 3, 4)
scala> res1
res40: scala.collection.mutable.BitSet = BitSet(1, 2, 3)

Repeating a List in Scala

I am a Scala noob. I have decided to write a spider solitaire solver as a first exercise to learn the language and functional programming in general.
I would like to generate a randomly shuffled deck of cards containing 1, 2, or 4 suits. Here is what I came up with:
val numberOfSuits = 1
(List("clubs", "diamonds", "hearts", "spades").take(numberOfSuits) * 4).take(4)
which should return
List("clubs", "clubs", "clubs", "clubs")
List("clubs", "diamonds", "clubs", "diamonds")
List("clubs", "diamonds", "hearts", "spades")
depending on the value of numberOfSuits, except there is no List "multiply" operation that I can find. Did I miss it? Is there a better way to generate the complete deck before shuffling?
BTW, I plan on using an Enumeration for the suits, but it was easier to type my question with strings. I will take the List generated above and using a for comprehension, iterate over the suits and a similar List of card "ranks" to generate a complete deck.
Flatten a finite lists of lists:
scala> List.fill(2)(List(1, 2, 3, 4)).flatten
res18: List[Int] = List(1, 2, 3, 4, 1, 2, 3, 4)
Flatten an infinite Stream of lists, take the first N elements:
scala> Stream.continually(List(1, 2, 3, 4)).flatten.take(8).toList
res19: List[Int] = List(1, 2, 3, 4, 1, 2, 3, 4)
You should look up the scaladoc for the object List. It has all manners of interesting methods for creation of lists. For instance, the following does exactly what you were trying to:
List.flatten(List.make(4, List("clubs", "diamonds", "hearts", "spades").take(numberOfSuits))).take(4)
A much nicer code, however, would be this (Scala 2.7):
val suits = List("clubs", "diamonds", "hearts", "spades")
List.tabulate(4, i => suits.apply(i % numberOfSuits))
On Scala 2.8 tabulate is curried, so the correct syntax would be:
List.tabulate(4)(i => suits.apply(i % numberOfSuits))
You can expand a numeric sequence and flatMap instead of multiplying.
scala> (1 to 3).flatMap(_=>List(1,2,3,4).take(2)).take(4)
res1: Seq[Int] = List(1, 2, 1, 2)
This works in 2.7.x also.
Edit: since you're less experienced with Scala, you may not yet have come across the enrich-my-library pattern. If you want to multiply your lists a lot, you can add a custom conversion class:
class MultipliableList[T](l: List[T]) {
def *(n: Int) = (1 to n).flatMap(_=>l).toList
}
implicit def list2multipliable[T](l: List[T]) = new MultipliableList[T](l)
and now you can
scala> List(1,2,3)*4
res2: List[Int] = List(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3)
(Generally, to reuse such implicits, declare them in an object and then import MyObject._ to get the implicit conversion and corresponding class in scope.)
If you use cats library, you can make use of Semigroup's method combineN. It repeates a list N times.
import cats.implicits._
import cats.syntax.semigroup._
scala> List("clubs", "diamonds", "hearts", "spades").combineN(2)
res1: List[String] = List(clubs, diamonds, hearts, spades, clubs, diamonds, hearts, spades)

Are there any methods included in Scala to convert tuples to lists?

I have a Tuple2 of List[List[String]] and I'd like to be able to convert the tuple to a list so that I can then use List.transpose(). Is there any way to do this? Also, I know it's a Pair, though I'm always a fan of generic solutions.
Works with any tuple (scala 2.8):
myTuple.productIterator.toList
Scala 2.7:
(0 to (myTuple.productArity-1)).map(myTuple.productElement(_)).toList
Not sure how to maintain type info for a general Product or Tuple, but for Tuple2:
def tuple2ToList[T](t: (T,T)): List[T] = List(t._1, t._2)
You could, of course, define similar type-safe conversions for all the Tuples (up to 22).
Using Shapeless -
# import syntax.std.tuple._
import syntax.std.tuple._
# (1,2,3).toList
res21: List[Int] = List(1, 2, 3)
# (1,2,3,4,3,3,3,3,3,3,3).toList
res22: List[Int] = List(1, 2, 3, 4, 3, 3, 3, 3, 3, 3, 3)
Note that type information is not lost using Shapeless's toList.