Want to know how scala arranges the data in set.
scala> val imm = Set(1, 2, 3, "four") //immutable variable
imm : scala.collection.immutable.Set[Any] = Set(1, 2, 3, four)
scala> var mu = Set(1, 2, 3, "four", 9)
mu: scala.collection.immutable.Set[Any] = Set(four, 1, 9, 2, 3)
scala> mu += "five"
scala> mu.toString
res1: String = Set(four, 1, 9, 2, 3, five)
Order remains as we insert in case of immutable Set but not in mutable Set.
Also no matter how many times I create a new set with var xyz = Set(1, 2, 3, "four", 9) the order of getting stored remains same Set(four, 1, 9, 2, 3). So it's not storing in random, there is some logic behind it which I want to know. Moreover, is there any advantage of such behavior?
Sets don't have an order. An item is in the set or it isn't. That's all you can know about a set.
If you require a certain ordering, you need to use an ordered set or possibly a sorted set.
Of course, any particular implementation of a set may or may not incidentally have an ordering which may or may not be stable across multiple calls, but that would be a purely accidental implementation detail that may change at any time, during an update of Scala or even between two calls.
Related
Given
val xs1 = Set(3, 2, 1, 4, 5, 6, 7)
val ys1 = Set(7, 2, 1, 4, 5, 6, 3)
xs1 and ys1 both result in scala.collection.immutable.Set[Int] = Set(5, 1, 6, 2, 7, 3, 4)
but smaller sets bellow
val xt1 = Set(1, 2, 3)
val yt1 = Set(3, 2, 1)
produce
xt1: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
yt1: scala.collection.immutable.Set[Int] = Set(3, 2, 1)
Why are former not ordered whilst latter seem to be ordered?
Difference in behaviour is due to optimisations for Sets of up to 4 elements
The default implementation of an immutable set uses a representation
that adapts to the number of elements of the set. An empty set is
represented by just a singleton object. Sets of sizes up to four are
represented by a single object that stores all elements as fields.
Beyond that size, immutable sets are implemented as Compressed
Hash-Array Mapped Prefix-tree.
similarly explained by Ben James:
Set is also a companion object* with an apply** method. When you call
Set(...), you're calling this factory method and getting a return
value which is some kind of Set. It might be a HashSet, but could be
some other implementation. According to 2, the default implementation
for an immutable set has special representation for empty set and sets
size up to 4. Immutable sets size 5 and above and mutable sets all use
hashSet.
Since size of Set(3, 2, 1, 4, 5, 6, 7) is greater than 4, then its concrete implementation is HashSet
Set(3, 2, 1, 4, 5, 6, 7).getClass
class scala.collection.immutable.HashSet
which does not guarantee insertion order. On the other hand, concrete implementation of Set(1, 2, 3) is dedicated class Set3
Set(1,2,3).getClass
class scala.collection.immutable.Set$Set3
which stores the three elements in corresponding three fields
final class Set3[A] private[collection] (elem1: A, elem2: A, elem3: A) extends AbstractSet[A] ...
I have tried below array example in my scala console.Declared immutable array, tried to change the array index values.Immutable should not allow modifying the values. I don't understand whats happening .can anyone explain, please.
val numbers=Array(1,2,3)
numbers(0)=5
print numbers
res1: Array[Int] = Array(5, 2, 3)
Thanks!
Declaring something as a val does not make it immutable. It merely prevents reassignment. For example, the following code would not compile:
val numbers = Array(1, 2, 3)
numbers = Array(5, 2, 3)
Notice that what changes in the above code is not something about the array object's internal state: what changes is the array that the name numbers references. At the first line, the name numbers refers to the array Array(1, 2, 3), but in the second line we try reassign the name numbers to the array Array(5, 2, 3) (which the compiler won't allow, since the name numbers is declared using val).
In contrast, the below code is allowed:
val numbers = Array(1, 2, 3)
numbers(0) = 5
In this code, the name numbers still points to the same array, but it's the internal state of the array that has changed. Using the val keyword for the name of an object cannot prevent that object's internal state from changing, the val keyword can only prevent the name from being reassigned to some other object.
You did not declare an immutable array. Array in Scala is mutable.
Where you might be confused is what exactly val vs var means. val does not make an object immutable, it makes the reference immutable so you can't reassign a different array to your variable, but you can still modify the contents since it is a mutable object.
If you want immutability you need to use val together with an immutable object like Vector or List.
Maybe your understanding of scala Array is not right.
Please pay attention to the following principle!
Arrays preserve order, can contain duplicates, and are mutable.
scala> val numbers = Array(1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
numbers: Array[Int] = Array(1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
scala> numbers(3) = 10
Lists preserve order, can contain duplicates, and are immutable.
scala> val numbers = List(1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
numbers: List[Int] = List(1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
scala> numbers(3) = 10
<console>:9: error: value update is not a member of List[Int]
numbers(3) = 10
I have a method that looks like this:
def compute[T](l: List[T]): List[T] = {
val shuffled = util.Random.shuffle(l)
// do some more computations
}
I wanted to seed the random number generator for my unit tests so that I don't have to break down my method into two methods and test only the computation, since this is the method that will be used externally. Is this possible to do in ScalaTest?
I don't have much background in ScalaTest, but if you call setSeed(seed: Long): Unit then you'll always get the same shuffle for any given seed value.
scala> util.Random.shuffle(Seq(1,2,3,4,5,6,7,8,9,0))
res0: Seq[Int] = List(6, 0, 8, 5, 4, 7, 2, 3, 1, 9)
scala> util.Random.setSeed(57L)
scala> util.Random.shuffle(Seq(1,2,3,4,5,6,7,8,9,0))
res1: Seq[Int] = List(5, 3, 2, 0, 6, 8, 7, 4, 1, 9)
scala> util.Random.setSeed(57L)
scala> util.Random.shuffle(Seq(1,2,3,4,5,6,7,8,9,0))
res2: Seq[Int] = List(5, 3, 2, 0, 6, 8, 7, 4, 1, 9)
how would you test the method that depends on current time? or database state? or response from google? or sleeping for some time?
generally when you test any code that depends on some external state (time, other system, or in your case: entropy / seed), you refactor that code and extract the dependency. one way, as #jwvh said is to extract seed. but imho, you should extract the whole transformation
therefore you should create a method that receives the shuffled array and test that method
I've manually built a method that takes 2 arrays and combines them to 1 like this:
a0,a1,a2,b0,b1,a3,a4,a5,b2,b3,a6,...
So I always take 3 elements of the first array, then 2 of the second one.
As I said, I built that function manually.
Now I guess I could make this a one-liner instead with the help of zip. The problem is, that zip alone is not enough as zip builds tuples like (a0, b0).
Of course I can flatMap this, but still not what I want:
val zippedArray: List[Float] = data1.zip(data2).toList.flatMap(t => List(t._1, t._2))
That way I'd get a List(a0, b0, a1, b1,...), still not what I want.
(I'd then use toArray for the list... it's more convenient to work with a List right now)
I thought about using take and drop but they return new data-structures instead of modifying the old one, so not really what I want.
As you can imagine, I'm not really into functional programming (yet). I do use it and I see huge benefits, but some things are so different to what I'm used to.
Consider grouping array a by 3, and array b by 2, namely
val a = Array(1,2,3,4,5,6)
val b = Array(11,22,33,44)
val g = (a.grouped(3) zip b.grouped(2)).toArray
Array((Array(1, 2, 3),Array(11, 22)), (Array(4, 5, 6),Array(33, 44)))
Then
g.flatMap { case (x,y) => x ++ y }
Array(1, 2, 3, 11, 22, 4, 5, 6, 33, 44)
Very similar answer to #elm but I wanted to show that you can use more lazy approach (iterator) to avoid creating temp structures:
scala> val a = List(1,2,3,4,5,6)
a: List[Int] = List(1, 2, 3, 4, 5, 6)
scala> val b = List(11,22,33,44)
b: List[Int] = List(11, 22, 33, 44)
scala> val groupped = a.sliding(3, 3) zip b.sliding(2, 2)
groupped: Iterator[(List[Int], List[Int])] = non-empty iterator
scala> val result = groupped.flatMap { case (a, b) => a ::: b }
result: Iterator[Int] = non-empty iterator
scala> result.toList
res0: List[Int] = List(1, 2, 3, 11, 22, 4, 5, 6, 33, 44)
Note that it stays an iterator all the way until we materialize it with toList
This was quite an unplesant surprise:
scala> Set(1, 2, 3, 4, 5)
res18: scala.collection.immutable.Set[Int] = Set(4, 5, 1, 2, 3)
scala> Set(1, 2, 3, 4, 5).toList
res25: List[Int] = List(5, 1, 2, 3, 4)
The example by itself suggest a "no" answer to my question. Then what about ListSet?
scala> import scala.collection.immutable.ListSet
scala> ListSet(1, 2, 3, 4, 5)
res21: scala.collection.immutable.ListSet[Int] = Set(1, 2, 3, 4, 5)
This one seems to work, but should I rely on this behavior?
What other data structure is suitable for an immutable collection of unique items, where the original order must be preserved?
By the way, I do know about distict method in List. The problem is, I want to enforce uniqueness of items (while preserving the order) at interface level, so using distinct would mess up my neat design..
EDIT
ListSet doesn't seem very reliable either:
scala> ListSet(1, 2, 3, 4, 5).toList
res28: List[Int] = List(5, 4, 3, 2, 1)
EDIT2
In my search for a perfect design I tried this:
scala> class MyList[A](list: List[A]) { val values = list.distinct }
scala> implicit def toMyList[A](l: List[A]) = new MyList(l)
scala> implicit def fromMyList[A](l: MyList[A]) = l.values
Which actually works:
scala> val l1: MyList[Int] = List(1, 2, 3)
scala> l1.values
res0: List[Int] = List(1, 2, 3)
scala> val l2: List[Int] = new MyList(List(1, 2, 3))
l2: List[Int] = List(1, 2, 3)
The problem, however, is that I do not want to expose MyList outside the library. Is there any way to have the implicit conversion when overriding? For example:
trait T { def l: MyList[_] }
object O extends T { val l: MyList[_] = List(1, 2, 3) }
scala> O.l mkString(" ") // Let's test the implicit conversion
res7: String = 1 2 3
I'd like to do it like this:
object O extends T { val l = List(1, 2, 3) } // Doesn't work
That depends on the Set you are using. If you do not know which Set implementation you have, then the answer is simply, no you cannot be sure. In practice I usually encounter the following three cases:
I need the items in the Set to be ordered. For this I use classes mixing in the SortedSet trait which when you use only the Standard Scala API is always a TreeSet. It guarantees the elements are ordered according to their compareTo method (see the Ordered trat). You get a (very) small performance penalty for the sorting as the runtime of inserts/retrievals is now logarithmic, not (almost) constant like with the HashSet (assuming a good hash function).
You need to preserve the order in which the items are inserted. Then you use the LinkedHashSet. Practically as fast as the normal HashSet, needs a little more storage space for the additional links between elements.
You do not care about order in the Set. So you use a HashSet. (That is the default when using the Set.apply method like in your first example)
All this applies to Java as well, Java has a TreeSet, LinkedHashSet and HashSet and the corresponding interfaces SortedSet, Comparable and plain Set.
It is my belief that you should never rely on the order in a set. In no language.
Apart from that, have a look at this question which talks about this in depth.
ListSet will always return elements in the reverse order of insertion because it is backed by a List, and the optimal way of adding elements to a List is by prepending them.
Immutable data structures are problematic if you want first in, first out (a queue). You can get O(logn) or amortized O(1). Given the apparent need to build the set and then produce an iterator out of it (ie, you'll first put all elements, then you'll remove all elements), I don't see any way to amortize it.
You can rely that a ListSet will always return elements in last in, first out order (a stack). If that suffices, then go for it.