Generate immutable collection in one expression - scala

How to fill collection and then add one element to it without using mutable collection or declaring it as var?
In other words how I can use immutable collection in the following code instead of mutable.Buffer?
val values: mutable.Buffer[MyClass] = {
(for (i <- 1 until 10
) yield MyClass(Some(i)).toBuffer
}
values += MyClass(None)

I switched to map, but with for-comprehension this should be the same:
val values = (1 until gridSize.size).map(i => MyClass(Some(i))) ++ Seq(MyClass(None), ...)

Related

Scala: Update Array inside a Map

I am creating a Map which has an Array inside it. I need to keep adding values to that Array. How do I do that?
var values: Map[String, Array[Float]] = Map()
I tried several ways such as:
myobject.values.getOrElse("key1", Array()).++(Array(float1))
Few other ways to but nothing updates the array inside the Map.
There is a problem with this code:
values.getOrElse("key1", Array()).++(Array(float1))
This does not update the Map in values, it just creates a new Array and then throws it away.
You need to replace the original Map with a new, updated Map, like this:
values = values.updated("key1", values.getOrElse("key1", Array.empty[Float]) :+ float1)
To understand this you need to be clear on the distinction between mutable variables and mutable data.
var is used to create a mutable variable which means that the variable can be assigned a new value, e.g.
var name = "John"
name = "Peter" // Would not work if name was a val
By contrast mutable data is held in objects whose contents can be changed
val a = Array(1,2,3)
a(0) = 12 // Works even though a is a val not a var
In your example values is a mutable variable but the Map is immutable so it can't be changed. You have to create a new, immutable, Map and assign it to the mutable var.
From what I can see (according to ++), you would like to append Array, with one more element. But Array fixed length structure, so instead I'd recommend to use Vector. Because, I suppose, you are using immutable Map you need update it as well.
So the final solution might look like:
var values: Map[String, Vector[Float]] = Map()
val key = "key1"
val value = 1.0
values = values + (key -> (values.getOrElse(key, Vector.empty[Float]) :+ value))
Hope this helps!
You can use Scala 2.13's transform function to transform your map anyway you want.
val values = Map("key" -> Array(1f, 2f, 3f), "key2" -> Array(4f,5f,6f))
values.transform {
case ("key", v) => v ++ Array(6f)
case (_,v) => v
}
Result:
Map(key -> Array(1.0, 2.0, 3.0, 6.0), key2 -> Array(4.0, 5.0, 6.0))
Note that appending to arrays takes linear time so you might want to consider a more efficient data structure such as Vector or Queue or even a List (if you can afford to prepend rather than append).
Update:
However, if it is only one key you want to update, it is probably better to use updatedWith:
values.updatedWith("key")(_.map(_ ++ Array(6f)))
which will give the same result. The nice thing about the above code is that if the key does not exist, it will not change the map at all without throwing any error.
Immutable vs Mutable Collections
You need to choose what type of collection you will use immutable or mutable one. Both are great and works totally differently. I guess you are familiar with mutable one (from other languages), but immutable are default in scala and probably you are using it in your code (because it doesn't need any imports). Immutable Map cannot be changed... you can only create new one with updated values (Tim's and Ivan's answers covers that).
There are few ways to solve your problem and all are good depending on use case.
See implementation below (m1 to m6):
//just for convenience
type T = String
type E = Long
import scala.collection._
//immutable map with immutable seq (default).
var m1 = immutable.Map.empty[T,List[E]]
//mutable map with immutable seq. This is great for most use-cases.
val m2 = mutable.Map.empty[T,List[E]]
//mutable concurrent map with immutable seq.
//should be fast and threadsafe (if you know how to deal with it)
val m3 = collection.concurrent.TrieMap.empty[T,List[E]]
//mutable map with mutable seq.
//should be fast but could be unsafe. This is default in most imperative languages (PHP/JS/JAVA and more).
//Probably this is what You have tried to do
val m4 = mutable.Map.empty[T,mutable.ArrayBuffer[E]]
//immutable map with mutable seq.
//still could be unsafe
val m5 = immutable.Map.empty[T,mutable.ArrayBuffer[E]]
//immutable map with mutable seq v2 (used in next snipped)
var m6 = immutable.Map.empty[T,mutable.ArrayBuffer[E]]
//Oh... and NEVER DO THAT, this is wrong
//I mean... don't keep mutable Map in `var`
//var mX = mutable.Map.empty[T,...]
Other answers show immutable.Map with immutable.Seq and this is preferred way (or default at least). It costs something but for most apps it is perfectly ok. Here You have nice source of info about immutable data structures: https://stanch.github.io/reftree/talks/Immutability.html.
Each variant has it's own Pros and Cons. Each deals with updates differently, and it makes this question much harder than it looks at the first glance.
Solutions
val k = "The Ultimate Answer"
val v = 42f
//immutable map with immutable seq (default).
m1 = m1.updated(k, v :: m1.getOrElse(k, Nil))
//mutable map with immutable seq.
m2.update(k, v :: m2.getOrElse(k, Nil))
//mutable concurrent map with immutable seq.
//m3 is bit harder to do in scala 2.12... sorry :)
//mutable map with mutable seq.
m4.getOrElseUpdate(k, mutable.ArrayBuffer.empty[Float]) += v
//immutable map with mutable seq.
m5 = m5.updated(k, {
val col = m5.getOrElse(k, c.mutable.ArrayBuffer.empty[E])
col += v
col
})
//or another implementation of immutable map with mutable seq.
m6.get(k) match {
case None => m6 = m6.updated(k, c.mutable.ArrayBuffer(v))
case Some(col) => col += v
}
check scalafiddle with this implementations. https://scalafiddle.io/sf/WFBB24j/3.
This is great tool (ps: you can always save CTRL+S your changes and share link to write question about your snippet).
Oh... and if You care about concurrency (m3 case) then write another question. Such topic deserve to be in separate thread :)
(im)mutable api VS (im)mutable Collections
You can have mutable collection and still use immutable api that will copy orginal seq. For example Array is mutable:
val example = Array(1,2,3)
example(0) = 33 //edit in place
println(example.mkString(", ")) //33, 2, 3
But some functions on it (e.g. ++) will create new sequence... not change existing one:
val example2 = example ++ Array(42, 41) //++ is immutable operator
println(example.mkString(", ")) //33, 2, 3 //example stays unchanged
println(example2.mkString(", ")) //33, 2, 3, 42, 41 //but new sequence is created
There is method updateWith that is mutable and will exist only in mutable sequences. There is also updatedWith and it exists in both immutable AND mutable collections and if you are not careful enough you will use wrong one (yea ... 1 letter more).
This means you need to be careful which functions you are using, immutable or mutable one. Most of the time you can distinct them by result type. If something returns collection then it will be probably some kind of copy of original seq. It result is unit then it is mutable for sure.

How to sort a list in scala

I am a newbie in scala and I need to sort a very large list with 40000 integers.
The operation is performed many times. So performance is very important.
What is the best method for sorting?
You can sort the list with List.sortWith() by providing a relevant function literal. For example, the following code prints all elements of sorted list which contains all elements of the initial list in alphabetical order of the first character lowercased:
val initial = List("doodle", "Cons", "bible", "Army")
val sorted = initial.sortWith((s: String, t: String)
=> s.charAt(0).toLower < t.charAt(0).toLower)
println(sorted)
Much shorter version will be the following with Scala's type inference:
val initial = List("doodle", "Cons", "bible", "Army")
val sorted = initial.sortWith((s, t) => s.charAt(0).toLower < t.charAt(0).toLower)
println(sorted)
For integers there is List.sorted, just use this:
val list = List(4, 3, 2, 1)
val sortedList = list.sorted
println(sortedList)
just check the docs
List has several methods for sorting. myList.sorted works for types with already defined order (like Int or String and others). myList.sortWith and myList.sortBy receive a function that helps defining the order
Also, first link on google for scala List sort: http://alvinalexander.com/scala/how-sort-scala-sequences-seq-list-array-buffer-vector-ordering-ordered
you can use List(1 to 400000).sorted

Populating an immutable List

Here I populate two Lists where each list is either mutable or immutable :
var mutableList = scala.collection.mutable.MutableList[String]()
//> mutableList : scala.collection.mutable.MutableList[String] = MutableList()
//|
for (a <- 1 to 100) {
mutableList += a.toString
}
println(mutableList.size); //> 100
val immutableList = List[String]() //> immutableList : List[String] = List()
for (a <- 1 to 100) {
immutableList :+ a.toString
}
println(immutableList.size); //> 0
When I print the size of the immutableList its output is 0. This is because within the for loop a new reference is created that does not point to immutableList ? Is there a functional equivalent to populating an immutable List from within loop ?
As Gabor answered in a comment, you want to use fold, or even continue with the for and yield. What he did not explain is why you are getting a size of 0. The reason is that immutableList :+ a.toString is returning a new list each time, which you are not using. the immutableList is exactly that, immutable.
Keep in mind that everything in Scala is an expression and therefore returns something. So, you can turn your regular for (which acts like a forEach) into a comprehension by adding the yield as below
val immutableList = for (a <- 1 to 100) yield a.toString
This desugars into something like:
(1 to 100).map(_.toString)
For completeness, method tabulate allows for creating and populating an immutable List, for instance as follows,
List.tabulate(100)(a => a.toString)
or equivalently
List.tabulate(100)(_.toString)

Sort a list by an ordered index

Let us assume that I have the following two sequences:
val index = Seq(2,5,1,4,7,6,3)
val unsorted = Seq(7,6,5,4,3,2,1)
The first is the index by which the second should be sorted. My current solution is to traverse over the index and construct a new sequence with the found elements from the unsorted sequence.
val sorted = index.foldLeft(Seq[Int]()) { (s, num) =>
s ++ Seq(unsorted.find(_ == num).get)
}
But this solution seems very inefficient and error-prone to me. On every iteration it searches the complete unsorted sequence. And if the index and the unsorted list aren't in sync, then either an error will be thrown or an element will be omitted. In both cases, the not in sync elements should be appended to the ordered sequence.
Is there a more efficient and solid solution for this problem? Or is there a sort algorithm which fits into this paradigm?
Note: This is a constructed example. In reality I would like to sort a list of mongodb documents by an ordered list of document Id's.
Update 1
I've selected the answer from Marius Danila because it seems the more fastest and scala-ish solution for my problem. It doesn't come with a not in sync item solution, but this could be easily implemented.
So here is the updated solution:
def sort[T: ClassTag, Key](index: Seq[Key], unsorted: Seq[T], key: T => Key): Seq[T] = {
val positionMapping = HashMap(index.zipWithIndex: _*)
val inSync = new Array[T](unsorted.size)
val notInSync = new ArrayBuffer[T]()
for (item <- unsorted) {
if (positionMapping.contains(key(item))) {
inSync(positionMapping(key(item))) = item
} else {
notInSync.append(item)
}
}
inSync.filterNot(_ == null) ++ notInSync
}
Update 2
The approach suggested by Bask.cc seems the correct answer. It also doesn't consider the not in sync issue, but this can also be easily implemented.
val index: Seq[String]
val entities: Seq[Foo]
val idToEntityMap = entities.map(e => e.id -> e).toMap
val sorted = index.map(idToEntityMap)
val result = sorted ++ entities.filterNot(sorted.toSet)
Why do you want to sort collection, when you already have sorted index collection? You can just use map
Concerning> In reality I would like to sort a list of mongodb documents by an ordered list of document Id's.
val ids: Seq[String]
val entities: Seq[Foo]
val idToEntityMap = entities.map(e => e.id -> e).toMap
ids.map(idToEntityMap _)
This may not exactly map to your use case, but Googlers may find this useful:
scala> val ids = List(3, 1, 0, 2)
ids: List[Int] = List(3, 1, 0, 2)
scala> val unsorted = List("third", "second", "fourth", "first")
unsorted: List[String] = List(third, second, fourth, first)
scala> val sorted = ids map unsorted
sorted: List[String] = List(first, second, third, fourth)
I do not know the language that you are using. But irrespective of the language this is how i would have solved the problem.
From the first list (here 'index') create a hash table taking key as the document id and the value as the position of the document in the sorted order.
Now when traversing through the list of document i would lookup the hash table using the document id and then get the position it should be in the sorted order. Then i would use this obtained order to sort in a pre allocated memory.
Note: if the number of documents is small then instead of using hashtable u could use a pre allocated table and index it directly using the document id.
Flat Mapping the index over the unsorted list seems to be a safer version (if the index isn't found it's just dropped since find returns a None):
index.flatMap(i => unsorted.find(_ == i))
It still has to traverse the unsorted list every time (worst case this is O(n^2)). With you're example I'm not sure that there's a more efficient solution.
In this case you can use zip-sort-unzip:
(unsorted zip index).sortWith(_._2 < _._2).unzip._1
Btw, if you can, better solution would be to sort list on db side using $orderBy.
Ok.
Let's start from the beginning.
Besides the fact you're rescanning the unsorted list each time, the Seq object will create, by default a List collection. So in the foldLeft you're appending an element at the end of the list each time and this is a O(N^2) operation.
An improvement would be
val sorted_rev = index.foldLeft(Seq[Int]()) { (s, num) =>
unsorted.find(_ == num).get +: s
}
val sorted = sorted_rev.reverse
But that is still an O(N^2) algorithm. We can do better.
The following sort function should work:
def sort[T: ClassTag, Key](index: Seq[Key], unsorted: Seq[T], key: T => Key): Seq[T] = {
val positionMapping = HashMap(index.zipWithIndex: _*) //1
val arr = new Array[T](unsorted.size) //2
for (item <- unsorted) { //3
val position = positionMapping(key(item))
arr(position) = item
}
arr //6
}
The function sorts a list of items unsorted by a sequence of indexes index where the key function will be used to extract the id from the objects you're trying to sort.
Line 1 creates a reverse index - mapping each object id to its final position.
Line 2 allocates the array which will hold the sorted sequence. We're using an array since we need constant-time random-position set performance.
The loop that starts at line 3 will traverse the sequence of unsorted items and place each item in it's meant position using the positionMapping reverse index
Line 6 will return the array converted implicitly to a Seq using the WrappedArray wrapper.
Since our reverse-index is an immutable HashMap, lookup should take constant-time for regular cases. Building the actual reverse-index takes O(N_Index) time where N_Index is the size of the index sequence. Traversing the unsorted sequence takes O(N_Unsorted) time where N_Unsorted is the size of the unsorted sequence.
So the complexity is O(max(N_Index, N_Unsorted)), which I guess is the best you can do in the circumstances.
For your particular example, you would call the function like so:
val sorted = sort(index, unsorted, identity[Int])
For the real case, it would probably be like this:
val sorted = sort(idList, unsorted, obj => obj.id)
The best I can do is to create a Map from the unsorted data, and use map lookups (basically the hashtable suggested by a previous poster). The code looks like:
val unsortedAsMap = unsorted.map(x => x -> x).toMap
index.map(unsortedAsMap)
Or, if there's a possibility of hash misses:
val unsortedAsMap = unsorted.map(x => x -> x).toMap
index.flatMap(unsortedAsMap.get)
It's O(n) in time*, but you're swapping time for space, as it uses O(n) space.
For a slightly more sophisticated version, that handles missing values, try:
import scala.collection.JavaConversions._
import scala.collection.mutable.ListBuffer
val unsortedAsMap = new java.util.LinkedHashMap[Int, Int]
for (i <- unsorted) unsortedAsMap.add(i, i)
val newBuffer = ListBuffer.empty[Int]
for (i <- index) {
val r = unsortedAsMap.remove(i)
if (r != null) newBuffer += i
// Not sure what to do for "else"
}
for ((k, v) <- unsortedAsMap) newBuffer += v
newBuffer.result()
If it's a MongoDB database in the first place, you might be better retrieving documents directly from the database by index, so something like:
index.map(lookupInDB)
*technically it's O(n log n), as Scala's standard immutable map is O(log n), but you could always use a mutable map, which is O(1)

Scala lazy elements in iterator

Does anyone know how to create a lazy iterator in scala?
For example, I want to iterate through instantiating each element. After passing, I want the instance to die / be removed from memory.
If I declare an iterator like so:
val xs = Iterator(
(0 to 10000).toArray,
(0 to 10).toArray,
(0 to 10000000000).toArray)
It creates the arrays when xs is declared. This can be proven like so:
def f(name: String) = {
val x = (0 to 10000).toArray
println("f: " + name)
x
}
val xs = Iterator(f("1"),f("2"),f("3"))
which prints:
scala> val xs = Iterator(f("1"),f("2"),f("3"))
f: 1
f: 2
f: 3
xs: Iterator[Array[Int]] = non-empty iterator
Anyone have any ideas?
Streams are not suitable because elements remain in memory.
Note: I am using an Array as an example, but I want it to work with any type.
Scala collections have a view method which produces a lazy equivalent of the collection. So instead of (0 to 10000).toArray, use (0 to 10000).view. This way, there will be no array created in the memory. See also https://stackoverflow.com/a/6996166/90874, https://stackoverflow.com/a/4799832/90874, https://stackoverflow.com/a/4511365/90874 etc.
Use one of Iterator factory methods which accepts call-by-name parameter.
For your first example you can do one of this:
val xs1 = Iterator.fill(3)((0 to 10000).toArray)
val xs2 = Iterator.tabulate(3)(_ => (0 to 10000).toArray)
val xs3 = Iterator.continually((0 to 10000).toArray).take(3)
Arrays won't be allocated until you need them.
In case you need different expressions for each element, you can create separate iterators and concatenate them:
val iter = Iterator.fill(1)(f("1")) ++
Iterator.fill(1)(f("2")) ++
Iterator.fill(1)(f("3"))