Scala immutable list internal implementation - scala

Suppose I am having a huge list having elements from 1 to 1 million.
val initialList = List(1,2,3,.....1 million)
and
val myList = List(1,2,3)
Now when I apply an operation such as foldLeft on the myList giving initialList as the starting value such as
val output = myList.foldLeft(initialList)(_ :+ _)
// result ==>> List(1,2,3,.....1 million, 1 , 2 , 3)
Now my question comes here, both the lists being immutable the intermediate lists that were produced were
List(1,2,3,.....1 million, 1)
List(1,2,3,.....1 million, 1 , 2)
List(1,2,3,.....1 million, 1 , 2 , 3)
By the concept of immutability, every time a new list is being created and the old one being discarded. So isn't this operation a performance killer in scala as every time a new list of 1 million elements has to be copied to create a new list.
Please correct me if I am wrong as I am trying to understand the internal implementation of an immutable list.
Thanks in advance.

Yup, this is performance killer, but this is a cost of having immutable structures (which are amazing, safe and makes programs much less buggy). That's why you should often avoid appending list if you can. There is many tricks that can avoid this issue (try to use accumulators).
For example:
Instead of:
val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = myList.foldLeft(initialList)(_ :+ _)
You can write:
val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = List(initialList,myList).flatten
Flatten is implemented to copy first line only once instead of copying it for every single append.
P.S.
At least adding element to the front of list works fast (O(1)), cause sharing of old list is possible. Let's Look at this example:
You can see how memory sharing works for immutable linked lists. Computer only keeps one copy of (b,c,d) end. But if you want to append bar to the end of baz you cannot modify baz, cause you would destroy foo, bar and raz! That's why you have to copy first list.

Appending to a List is not a good idea because List has linear cost for appending. So, if you can
either prepend to the List (List have constant time prepend)
or choose another collection that is efficient for appending. That would be a Queue
For the list of performance characteristic per operation on most scala collections, See:
https://docs.scala-lang.org/overviews/collections/performance-characteristics.html
Note that, depending on your requirement, you may also make your own smarter collection, such as chain iterable for example

Related

Scala Array.view memory usage

I'm learning Scala and have been trying some LeetCode problems with it, but I'm having trouble with the memory limit being exceeded. One problem I have tried goes like this:
A swap is defined as taking two distinct positions in an array and swapping the values in them.
A circular array is defined as an array where we consider the first element and the last element to be adjacent.
Given a binary circular array nums, return the minimum number of swaps required to group all 1's present in the array together at any location.
and my attempted solution looks like
object Solution {
def minSwaps(nums: Array[Int]): Int = {
val count = nums.count(_==1)
if (count == 0) return 0
val circular = nums.view ++ nums.view
circular.sliding(count).map(_.count(_==0)).min
}
}
however, when I submit it, I'm hit with Memory Limit Exceeded for one of the test case where nums is very large.
My understanding is that, because I'm using .view, I shouldn't be allocating over O(1) memory. Is that understanding incorrect? To be clear, I realise this is the most time efficient way of solving this, but I didn't expect it to be memory inefficient.
The version used is Scala 2.13.7, in case that makes a difference.
Update
I did some inspection of the types and it seems circular is only a View unless I replace ++ with concat which makes it IndexedSeqView, why is that, I thought ++ was just an alias for concat?
If I make the above change, and replace circular.sliding(count) with (0 to circular.size - count).view.map(i => circular.slice(i, i + count)) it "succeeds" in hitting the time limit instead, so I think sliding might not be optimised for IndexedSeqView.

Mapping on slices of a List

I was wondering what the best way to accomplish the following given a List:
val l = List("a","b","c","d","e","f","g","h","i","j","k","l","m" /*...,x,y,z*/)
For each 5 items (or less for the last segment) apply a function like:
...map(_.mkString(","))
Such that I end up with a List that looks like:
List("a,b,c,d,e","f,g,h,i,j","k,l,m,n,o",/*...,*/"u,v,w,x,y,"z")
Perhaps there is a common term for this type of list processing, however I'm not aware of it. Essentially I'm grouping items, so using zipWithIndex and then modding by 5 on the index to indicate where to partition?
You can use the grouped(n) method on the List.
val l = List("a","b","c","d","e","f","g","h","i","j","k","l","m")
l.grouped(5).map(_.mkString(",")).toList
Results in
List("a,b,c,d,e", "f,g,h,i,j", "k,l,m"): List[String]

Lazily generate partial sums in Scala

I want to produce a lazy list of partial sums and stop when I have found a "suitable" sum. For example, I want to do something like the following:
val str = Stream.continually {
val i = Random.nextInt
println("generated " + i)
List(i)
}
str
.take(5)
.scanLeft(List[Int]())(_ ++ _)
.find(l => !l.forall(_ > 0))
This produces output like the following:
generated -354822103
generated 1841977627
z: Option[List[Int]] = Some(List(-354822103))
This is nice because I've avoided producing the entire list of lists before finding a suitable list. However, it's suboptimal because I generated one extra random number that I don't need (i.e., the second, positive number in this test run). I know I can hand code a solution to do what I want, but is there a way to use the core scala collection library to achieve this result without writing my own recursion?
The above example is just a toy, but the real application involves heavy-duty network traffic for each "retry" as I build up a map until the map is "complete".
EDIT: Note that even substituting take(1) for find(...) results in the generation of a random number even though the returned value List() does not depend on the number. Does anyone know why the number is being generated in this case? I would think scanLeft does not need to fetch an element of the iterable receiving the call to scanLeft in this case.

Immutable DataStructures In Scala

We know that Scala supports immutable data structures..i.e each time u update the list it will create a new object and reference in the heap.
Example
val xs:List[Int] = List.apply(22)
val newList = xs ++ (33)
So when i append the second element to a list it will create a new list which will contain both 22 and 33.This exactly works like how immutable String works in Java.
So the question is each time I append a element in the list a new object will be created each time..This ldoes not look efficient to me.
is there some special data structures like persistent data structures are used when dealing with this..Does anyone know about this?
Appending to a list has O(n) complexity and is inefficient. A general approach is to prepend to a list while building it, and finally reverse it.
Now, your question on creating new object still applies to the prepend. Note that since xs is immutable, newList just points to xs for the rest of the data after the prepend.
While #manojlds is correct in his analysis, the original post asked about the efficiency of duplicating list nodes whenever you do an operation.
As #manojlds said, constructing lists often require thinking backwards, i.e., building a list and then reversing it. There are a number of other situations where list building requires "needless" copying.
To that end, there is a mutable data structure available in Scala called ListBuffer which you can use to build up your list and then extract the result as an immutable list:
val xsa = ListBuffer[Int](22)
xsa += 33
val newList = xsa.toList
However, the fact that the list data structure is, in general, immutable means that you have some very useful tools to analyze, de-compose and re-compose the list. Many builtin operations take advantage of the immutability. By extension, your own programs can also take advantage of this immutability.

join two sequences in scala [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
This is rewritten question. Previous one is unclear.
1) Introduction
I need some cool structure, that would store my values. I need that my cool structure can add items to it, and sometimes I need that my cool structure can fold containing elements into something based on passed foldFunction . Generally something like scala.collection.immutable.List is great, besides point 2.
See this in action:
val coolContainer = 1 :: 2 :: 3 :: 4 :: Nil
val folded = coolContainer.foldLeft("")((acc, curr)=> acc + curr)
Yea! I got what I wanted - the shiny "1234" String :)
2) Additional requirements
Then I want that my cool structure can be appended to another one. More over this is very important that such operation must be efficient because my program will mostly do such operations. Because of that, the "appending of two containers" algorithm with complexity O(1) is very expected.
So, lets just try to append two Lists, one to another using standard way with the ++ function.
val locomotive = (1 to 1000000).toList
val vehicle = (1000000 to 2000000).toList
val train = locomotive ++ vehicle
So what is going here? As most of you probably know, the List in Scala Standard Library API is implemented as head prepended to tail where tail is another List. Finally the last element in list is prepended to empty List Nil.
This architecture implies that if you join two lists using the ++ function, algoirithm under the hood will traverse locomotive through all elements until the last item and then replace the tail (which is the Nil) with the value of another List - with the vehicle in our example. The complexity is O(size_of_loclomotive). Ehh, :/ I want it to be O(1).
Finally the question.
Is there in scala container, that behaves similarly to the List and meets the above requirements?
Deprecated, old question. Just in case if you are just interested how it was reasked and don't.
Basically I want to choose the best pair: structure and method for appending one sequence to another. The efficiency is the key in this scenario.
Read below snipped to understand what I want to achieve but more efficient:
val seq1 = List(1,2,3)
val seq2 = List(4,5,6)
val seq3 = seq1 ++ seq2 //I am afraid that seq1 and seq2 will be traversed
//what is not the most efficient way to join two seqs
If you're going to be appending over and over, where eventually the prepended list is going to be much bigger than the appended, your best bet for an immutable data structure is Vector. If the lists are all really short, though, it's hard to beat just plain List; yes, you have to allocate three extra objects but the overhead for Vector is much more.
Keep in mind, though that you can prepend and reverse (with a List), and that there are mutable data structures that support an efficient append (e.g. ArrayBuffer, or perhaps ListBuffer if you want to convert to a list when you're done appending).
Appending is O(n), for constant time appending you can use a ListBuffer and call toList when you want the immutable list.