join two sequences in scala [closed] - scala

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
This is rewritten question. Previous one is unclear.
1) Introduction
I need some cool structure, that would store my values. I need that my cool structure can add items to it, and sometimes I need that my cool structure can fold containing elements into something based on passed foldFunction . Generally something like scala.collection.immutable.List is great, besides point 2.
See this in action:
val coolContainer = 1 :: 2 :: 3 :: 4 :: Nil
val folded = coolContainer.foldLeft("")((acc, curr)=> acc + curr)
Yea! I got what I wanted - the shiny "1234" String :)
2) Additional requirements
Then I want that my cool structure can be appended to another one. More over this is very important that such operation must be efficient because my program will mostly do such operations. Because of that, the "appending of two containers" algorithm with complexity O(1) is very expected.
So, lets just try to append two Lists, one to another using standard way with the ++ function.
val locomotive = (1 to 1000000).toList
val vehicle = (1000000 to 2000000).toList
val train = locomotive ++ vehicle
So what is going here? As most of you probably know, the List in Scala Standard Library API is implemented as head prepended to tail where tail is another List. Finally the last element in list is prepended to empty List Nil.
This architecture implies that if you join two lists using the ++ function, algoirithm under the hood will traverse locomotive through all elements until the last item and then replace the tail (which is the Nil) with the value of another List - with the vehicle in our example. The complexity is O(size_of_loclomotive). Ehh, :/ I want it to be O(1).
Finally the question.
Is there in scala container, that behaves similarly to the List and meets the above requirements?
Deprecated, old question. Just in case if you are just interested how it was reasked and don't.
Basically I want to choose the best pair: structure and method for appending one sequence to another. The efficiency is the key in this scenario.
Read below snipped to understand what I want to achieve but more efficient:
val seq1 = List(1,2,3)
val seq2 = List(4,5,6)
val seq3 = seq1 ++ seq2 //I am afraid that seq1 and seq2 will be traversed
//what is not the most efficient way to join two seqs

If you're going to be appending over and over, where eventually the prepended list is going to be much bigger than the appended, your best bet for an immutable data structure is Vector. If the lists are all really short, though, it's hard to beat just plain List; yes, you have to allocate three extra objects but the overhead for Vector is much more.
Keep in mind, though that you can prepend and reverse (with a List), and that there are mutable data structures that support an efficient append (e.g. ArrayBuffer, or perhaps ListBuffer if you want to convert to a list when you're done appending).

Appending is O(n), for constant time appending you can use a ListBuffer and call toList when you want the immutable list.

Related

How can i count occurrences in a list using this function in scala [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 months ago.
Improve this question
I've found multiple ways of doing this already but haven't been able to apply those here since i'm not familiar with the language yet, so maybe a more direct question could help
def occurrences(cars: List[Char]): List [(Char, Int)] = {
. . .
}
Thanks in advance!
If you are not familiar with the language, this answer will probably not help you very much ... but then, it's not very likely that another will. My advice to you would be to grab a book and learn the basics of the language first, before jumping into asking other people to solve random made up problems for you.
As to counting characters in a string, it would be something like this:
chars.groupBy(identity).mapValues(_.size).toList
As both answers here show, there are plenty of ways to solve this using the stdlib; as I always say the Scaladoc is your friend.
Also, as both answers suggest, it is recommended that you learn the language.
However, I wanted to show another cool way of solving this problem using cats.
import cats.syntax.all._
chars.foldMap(char => Map(char -> 1))
foldMap is a function added by cats to the List[Char] that as the name suggests does mapping and folding in the same step.
The folding is done using the default combine operation of the type returned by the map. And the default combine operation of a Map is to join both maps and combine values with matching keys using their default combine operation; and for ints that is sum.
First of all, cars is not a good name for a list of characters :) (I know, just kidding). You should definitely get more into the language and read more about some of the most used functions, like what they are, what they do and why they exist, but just to answer this specific question, you can do this:
Approach No.1:
def occurrences(chars: List[Char]): List[(Char, Int)] =
chars.foldLeft(Map.empty[Char, Int])(
(occurrencesMap, newChar) => occurrencesMap.updatedWith(newChar) {
case Some(count) => Some(count + 1)
case None => Some(1)
}
).toList
Explanation:
Basically we're just iterating over the list, and we want to aggregate the data in a structure, where a character is mapped to its number of usages, so we use a Map[Char, Int], and during the iteration, if the aggregator (occurrencesMap in this case) contains the char, we increment it and otherwise we initiate it with 1.
Approach No.2:
I can also suggest this one-line approach, but I think the first approach is more beginner-friendly or understandable in some senses:
def occurrences(chars: List[Char]): List[(Char, Int)] =
chars.groupMapReduce[Char, Int](identity)(_ => 1)(_ + _).toList
Explanation:
As the function name declares groupMapReduce first groups the values inside the list by a function, then maps each key occurrence into a value, and then reduce the resulted values for each key, into a single item, so we use identity to group the items inside the list, by the value of the character itself, and if a character is found, we basically just found 1 character right? so ignore the value of the character, and just return 1 :D _ => 1, and at the end, you would have something like 'c' -> List(1, 1, 1), so let's just add them all together, or use .length to count the occurrences.

In scala, is there an alternative to using the :: to append to front or end of a list?

If a have a val x = List(2,3,5,8) and I want to append element 4 to the list, x::a or a::x work as expected. But is there an alternative to this notation?
If I understood your question correctly, we have:
val x = List(2,3,5,8)
val a = 4
and you wish to append (in immutable terms) a to x.
a::x works but will return a list with 4 prepended, so not what you asked for. x::a will not work at all because, well, you can't really prepend a list to an integer.
What you can do, for example, is use the :+ method:
x :+ a // Returns List(2, 3, 5, 8, 4)
Notice however that appending to a List requires linear time and may therefore be a bad idea, depending on your particular application. Consider using a different data structure if the performance of this operation is important. More information here.

Scala immutable list internal implementation

Suppose I am having a huge list having elements from 1 to 1 million.
val initialList = List(1,2,3,.....1 million)
and
val myList = List(1,2,3)
Now when I apply an operation such as foldLeft on the myList giving initialList as the starting value such as
val output = myList.foldLeft(initialList)(_ :+ _)
// result ==>> List(1,2,3,.....1 million, 1 , 2 , 3)
Now my question comes here, both the lists being immutable the intermediate lists that were produced were
List(1,2,3,.....1 million, 1)
List(1,2,3,.....1 million, 1 , 2)
List(1,2,3,.....1 million, 1 , 2 , 3)
By the concept of immutability, every time a new list is being created and the old one being discarded. So isn't this operation a performance killer in scala as every time a new list of 1 million elements has to be copied to create a new list.
Please correct me if I am wrong as I am trying to understand the internal implementation of an immutable list.
Thanks in advance.
Yup, this is performance killer, but this is a cost of having immutable structures (which are amazing, safe and makes programs much less buggy). That's why you should often avoid appending list if you can. There is many tricks that can avoid this issue (try to use accumulators).
For example:
Instead of:
val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = myList.foldLeft(initialList)(_ :+ _)
You can write:
val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = List(initialList,myList).flatten
Flatten is implemented to copy first line only once instead of copying it for every single append.
P.S.
At least adding element to the front of list works fast (O(1)), cause sharing of old list is possible. Let's Look at this example:
You can see how memory sharing works for immutable linked lists. Computer only keeps one copy of (b,c,d) end. But if you want to append bar to the end of baz you cannot modify baz, cause you would destroy foo, bar and raz! That's why you have to copy first list.
Appending to a List is not a good idea because List has linear cost for appending. So, if you can
either prepend to the List (List have constant time prepend)
or choose another collection that is efficient for appending. That would be a Queue
For the list of performance characteristic per operation on most scala collections, See:
https://docs.scala-lang.org/overviews/collections/performance-characteristics.html
Note that, depending on your requirement, you may also make your own smarter collection, such as chain iterable for example

Immutable DataStructures In Scala

We know that Scala supports immutable data structures..i.e each time u update the list it will create a new object and reference in the heap.
Example
val xs:List[Int] = List.apply(22)
val newList = xs ++ (33)
So when i append the second element to a list it will create a new list which will contain both 22 and 33.This exactly works like how immutable String works in Java.
So the question is each time I append a element in the list a new object will be created each time..This ldoes not look efficient to me.
is there some special data structures like persistent data structures are used when dealing with this..Does anyone know about this?
Appending to a list has O(n) complexity and is inefficient. A general approach is to prepend to a list while building it, and finally reverse it.
Now, your question on creating new object still applies to the prepend. Note that since xs is immutable, newList just points to xs for the rest of the data after the prepend.
While #manojlds is correct in his analysis, the original post asked about the efficiency of duplicating list nodes whenever you do an operation.
As #manojlds said, constructing lists often require thinking backwards, i.e., building a list and then reversing it. There are a number of other situations where list building requires "needless" copying.
To that end, there is a mutable data structure available in Scala called ListBuffer which you can use to build up your list and then extract the result as an immutable list:
val xsa = ListBuffer[Int](22)
xsa += 33
val newList = xsa.toList
However, the fact that the list data structure is, in general, immutable means that you have some very useful tools to analyze, de-compose and re-compose the list. Many builtin operations take advantage of the immutability. By extension, your own programs can also take advantage of this immutability.

scala.collection.breakOut vs views

This SO answer describes how scala.collection.breakOut can be used to prevent creating wasteful intermediate collections. For example, here we create an intermediate Seq[(String,String)]:
val m = List("A", "B", "C").map(x => x -> x).toMap
By using breakOut we can prevent the creation of this intermediate Seq:
val m: Map[String,String] = List("A", "B", "C").map(x => x -> x)(breakOut)
Views solve the same problem and in addition access elements lazily:
val m = (List("A", "B", "C").view map (x => x -> x)).toMap
I am assuming the creation of the View wrappers is fairly cheap, so my question is: Is there any real reason to use breakOut over Views?
You're going to make a trip from England to France.
With view: you're taking a set of notes in your notebook and boom, once you've called .force() you start making all of them: buy a ticket, board on the plane, ....
With breakOut: you're departing and boom, you in the Paris looking at the Eiffel tower. You don't remember how exactly you've arrived there, but you did this trip actually, just didn't make any memories.
Bad analogy, but I hope this give you a taste of what is the difference between them.
I don't think views and breakOut are identical.
A breakOut is a CanBuildFrom implementation used to simplify transformation operations by eliminating intermediary steps. E.g get from A to B without the intermediary collection. A breakOut means letting Scala choose the appropriate builder object for maximum efficiency of producing new items in a given scenario. More details here.
views deal with a different type of efficiency, the main sale pitch being: "No more new objects". Views store light references to objects to tackle different usage scenarios: lazy access etc.
Bottom line:
If you map on a view you may still get an intermediary collection of references created before the expected result can be produced. You could still have superior performance from:
collection.view.map(somefn)(breakOut)
Than from:
collection.view.map(someFn)
As of Scala 2.13, this is no longer a concern. Breakout has been removed and views are the recommended replacement.
Scala 2.13 Collections Rework
Views are also the recommended replacement for collection.breakOut.
For example,
val s: Seq[Int] = ...
val set: Set[String] = s.map(_.toString)(collection.breakOut)
can be expressed with the same performance characteristics as:
val s: Seq[Int] = ...
val set = s.view.map(_.toString).to(Set)
What flavian said.
One use case for views is to conserve memory. For example, if you had a million-character-long string original, and needed to use, one by one, all of the million suffixes of that string, you might use a collection of
val v = original.view
val suffixes = v.tails
views on the original string. Then you might loop over the suffixes one by one, using suffix.force() to convert them back to strings within the loop, thus only holding one in memory at a time. Of course, you could do the same thing by iterating with your own loop over the indices of the original string, rather than creating any kind of collection of the suffixes.
Another use-case is when creation of the derived objects is expensive, you need them in a collection (say, as values in a map), but you only will access a few, and you don't know which ones.
If you really have a case where picking between them makes sense, prefer breakOut unless there's a good argument for using view (like those above).
Views require more code changes and care than breakOut, in that you need to add force() where needed. Depending on context, failure to do so is
often only detected at run-time. With breakOut, generally if it
compiles, it's right.
In cases where view does not apply, breakOut
will be faster, since view generation and forcing is skipped.
If you use a debugger, you can inspect the collection contents, which you
can't meaningfully do with a collection of views.