Mapping on slices of a List - scala

I was wondering what the best way to accomplish the following given a List:
val l = List("a","b","c","d","e","f","g","h","i","j","k","l","m" /*...,x,y,z*/)
For each 5 items (or less for the last segment) apply a function like:
...map(_.mkString(","))
Such that I end up with a List that looks like:
List("a,b,c,d,e","f,g,h,i,j","k,l,m,n,o",/*...,*/"u,v,w,x,y,"z")
Perhaps there is a common term for this type of list processing, however I'm not aware of it. Essentially I'm grouping items, so using zipWithIndex and then modding by 5 on the index to indicate where to partition?

You can use the grouped(n) method on the List.
val l = List("a","b","c","d","e","f","g","h","i","j","k","l","m")
l.grouped(5).map(_.mkString(",")).toList
Results in
List("a,b,c,d,e", "f,g,h,i,j", "k,l,m"): List[String]

Related

Scala immutable list internal implementation

Suppose I am having a huge list having elements from 1 to 1 million.
val initialList = List(1,2,3,.....1 million)
and
val myList = List(1,2,3)
Now when I apply an operation such as foldLeft on the myList giving initialList as the starting value such as
val output = myList.foldLeft(initialList)(_ :+ _)
// result ==>> List(1,2,3,.....1 million, 1 , 2 , 3)
Now my question comes here, both the lists being immutable the intermediate lists that were produced were
List(1,2,3,.....1 million, 1)
List(1,2,3,.....1 million, 1 , 2)
List(1,2,3,.....1 million, 1 , 2 , 3)
By the concept of immutability, every time a new list is being created and the old one being discarded. So isn't this operation a performance killer in scala as every time a new list of 1 million elements has to be copied to create a new list.
Please correct me if I am wrong as I am trying to understand the internal implementation of an immutable list.
Thanks in advance.
Yup, this is performance killer, but this is a cost of having immutable structures (which are amazing, safe and makes programs much less buggy). That's why you should often avoid appending list if you can. There is many tricks that can avoid this issue (try to use accumulators).
For example:
Instead of:
val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = myList.foldLeft(initialList)(_ :+ _)
You can write:
val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = List(initialList,myList).flatten
Flatten is implemented to copy first line only once instead of copying it for every single append.
P.S.
At least adding element to the front of list works fast (O(1)), cause sharing of old list is possible. Let's Look at this example:
You can see how memory sharing works for immutable linked lists. Computer only keeps one copy of (b,c,d) end. But if you want to append bar to the end of baz you cannot modify baz, cause you would destroy foo, bar and raz! That's why you have to copy first list.
Appending to a List is not a good idea because List has linear cost for appending. So, if you can
either prepend to the List (List have constant time prepend)
or choose another collection that is efficient for appending. That would be a Queue
For the list of performance characteristic per operation on most scala collections, See:
https://docs.scala-lang.org/overviews/collections/performance-characteristics.html
Note that, depending on your requirement, you may also make your own smarter collection, such as chain iterable for example

filtering only one side of a list/iterable in scala

I'd like to remove only the few last elements of a List (or Seq), and avoid parsing all elements (and avoid applying the filter function to all of them).
Let's say, for example, I have a random strictly increasing list of values:
import scala.util.Random.nextInt
val r = (1 to 100).map(_ => nextInt(10)+1).scanLeft(0)(_+_)
And I want to remove the elements greater than, say, 300. I can do this like that:
r.filter(_<300)
but this method parse the whole list. So, is it possible to filter a list only on one end? Something like a filterRight method?
subquestions:
Also, would it be possible for list of values that are not strictly increasing? i.e. remove the elements from the end of a list until one element is, say, below 300.
if it is not possible for a List/Seq, what about IndexedSeq like Vector or Array
Solutions
I selected #elm solution because it answers the subquestion for general list, not only (strictly) increasing ones.
However, the solution of #dcastro looks to be more efficient as it doesn't do 2 reverse
First note SI-4247 on dropWhile but no dropRightWhile.
Though, a simple implementation that conveys the desired semantics,
def dropRightWhile[A](xs: Seq[A], p: A => Boolean) =
xs.reverse.dropWhile(p).reverse
or equivalently
implicit class OpsSeq[A](val xs: Seq[A]) extends AnyVal {
def dropRightWhile(p: A => Boolean) = xs.reverse.dropWhile(p).reverse
}
You're looking for something like dropRightWhile, which doesn't exist in the standard library (but has been requested before).
I think your best bet is:
r.takeWhile(_<300)
Since it's an increasing list of values, you can stop performing any checks when you first encounter an element greater than 300

How to select the second smallest element from sorted list?

How can I select the second smallest element after that a list has been sorted?
With this code I get an error and I do not understand why.
object find_the_median {
val L = List(2,4,1,2,5,6,7,2)
L(2)
L.sorted(2) // FIXME returns an error
}
It's because sorted receives implicitly an Ordering argument, and when you do it like L.sorted(2) the typechecker thinks you want to pass 2 as an Ordering. So one way to do it in one line is:
L.sorted.apply(2)
or to avoid the apply pass the ordering explicitly:
L.sorted(implicitly[Ordering[Int]])(2)
which I admit is somewhat confussing so I think the best one is in two lines:
val sorted = L.sorted
sorted(2)
(You may also want to adhere to the Scala convention of naming variables with lowercase).

Lazily generate partial sums in Scala

I want to produce a lazy list of partial sums and stop when I have found a "suitable" sum. For example, I want to do something like the following:
val str = Stream.continually {
val i = Random.nextInt
println("generated " + i)
List(i)
}
str
.take(5)
.scanLeft(List[Int]())(_ ++ _)
.find(l => !l.forall(_ > 0))
This produces output like the following:
generated -354822103
generated 1841977627
z: Option[List[Int]] = Some(List(-354822103))
This is nice because I've avoided producing the entire list of lists before finding a suitable list. However, it's suboptimal because I generated one extra random number that I don't need (i.e., the second, positive number in this test run). I know I can hand code a solution to do what I want, but is there a way to use the core scala collection library to achieve this result without writing my own recursion?
The above example is just a toy, but the real application involves heavy-duty network traffic for each "retry" as I build up a map until the map is "complete".
EDIT: Note that even substituting take(1) for find(...) results in the generation of a random number even though the returned value List() does not depend on the number. Does anyone know why the number is being generated in this case? I would think scanLeft does not need to fetch an element of the iterable receiving the call to scanLeft in this case.

How to correctly get current loop count from a Iterator in scala

I am looping over the following lines from a csv file to parse them. I want to identify the first line since its the header. Whats the best way of doing this instead of making a var counter holder.
var counter = 0
for (line <- lines) {
println(CsvParser.parse(line, counter))
counter++
}
I know there is got to be a better way to do this, newbie to Scala.
Try zipWithIndex:
for (line <- lines.zipWithIndex) {
println(CsvParser.parse(line._1, line._2))
}
#tenshi suggested the following improvement with pattern matching:
for ((line, count) <- lines.zipWithIndex) {
println(CsvParser.parse(line, count))
}
I totally agree with the given answer, still that I've to point something important out and initially I planned to put in a simple comment.
But it would be quite long, so that, leave me set it as a variant answer.
It's prefectly true that zip* methods are helpful in order to create tables with lists, but they have the counterpart that they loop the lists in order to create it.
So that, a common recommendation is to sequence the actions required on the lists in a view, so that you combine all of them to be applied only producing a result will be required. Producing a result is considered when the returnable isn't an Iterable. So is foreach for instance.
Now, talking about the first answer, if you have lines to be the list of lines in a very big file (or even an enumeratee on it), zipWithIndex will go through all of 'em and produce a table (Iterable of tuples). Then the for-comprehension will go back again through the same amount of items.
Finally, you've impacted the running lenght by n, where n is the length of lines and added a memory footprint of m + n*16 (roughtly) where m is the lines' footprint.
Proposition
lines.view.zipWithIndex map Function.tupled(CsvParser.parse) foreach println
Some few words left (I promise), lines.view will create something like scala.collection.SeqView that will hold all further "mapping" function producing new Iterable, as are zipWithIndex and map.
Moreover, I think the expression is more elegant because it follows the reader and logical.
"For lines, create a view that will zip each item with its index, the result as to be mapped on the result of the parser which must be printed".
HTH.