Extract list to multiple distinct list - scala

How can I extract Scala list to List with multiple distinct list in Scala?
From
val l = List(1,2,6,3,5,4,4,3,4,1)
to
List(List(1,2,3,4,5,6),List(1,3,4),List(4))

Here's a (rather inefficient) way to do this: group by value, sort result by size of group, then use first group as a basis for per-index scan of the original groups to build the distinct lists:
scala> val l = List(1,2,6,3,5,4,4,3,4,1)
l: List[Int] = List(1, 2, 6, 3, 5, 4, 4, 3, 4, 1)
scala> val groups = l.groupBy(identity).values.toList.sortBy(- _.size)
groups: List[List[Int]] = List(List(4, 4, 4), List(1, 1), List(3, 3), List(5), List(6), List(2))
scala> groups.head.zipWithIndex.map { case (_, i) => groups.flatMap(_.drop(i).headOption) }
res9: List[List[Int]] = List(List(4, 1, 3, 5, 6, 2), List(4, 1, 3), List(4))

An alternative approach after grouping like in the first answer by #TzachZohar is to keep taking one element from each list until all lists are empty:
val groups = l.groupBy(identity).values
Iterator
// continue removing the first element from every sublist, and discard empty tails
.iterate(groups)(_ collect { case _ :: (rest # (_ :: _)) => rest } )
// stop when all sublists become empty and are removed
.takeWhile(_.nonEmpty)
// build and sort result lists
.map(_.map(_.head).toList.sorted)
.toList

And here's another option - scanning the input N times with N being the largest amount of repetitions of a single value:
// this function splits input list into two:
// all duplicate values, and the longest list of unique values
def collectDistinct[A](l: List[A]): (List[A], List[A]) = l.foldLeft((List[A](), List[A]())) {
case ((remaining, distinct), candidate) if distinct.contains(candidate) => (candidate :: remaining, distinct)
case ((remaining, distinct), candidate) => (remaining, candidate :: distinct)
}
// this recursive function takes a list of "remaining" values,
// and a list of distinct groups, and adds distinct groups to the list
// until "remaining" is empty
#tailrec
def distinctGroups[A](remaining: List[A], groups: List[List[A]]): List[List[A]] = remaining match {
case Nil => groups
case _ => collectDistinct(remaining) match {
case (next, group) => distinctGroups(next, group :: groups)
}
}
// all second function with our input and an empty list of groups to begin with:
val result = distinctGroups(l, List())

Consider this approach:
trait Proc {
def process(v:Int): Proc
}
case object Empty extends Proc {
override def process(v:Int) = Processor(v, Map(0 -> List(v)), 0)
}
case class Processor(prev:Int, map:Map[Int, List[Int]], lastTarget:Int) extends Proc {
override def process(v:Int) = {
val target = if (prev==v) lastTarget+1 else 0
Processor(v, map + (target -> (v::map.getOrElse(target, Nil))), target)
}
}
list.sorted.foldLeft[Proc](Empty) {
case (acc, item) => acc.process(item)
}
Here we have simple state machine. We iterate over sorted list with initial state 'Empty'. Once 'Empty' processes item, it produces next state 'Processor'.
Processor has previous value in 'prev' and accumulated map of already grouped items. It also has lastTarget - the index of list where last write occured.
The only thing 'Processor' does is calculating the target for current processing item: if it is the same as previous, it takes next index, otherwise it starts from the beginning with index 0.

Related

Looping through a list of tuples in Scala

I have a sample List as below
List[(String, Object)]
How can I loop through this list using for?
I want to do something like
for(str <- strlist)
but for the 2d list above. What would be placeholder for str?
Here it is,
scala> val fruits: List[(Int, String)] = List((1, "apple"), (2, "orange"))
fruits: List[(Int, String)] = List((1,apple), (2,orange))
scala>
scala> fruits.foreach {
| case (id, name) => {
| println(s"$id is $name")
| }
| }
1 is apple
2 is orange
Note: The expected type requires a one-argument function accepting a 2-Tuple.
Consider a pattern matching anonymous function, { case (id, name) => ... }
Easy to copy code:
val fruits: List[(Int, String)] = List((1, "apple"), (2, "orange"))
fruits.foreach {
case (id, name) => {
println(s"$id is $name")
}
}
With for you can extract the elements of the tuple,
for ( (s,o) <- list ) yield f(s,o)
I will suggest using map, filter,fold or foreach(whatever suits your need) rather than iterating over a collection using loop.
Edit 1:
e.g
if you want to apply some func foo(tuple) on each element
val newList=oldList.map(tuple=>foo(tuple))
val tupleStrings=tupleList.map(tuple=>tuple._1) //in your situation
if you want to filter according to some boolean condition
val newList=oldList.filter(tuple=>someCondition(tuple))
or simply if you want to print your List
oldList.foreach(tuple=>println(tuple)) //assuming tuple is printable
you can find example and similar functions here https://twitter.github.io/scala_school/collections.html
If you just want to get the strings you could map over your list of tuples like this:
// Just some example object
case class MyObj(i: Int = 0)
// Create a list of tuples like you have
val tuples = Seq(("a", new MyObj), ("b", new MyObj), ("c", new MyObj))
// Get the strings from the tuples
val strings = tuples.map(_._1)
// Output: Seq[String] = List(a, b, c)
Note: Tuple members are accessed using the underscore notation (which
is indexed from 1, not 0)

Sort tuples by first element reverse, second element regular

I have tuples of the form (Boolean, Int, String).
I want to define Ordering which sorts the tuples in the following order:
Boolean - reverse order
Int - reverse order
String - regular order
Example:
For the tuples: Array((false, 8, "zz"), (false,3, "bb"), (true, 5, "cc"),(false, 3,"dd")).
The ordering should be:
(true, 5, "cc")
(false, 8,"zz")
(false, 3, "bb")
(false, 3, "dd")
I couldn't find a way to define some of the ordering reverse and some regular.
The straight forward solution in this specific case is to use sortBy on the tuples, modified on the fly to "invert" the first and second elements so that in the end the ordering is reversed:
val a = Array((false, 8, "zz"), (false,3, "bb"), (true, 5, "cc"),(false, 3,"dd"))
a.sortBy{ case (x,y,z) => (!x, -y, z) }
For cases when you cannot easily "invert" a value (say that this is a reference object and you've got an opaque ordering on them), you can instead use
sorted and explicitly pass an ordering that is constructed to invert the order on the first and second elements (you can use Ordering.reverse to reverse an ordering):
val myOrdering: Ordering[(Boolean, Int, String)] = Ordering.Tuple3(Ordering.Boolean.reverse, Ordering.Int.reverse, Ordering.String)
a.sorted(myOrdering)
You could do something like this.
case class myTuple(t: (Boolean, Int, String)) extends Ordered[myTuple] {
def compare(that: myTuple):Int = {
val (x,y,z) =t
val (x1,y1,z1) = that.t
if (x.compare(x1) != 0) x.compare(x1)
else {
if (y.compare(y1) != 0) if (y.compare(y1) == 1) 0 else 1
else z.compareTo(z1)
}
}
}
val myList = Array((false, 8, "zz"), (false,3, "bb"), (true, 5, "cc"),(false, 3,"dd"))
implicit def tupleToBeordered(t: (Boolean, Int, String)) = new myTuple(t._1,t._2,t._3)
myList.sorted

Returning values from an inner loop in Scala, use a function instead?

I'm attempting to return a value from a inner loop. I could create a outside list
and populate it within the inner loop as suggested in comments below but this does
not feel very functional. Is there a function I can use to achieve this ?
The type of the loop/inner loop is currently Unit but I would like it to be of type List[Int] or some similar collection type.
val data = List(Seq(1, 2, 3, 4), Seq(1, 2, 3, 4))
//val list : List
for(d <- data){
for(d1 <- data){
//add the result to the val list defined above
distance(d , d1)
}
}
def distance(s1 : Seq[Int], s2 : Seq[Int]) = {
s1.zip(s2).map(t => t._1 + t._2).sum
}
val list = for (x <- data; y <- data) yield distance(x, y)
will do what you want, yielding:
List(20, 20, 20, 20)
The above desugared is equivalent to:
data.flatMap { x => data.map { y => distance(x, y) } }
The trick is to not nest for-comprehensions because that way you'll only ever get nested collections; to get a flat collection from a conceptually nested iteration, you need to make sure flatMap gets used.

Combining multiple Lists of arbitrary length

I am looking for an approach to join multiple Lists in the following manner:
ListA a b c
ListB 1 2 3 4
ListC + # * § %
..
..
..
Resulting List: a 1 + b 2 # c 3 * 4 § %
In Words: The elements in sequential order, starting at first list combined into the resulting list. An arbitrary amount of input lists could be there varying in length.
I used multiple approaches with variants of zip, sliding iterators but none worked and especially took care of varying list lengths. There has to be an elegant way in scala ;)
val lists = List(ListA, ListB, ListC)
lists.flatMap(_.zipWithIndex).sortBy(_._2).map(_._1)
It's pretty self-explanatory. It just zips each value with its position on its respective list, sorts by index, then pulls the values back out.
Here's how I would do it:
class ListTests extends FunSuite {
test("The three lists from his example") {
val l1 = List("a", "b", "c")
val l2 = List(1, 2, 3, 4)
val l3 = List("+", "#", "*", "§", "%")
// All lists together
val l = List(l1, l2, l3)
// Max length of a list (to pad the shorter ones)
val maxLen = l.map(_.size).max
// Wrap the elements in Option and pad with None
val padded = l.map { list => list.map(Some(_)) ++ Stream.continually(None).take(maxLen - list.size) }
// Transpose
val trans = padded.transpose
// Flatten the lists then flatten the options
val result = trans.flatten.flatten
// Viola
assert(List("a", 1, "+", "b", 2, "#", "c", 3, "*", 4, "§", "%") === result)
}
}
Here's an imperative solution if efficiency is paramount:
def combine[T](xss: List[List[T]]): List[T] = {
val b = List.newBuilder[T]
var its = xss.map(_.iterator)
while (!its.isEmpty) {
its = its.filter(_.hasNext)
its.foreach(b += _.next)
}
b.result
}
You can use padTo, transpose, and flatten to good effect here:
lists.map(_.map(Some(_)).padTo(lists.map(_.length).max, None)).transpose.flatten.flatten
Here's a small recursive solution.
def flatList(lists: List[List[Any]]) = {
def loop(output: List[Any], xss: List[List[Any]]): List[Any] = (xss collect { case x :: xs => x }) match {
case Nil => output
case heads => loop(output ::: heads, xss.collect({ case x :: xs => xs }))
}
loop(List[Any](), lists)
}
And here is a simple streams approach which can cope with an arbitrary sequence of sequences, each of potentially infinite length.
def flatSeqs[A](ssa: Seq[Seq[A]]): Stream[A] = {
def seqs(xss: Seq[Seq[A]]): Stream[Seq[A]] = xss collect { case xs if !xs.isEmpty => xs } match {
case Nil => Stream.empty
case heads => heads #:: seqs(xss collect { case xs if !xs.isEmpty => xs.tail })
}
seqs(ssa).flatten
}
Here's something short but not exceedingly efficient:
def heads[A](xss: List[List[A]]) = xss.map(_.splitAt(1)).unzip
def interleave[A](xss: List[List[A]]) = Iterator.
iterate(heads(xss)){ case (_, tails) => heads(tails) }.
map(_._1.flatten).
takeWhile(! _.isEmpty).
flatten.toList
Here's a recursive solution that's O(n). The accepted solution (using sort) is O(nlog(n)). Some testing I've done suggests the second solution using transpose is also O(nlog(n)) due to the implementation of transpose. The use of reverse below looks suspicious (since it's an O(n) operation itself) but convince yourself that it either can't be called too often or on too-large lists.
def intercalate[T](lists: List[List[T]]) : List[T] = {
def intercalateHelper(newLists: List[List[T]], oldLists: List[List[T]], merged: List[T]): List[T] = {
(newLists, oldLists) match {
case (Nil, Nil) => merged
case (Nil, zss) => intercalateHelper(zss.reverse, Nil, merged)
case (Nil::xss, zss) => intercalateHelper(xss, zss, merged)
case ( (y::ys)::xss, zss) => intercalateHelper(xss, ys::zss, y::merged)
}
}
intercalateHelper(lists, List.empty, List.empty).reverse
}

How to use takeWhile with an Iterator in Scala

I have a Iterator of elements and I want to consume them until a condition is met in the next element, like:
val it = List(1,1,1,1,2,2,2).iterator
val res1 = it.takeWhile( _ == 1).toList
val res2 = it.takeWhile(_ == 2).toList
res1 gives an expected List(1,1,1,1) but res2 returns List(2,2) because iterator had to check the element in position 4.
I know that the list will be ordered so there is no point in traversing the whole list like partition does. I like to finish as soon as the condition is not met. Is there any clever way to do this with Iterators? I can not do a toList to the iterator because it comes from a very big file.
The simplest solution I found:
val it = List(1,1,1,1,2,2,2).iterator
val (r1, it2) = it.span( _ == 1)
println(s"group taken is: ${r1.toList}\n rest is: ${it2.toList}")
output:
group taken is: List(1, 1, 1, 1)
rest is: List(2, 2, 2)
Very short but further you have to use new iterator.
With any immutable collection it would be similar:
use takeWhile when you want only some prefix of collection,
use span when you need rest also.
With my other answer (which I've left separate as they are largely unrelated), I think you can implement groupWhen on Iterator as follows:
def groupWhen[A](itr: Iterator[A])(p: (A, A) => Boolean): Iterator[List[A]] = {
#annotation.tailrec
def groupWhen0(acc: Iterator[List[A]], itr: Iterator[A])(p: (A, A) => Boolean): Iterator[List[A]] = {
val (dup1, dup2) = itr.duplicate
val pref = ((dup1.sliding(2) takeWhile { case Seq(a1, a2) => p(a1, a2) }).zipWithIndex collect {
case (seq, 0) => seq
case (Seq(_, a), _) => Seq(a)
}).flatten.toList
val newAcc = if (pref.isEmpty) acc else acc ++ Iterator(pref)
if (dup2.nonEmpty)
groupWhen0(newAcc, dup2 drop (pref.length max 1))(p)
else newAcc
}
groupWhen0(Iterator.empty, itr)(p)
}
When I run it on an example:
println( groupWhen(List(1,1,1,1,3,4,3,2,2,2).iterator)(_ == _).toList )
I get List(List(1, 1, 1, 1), List(2, 2, 2))
I had a similar need, but the solution from #oxbow_lakes does not take into account the situation when the list has only one element, or even if the list contains elements that are not repeated. Also, that solution doesn't lend itself well to an infinite iterator (it wants to "see" all the elements before it gives you a result).
What I needed was the ability to group sequential elements that match a predicate, but also include the single elements (I can always filter them out if I don't need them). I needed those groups to be delivered continuously, without having to wait for the original iterator to be completely consumed before they are produced.
I came up with the following approach which works for my needs, and thought I should share:
implicit class IteratorEx[+A](itr: Iterator[A]) {
def groupWhen(p: (A, A) => Boolean): Iterator[List[A]] = new AbstractIterator[List[A]] {
val (it1, it2) = itr.duplicate
val ritr = new RewindableIterator(it1, 1)
override def hasNext = it2.hasNext
override def next() = {
val count = (ritr.rewind().sliding(2) takeWhile {
case Seq(a1, a2) => p(a1, a2)
case _ => false
}).length
(it2 take (count + 1)).toList
}
}
}
The above is using a few helper classes:
abstract class AbstractIterator[A] extends Iterator[A]
/**
* Wraps a given iterator to add the ability to remember the last 'remember' values
* From any position the iterator can be rewound (can go back) at most 'remember' values,
* such that when calling 'next()' the memoized values will be provided as if they have not
* been iterated over before.
*/
class RewindableIterator[A](it: Iterator[A], remember: Int) extends Iterator[A] {
private var memory = List.empty[A]
private var memoryIndex = 0
override def next() = {
if (memoryIndex < memory.length) {
val next = memory(memoryIndex)
memoryIndex += 1
next
} else {
val next = it.next()
memory = memory :+ next
if (memory.length > remember)
memory = memory drop 1
memoryIndex = memory.length
next
}
}
def canRewind(n: Int) = memoryIndex - n >= 0
def rewind(n: Int) = {
require(memoryIndex - n >= 0, "Attempted to rewind past 'remember' limit")
memoryIndex -= n
this
}
def rewind() = {
memoryIndex = 0
this
}
override def hasNext = it.hasNext
}
Example use:
List(1,2,2,3,3,3,4,5,5).iterator.groupWhen(_ == _).toList
gives: List(List(1), List(2, 2), List(3, 3, 3), List(4), List(5, 5))
If you want to filter out the single elements, just apply a filter or withFilter after groupWhen
Stream.continually(Random.nextInt(100)).iterator
.groupWhen(_ + _ == 100).withFilter(_.length > 1).take(3).toList
gives: List(List(34, 66), List(87, 13), List(97, 3))
You could use method toStream on Iterator.
Stream is a lazy equivalent of List.
As you can see from implementation of toStream it creates a Stream without traversing the whole Iterator.
Stream keeps all element in memory. You should localize usage of link to Stream in some local scope to prevent memory leaking.
With Stream you should use span like this:
val (res1, rest1) = stream.span(_ == 1)
val (res2, rest2) = rest1.span(_ == 2)
I'm guessing a bit here but by the statement "until a condition is met in the next element", it sounds like you might want to look at the groupWhen method on ListOps in scalaz
scala> import scalaz.syntax.std.list._
import scalaz.syntax.std.list._
scala> List(1,1,1,1,2,2,2) groupWhen (_ == _)
res1: List[List[Int]] = List(List(1, 1, 1, 1), List(2, 2, 2))
Basically this "chunks" up the input sequence upon a condition (a (A, A) => Boolean) being met between an element and its successor. In the example above the condition is equality, so, as long as an element is equal to its successor, they will be in the same chunk.