How to write Scala recursion with for/yield? - scala

I have the following list of pairs (key,id):
val pairs = List(('a',1), ('a',2), ('b',1), ('b',2))
I need to generate all combinations of pairs when the keys are different
so the expected output is:
List(
List(),
List(('a', 1)),
List(('a', 2)),
List(('b', 1)),
List(('a', 1), ('b', 1)),
List(('a', 2), ('b', 1)),
List(('b', 2)),
List(('a', 1), ('b', 2)),
List(('a', 2), ('b', 2))
)
Note (List(('a',1),('a',2)) should Not be part of the output so using Scala List.combinations is not an option
I currently have the following code:
def subSeq (xs: List[(Char, Int)]): List[(Char,Int)] = {
xs match {
case Nil => List()
case y::ys => {
val eh = xs.filter (c => c._1 == y._1)
val et = xs.filter (c => c._1 != y._1)
for (z: (Char,Int) <- eh) yield z :: subSeq(et)
}
}
}
But I get an error saying List[List[(Char,Int)]] does not match List[(Char,Int)]

What you are probably trying to do is this:
def subSeq (xs: List[(Char, Int)]): List[List[(Char,Int)]] = {
xs match {
case Nil => List(List())
case y::ys => {
val eh: List[(Char, Int)] = xs.filter (c => c._1 == y._1)
val et = xs.filter (c => c._1 != y._1)
val t = subSeq(et)
t ++ (for {
z: (Char,Int) <- eh
foo <- t
} yield z :: foo)
}
}
}
Your method has to return a List of Lists as that is what you are interested in. So when building up your combinations you have to iterate over the result from your recursion step.
A way using API functions would be to do it like this:
val sets = (0 to 2).flatMap{pairs.combinations}.toSet
sets.map{_.toMap}
Or this if you need the output as lists:
sets.map{_.toMap.toList}.toList
Obviously this will build more combinations than you need at first and then filter stuff out. If performance is an issue and the input does not contain any redundancies, the direct implementation is probably better.

eventually I used the combinations function from Scala and filtered out the non relevant matches by this filter function
def filterDup(xs: List[(Char,Int)]) : Boolean = {
xs.map(x => x._1).size == xs.map(x => x._1).toSet.size
}
and used it as followes:
((0 to 3).flatMap(pairs.combinations(_)) filter( filterDup(_))).toList

Related

Count consecutive characters in a list?

I want to create a scala method that counts the number of consecutive characters where the values are the same. So I have this list:
List('a','a','b')
and I want to return something like List(('a', 2), 'b', 1) - because there two characters with the same values beside each other. I've had a bash at this with little success:
def recursivelyCompressList(list: List[(Char, Int)], newString: List[(Char, Int)]): List[(Char, Int)] = {
list match {
case Nil => newString
case s :: tail => {
if (tail.nonEmpty && s._1 == tail.head._1) {
recursivelyCompressList(tail, newString :+ (s._1, s._2 + 1))
} else {
recursivelyCompressList(tail, newString :+ s)
}
}
case _ => newString
}
}
Grateful for any guidance.
This should work.
I would expect the code to be self-explaining, but if you have any question, do not doubt to ask.
def compressList[T](list: List[T]): List[(T, Int)] = {
#annotation.tailrec
def loop(remaining: List[T], currentValue: T, currentCount: Int, acc: List[(T, Int)]): List[(T, Int)] =
remaining match {
case Nil =>
((currentValue -> currentCount) :: acc).reverse
case t :: tail =>
if (t == currentValue)
loop(
remaining = tail,
currentValue,
currentCount + 1,
acc
)
else
loop(
remaining = tail,
currentValue = t,
currentCount = 1,
(currentValue -> currentCount) :: acc
)
}
list match {
case Nil =>
Nil
case t :: tail =>
loop(
remaining = tail,
currentValue = t,
currentCount = 1,
acc = List.empty
)
}
}
Which you can use like this:
compressList(List.empty[Char])
// res: List[(Char, Int)] = List()
compressList(List('a', 'b'))
// res: List[(Char, Int)] = List(('a', 1), ('b', 1))
compressList(List('a', 'a', 'b'))
// res: List[(Char, Int)] = List(('a', 2), ('b', 1))
compressList(List('a', 'a', 'b', 'b', 'b', 'a', 'c'))
// res: List[(Char, Int)] = List(('a', 2), ('b', 3), ('a', 1), ('c', 1))
can also use span instead of dropWhile and takeWhile to avoid double scan
def comp(xs:List[Char]):List[(Char,Int)] =
if(xs.isEmpty) Nil
else {
val h = xs.head
val (m,r) = xs.span(_ == h)
(h, m.length) :: comp(r)
}
Use takeWhile and dropWhile
Count consecutive
def count(xs: List[Char]): List[(Char, Int)] =
if (xs.isEmpty) Nil else
(xs.head, xs.takeWhile(_ == xs.head).length) :: count(xs.dropWhile(_ == xs.head))
Scala REPL
scala> def count(xs: List[Char]): List[(Char, Int)] =
| if (xs.isEmpty) Nil else
| (xs.head, xs.takeWhile(_ == xs.head).length) :: count(xs.dropWhile(_ == xs.head))
count: (xs: List[Char])List[(Char, Int)]
scala> count(List('a', 'a', 'b', 'c', 'c', 'c'))
res0: List[(Char, Int)] = List((a,2), (b,1), (c,3))
scala> count(List('a', 'a', 'b', 'a', 'c', 'c', 'c'))
res1: List[(Char, Int)] = List((a,2), (b,1), (a,1), (c,3))
You have not specified the behavior you want if there are multiple repeating sequences of the same character. Assuming you only want the longest repeating sequence, the following code would be a good starting point:
def rec(list : List[Char]) : Map[Char, Int] = {
#scala.annotation.tailrec
def helper(prev: Char, li : List[Char], len : Int, result : Map[Char, Int]) : Map[Char,Int] = {
if(li.isEmpty) {
if(!result.contains(prev)) result + (prev -> len)
else if(result(prev) < len) result + (prev -> len)
else result
}
else {
val cur = li.head
if(cur != prev) {
if(result.contains(prev)) {
if(result(prev) < len)
helper(li.head, li.tail, 1, result + (prev -> len))
else
helper(li.head, li.tail, 1, result)
} else {
helper(li.head, li.tail, 1, result + (prev -> len))
}
} else {
helper(li.head, li.tail, len + 1, result)
}
}
}
helper('\0', list, 0, Map.empty[Char, Int]) - '\0'
}
Running
rec(List('c', 'a', 'a', 'a', 'c', 'd' ,'c', 'c' , 'a', 'a', 'd', 'd', 'd', 'd','c','c','a', 'c','c','c'))
Output:
res0: Map[Char,Int] = Map(c -> 3, a -> 3, d -> 4)
The idea is just to look at the current character in the list and the previous character. When the character changes, the sequence count is stopped and current length is compared to what is stored in the map. It's pretty simple when you come to think about it.
I think this can be written more elegantly. But it could be a good starting point.

Transforming one record into multiple records

If the format of the input is
(x1,(a,b,c,List(key1, key2))
(x2,(a,b,c,List(key3))
and I would like to achieve this output
(key1,(a,b,c,x1))
(key2,(a,b,c,x1))
(key3,(a,b,c,x2))
Here is the code:
var hashtags = joined_d.map(x => (x._1, (x._2._1._1, x._2._2, x._2._1._4, getHashTags(x._2._1._4))))
var hashtags_keys = hashtags.map(x => if(x._2._4.size == 0) (x._1, (x._2._1, x._2._2, x._2._3, 0)) else
x._2._4.map(y => (y, (x._2._1, x._2._2, x._2._3, 1))))
The function getHashTags() returns a list. If the list is not empty, we want to use each elements in the list as the new key. How should i work around this issue?
With rdd created as:
val rdd = sc.parallelize(
Seq(
("x1",("a","b","c",List("key1", "key2"))),
("x2", ("a", "b", "c", List("key3")))
)
)
You can use flatMap like this:
rdd.flatMap{ case (x, (a, b, c, list)) => list.map(k => (k, (a, b, c, x))) }.collect
// res12: Array[(String, (String, String, String, String))] =
// Array((key1,(a,b,c,x1)),
// (key2,(a,b,c,x1)),
// (key3,(a,b,c,x2)))
Here's one way to do it:
val rdd = sc.parallelize(Seq(
("x1", ("a", "b", "c", List("key1", "key2"))),
("x2", ("a", "b", "c", List("key3")))
))
val rdd2 = rdd.flatMap{
case (x, (a, b, c, l)) => l.map( (_, (a, b, c, x) ) )
}
rdd2.collect
// res1: Array[(String, (String, String, String, String))] = Array((key1,(a,b,c,x1)), (key2,(a,b,c,x1)), (key3,(a,b,c,x2)))

removing the some in left join RDD in spark

I'm running a left join in a Spark RDD but sometimes I get an output like this:
(k, (v, Some(w)))
or
(k, (v, None))
how do I make it so it give me back just
(k, (v, (w)))
or
(k, (v, ()))
here is how I'm combining 2 files..
def formatMap3(
left: String = "", right: String = "")(m: String = "") = {
val items = m.map{k => {
s"$k"}}
s"$left$items$right"
}
val combPrdGrp = custPrdGrp3.leftOuterJoin(cmpgnPrdGrp3)
val combPrdGrp2 = combPrdGrp.groupByKey
val combPrdGrp3 = combPrdGrp2.map { case (n, list) =>
val formattedPairs = list.map { case (a, b) => s"$a $b" }
s"$n ${formattedPairs.mkString}"
}
If you're just interesting in getting formatted output without the Somes/Nones, then something like this should work:
val combPrdGrp3 = combPrdGrp2.map { case (n, list) =>
val formattedPairs = list.map {
case (a, Some(b)) => s"$a $b"
case (a, None) => s"$a, ()"
}
s"$n ${formattedPairs.mkString}"
}
If you have other uses in mind then you probably need to provide more details.
The leftOuterJoin() function in Spark returns the tuples containing the join key, the left set's value and an Option of the right set's value. To extract from the Option class, simply call getOrElse() on the right set's value in the resultant RDD. As an example:
scala> val rdd1 = sc.parallelize(Array(("k1", 4), ("k4", 7), ("k8", 10), ("k6", 1), ("k7", 4)))
rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[13] at parallelize at <console>:21
scala> val rdd2 = sc.parallelize(Array(("k5", 4), ("k4", 3), ("k0", 2), ("k6", 5), ("k1", 6)))
rdd2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[14] at parallelize at <console>:21
scala> val rdd_join = rdd1.leftOuterJoin(rdd2).map { case (a, (b, c: Option[Int])) => (a, (b, (c.getOrElse()))) }
rdd_join: org.apache.spark.rdd.RDD[(String, (Int, AnyVal))] = MapPartitionsRDD[18] at map at <console>:25'
scala> rdd_join.take(5).foreach(println)
...
(k4,(7,3))
(k6,(1,5))
(k7,(4,()))
(k8,(10,()))
(k1,(4,6))

How to split an iterator based on a condition on prev and curr elements?

I want to split a list of elements into a list of lists such that neighboring elements in the inner list satisfy a given condition.
A simple condition would be neighboring elements are equal. Then if the input is List(1,1,1,2,2,3,3,3,3) output is List(List(1,1,1),List(2,2),List(3,3,3)).
Another condition could be current element should be greater than prev element. Then if the input is List(1,2,3,1,4,6,5,7,8), the output is List(List(1,2,3), List(1,4,6), List(5,7,8)). It would also be wonderful if the method can act on Iterator. The typedef of the method is
def method[A](lst:List[A], cond:(A,A)=>Boolean):List[List[A]]
def method[A](lst:Iterator[A], cond:(A,A)=>Boolean):Iterator[Iterator[A]]
You can use sliding together with span in a recursive function for the desired effect. This quick and dirty version is less efficient, but terser than some of the alternative:
def method[A](lst: TraversableOnce[A], cond: (A, A) => Boolean): List[List[A]] = {
val iterable = lst.toIterable
iterable.headOption.toList.flatMap { head =>
val (next, rest) = iterable.sliding(2).filter(_.size == 2).span(x => cond(x.head, x.last))
(head :: next.toList.map(_.last)) :: method(rest.map(_.last), cond)
}
}
If you want to lazily execute the code, you can return an Iterator[List[A]] instead of List[List[A]]:
def method[A](lst: TraversableOnce[A], cond: (A, A) => Boolean): Iterator[List[A]] = {
val iterable = lst.toIterable
iterable.headOption.toIterator.flatMap { head =>
val (next, rest) = iterable.sliding(2).filter(_.size == 2).span(x => cond(x.head, x.last))
Iterator(head :: next.toList.map(_.last)) ++ method(rest.map(_.last), cond)
}
}
And you can verify that this is lazy:
val x = (Iterator.range(0, 10) ++ Iterator.range(3, 5) ++ Iterator.range(1, 3)).map(x => { println(x); x })
val iter = method(x, (x: Int, y: Int) => x < y) //Only prints 0-9, and then 3!
iter.take(2).toList //Prints more
iter.toList //Prints the rest
You can make it even lazier by returning an Iterator[Iterator[A]]:
def method[A](lst: TraversableOnce[A], cond: (A, A) => Boolean): Iterator[Iterator[A]] = {
val iterable = lst.toIterable
iterable.headOption.toIterator.flatMap { head =>
val (next, rest) = iterable.sliding(2).filter(_.size == 2).span(x => cond(x.head, x.last))
Iterator(Iterator(head) ++ next.toIterator.map(_.last)) ++ method(rest.map(_.last), cond)
}
}
As a relatively unrelated side note, when you have generic parameters of this form, you're better off using 2 parameter lists:
def method[A](lst: TraversableOnce[A])(cond: (A, A) => Boolean)
When you have 2 parameter lists like this, the type inference can be a little bit smarter:
//No need to specify parameter types on the anonymous function now!
method(List(1, 3, 2, 3, 4, 1, 8, 1))((x, y) => x < y).toList
//You can now even use underscore anonymous function notation!
method(List(1, 4, 2, 3, 4, 1, 8))(_ < _)
Here is something close (I believe) to what you are asking for. The only issue with this is that it always produces a List of Lists for the result as opposed to being based on the input type:
val iter = Iterator(1,1,2,2,2,3,3,3)
val list = List(4,5,5,5,5,6,6)
def same(a:Int,b:Int) = a == b
def gt(a:Int, b:Int) = b > a
println(groupByPred(iter, same))
println(groupByPred(list, gt))
def groupByPred[L <: TraversableOnce[T], T](trav:L, cond:(T,T) => Boolean):List[List[T]] = {
val (ret, inner) =
trav.foldLeft((List.empty[List[T]], List.empty[T])){
case ((acc, list), el) if list.isEmpty || cond(list.head, el) => (acc,el :: list)
case ((acc, list), el) => (list.reverse :: acc,el :: List.empty)
}
(inner.reverse :: ret).reverse
}
If you run that code, the output should be the following:
List(List(1, 1), List(2, 2, 2), List(3, 3, 3))
List(List(4, 5), List(5), List(5), List(5, 6), List(6))
Try this.
Puts the head of the list as the first element of the first element of the List of Lists. Then adds things to that first List if the condition holds. If it doesn't, starts a new List with the current entry as the first element.
Both the inner list and the outer are constructed in the wrong order. So reverse each element of the outer List (with map) and then reverse the outer list.
val xs = List(1, 1, 1, 2, 2, 3, 3, 3, 3)
val ys = List(1, 2, 3, 1, 4, 6, 5, 7, 8)
def method[A](lst: List[A], cond: (A, A) => Boolean): List[List[A]] = {
lst.tail.foldLeft(List(List(lst.head))) { (acc, e) =>
if (cond(acc.head.head, e))
(e :: acc.head) :: acc.tail
else List(e) :: acc
}.map(_.reverse).reverse
}
method(xs, { (a: Int, b: Int) => a == b })
//> res0: List[List[Int]] = List(List(1, 1, 1), List(2, 2), List(3, 3, 3, 3))
method(ys, { (a: Int, b: Int) => a < b })
//> res1: List[List[Int]] = List(List(1, 2, 3), List(1, 4, 6), List(5, 7, 8))
Iterator overload
def method[A](iter:Iterator[A], cond: (A, A) => Boolean): List[List[A]] = {
val h = iter.next
iter.foldLeft(List(List(h))) { (acc, e) =>
if (cond(acc.head.head, e))
(e :: acc.head) :: acc.tail
else List(e) :: acc
}.map(_.reverse).reverse
}
method(xs.toIterator, { (a: Int, b: Int) => a == b })
//> res0: List[List[Int]] = List(List(1, 1, 1), List(2, 2), List(3, 3, 3, 3))
method(ys.toIterator, { (a: Int, b: Int) => a < b })
//> res1: List[List[Int]] = List(List(1, 2, 3), List(1, 4, 6), List(5, 7, 8))
More generic version (hat-tip to #cmbaxter for some ideas here) that works with Lists, Iterators and anything that can be traversed once:
def method[A, T <: TraversableOnce[A]](trav: T, cond: (A, A) => Boolean)
: List[List[A]] = {
trav.foldLeft(List(List.empty[A])) { (acc, e) =>
if (acc.head.isEmpty || !cond(acc.head.head, e)) List(e) :: acc
else (e :: acc.head) :: acc.tail
}.map(_.reverse).reverse
}

Groupby like Python's itertools.groupby

In Python I'm able to group consecutive elements with the same key by using itertools.groupby:
>>> items = [(1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4)]
>>> import itertools
>>> list(key for key,it in itertools.groupby(items, lambda tup: tup[0]))
[1, 2, 3, 1]
Scala has groupBy as well, but it produces different result - a map pointing from key to all the values found in the iterable with the specified key (not the consecutive runs with the same key):
scala> val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
items: List[(Int, Int)] = List((1,2), (1,5), (1,3), (2,9), (3,7), (1,5), (1,4))
scala> items.groupBy {case (key, value) => key}
res0: scala.collection.immutable.Map[Int,List[(Int, Int)]] = Map(2 -> List((2,9)), 1 -> List((1,2), (1,5), (1,3), (1,5), (1,4)), 3 -> List((3,7)))
What is the most eloquent way of achieving the same as with Python itertools.groupby?
If you just want to throw out sequential duplicates, you can do something like this:
def unchain[A](items: Seq[A]) = if (items.isEmpty) items else {
items.head +: (items zip items.drop(1)).collect{ case (l,r) if r != l => r }
}
That is, just compare the list to a version of itself shifted by one place, and only keep the items which are different. It's easy to add a (same: (a1: A, a2: A) => Boolean) parameter to the method and use !same(l,r) if you want custom behavior for what counts as the same (e.g. do it just by key).
If you want to keep the duplicates, you can use Scala's groupBy to get a very compact (but inefficient) solution:
def groupSequential(items: Seq[A])(same: (a1: A, a2: A) => Boolean) = {
val ns = (items zip items.drop(1)).
scanLeft(0){ (n,cc) => if (same(cc._1, cc._2)) n+1 else n }
(ns zip items).groupBy(_._1).toSeq.sortBy(_._1).map(_._2)
}
Using List.span, like this
def keyMultiSpan(l: List[(Int,Int)]): List[List[(Int,Int)]] = l match {
case Nil => List()
case h :: t =>
val ms = l.span(_._1 == h._1)
ms._1 :: keyMultiSpan(ms._2)
}
Hence let
val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
and so
keyMultiSpan(items).map { _.head._1 }
res: List(1, 2, 3, 1)
Update
A more readable syntax, as suggested by #Paul, an implicit class for possibly neater usage, and type parameterisation for generality,
implicit class RichSpan[A,B](val l: List[(A,B)]) extends AnyVal {
def keyMultiSpan(): List[List[(A,B)]] = l match {
case Nil => List()
case h :: t =>
val (f, r) = l.span(_._1 == h._1)
f :: r.keyMultiSpan()
}
}
Thus, use it as follows,
items.keyMultiSpan.map { _.head._1 }
res: List(1, 2, 3, 1)
Here is a succinct but inefficient solution:
def pythonGroupBy[T, U](items: Seq[T])(f: T => U): List[List[T]] = {
items.foldLeft(List[List[T]]()) {
case (Nil, x) => List(List(x))
case (g :: gs, x) if f(g.head) == f(x) => (x :: g) :: gs
case (gs, x) => List(x) :: gs
}.map(_.reverse).reverse
}
And here is a better one, that only invokes f on each element once:
def pythonGroupBy2[T, U](items: Seq[T])(f: T => U): List[List[T]] = {
if (items.isEmpty)
List(List())
else {
val state = (List(List(items.head)), f(items.head))
items.tail.foldLeft(state) { (state, x) =>
val groupByX = f(x)
state match {
case (g :: gs, groupBy) if groupBy == groupByX => ((x :: g) :: gs, groupBy)
case (gs, _) => (List(x) :: gs, groupByX)
}
}._1.map(_.reverse).reverse
}
}
Both solutions fold over items, building up a list of groups as they go. pythonGroupBy2 also keeps track of the value of f for the current group. At the end, we have to reverse each group and the list of groups in order to get the correct order.
Try:
val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
val res = compress(items.map(_._1))
/** Eliminate consecutive duplicates of list elements **/
def compress[T](l : List[T]) : List[T] = l match {
case head :: next :: tail if (head == next) => compress(next :: tail)
case head :: tail => head :: compress(tail)
case Nil => List()
}
/** Tail recursive version **/
def compress[T](input: List[T]): List[T] = {
def comp(remaining: List[T], l: List[T], last: Any): List[T] = {
remaining match {
case Nil => l
case head :: tail if head == last => comp(tail, l, head)
case head :: tail => comp(tail, head :: l, head)
}
}
comp(input, Nil, Nil).reverse
}
Where compress is the solution of one of the 99 Problems in Scala.
hmm couldn't find something out of the box but this will do it
def groupz[T](list:List[T]):List[T] = {
list match {
case Nil => Nil
case x::Nil => List(x)
case x::xs if (x == xs.head) => groupz(xs)
case x::xs => x::groupz(xs)
}}
//now let's add this functionality to List class
implicit def addPythonicGroupToList[T](list:List[T]) = new {def pythonGroup = groupz(list)}
and now you can do:
val items = List((1, 2), (1, 5), (1, 3), (2, 9), (3, 7), (1, 5), (1, 4))
items.map(_._1).pythonGroup
res1: List[Int] = List(1, 2, 3, 1)
Here is a simple solution that I used for a problem I stumbled on at work. In this case I didn't care too much about space, so did not worry about efficient iterators. Used an ArrayBuffer to accumulate the results.
(Don't use this with enormous amounts of data.)
Sequential GroupBy
import scala.collection.mutable.ArrayBuffer
object Main {
/** Returns consecutive keys and groups from the iterable. */
def sequentialGroupBy[A, K](items: Seq[A], f: A => K): ArrayBuffer[(K, ArrayBuffer[A])] = {
val result = ArrayBuffer[(K, ArrayBuffer[A])]()
if (items.nonEmpty) {
// Iterate, keeping track of when the key changes value.
var bufKey: K = f(items.head)
var buf: ArrayBuffer[A] = ArrayBuffer()
for (elem <- items) {
val key = f(elem)
if (key == bufKey) {
buf += elem
} else {
val group: (K, ArrayBuffer[A]) = (bufKey, buf)
result += group
bufKey = key
buf = ArrayBuffer(elem)
}
}
// Append last group.
val group: (K, ArrayBuffer[A]) = (bufKey, buf)
result += group
}
result
}
def main(args: Array[String]): Unit = {
println("\nExample 1:")
sequentialGroupBy[Int, Int](
Seq(1, 4, 5, 7, 9, 8, 16),
x => x % 2
).foreach(println)
println("\nExample 2:")
sequentialGroupBy[String, Boolean](
Seq("pi", "nu", "rho", "alpha", "xi"),
x => x.length > 2
).foreach(println)
}
}
Running the above code results in the following:
Example 1:
(1,ArrayBuffer(1))
(0,ArrayBuffer(4))
(1,ArrayBuffer(5, 7, 9))
(0,ArrayBuffer(8, 16))
Example 2:
(false,ArrayBuffer(pi, nu))
(true,ArrayBuffer(rho, alpha))
(false,ArrayBuffer(xi))