"conditionalZip" operator in Akka Streams - scala

Assume that I have two sources:
val first = Source(1 :: 2 :: 4 :: 6 :: Nil)
val second = Source(1 :: 2 :: 3 :: 4 :: 5 :: 6 :: 7 :: Nil)
Is it possible to create a zip that will pair only elements based on a condition? I mean something like:
first.conditionalZip(second, _ == _) // if that method exited
That code would take the element from the first source and drop elements from the second until there is an element that satisfies the condition, and then output a tuple. The result for the above call would be (1, 1), (2, 2), (4, 4), (6, 6).

Consider zipping the two Sources, followed by using statefulMapConcat to transform the zipped elements in accordance with the condition function, as shown below:
import akka.stream.scaladsl._
import akka.NotUsed
def popFirstMatch(ls: List[Int], condF: Int => Boolean): (Option[Int], List[Int]) = {
ls.find(condF) match {
case None =>
(None, ls)
case Some(e) =>
val idx = ls.indexOf(e)
if (idx < 0)
(None, ls)
else {
val (l, r) = ls.splitAt(idx)
(r.headOption, l ++ r.tail)
}
}
}
def conditionalZip( first: Source[Int, NotUsed],
second: Source[Int, NotUsed],
filler: Int,
condFcn: (Int, Int) => Boolean ): Source[(Int, Int), NotUsed] = {
first.zipAll(second, filler, filler).statefulMapConcat{ () =>
var prevList = List.empty[Int]
tuple => tuple match { case (e1, e2) =>
if (e2 != filler) {
if (e1 != filler && condFcn(e1, e2))
(e1, e2) :: Nil
else {
if (e1 != filler)
prevList :+= e1
val (opElem, rest) = popFirstMatch(prevList, condFcn(_, e2))
prevList = rest
opElem match {
case None => Nil
case Some(e) => (e, e2) :: Nil
}
}
}
else
Nil
}
}
}
Test running:
import akka.actor.ActorSystem
implicit val system = ActorSystem("system")
implicit val ec = system.dispatcher
// Example 1:
val first = Source(1 :: 2 :: 4 :: 6 :: Nil)
val second = Source(1 :: 2 :: 3 :: 4 :: 5 :: 6 :: 7 :: Nil)
conditionalZip(first, second, Int.MinValue, _ == _).runForeach(println)
// (1,1)
// (2,2)
// (4,4)
// (6,6)
conditionalZip(first, second, Int.MinValue, _ > _).runForeach(println)
// (4,3)
// (6,4)
conditionalZip(first, second, Int.MinValue, _ < _).runForeach(println)
// (1,2)
// (2,3)
// (4,5)
// (6,7)
// Example 2:
val first = Source(3 :: 9 :: 5 :: 5 :: 6 :: Nil)
val second = Source(1 :: 3 :: 5 :: 2 :: 5 :: 6 :: Nil)
conditionalZip(first, second, Int.MinValue, _ == _).runForeach(println)
// (3,3)
// (5,5)
// (5,5)
// (6,6)
conditionalZip(first, second, Int.MinValue, _ > _).runForeach(println)
// (3,1)
// (9,3)
// (5,2)
// (6,5)
conditionalZip(first, second, Int.MinValue, _ < _).runForeach(println)
// (3,5)
// (5,6)
A few notes:
Method zipAll (available on Akka Stream 2.6+) zips the two Sources while padding the one with fewer elements with provided "filler" values. In this case, these fillers are of no interest hence should be assigned a distinct value from actual elements.
An internal List, prevList, is used within statefulMapConcat to store elements from the 1st Source for comparing in following iterations with elements from the 2nd Source. The List can be replaced with a Set for better lookup performance if elements within the Sources are distinct.
Method popFirstMatch is for extracting the first element in the prevList that matches the provided partial condition function, returning a Tuple of the element of type Option and the remaining List.
NOTE that this is just an illustration of how statefulMapConcat may be a solution for the described problem. Behavior of the sample code may not necessarily match the exact requirement without detailed implementation to either cover all cases or refine the scope of the fairly broad condition function (Int, Int) => Boolean.

Related

A "Simple' Scala question but took me long time to debug

Please check the two pieces of script as above.
genComb4(lst) works since I put z <- genComb4(xs) before i <- 0 to x._2 in the for-comprehension; genComb(lst) does not work since I change the order of these two lines in for-comprehension.
It took me almost half day to find this bug, but I cannot explain it by myself. Could you tell me why this happened?
Thank you very much in advance.
// generate combinations
val nums = Vector(1, 2, 3)
val strs = Vector('a', 'b', 'c')
val lst: List[(Char, Int)] = strs.zip(nums).toList
def genComb4(lst: List[(Char, Int)]): List[List[(Char, Int)]] = lst match {
case Nil => List(List())
case x :: xs =>
for {
z <- genComb4(xs) // correct
i <- 0 to x._2 // correct
} yield ( (x._1, i) :: z)
}
genComb4(lst)
def genComb(lst: List[(Char, Int)]): List[List[(Char, Int)]] = lst match {
case Nil => List(List())
case x :: xs =>
for {
i <- (0 to x._2) // wrong
z <- genComb(xs) // wrong
} yield ( (x._1, i) :: z)
}
genComb(lst)
It's because of different types of container in for comprehension. When you start for-comprehension from line: i <- (0 to x._2) it's set type of result container as IndexedSeq but in case where first line is z <- genComb4(xs) the type of result container is List, take a look:
val x = 'a' -> 2
val indices: Seq[Int] = 0 to x._2
val combs: List[List[(Char, Int)]] = genComb4(List(x))
// indexed sequence
val indicesInFor: IndexedSeq[(Char, Int)] = for {
i <- 0 to x._2
} yield (x._1, i)
// list
val combsInFor: List[List[(Char, Int)]] = for {
z <- genComb4(List(x))
} yield z
so for make your second case is working, you should cast (0 to x._2).toList:
val indicesListInFor: List[(Char, Int)] = for {
i <- (0 to x._2).toList
} yield (x._1, i)
result code should be:
def genComb(lst: List[(Char, Int)]): List[List[(Char, Int)]] = lst match {
case Nil => List(List())
case x :: xs =>
for {
i <- (0 to x._2).toList
z <- genComb(xs)
} yield ( (x._1, i) :: z)
}
genComb(lst)
You should remember about type of starting line in for-comprehension and inheritance of scala collections. If next types in for-comprehension can't be converted by inheritance rules to the first expression line type you should take care about it by yourself.
Good practise is unwrap for-expression into flatMap, map and withFilter functions, then you will find miss-typing or something else faster.
useful links:
how does yield work, scala documentation
quick review of scala for-comprehensions
for-expressions, scala documentation

Scala: Remove duplicated integers from Vector( tuples(Int,Int) , ...)

I have a big size of a vector (about 2000 elements), inside consists of many tuples, Tuple(Int,Int), i.e.
val myVectorEG = Vector((65,61), (29,49), (4,57), (12,49), (24,98), (21,52), (81,86), (91,23), (73,34), (97,41),...))
I wish to remove the repeated/duplicated integers for every tuple at the index (0), i.e. if Tuple(65,xx) repeated at other Tuple(65, yy) inside the vector, it should be removed)
I enable to access them and print out in this method:
val (id1,id2) = ( allSource.foreach(i=>println(i._1)), allSource.foreach(i=>i._2))
How can I remove duplicate integers? Or I should use another method, rather than using foreach to access my element index at 0
To remove all duplicates, first group by the first tuple and only collect the tuples where there is only one tuple that belongs to that particular key (_._1). Then flatten the result.
myVectorEG.groupBy(_._1).collect{
case (k, v) if v.size == 1 => v
}.flatten
This returns a List which you can call .toVector on if you need a Vector
This does the job and preserves order (unlike other solutions) but is O(n^2) so potentially slow for 2000 elements:
myVectorEG.filter(x => myVectorEG.count(_._1 == x._1) == 1)
This is more efficient for larger vectors but still preserves order:
val keep =
myVectorEG.groupBy(_._1).collect{
case (k, v) if v.size == 1 => k
}.toSet
myVectorEG.filter(x => keep.contains(x._1))
You can use a distinctBy to remove duplicates.
In the case of Vector[(Int, Int)] it will look like this
myVectorEG.distinctBy(_._1)
Updated, if you need to remove all the duplicates:
You can use groupBy but this will rearrange your order.
myVectorEG.groupBy(_._1).filter(_._2.size == 1).flatMap(_._2).toVector
Another option, taking advantage that you want the list sorted at the end.
def sortAndRemoveDuplicatesByFirst[A : Ordering, B](input: List[(A, B)]): List[(A, B)] = {
import Ordering.Implicits._
val sorted = input.sortBy(_._1)
#annotation.tailrec
def loop(remaining: List[(A, B)], previous: (A, B), repeated: Boolean, acc: List[(A, B)]): List[(A, B)] =
remaining match {
case x :: xs =>
if (x._1 == previous._1)
loop(remaining = xs, previous, repeated = true, acc)
else if (!repeated)
loop(remaining = xs, previous = x, repeated = false, previous :: acc)
else
loop(remaining = xs, previous = x, repeated = false, acc)
case Nil =>
(previous :: acc).reverse
}
sorted match {
case x :: xs =>
loop(remaining = xs, previous = x, repeated = false, acc = List.empty)
case Nil =>
List.empty
}
}
Which you can test like this:
val data = List(
1 -> "A",
3 -> "B",
1 -> "C",
4 -> "D",
3 -> "E",
5 -> "F",
1 -> "G",
0 -> "H"
)
sortAndRemoveDuplicatesByFirst(data)
// res: List[(Int, String)] = List((0,H), (4,D), (5,F))
(I used List instead of Vector to make it easy and performant to write the tail-rec algorithm)

How to elegantly extract range of list based on specific criteria?

I want to extract range of elements from a list, meeting the following requirements:
First element of range has to be an element previous to element matching specific condition
Last element of range has to be an element next to element matching specific condition
Example: For list (1,1,1,10,2,10,1,1,1) and condition x >= 10 I want to get (1,10,2,10,1)
This is very simple to program imperatively, but I am just wondering if there is some smart Scala-functional way to achieve it. Is it?
Keeping it in the scala standard lib, I would solve this using recursion:
def f(_xs: List[Int])(cond: Int => Boolean): List[Int] = {
def inner(xs: List[Int], res: List[Int]): List[Int] = xs match {
case Nil => Nil
case x :: y :: tail if cond(y) && res.isEmpty => inner(tail, res ++ (x :: y :: Nil))
case x :: y :: tail if cond(x) && res.nonEmpty => res ++ (x :: y :: Nil)
case x :: tail if res.nonEmpty => inner(tail, res :+ x)
case x :: tail => inner(tail, res)
}
inner(_xs, Nil)
}
scala> f(List(1,1,1,10,2,10,1,1,1))(_ >= 10)
res3: List[Int] = List(1, 10, 2, 10, 1)
scala> f(List(2,10,2,10))(_ >= 10)
res4: List[Int] = List()
scala> f(List(2,10,2,10,1))(_ >= 10)
res5: List[Int] = List(2, 10, 2, 10, 1)
Maybe there is something I did not think of in this solution, or I missunderstood something, but I think you will get the basic idea.
Good functional algorithm design practice is all about breaking complex problems into simpler ones.
The principle is called Divide and Conquer.
It's easy to extract two simpler subproblems from the subject problem:
Get a list of all elements after the matching one, preceded with this matching element,
preceded with an element before it.
Get a list of all elements up to the latest matching one, followed by the matching element and
the element after it.
The named problems are simple enough for the appropriate functions to be implemented, so no subdivision is required.
Here's the implementation of the first function:
def afterWithPredecessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= elements match {
case Nil => Nil
case a :: tail if test( a ) => Nil // since there is no predecessor
case a :: b :: tail if test( b ) => a :: b :: tail
case a :: tail => afterWithPredecessor( tail )( test )
}
Since the second problem can be seen as a direct inverse of the first one, it can be easily implemented by reversing the input and output:
def beforeWithSuccessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= afterWithPredecessor( elements.reverse )( test ).reverse
But here's an optimized version of this:
def beforeWithSuccessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= elements match {
case Nil => Nil
case a :: b :: tail if test( a ) =>
a :: b :: beforeWithSuccessor( tail )( test )
case a :: tail =>
beforeWithSuccessor( tail )( test ) match {
case Nil => Nil
case r => a :: r
}
}
Finally, composing the above functions together to produce the function solving your problem becomes quite trivial:
def range[ A ]( elements : List[ A ] )( test : A => Boolean ) : List[ A ]
= beforeWithSuccessor( afterWithPredecessor( elements )( test ) )( test )
Tests:
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ >= 10 )
res0: List[Int] = List(1, 10, 2, 10, 1)
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ >= 1 )
res1: List[Int] = List()
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ == 2 )
res2: List[Int] = List(10, 2, 10)
The second test returns an empty list since the outermost elements satisfying the predicate have no predecessors (or successors).
def range[T](elements: List[T], condition: T => Boolean): List[T] = {
val first = elements.indexWhere(condition)
val last = elements.lastIndexWhere(condition)
elements.slice(first - 1, last + 2)
}
scala> range[Int](List(1,1,1,10,2,10,1,1,1), _ >= 10)
res0: List[Int] = List(1, 10, 2, 10, 1)
scala> range[Int](List(2,10,2,10), _ >= 10)
res1: List[Int] = List(2, 10, 2, 10)
scala> range[Int](List(), _ >= 10)
res2: List[Int] = List()
Zip and map to the rescue
val l = List(1, 1, 1, 10, 2, 1, 1, 1)
def test (i: Int) = i >= 10
((l.head :: l) zip (l.tail :+ l.last)) zip l filter {
case ((a, b), c) => (test (a) || test (b) || test (c) )
} map { case ((a, b), c ) => c }
That should work. I only have my smartphone and am miles from anywhere I could test this, so apologise for any typos or minor syntax errors
Edit: works now. I hope it's obvious that my solution shuffles the list to the right and to the left to create two new lists. When these are zipped together and zipped again with the original list, the result is a list of tuples, each containing the original element and a tuple of its neighbours. This is then trivial to filter and map back to a simple list.
Making this into a more general function (and using collect rather than filter -> map)...
def filterWithNeighbours[E](l: List[E])(p: E => Boolean) = l match {
case Nil => Nil
case li if li.size < 3 => if (l exists p) l else Nil
case _ => ((l.head :: l) zip (l.tail :+ l.last)) zip l collect {
case ((a, b), c) if (p (a) || p (b) || p (c) ) => c
}
}
This is less efficient than the recursive solution but makes the test much simpler and more clear. It can be difficult to match the right sequence of patterns in a recursive solution, as the patterns often express the shape of the chosen implementation rather than the original data. With the simple functional solution, each element is clearly and simply being compared to its neighbours.

Confused about merge sort implementation

What is occurring on this line, x is being concatenated to xs1 but x and xs1 are not defined anywhere?
case (x :: xs1, y :: ys1) =>
Also here, what value do have x and y below? Is merge being recursively called as part of the case class?
if( x < y) x :: merge(xs1 , ys)
Here is the complete Scala code :
object mergesort {
def msort(xs: List[Int]): List[Int] = {
val n = xs.length / 2
if(n == 0) xs
else {
def merge(xs: List[Int], ys: List[Int]): List[Int] = (xs , ys) match {
case (Nil, ys) => ys
case (xs, Nil) => xs
case (x :: xs1, y :: ys1) =>
if( x < y) x :: merge(xs1 , ys)
else y :: merge(xs, ys1)
}
val (fst, snd) = xs splitAt n
merge(msort(fst), msort(snd))
}
} //> msort: (xs: List[Int])List[Int]
val nums = List(2, -4, 5, 7, 1) //> nums : List[Int] = List(2, -4, 5, 7, 1)
msort(nums) //> res0: List[Int] = List(-4, 1, 2, 5, 7)
}
In
case (x :: xs1, y :: ys1) =>
:: is a syntactic sugar in pattern matching to de-construct a list in to head and tail
the list xs is de-constructed in to head x and tail xs.
In pattern matching :: de-constructs' a list, exact reverse of what it actually does in normal, construct a list.
Read De-Constructing objects in The Point of Pattern Matching in Scala
This
(xs , ys) match {
...
case (x :: xs1, y :: ys1)
is a pattern match that declares the variables x, xs1 etc. in the same statement as asserting a sequence match.
The code above is checking that xs can be decomposed into a sequence with head x and tail xs1, and if so, making the head/tail available to the successive code block in those two variables.
To answer your second question (since nobody else has!), yes, the merge function (declared within the outer function) is being called recursively.
Here's an example of how scala allows you to do pattern matching on a List:
scala> List(1,2,3)
res0: List[Int] = List(1, 2, 3)
scala> res0 match {
| case h :: t => "more than two elements, " + h + " is the first"
| case _ => "less than two elements"
| }
res1: java.lang.String = more than two elements, 1 is the first
Note that :: on the left side of the case decomposes the list in its head ( 1 ) and its tail (the rest of the list 2, 3) and binds the values to h and t, that are created and scoped only inside the first case.
Here's how you decompose a tuple:
scala> val tp = ("a", 1)
tp: (java.lang.String, Int) = (a,1)
scala> tp match {
| case (a, b) => a + " is a string, " + b + " is a number"
| case _ => "something missing"
| }
res2: java.lang.String = a is a string, 1 is a number
In the code in your question you're mixing both things and pattern matching on a tuple of Lists (xs , ys).
case (x :: xs1, y :: ys1) is both decomposing the tuple in its two lists and decomposing its two lists in their respective heads and tails.
The match-case keywords are used in scala to perform pattern matching, which is a way to match/decompose objects using several mechanisms like case classes and extractors. Google for scala pattern matching and you'll find the answers you need.

Scala Get First and Last elements of List using Pattern Matching

I am doing a pattern matching on a list. Is there anyway I can access the first and last element of the list to compare?
I want to do something like..
case List(x, _*, y) if(x == y) => true
or
case x :: _* :: y =>
or something similar...
where x and y are first and last elements of the list..
How can I do that.. any Ideas?
Use the standard :+ and +: extractors from the scala.collection package
ORIGINAL ANSWER
Define a custom extractor object.
object :+ {
def unapply[A](l: List[A]): Option[(List[A], A)] = {
if(l.isEmpty)
None
else
Some(l.init, l.last)
}
}
Can be used as:
val first :: (l :+ last) = List(3, 89, 11, 29, 90)
println(first + " " + l + " " + last) // prints 3 List(89, 11, 29) 90
(For your case: case x :: (_ :+ y) if(x == y) => true)
In case you missed the obvious:
case list # (head :: tail) if head == list.last => true
The head::tail part is there so you don’t match on the empty list.
simply:
case head +: _ :+ last =>
for example:
scala> val items = Seq("ham", "spam", "eggs")
items: Seq[String] = List(ham, spam, eggs)
scala> items match {
| case head +: _ :+ last => Some((head, last))
| case List(head) => Some((head, head))
| case _ => None
| }
res0: Option[(String, String)] = Some((ham,eggs))
Lets understand the concept related to this question, there is a difference between '::', '+:' and ':+':
1st Operator:
'::' - It is right associative operator which works specially for lists
scala> val a :: b :: c = List(1,2,3,4)
a: Int = 1
b: Int = 2
c: List[Int] = List(3, 4)
2nd Operator:
'+:' - It is also right associative operator but it works on seq which is more general than just list.
scala> val a +: b +: c = List(1,2,3,4)
a: Int = 1
b: Int = 2
c: List[Int] = List(3, 4)
3rd Operator:
':+' - It is also left associative operator but it works on seq which is more general than just list
scala> val a :+ b :+ c = List(1,2,3,4)
a: List[Int] = List(1, 2)
b: Int = 3
c: Int = 4
The associativity of an operator is determined by the operator’s last character. Operators ending in a colon ‘:’ are right-associative. All other operators are left-associative.
A left-associative binary operation e1;op;e2 is interpreted as e1.op(e2)
If op is right-associative, the same operation is interpreted as { val x=e1; e2.op(x) }, where x is a fresh name.
Now comes answer for your question:
So now if you need to get first and last element from the list, please use following code
scala> val firstElement +: b :+ lastElement = List(1,2,3,4)
firstElement: Int = 1
b: List[Int] = List(2, 3)
lastElement: Int = 4