Scala: Remove duplicated integers from Vector( tuples(Int,Int) , ...) - scala

I have a big size of a vector (about 2000 elements), inside consists of many tuples, Tuple(Int,Int), i.e.
val myVectorEG = Vector((65,61), (29,49), (4,57), (12,49), (24,98), (21,52), (81,86), (91,23), (73,34), (97,41),...))
I wish to remove the repeated/duplicated integers for every tuple at the index (0), i.e. if Tuple(65,xx) repeated at other Tuple(65, yy) inside the vector, it should be removed)
I enable to access them and print out in this method:
val (id1,id2) = ( allSource.foreach(i=>println(i._1)), allSource.foreach(i=>i._2))
How can I remove duplicate integers? Or I should use another method, rather than using foreach to access my element index at 0

To remove all duplicates, first group by the first tuple and only collect the tuples where there is only one tuple that belongs to that particular key (_._1). Then flatten the result.
myVectorEG.groupBy(_._1).collect{
case (k, v) if v.size == 1 => v
}.flatten
This returns a List which you can call .toVector on if you need a Vector

This does the job and preserves order (unlike other solutions) but is O(n^2) so potentially slow for 2000 elements:
myVectorEG.filter(x => myVectorEG.count(_._1 == x._1) == 1)
This is more efficient for larger vectors but still preserves order:
val keep =
myVectorEG.groupBy(_._1).collect{
case (k, v) if v.size == 1 => k
}.toSet
myVectorEG.filter(x => keep.contains(x._1))

You can use a distinctBy to remove duplicates.
In the case of Vector[(Int, Int)] it will look like this
myVectorEG.distinctBy(_._1)
Updated, if you need to remove all the duplicates:
You can use groupBy but this will rearrange your order.
myVectorEG.groupBy(_._1).filter(_._2.size == 1).flatMap(_._2).toVector

Another option, taking advantage that you want the list sorted at the end.
def sortAndRemoveDuplicatesByFirst[A : Ordering, B](input: List[(A, B)]): List[(A, B)] = {
import Ordering.Implicits._
val sorted = input.sortBy(_._1)
#annotation.tailrec
def loop(remaining: List[(A, B)], previous: (A, B), repeated: Boolean, acc: List[(A, B)]): List[(A, B)] =
remaining match {
case x :: xs =>
if (x._1 == previous._1)
loop(remaining = xs, previous, repeated = true, acc)
else if (!repeated)
loop(remaining = xs, previous = x, repeated = false, previous :: acc)
else
loop(remaining = xs, previous = x, repeated = false, acc)
case Nil =>
(previous :: acc).reverse
}
sorted match {
case x :: xs =>
loop(remaining = xs, previous = x, repeated = false, acc = List.empty)
case Nil =>
List.empty
}
}
Which you can test like this:
val data = List(
1 -> "A",
3 -> "B",
1 -> "C",
4 -> "D",
3 -> "E",
5 -> "F",
1 -> "G",
0 -> "H"
)
sortAndRemoveDuplicatesByFirst(data)
// res: List[(Int, String)] = List((0,H), (4,D), (5,F))
(I used List instead of Vector to make it easy and performant to write the tail-rec algorithm)

Related

Zip two lists with diminishing lengths

I got the below problem statement that should be solved with scala:
Two list of elements, size of first one less than the second one. For instance list 1 have 2 elements & list 2 have 10 elements.
Need to map each element of list 1 with two elements of second list. The elements used for first element shouldn't be used for the second element i.e. it takes two unique elements from second list & returns the remaining elements in second list with the mapped elements list.
scala> val list1 = List(1,2)
list1: List[Int] = List(1, 2)
scala> val list2 = List(3,4,5,6,7,8,9)
list2: List[Int] = List(3, 4, 5, 6, 7, 8, 9)
expected output
(List((1,3), (1,4), (2,5), (2,6)), List(7,8,9))
This is the kind of problem that I personally prefer to solve using a tail-recursive approach.
/** Zips two lists together by taking multiple elements from the second list
* for each element of the first list.
*
* #param l1 The first (small) list.
* #param l2 The second (big) list.
* #param n The number of elements to take of second list for each element of teh first list,
* must be greater than zero.
* #return A pair of the zipped list with the remaining elements of the second list,
* wrapped in an option to catch the possibility than the second list was consumed before finishing.
*/
def zipWithLarger[A, B](l1: List[A], l2: List[B])(n: Int): Option[(List[(A, B)], List[B])] = {
#annotation.tailrec
def loop(remainingA: List[A], remainingB: List[B], count: Int, acc: List[(A, B)]): Option[(List[(A, B)], List[B])] =
(remainingA, remainingB) match {
case (a :: as, b :: bs) =>
val newElement = (a, b)
if (count == n)
loop(remainingA = as, remainingB = bs, count = 1, newElement :: acc)
else
loop(remainingA, remainingB = bs, count + 1, newElement :: acc)
case (Nil, _) =>
Some(acc.reverse -> remainingB)
case (_, Nil) =>
// We consumed the second list beforing finishing the first one.
None
}
// Ensure n is positive.
if (n >= 1) loop(remainingA = l1, remainingB = l2, count = 1, acc = List.empty)
else None
}
You can see the code running here.
You start with repeating elements of the first list n times, flatMap the result and zip with second collection:
val list1 = List(1,2)
val list2 = List(3,4,5,6,7,8,9)
val zipped = list1
.flatMap(i => (1 to list1.size).map(_ => i))
.zip(list2)
val result = (zipped, list2.drop(zipped.size))

Scala Split Seq of Int from right when cumulative results meet the condition

Have a sequence of positive integer, need to split the sequence from the right at the element when the sum till that element is less than or equal to a threshold. For example,
val seq = Seq(9,8,7,6,5,4,3,2,1)
The threshold is 10, so the result is
Seq(9,8,7,6,5) and Seq(4,3,2,1)
I tried dropWhile and scanLeft after reverse, however, they are either quadratic or linear but complicated. Since our sequence may be very long, but normally the threshold is small and very few elements from the right side will meet the condition. I am wondering if there is any better and linear way to do it.
This will stop as soon as the threshold is met. Unfortunately it uses a return to break.
val seq = Seq(9,8,7,6,5,4,3,2,1)
val threshold = 10
def processList(): (Seq[Int], Int) = {
seq.foldRight((Seq[Int](), 0)) {
case (elem, (acc, total)) =>
if (total + elem <= threshold) {
(elem +: acc, total + elem)
} else {
return (acc, total)
}
}
}
processList()
Looks like there's not a great way to do this with built-in methods, but you can implement it yourself:
def splitRightCumulative[A, B](xs: Seq[A])(start: B)(f: (B, A) => B, cond: B => Boolean): (Seq[A], Seq[A]) = {
def _loop(current: B, xs: Seq[A], acc: Seq[A]): (Seq[A], Seq[A]) = {
val next = f(current, xs.head)
if (cond(next)) {
_loop(next, xs.tail, xs.head +: acc)
} else {
(xs.reverse, acc)
}
}
_loop(start, xs.reverse, Seq.empty)
}
val xs = List(9, 8, 7, 6, 5, 4, 3, 2, 1)
val (left, right) = splitRightCumulative(xs)(0)(_ + _, _ <= 10)
The second type parameter (B) might not be necessary if you're always accumulating the same type as what's in your collection.

How to elegantly extract range of list based on specific criteria?

I want to extract range of elements from a list, meeting the following requirements:
First element of range has to be an element previous to element matching specific condition
Last element of range has to be an element next to element matching specific condition
Example: For list (1,1,1,10,2,10,1,1,1) and condition x >= 10 I want to get (1,10,2,10,1)
This is very simple to program imperatively, but I am just wondering if there is some smart Scala-functional way to achieve it. Is it?
Keeping it in the scala standard lib, I would solve this using recursion:
def f(_xs: List[Int])(cond: Int => Boolean): List[Int] = {
def inner(xs: List[Int], res: List[Int]): List[Int] = xs match {
case Nil => Nil
case x :: y :: tail if cond(y) && res.isEmpty => inner(tail, res ++ (x :: y :: Nil))
case x :: y :: tail if cond(x) && res.nonEmpty => res ++ (x :: y :: Nil)
case x :: tail if res.nonEmpty => inner(tail, res :+ x)
case x :: tail => inner(tail, res)
}
inner(_xs, Nil)
}
scala> f(List(1,1,1,10,2,10,1,1,1))(_ >= 10)
res3: List[Int] = List(1, 10, 2, 10, 1)
scala> f(List(2,10,2,10))(_ >= 10)
res4: List[Int] = List()
scala> f(List(2,10,2,10,1))(_ >= 10)
res5: List[Int] = List(2, 10, 2, 10, 1)
Maybe there is something I did not think of in this solution, or I missunderstood something, but I think you will get the basic idea.
Good functional algorithm design practice is all about breaking complex problems into simpler ones.
The principle is called Divide and Conquer.
It's easy to extract two simpler subproblems from the subject problem:
Get a list of all elements after the matching one, preceded with this matching element,
preceded with an element before it.
Get a list of all elements up to the latest matching one, followed by the matching element and
the element after it.
The named problems are simple enough for the appropriate functions to be implemented, so no subdivision is required.
Here's the implementation of the first function:
def afterWithPredecessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= elements match {
case Nil => Nil
case a :: tail if test( a ) => Nil // since there is no predecessor
case a :: b :: tail if test( b ) => a :: b :: tail
case a :: tail => afterWithPredecessor( tail )( test )
}
Since the second problem can be seen as a direct inverse of the first one, it can be easily implemented by reversing the input and output:
def beforeWithSuccessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= afterWithPredecessor( elements.reverse )( test ).reverse
But here's an optimized version of this:
def beforeWithSuccessor
[ A ]
( elements : List[ A ] )
( test : A => Boolean )
: List[ A ]
= elements match {
case Nil => Nil
case a :: b :: tail if test( a ) =>
a :: b :: beforeWithSuccessor( tail )( test )
case a :: tail =>
beforeWithSuccessor( tail )( test ) match {
case Nil => Nil
case r => a :: r
}
}
Finally, composing the above functions together to produce the function solving your problem becomes quite trivial:
def range[ A ]( elements : List[ A ] )( test : A => Boolean ) : List[ A ]
= beforeWithSuccessor( afterWithPredecessor( elements )( test ) )( test )
Tests:
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ >= 10 )
res0: List[Int] = List(1, 10, 2, 10, 1)
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ >= 1 )
res1: List[Int] = List()
scala> range( List(1,1,1,10,2,10,1,1,1) )( _ == 2 )
res2: List[Int] = List(10, 2, 10)
The second test returns an empty list since the outermost elements satisfying the predicate have no predecessors (or successors).
def range[T](elements: List[T], condition: T => Boolean): List[T] = {
val first = elements.indexWhere(condition)
val last = elements.lastIndexWhere(condition)
elements.slice(first - 1, last + 2)
}
scala> range[Int](List(1,1,1,10,2,10,1,1,1), _ >= 10)
res0: List[Int] = List(1, 10, 2, 10, 1)
scala> range[Int](List(2,10,2,10), _ >= 10)
res1: List[Int] = List(2, 10, 2, 10)
scala> range[Int](List(), _ >= 10)
res2: List[Int] = List()
Zip and map to the rescue
val l = List(1, 1, 1, 10, 2, 1, 1, 1)
def test (i: Int) = i >= 10
((l.head :: l) zip (l.tail :+ l.last)) zip l filter {
case ((a, b), c) => (test (a) || test (b) || test (c) )
} map { case ((a, b), c ) => c }
That should work. I only have my smartphone and am miles from anywhere I could test this, so apologise for any typos or minor syntax errors
Edit: works now. I hope it's obvious that my solution shuffles the list to the right and to the left to create two new lists. When these are zipped together and zipped again with the original list, the result is a list of tuples, each containing the original element and a tuple of its neighbours. This is then trivial to filter and map back to a simple list.
Making this into a more general function (and using collect rather than filter -> map)...
def filterWithNeighbours[E](l: List[E])(p: E => Boolean) = l match {
case Nil => Nil
case li if li.size < 3 => if (l exists p) l else Nil
case _ => ((l.head :: l) zip (l.tail :+ l.last)) zip l collect {
case ((a, b), c) if (p (a) || p (b) || p (c) ) => c
}
}
This is less efficient than the recursive solution but makes the test much simpler and more clear. It can be difficult to match the right sequence of patterns in a recursive solution, as the patterns often express the shape of the chosen implementation rather than the original data. With the simple functional solution, each element is clearly and simply being compared to its neighbours.

Calculating differences of subsequent elements of a sequence in scala

I would like to do almost exactly this in scala. Is there an elegant way?
Specifically, I just want the difference of adjacent elements in a sequence. For example
input = 1,2,6,9
output = 1,4,3
How about this?
scala> List(1, 2, 6, 9).sliding(2).map { case Seq(x, y, _*) => y - x }.toList
res0: List[Int] = List(1, 4, 3)
Here is one that uses recursion and works best on Lists
def differences(l:List[Int]) : List[Int] = l match {
case a :: (rest # b :: _) => (b - a) :: differences(rest)
case _ => Nil
}
And here is one that should be pretty fast on Vector or Array:
def differences(a:IndexedSeq[Int]) : IndexedSeq[Int] =
a.indices.tail.map(i => a(i) - a(i-1))
Of course there is always this:
def differences(a:Seq[Int]) : Seq[Int] =
a.tail.zip(a).map { case (x,y) => x - y }
Note that only the recursive version handles empty lists without an exception.

Scala - can a lambda parameter match a tuple?

So say i have some list like
val l = List((1, "blue"), (5, "red"), (2, "green"))
And then i want to filter one of them out, i can do something like
val m = l.filter(item => {
val (n, s) = item // "unpack" the tuple here
n != 2
}
Is there any way i can "unpack" the tuple as the parameter to the lambda directly, instead of having this intermediate item variable?
Something like the following would be ideal, but eclipse tells me wrong number of parameters; expected=1
val m = l.filter( (n, s) => n != 2 )
Any help would be appreciated - using 2.9.0.1
This is about the closest you can get:
val m = l.filter { case (n, s) => n != 2 }
It's basically pattern matching syntax inside an anonymous PartialFunction. There are also the tupled methods in Function object and traits, but they are just a wrapper around this pattern matching expression.
Hmm although Kipton has a good answer. You can actually make this shorter by doing.
val l = List((1, "blue"), (5, "red"), (2, "green"))
val m = l.filter(_._1 != 2)
There are a bunch of options:
for (x <- l; (n,s) = x if (n != 2)) yield x
l.collect{ case x # (n,s) if (n != 2) => x }
l.filter{ case (n,s) => n != 2 }
l.unzip.zipped.map((n,s) => n != 2).zip // Complains that zip is deprecated
val m = l.filter( (n, s) => n != 2 )
... is a type mismatch because that lambda defines a
Function2[String,Int,Boolean] with two parameters instead of
Function1[(String,Int),Boolean] with one Tuple2[String,Int] as its parameter.
You can convert between them like this:
val m = l.filter( ((n, s) => n != 2).tupled )
I've pondered the same, and came to your question today.
I'm not very fond of the partial function approaches (anything having case) since they imply that there could be more entry points for the logic flow. At least to me, they tend to blur the intention of the code. On the other hand, I really do want to go straight to the tuple fields, like you.
Here's a solution I drafted today. It seems to work, but I haven't tried it in production, yet.
object unTuple {
def apply[A, B, X](f: (A, B) => X): (Tuple2[A, B] => X) = {
(t: Tuple2[A, B]) => f(t._1, t._2)
}
def apply[A, B, C, X](f: (A, B, C) => X): (Tuple3[A, B, C] => X) = {
(t: Tuple3[A, B, C]) => f(t._1, t._2, t._3)
}
//...
}
val list = List( ("a",1), ("b",2) )
val list2 = List( ("a",1,true), ("b",2,false) )
list foreach unTuple( (k: String, v: Int) =>
println(k, v)
)
list2 foreach unTuple( (k: String, v: Int, b: Boolean) =>
println(k, v, b)
)
Output:
(a,1)
(b,2)
(a,1,true)
(b,2,false)
Maybe this turns out to be useful. The unTuple object should naturally be put aside in some tool namespace.
Addendum:
Applied to your case:
val m = l.filter( unTuple( (n:Int,color:String) =>
n != 2
))