scala collections : map a list and carry some state? - scala

I seem to run into this problem all the time. I want to modify some of the elements in a list, but I need to keep some state as I do it, so map doesn't work.
Here is an example :
scala> val l1 = List("a","b","c","d","e","f","b","c","e","b","a")
l1: List[String] = List(a, b, c, d, e, f, b, c, e, b, a)
I want to change the name of any duplicates. so I want to end up with this:
List(a1, b1, c1, d, e1, f, b2, c2, e2, b3, a2)
Getting the dupes is easy :
scala> val d = l1.diff(l1.distinct).distinct
d: List[String] = List(b, c, e, a)
Now I'm stuck. I made it work by converting d to a HashMap w/ a count, and writing a function to iterate over l1 and update it & the hash before recursing. Which works fine, but looks kinda ugly to me.
But I've always thought there should be a way to do w/ the collection classes.
Here is the rest of my solution which I don't like :
val m = d.map( _ -> 1).toMap
def makeIt(ms: Map[String, Int], ol: Iterator[String], res: List[String]=List[String]()) :List[String] = {
if( !ol.hasNext) return res
val no = ol.next()
val (p,nm) = ms.get(no) match {
case Some(v) => (s"$no$v", ms.updated(no,v+1))
case None => (no,ms)
}
makeIt(nm,ol,res :+ p)
}
makeIt(m,l1.iterator)
Which gives me what I want
res2: List[String] = List(a1, b1, c1, d, e1, f, b2, c2, e2, b3, a2)
I feel like I want "mapWithState" where I can just pass something along. Like Fold-ish. Maybe it exists and I just haven't found it yet?
Thanks
-------UPDATE---------
#Aluan Haddad's comment pointed me in this direction. Which destroys order, which is fine for my case. But the "state" is carried by zipWithIndex. I'm looking for a more general case where the state would require some computation at each element. But for this simple case I like it :
l1.groupBy(x=>x).values.flatMap( v =>{
if( v.length <= 1 ) v else {
v.zipWithIndex.map{ case (s,i) => s"$s${i+1}"}
}
})
res7: Iterable[String] = List(e1, e2, f, a1, a2, b1, b2, b3, c1, c2, d)

The tricky part is that the "d" and "f" elements get no modification.
This is what I came up with. It's a bit more concise, code wise, but does involve multiple traversals.
val l1: List[String] = List("a","b","c","d","e","f","b","c","e","b","a")
l1.reverse.tails.foldLeft(List[String]()){
case (res, Nil) => res
case (res, hd::tl) =>
val count = tl.count(_ == hd)
if (count > 0) s"$hd${count+1}" +: res
else if (res.contains(hd+2)) (hd+1) +: res
else hd +: res
}
//res0: List[String] = List(a1, b1, c1, d, e1, f, b2, c2, e2, b3, a2)
By using tails, each element, hd, is able to see the future, tl, and the past, res.

A simple but slow version
l1.zipWithIndex.map{ case (elem, i) =>
if (l1.count(_ == elem) == 1) {
elem
} else {
val n = {l1.take(i+1).count(_ == elem)}
s"$elem$n"
}
}
The next version is longer, less pretty, and not functional, but should be faster in the unlikely case that you are processing a very long list.
def makeUniq(in: Seq[String]): Seq[String] = {
// Count occurrence of each element
val m = mutable.Map.empty[String, Int]
for (elem <- in) {
m.update(elem, m.getOrElseUpdate(elem, 0) + 1)
}
// Remove elements with a single occurrence
val dupes = m.filter(_._2 > 1)
// Apply numbering to duplicate elements
in.reverse.map(e => {
val idx = dupes.get(e) match {
case Some(i) =>
dupes.update(e, i - 1)
i.toString
case _ =>
""
}
s"$e$idx"
}).reverse
}
The code is easier if you wanted to apply a count to every element rather than just the non-unique ones.
def makeUniq(in: Seq[String]): Seq[String] = {
val m = mutable.Map.empty[String, Int]
in.map{ e =>
val i = m.getOrElseUpdate(e, 0) + 1
m.update(e, i)
s"$e$i"
}
}

Related

Scala: Remove duplicated integers from Vector( tuples(Int,Int) , ...)

I have a big size of a vector (about 2000 elements), inside consists of many tuples, Tuple(Int,Int), i.e.
val myVectorEG = Vector((65,61), (29,49), (4,57), (12,49), (24,98), (21,52), (81,86), (91,23), (73,34), (97,41),...))
I wish to remove the repeated/duplicated integers for every tuple at the index (0), i.e. if Tuple(65,xx) repeated at other Tuple(65, yy) inside the vector, it should be removed)
I enable to access them and print out in this method:
val (id1,id2) = ( allSource.foreach(i=>println(i._1)), allSource.foreach(i=>i._2))
How can I remove duplicate integers? Or I should use another method, rather than using foreach to access my element index at 0
To remove all duplicates, first group by the first tuple and only collect the tuples where there is only one tuple that belongs to that particular key (_._1). Then flatten the result.
myVectorEG.groupBy(_._1).collect{
case (k, v) if v.size == 1 => v
}.flatten
This returns a List which you can call .toVector on if you need a Vector
This does the job and preserves order (unlike other solutions) but is O(n^2) so potentially slow for 2000 elements:
myVectorEG.filter(x => myVectorEG.count(_._1 == x._1) == 1)
This is more efficient for larger vectors but still preserves order:
val keep =
myVectorEG.groupBy(_._1).collect{
case (k, v) if v.size == 1 => k
}.toSet
myVectorEG.filter(x => keep.contains(x._1))
You can use a distinctBy to remove duplicates.
In the case of Vector[(Int, Int)] it will look like this
myVectorEG.distinctBy(_._1)
Updated, if you need to remove all the duplicates:
You can use groupBy but this will rearrange your order.
myVectorEG.groupBy(_._1).filter(_._2.size == 1).flatMap(_._2).toVector
Another option, taking advantage that you want the list sorted at the end.
def sortAndRemoveDuplicatesByFirst[A : Ordering, B](input: List[(A, B)]): List[(A, B)] = {
import Ordering.Implicits._
val sorted = input.sortBy(_._1)
#annotation.tailrec
def loop(remaining: List[(A, B)], previous: (A, B), repeated: Boolean, acc: List[(A, B)]): List[(A, B)] =
remaining match {
case x :: xs =>
if (x._1 == previous._1)
loop(remaining = xs, previous, repeated = true, acc)
else if (!repeated)
loop(remaining = xs, previous = x, repeated = false, previous :: acc)
else
loop(remaining = xs, previous = x, repeated = false, acc)
case Nil =>
(previous :: acc).reverse
}
sorted match {
case x :: xs =>
loop(remaining = xs, previous = x, repeated = false, acc = List.empty)
case Nil =>
List.empty
}
}
Which you can test like this:
val data = List(
1 -> "A",
3 -> "B",
1 -> "C",
4 -> "D",
3 -> "E",
5 -> "F",
1 -> "G",
0 -> "H"
)
sortAndRemoveDuplicatesByFirst(data)
// res: List[(Int, String)] = List((0,H), (4,D), (5,F))
(I used List instead of Vector to make it easy and performant to write the tail-rec algorithm)

Create function to group sequence at breakpoints

I'm trying to create a function that takes two sequences and groups the elements first based on the "breakpoints" included in the second. An example:
val ls = ('a' to 'z').map(_.toString)
// IndexedSeq[String] = Vector(a, b, c, d, e, f, g, h, i, j, k, l, m, ...)
val breakpoints = Seq("c", "f", "j")
grouper(ls, breakpoints)
// Seq(Seq("a", "b"), Seq("c", "d", "e"), Seq("f", "g", "h", "i"), Seq("j", ...))
I've tried to do this with recursive calls to takeWhile and dropWhile (as I might in a language like Haskell), but as my current function doesn't use tail recursion, I receive a java.lang.StackOverflowError. Here's the function:
def grouper(strings: Seq[String], breaks: Seq[String]): Seq[Seq[String]] = strings match {
case Nil => Seq()
case s => s.takeWhile(breaks.contains(_)) +: grouper(s.dropWhile(breaks.contains(_)), breaks)
}
Is there a better way to approach this?
You're on the right track. takeWhile and dropWhile can be replaced by span.
def grouper(xs: Seq[String], breaks: Seq[String]): Seq[Seq[String]] = breaks match{
case Nil => Seq(xs)
case h::t => {
val split = xs.span(x => x != h);
split._1 +: grouper(split._2, t);
}
}
scala> grouper(ls, breakpoints)
res5: Seq[Seq[String]] = List(Vector(a, b), Vector(c, d, e), Vector(f, g, h, i),
Vector(j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z))
From the API on span: Note: c span p is equivalent to (but possibly more efficient than) (c takeWhile p, c dropWhile p), provided the evaluation of the predicate p does not cause any side-effects.
For this kind of problems, I always prefer to write my own tail-recursive function, operating o Lists.
import Ordering.Implicits._
def group[A : Ordering](data: List[A], breakPoints: List[A]): List[List[A]] = {
def takeUntil(list: List[A], breakPoint: A): (List[A], List[A]) = {
#annotation.tailrec
def loop(remaining: List[A], acc: List[A]): (List[A], List[A]) =
remaining match {
case x :: xs if (x < breakPoint) =>
loop(remaining = xs, x :: acc)
case _ =>
(acc.reverse, remaining)
}
loop(remaining = list, acc = List.empty)
}
#annotation.tailrec
def loop(remainingElements: List[A], remainingBreakPoints: List[A], acc: List[List[A]]): List[List[A]] =
remainingBreakPoints match {
case breakPoint :: remainingBreakPoints =>
val (group, remaining) = takeUntil(remainingElements, breakPoint)
loop(
remainingElements = remaining,
remainingBreakPoints,
group :: acc
)
case Nil =>
(remainingElements :: acc).reverse
}
loop(
remainingElements = data.sorted,
remainingBreakPoints = breakPoints.sorted,
acc = List.empty
)
}
You can use it like this:
group(data = ('a' to 'z').toList, breakPoints = List('c', 'f', 'j'))
//res: List[List[Char]] = List(
// List('a', 'b'),
// List('c', 'd', 'e'),
// List('f', 'g', 'h', 'i'),
// List('j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z')
// )
This function will generate always a list of length = length(breakPoints) + 1.
If there are no more elements, it will generate empty lists.
(you can edit the code for your concrete requirements)
You can try a bit different approach:
def grouper(strings: Seq[String], breaks: Seq[String]): Seq[Seq[String]] = {
var i = 0
(for(x <- strings) yield {if (breaks.contains(x)) {i=i+1}; (x,i)})
.groupBy(_._2).map(_._2.map(_._1)).toList
}
grouper(ls,breakpoints).foreach(println(_))
Vector(f, g, h, i)
Vector(c, d, e)
Vector(j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z)
Vector(a, b)

Combining the elements of 2 lists

Assume we have two lists :
val l1=List("a","b","c")
val l2 = List("1","2","3")
What I want is : List("a1", "b2", "c3") that is, adding the nth element of l1 with the nth element of l2
A way to achieve it is :
(l1 zip l2).map (c => {c._1+c._2})
I just wonder if one could achieve it with an Applicative. I tried :
(l1 |#| l2) { _+ _ }
but it gives all the combinations :
List(a1, a2, a3, b1, b2, b3, c1, c2, c3)
Any idea?
Thank you
Benoit
You cannot do that with strict lists, so instead use lazy lists i.e. streams. You have to define the Applicative[Stream] instance as shown below. (You'll find it in Haskell standard library under the name ZipList.)
scala> val s1 = Stream("a", "b", "c")
s1: scala.collection.immutable.Stream[java.lang.String] = Stream(a, ?)
scala> val s2 = Stream("1", "2", "3")
s2: scala.collection.immutable.Stream[java.lang.String] = Stream(1, ?)
scala> implicit object StreamApplicative extends Applicative[Stream] {
| def pure[A](a: => A) = Stream.continually(a)
| override def apply[A, B](f: Stream[A => B], xs: Stream[A]): Stream[B] = (f, xs).zipped.map(_ apply _)
| }
defined module StreamApplicative
scala> (s1 |#| s2)(_ + _)
res101: scala.collection.immutable.Stream[java.lang.String] = Stream(a1, ?)
scala> .force
res102: scala.collection.immutable.Stream[java.lang.String] = Stream(a1, b2, c3)
The reason this cannot be done with strict lists is because it is impossible to define a pure on them that satisfies the applicative laws.
As an aside, Scala lets you do this more concisely than the code you have used in OP:
scala> (l1, l2).zipped.map(_ + _)
res103: List[java.lang.String] = List(a1, b2, c3)
The answer is that you can't achieve this with an applicative as far as I can see. The applicative for list will apply the function to all combinations, as you have found out. Not great for what you want but awesome for stuff like creating cartesian products.
A slightly less verbose method might use Tuple2W.fold supplied by scalaz:
(l1 zip l2).map (_ fold (_ + _))

Cartesian product of two lists

Given a map where a digit is associated to several characters
scala> val conversion = Map("0" -> List("A", "B"), "1" -> List("C", "D"))
conversion: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] =
Map(0 -> List(A, B), 1 -> List(C, D))
I want to generate all possible character sequences based on a sequence of digits. Examples:
"00" -> List("AA", "AB", "BA", "BB")
"01" -> List("AC", "AD", "BC", "BD")
I can do this with for comprehensions
scala> val number = "011"
number: java.lang.String = 011
Create a sequence of possible characters per index
scala> val values = number map { case c => conversion(c.toString) }
values: scala.collection.immutable.IndexedSeq[List[java.lang.String]] =
Vector(List(A, B), List(C, D), List(C, D))
Generate all the possible character sequences
scala> for {
| a <- values(0)
| b <- values(1)
| c <- values(2)
| } yield a+b+c
res13: List[java.lang.String] = List(ACC, ACD, ADC, ADD, BCC, BCD, BDC, BDD)
Here things get ugly and it will only work for sequences of three digits. Is there any way to achieve the same result for any sequence length?
The following suggestion is not using a for-comprehension. But I don't think it's a good idea after all, because as you noticed you'd be tied to a certain length of your cartesian product.
scala> def cartesianProduct[T](xss: List[List[T]]): List[List[T]] = xss match {
| case Nil => List(Nil)
| case h :: t => for(xh <- h; xt <- cartesianProduct(t)) yield xh :: xt
| }
cartesianProduct: [T](xss: List[List[T]])List[List[T]]
scala> val conversion = Map('0' -> List("A", "B"), '1' -> List("C", "D"))
conversion: scala.collection.immutable.Map[Char,List[java.lang.String]] = Map(0 -> List(A, B), 1 -> List(C, D))
scala> cartesianProduct("01".map(conversion).toList)
res9: List[List[java.lang.String]] = List(List(A, C), List(A, D), List(B, C), List(B, D))
Why not tail-recursive?
Note that above recursive function is not tail-recursive. This isn't a problem, as xss will be short unless you have a lot of singleton lists in xss. This is the case, because the size of the result grows exponentially with the number of non-singleton elements of xss.
I could come up with this:
val conversion = Map('0' -> Seq("A", "B"), '1' -> Seq("C", "D"))
def permut(str: Seq[Char]): Seq[String] = str match {
case Seq() => Seq.empty
case Seq(c) => conversion(c)
case Seq(head, tail # _*) =>
val t = permut(tail)
conversion(head).flatMap(pre => t.map(pre + _))
}
permut("011")
I just did that as follows and it works
def cross(a:IndexedSeq[Tree], b:IndexedSeq[Tree]) = {
a.map (p => b.map( o => (p,o))).flatten
}
Don't see the $Tree type that am dealing it works for arbitrary collections too..

Scala - can a lambda parameter match a tuple?

So say i have some list like
val l = List((1, "blue"), (5, "red"), (2, "green"))
And then i want to filter one of them out, i can do something like
val m = l.filter(item => {
val (n, s) = item // "unpack" the tuple here
n != 2
}
Is there any way i can "unpack" the tuple as the parameter to the lambda directly, instead of having this intermediate item variable?
Something like the following would be ideal, but eclipse tells me wrong number of parameters; expected=1
val m = l.filter( (n, s) => n != 2 )
Any help would be appreciated - using 2.9.0.1
This is about the closest you can get:
val m = l.filter { case (n, s) => n != 2 }
It's basically pattern matching syntax inside an anonymous PartialFunction. There are also the tupled methods in Function object and traits, but they are just a wrapper around this pattern matching expression.
Hmm although Kipton has a good answer. You can actually make this shorter by doing.
val l = List((1, "blue"), (5, "red"), (2, "green"))
val m = l.filter(_._1 != 2)
There are a bunch of options:
for (x <- l; (n,s) = x if (n != 2)) yield x
l.collect{ case x # (n,s) if (n != 2) => x }
l.filter{ case (n,s) => n != 2 }
l.unzip.zipped.map((n,s) => n != 2).zip // Complains that zip is deprecated
val m = l.filter( (n, s) => n != 2 )
... is a type mismatch because that lambda defines a
Function2[String,Int,Boolean] with two parameters instead of
Function1[(String,Int),Boolean] with one Tuple2[String,Int] as its parameter.
You can convert between them like this:
val m = l.filter( ((n, s) => n != 2).tupled )
I've pondered the same, and came to your question today.
I'm not very fond of the partial function approaches (anything having case) since they imply that there could be more entry points for the logic flow. At least to me, they tend to blur the intention of the code. On the other hand, I really do want to go straight to the tuple fields, like you.
Here's a solution I drafted today. It seems to work, but I haven't tried it in production, yet.
object unTuple {
def apply[A, B, X](f: (A, B) => X): (Tuple2[A, B] => X) = {
(t: Tuple2[A, B]) => f(t._1, t._2)
}
def apply[A, B, C, X](f: (A, B, C) => X): (Tuple3[A, B, C] => X) = {
(t: Tuple3[A, B, C]) => f(t._1, t._2, t._3)
}
//...
}
val list = List( ("a",1), ("b",2) )
val list2 = List( ("a",1,true), ("b",2,false) )
list foreach unTuple( (k: String, v: Int) =>
println(k, v)
)
list2 foreach unTuple( (k: String, v: Int, b: Boolean) =>
println(k, v, b)
)
Output:
(a,1)
(b,2)
(a,1,true)
(b,2,false)
Maybe this turns out to be useful. The unTuple object should naturally be put aside in some tool namespace.
Addendum:
Applied to your case:
val m = l.filter( unTuple( (n:Int,color:String) =>
n != 2
))