Append auto-incrementing suffix to duplicated elements of a List - scala

Given the following list :
val l = List("A", "A", "C", "C", "B", "C")
How can I add an auto-incrementing suffix to every elements so that I end up with a list containing no more duplicates, like the following (the ordering doesn't matter) :
List("A0", "A1", "C0", "C1", "C2", "B0")

I found it out by myself just after having written this question
val l = List("A", "A", "C", "C", "B", "C")
l.groupBy(identity) // Map(A->List(A,A),C->List(C,C,C),B->List(B))
.values.flatMap(_.zipWithIndex) // List((A,0),(A,1),(C,0),(C,1),(C,2),(B,0))
.map{ case (str, i) => s"$str$i"}
If there is a better solution (using foldLeft maybe) please let me know

In a single pass straightforward way :
def transformList(list : List[String]) : List[String] = {
val buf: mutable.Map[String, Int] = mutable.Map.empty
list.map {
x => {
val i = buf.getOrElseUpdate(x, 0)
val result = s"${x.toString}$i"
buf.put(x, i + 1)
result
}
}
}
transformList( List("A", "A", "C", "C", "B", "C"))

Perhaps not the most readable solution, but...
def appendCount(l: List[String]): List[String] = {
// Since we're doing zero-based counting, we need to use `getOrElse(e, -1) + 1`
// to indicate a first-time element count as 0.
val counts =
l.foldLeft(Map[String, Int]())((acc, e) =>
acc + (e -> (acc.getOrElse(e, -1) + 1))
)
val (appendedList, _) =
l.foldRight(List[String](), counts){ case (e, (li, m)) =>
// Prepend the element with its count to the accumulated list.
// Decrement that element's count within the map of element counts
(s"$e${m(e)}" :: li, m + (e -> (m(e) - 1)))
}
appendedList
}
The idea here is that you create a count of each element in the list. You then iterate from the back of the list of original values and append the count to the value while decrementing the count map.
You need to define a helper here because foldRight will require both the new List[String] and the counts as an accumulator (and, as such, will return both). You'll just ignore the counts at the end (they'll all be -1 anyway).
I'd say your way is probably more clear. You'll need to benchmark to see which is faster if that's a concern.
Ideone.

Related

Iterator of repeated words in a file

Suppose, I'm writing a function to find "repeated words" in a text file. For example, in aaa aaa bb cc cc bb dd repeated words are aaa and cc but not bb, because two bb instances don't appear next to each other.
The function receives an iterator and returns iterator like that:
def foo(in: Iterator[String]): Iterator[String] = ???
foo(Iterator("aaa", "aaa", "bb", "cc", "cc", "bb")) // Iterator("aaa", "cc")
foo(Iterator("a", "a", "a", "b", "c", "b")) // Iterator("a")
How would you write foo ? Note that the input is huge and all words do not fit in memory (but the number of repeated words is relatively small).
P.S. I would like also to enhance foo later to return also positions of the repeated words, the number of repetitions, etc.
UPDATE:
OK then. Let specify bit what you want:
input | expected
|
a |
aa | a
abc |
aabc | a
aaabbbbbbc | ab
aabaa | aa
aabbaa | aba
aabaa | aa
Is it true? If so this is working solution. Not sure about performance but at least it is lazy (don't load everything into memory).
//assume we have no nulls in iterator.
def foo[T >: Null](it:Iterator[T]) = {
(Iterator(null) ++ it).sliding(3,1).collect {
case x # Seq(a,b,c) if b == c && a != b => c
}
}
We need this ugly Iterator(null) ++ because we are looking for 3 elements and we need a way to see if first two are the same.
This is pure implementation and it has some advantages over imperative one (eg. in other answers). Most important one is that it is lazy:
//infinite iterator!!!
val it = Iterator.iterate('a')(s => (s + (if(Random.nextBoolean) 1 else 0)).toChar)
//it'll take only as much as needs to take this 10 items.
//should not blow up
foo(it).take(10)
//imperative implementation will blow up in such situation.
fooImp(it).take(10)
here are all implementations from this and other posts seen in this topic:
https://scalafiddle.io/sf/w5yozTA/15
WITH INDEXES AND POSITIONS
In comment you have asked if it would be easy to add the number of repeated words and their indices. I thought about it for a while and i've made something like this. Not sure if it has great performance but it should be lazy (eg. should work for big files).
/** returns Iterator that replace consecutive items with (item, index, count).
It contains all items from orginal iterator. */
def pack[T >: Null](it:Iterator[T]) = {
//Two nulls, each for one sliding(...)
(Iterator(null:T) ++ it ++ Iterator(null:T))
.sliding(2,1).zipWithIndex
//skip same items
.filter { case (x, _) => x(0) != x(1) }
//calculate how many items was skipped
.sliding(2,1).collect {
case Seq((a, idx1), (b, idx2)) => (a(1), idx1 ,idx2-idx1)
}
}
def foo[T >: Null](it:Iterator[T]) = pack(it).filter(_._3 > 1)
OLD ANSWER (BEFORE UPDATE QUESTION)
Another (simpler) solution could be something like this:
import scala.collection.immutable._
//Create new iterator each time we'll print it.
def it = Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd", "dd", "ee", "ee", "ee", "ee", "ee", "aaa", "aaa", "ff", "ff", "zz", "gg", "aaa", "aaa")
//yep... this is whole implementation :)
def foo(it:Iterator[String]) = it.sliding(2,1).collect { case Seq(a,b) if a == b => a }
println(foo(it).toList) //dont care about duplication
//List(aaa, cc, dd, ee, ee, ee, ff)
println(foo(it).toSet) //throw away duplicats but don't keeps order
//Set(cc, aaa, ee, ff, dd)
println(foo(it).to[ListSet]) //throw away duplicats and keeps order
//ListSet(aaa, cc, dd, ee, ff)
//oh... and keep result longer than 5 items while testing.
//Scala collections (eg: Sets) behaves bit diffrently up to this limit (they keeps order)
//just test with bit bigger Sequences :)
https://scalafiddle.io/sf/w5yozTA/1
(if answer is helpful up-vote please)
Here is a solution with an Accumulator:
case class Acc(word: String = "", count: Int = 0, index: Int = 0)
def foo(in: Iterator[String]) =
in.zipWithIndex
.foldLeft(List(Acc())) { case (Acc(w, c, i) :: xs, (word: String, index)) =>
if (word == w) // keep counting
Acc(w, c + 1, i) :: xs
else
Acc(word, 1, index) :: Acc(w, c, i) :: xs
}.filter(_.count > 1)
.reverse
val it = Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd", "aaa", "aaa", "aaa", "aaa")
This returns List(Acc(aaa,2,0), Acc(cc,2,3), Acc(aaa,4,7))
It also handles if the same word has another group with repeated words.
And you have the index of the occurrences as well as the count.
Let me know if you need more explanation.
Here's a solution that uses only the original iterator. No intermediate collections. So everything stays completely lazy and is suitable for very large input data.
def foo(in: Iterator[String]): Iterator[String] =
Iterator.unfold(in.buffered){ itr => // <--- Scala 2.13
def loop :Option[String] =
if (!itr.hasNext) None
else {
val str = itr.next()
if (!itr.hasNext) None
else if (itr.head == str) {
while (itr.hasNext && itr.head == str) itr.next() //remove repeats
Some(str)
}
else loop
}
loop.map(_ -> itr)
}
testing:
val it = Iterator("aaa", "aaa", "aaa", "bb", "cc", "cc", "bb", "dd")
foo(it) // Iterator("aaa", "cc")
//pseudo-infinite iterator
val piIt = Iterator.iterate(8)(_+1).map(_/3) //2,3,3,3,4,4,4,5,5,5, etc.
foo(piIt.map(_.toString)) //3,4,5,6, etc.
It's some complex compare to another answers, but it use relatively small additional memory. And probably more fast.
def repeatedWordsIndex(in: Iterator[String]): java.util.Iterator[String] = {
val initialCapacity = 4096
val res = new java.util.ArrayList[String](initialCapacity) // or mutable.Buffer or mutable.Set, if you want Scala
var prev: String = null
var next: String = null
var prevEquals = false
while (in.hasNext) {
next = in.next()
if (next == prev) {
if (!prevEquals) res.add(prev)
prevEquals = true
} else {
prevEquals = false
}
prev = next
}
res.iterator // may be need to call distinct
}
You could traverse the collection using foldLeft with its accumulator being a Tuple of Map and String to keep track of the previous word for the conditional word counts, followed by a collect, as shown below:
def foo(in: Iterator[String]): Iterator[String] =
in.foldLeft((Map.empty[String, Int], "")){ case ((m, prev), word) =>
val count = if (word == prev) m.getOrElse(word, 0) + 1 else 1
(m + (word -> count), word)
}._1.
collect{ case (word, count) if count > 1 => word }.
iterator
foo(Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd")).toList
// res1: List[String] = List("aaa", "cc")
To capture also the repeated word counts and indexes, just index the collection and apply similar tactic for the conditional word count:
def bar(in: Iterator[String]): Map[(String, Int), Int] =
in.zipWithIndex.foldLeft((Map.empty[(String, Int), Int], "", 0)){
case ((m, pWord, pIdx), (word, idx)) =>
val idx1 = if (word == pWord) idx min pIdx else idx
val count = if (word == pWord) m.getOrElse((word, idx1), 0) + 1 else 1
(m + ((word, idx1) -> count), word, idx1)
}._1.
filter{ case ((_, _), count) => count > 1 }
bar(Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd", "cc", "cc", "cc"))
// res2: Map[(String, Int), Int] = Map(("cc", 7) -> 3, ("cc", 3) -> 2, ("aaa", 0) -> 2)
UPDATE:
As per the revised requirement, to minimize memory usage, one approach would be to keep the Map to a minimal size by removing elements of count 1 (which would be the majority if few words are repeated) on-the-fly during the foldLeft traversal. Method baz below is a revised version of bar:
def baz(in: Iterator[String]): Map[(String, Int), Int] =
(in ++ Iterator("")).zipWithIndex.
foldLeft((Map.empty[(String, Int), Int], (("", 0), 0), 0)){
case ((m, pElem, pIdx), (word, idx)) =>
val sameWord = word == pElem._1._1
val idx1 = if (sameWord) idx min pIdx else idx
val count = if (sameWord) m.getOrElse((word, idx1), 0) + 1 else 1
val elem = ((word, idx1), count)
val newMap = m + ((word, idx1) -> count)
if (sameWord) {
(newMap, elem, idx1)
} else
if (pElem._2 == 1)
(newMap - pElem._1, elem, idx1)
else
(newMap, elem, idx1)
}._1.
filter{ case ((word, _), _) => word != "" }
baz(Iterator("aaa", "aaa", "bb", "cc", "cc", "bb", "dd", "cc", "cc", "cc"))
// res3: Map[(String, Int), Int] = Map(("aaa", 0) -> 2, ("cc", 3) -> 2, ("cc", 7) -> 3)
Note that the dummy empty String appended to the input collection is to ensure that the last word gets properly processed as well.

How to get all possible partitions for a list in Scala

I have a list of string, e.g. List("A", "B", "C"). I would like to get all the possible partitions of it in Scala. The result I expect is:
def func(List[String]): List[List[String]] = {
// some operations
}
In: func(List("A", "B", "C"))
Out:
[
[["A"], ["B"], ["C"]],
[["A", "B"], ["C"]],
[["A", "C"], ["B"]],
[["B", "C"], ["A"]],
[["A", "B", "C"]],
]
This is a solution using Set:
def partitions[T](seq: TraversableOnce[T]): Set[Set[Set[T]]] = {
def loop(set: Set[T]): Set[Set[Set[T]]] =
if (set.size < 2) {
Set(Set(set))
} else {
set.subsets.filter(_.nonEmpty).flatMap(sub =>
loop(set -- sub).map(_ + sub - Set.empty)
).toSet
}
loop(seq.toSet)
}
Using Set makes the logic easier, but it does remove duplicate values if they are present in the original list. The same logic can be use for List but you need to implement the set-like operations such as subsets.
Just for reference, here is an implementation using List which will preserve duplicates in the input list.
def partitions[T](list: List[T]): List[List[List[T]]] =
list match {
case Nil | _ :: Nil => // 0/1 elements
List(List(list))
case head :: tail => // 2+ elements
partitions(tail).flatMap(part => {
val joins =
part.indices.map(i =>
part.zipWithIndex.map { case (p, j) =>
if (i == j) {
head +: p
} else {
p
}
}
)
(List(head) +: part) +: joins
})
}

Fold left to create a immutable list

I am trying to create a list of string and then concatenate them using mkstring
List("a", "b1")
.foldLeft(ListBuffer.empty[String]) { (a, l) =>
{
if (StringUtils.isNotBlank(l))
a += "abc" +l
else a
}
}
.mkString(";")
Output
abca;abcb1
I want to used a mutable list .
Solution tried
List("a", "b1").
foldLeft(List[String]())((b,a) => b:+"abc"+a).mkString(";")
I can perform the empty check.Can we refactor it to better to get rid of if and else
List("a", "b1","","c2").
foldLeft(List[String]())((b,a) =>
if (StringUtils.isNotBlank(a))
b:+"abc"+a
else b
).mkString(";")
Can anyone help
List("a", "b1").foldLeft("") { case (acc, el) => acc + el }
You're slightly misusing foldLeft. The key thing to remember is that you pass in a function that takes an accumulator and the "current element", as well as a seed value, and feeds in the result of the "current" step as the "seed" or accumulator for the next step.
Reading from the top:
Take my list of List("a", "b1")
Starting from the empty string "" as the accumulator
For every element in the list, call the function against the "current" value of the accumulator.
In the above case, concatenate the "current" element to the existing accumulator.
Pass the result to the next step as the seed value.
There's no += like in your example, as you're not mutating the value, instead the return of the "current step", will be the initial accumulator value for the next step, it's all immutable.
In effect:
- Step 0: acc = "", el = "a", so you get "" + "a" = "a"(this is the value of acc at the next stage)
- Step 1: acc = "a", el = "b1", so you get "a" + "b1" = "ab1"
It's also worth nothing that the empty string "" is a the zero element for string concatenation, so there's no value in checking for empty.
For your specific example:
List("a", "b1").foldLeft("") { case (acc, el) =>
if (el.isEmpty) acc else acc + "abc" + el
}
In your case, collect is probably better
l.collect {
case s if s.nonEmpty => "abc" + s
} mkString ";"

Returning values from an inner loop in Scala, use a function instead?

I'm attempting to return a value from a inner loop. I could create a outside list
and populate it within the inner loop as suggested in comments below but this does
not feel very functional. Is there a function I can use to achieve this ?
The type of the loop/inner loop is currently Unit but I would like it to be of type List[Int] or some similar collection type.
val data = List(Seq(1, 2, 3, 4), Seq(1, 2, 3, 4))
//val list : List
for(d <- data){
for(d1 <- data){
//add the result to the val list defined above
distance(d , d1)
}
}
def distance(s1 : Seq[Int], s2 : Seq[Int]) = {
s1.zip(s2).map(t => t._1 + t._2).sum
}
val list = for (x <- data; y <- data) yield distance(x, y)
will do what you want, yielding:
List(20, 20, 20, 20)
The above desugared is equivalent to:
data.flatMap { x => data.map { y => distance(x, y) } }
The trick is to not nest for-comprehensions because that way you'll only ever get nested collections; to get a flat collection from a conceptually nested iteration, you need to make sure flatMap gets used.

Combining multiple Lists of arbitrary length

I am looking for an approach to join multiple Lists in the following manner:
ListA a b c
ListB 1 2 3 4
ListC + # * § %
..
..
..
Resulting List: a 1 + b 2 # c 3 * 4 § %
In Words: The elements in sequential order, starting at first list combined into the resulting list. An arbitrary amount of input lists could be there varying in length.
I used multiple approaches with variants of zip, sliding iterators but none worked and especially took care of varying list lengths. There has to be an elegant way in scala ;)
val lists = List(ListA, ListB, ListC)
lists.flatMap(_.zipWithIndex).sortBy(_._2).map(_._1)
It's pretty self-explanatory. It just zips each value with its position on its respective list, sorts by index, then pulls the values back out.
Here's how I would do it:
class ListTests extends FunSuite {
test("The three lists from his example") {
val l1 = List("a", "b", "c")
val l2 = List(1, 2, 3, 4)
val l3 = List("+", "#", "*", "§", "%")
// All lists together
val l = List(l1, l2, l3)
// Max length of a list (to pad the shorter ones)
val maxLen = l.map(_.size).max
// Wrap the elements in Option and pad with None
val padded = l.map { list => list.map(Some(_)) ++ Stream.continually(None).take(maxLen - list.size) }
// Transpose
val trans = padded.transpose
// Flatten the lists then flatten the options
val result = trans.flatten.flatten
// Viola
assert(List("a", 1, "+", "b", 2, "#", "c", 3, "*", 4, "§", "%") === result)
}
}
Here's an imperative solution if efficiency is paramount:
def combine[T](xss: List[List[T]]): List[T] = {
val b = List.newBuilder[T]
var its = xss.map(_.iterator)
while (!its.isEmpty) {
its = its.filter(_.hasNext)
its.foreach(b += _.next)
}
b.result
}
You can use padTo, transpose, and flatten to good effect here:
lists.map(_.map(Some(_)).padTo(lists.map(_.length).max, None)).transpose.flatten.flatten
Here's a small recursive solution.
def flatList(lists: List[List[Any]]) = {
def loop(output: List[Any], xss: List[List[Any]]): List[Any] = (xss collect { case x :: xs => x }) match {
case Nil => output
case heads => loop(output ::: heads, xss.collect({ case x :: xs => xs }))
}
loop(List[Any](), lists)
}
And here is a simple streams approach which can cope with an arbitrary sequence of sequences, each of potentially infinite length.
def flatSeqs[A](ssa: Seq[Seq[A]]): Stream[A] = {
def seqs(xss: Seq[Seq[A]]): Stream[Seq[A]] = xss collect { case xs if !xs.isEmpty => xs } match {
case Nil => Stream.empty
case heads => heads #:: seqs(xss collect { case xs if !xs.isEmpty => xs.tail })
}
seqs(ssa).flatten
}
Here's something short but not exceedingly efficient:
def heads[A](xss: List[List[A]]) = xss.map(_.splitAt(1)).unzip
def interleave[A](xss: List[List[A]]) = Iterator.
iterate(heads(xss)){ case (_, tails) => heads(tails) }.
map(_._1.flatten).
takeWhile(! _.isEmpty).
flatten.toList
Here's a recursive solution that's O(n). The accepted solution (using sort) is O(nlog(n)). Some testing I've done suggests the second solution using transpose is also O(nlog(n)) due to the implementation of transpose. The use of reverse below looks suspicious (since it's an O(n) operation itself) but convince yourself that it either can't be called too often or on too-large lists.
def intercalate[T](lists: List[List[T]]) : List[T] = {
def intercalateHelper(newLists: List[List[T]], oldLists: List[List[T]], merged: List[T]): List[T] = {
(newLists, oldLists) match {
case (Nil, Nil) => merged
case (Nil, zss) => intercalateHelper(zss.reverse, Nil, merged)
case (Nil::xss, zss) => intercalateHelper(xss, zss, merged)
case ( (y::ys)::xss, zss) => intercalateHelper(xss, ys::zss, y::merged)
}
}
intercalateHelper(lists, List.empty, List.empty).reverse
}