Anagram Recursion Scala - scala

This is the question that I tried to write a code for.
Consider a recursive algorithm that takes two strings s1 and s2 as input and checks if these strings are the anagram of each other, hence if all the letters contained in the former appear in the latter the same number of times, and vice versa (i.e. s2 is a permutation of s1).
Example:
if s1 = ”elevenplustwo” and s2 = ”twelveplusone” the output is true
if s1 = ”amina” and s2 = ”minia” the output is false
Hint: consider the first character c = s1(0) of s1 and the rest of r =s1.substring(1, s1.size) of s1. What are the conditions that s2 must (recursively) satisfy with respect to c and r?
And this is the piece of code I wrote to solve this problem. The problem is that the code works perfectly when there is no repetition of characters in the strings. For example, it works just fine for amin and mina. However, when there is repetition, for example, amina and maina, then it does not work properly.
How can I solve this issue?
import scala.collection.mutable.ArrayBuffer
object Q32019 extends App {
def anagram(s1:String, s2:String, indexAr:ArrayBuffer[Int]):ArrayBuffer[Int]={
if(s1==""){
return indexAr
}
else {
val c=s1(0)
val s=s1.substring(1,s1.length)
val ss=s2
var count=0
for (i<-0 to s2.length-1) {
if(s2(i)==c && !indexAr.contains(s2.indexOf(c))) {
indexAr+=i
}
}
anagram(s,s2,indexAr)
}
indexAr
}
var a="amin"
var b="mina"
var c=ArrayBuffer[Int]()
var d=anagram(a,b,c)
println(d)
var check=true
var i=0
while (i<a.length && check){
if (d.contains(i) && a.length==b.length) check=true
else check=false
i+=1
}
if (check) println("yes they are anagram")
else println("no, they are not anagram")
}

The easiest way is probably to sort both strings and just compare them:
def areAnagram(str1: String, str2: String): Boolean =
str1.sorted == str2.sorted
println(areAnagram("amina", "anima")) // true
println(areAnagram("abc", "bcc")) // false
Other one is more "natural". Two strings are anagrams if they have the same count of each character.
So you make two Map[Char, Int] and compare them:
import scala.collection.mutable
def areAnagram(str1: String, str2: String): Boolean = {
val map1 = mutable.Map.empty[Char, Int].withDefaultValue(0)
val map2 = mutable.Map.empty[Char, Int].withDefaultValue(0)
for (c <- str1) map1(c) += 1
for (c <- str2) map2(c) += 1
map1 == map2
}
There is also another version of second solution with Arrays probably, if you know the chars are only ASCII ones.
Or some other clever algorithm, IDK.
EDIT: One recursive solution could be to remove the first char of str1 from str2. The rests of both strings must be anagrams also.
E.g. for ("amina", "niama") first you throw out an a from both, and you get ("mina", "nima"). Those 2 strings must also be anagrams, by definition.
def areAnagram(str1: String, str2: String): Boolean = {
if (str1.length != str2.length) false
else if (str1.isEmpty) true // end recursion
else {
val (c, r1) = str1.splitAt(1)
val r2 = str2.replaceFirst(c, "") // remove c
areAnagram(r1, r2)
}
}

When you calculate anagrams you can take advantage of property of XOR operation, which says, that if you xor two same numbers you'd get 0.
Since characters in strings are essentially just numbers, you could run xor over all characters of both strings and if result is 0, then these strings are anagrams.
You could iterate over both strings using loop, but if you want to use recursion, I would suggest, that you convert your string to lists of chars.
Lists allow efficient splitting between first element (head of list) and rest (tail of list). So solution would go like this:
Split list to head and tail for both lists of chars.
Run xor over characters extracted from heads of lists and previous result.
Pass tails of list and result of xoring to the next recursive call.
When we get to the end of lists, we just return true is case result of xoring is 0.
Last optimalization we can do is short-curcuiting with false whenever strings with different lengths are passed (since they never could be anagrams anyway).
Final solution:
def anagram(a: String, b: String): Boolean = {
//inner function doing recursion, annotation #tailrec makes sure function is tail-recursive
#tailrec
def go(a: List[Char], b: List[Char], acc: Int): Boolean = { //using additional parameter acc, allows us to use tail-recursion, which is safe for stack
(a, b) match {
case (x :: xs, y :: ys) => //operator :: splits list to head and tail
go(xs, ys, acc ^ x ^ y) //because we changed string to lists of chars, we can now efficiently access heads (first elements) of lists
//we get first characters of both lists, then call recursively go passing tails of lists and result of xoring accumulator with both characters
case _ => acc == 0 //if result of xoring whole strings is 0, then both strings are anagrams
}
}
if (a.length != b.length) { //we already know strings can't be anagrams, because they've got different size
false
} else {
go(a.toList, b.toList, 0)
}
}
anagram("twelveplusone", "elevenplustwo") //true
anagram("amina", "minia") //false

My suggestion: Don't over-think it.
def anagram(a: String, b: String): Boolean =
if (a.isEmpty) b.isEmpty
else b.contains(a(0)) && anagram(a.tail, b diff a(0).toString)

Related

Recursive sorting in Scala including an exit condition

I am attempting to sort a List of names using Scala, and I am trying to learn how to do this recursively. The List is a List of Lists, with the "element" list containing two items (lastName, firstName). My goal is to understand how to use recursion to sort the names. For the purpose of this post my goal is just to sort by the length of lastName.
If I call my function several times on a small sample list, it will successfully sort lastName by length from shortest to longest, but I have not been able to construct a satisfactory exit condition using recursion. I have tried variations of foreach and other loops, but I have been unsuccessful. Without a satisfactory exit condition, the recursion just continues forever.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
val nameListBuffer = new ListBuffer[List[String]]
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
for (line <- lines) {
nameListBuffer += line.split(" ").reverse.toList
}
#tailrec
def sorting(x: ListBuffer[List[String]]): Unit = {
for (i <- 0 until ((x.length)-1)) {
var temp = x(i)
if (x(i)(0).length > x(i+1)(0).length) {
x(i) = x(i+1)
x(i+1) = temp
}
}
var continue = false
while (continue == false) {
for (i <- 0 until ((x.length)-1)) {
if (x(i)(0).length <= x(i+1)(0).length) {
continue == false//want to check ALL i's before evaluating
}
else continue == true
}
}
sorting(x)
}
sorting(nameListBuffer)
Sorry about the runtime complexity it's basically an inefficient bubble sort at O(n^4) but the exit criteria - focus on that. For tail recursion, the key is that the recursive call is to a smaller element than the preceding recursive call. Also, keep two arguments, one is the original list, and one is the list that you are accumulating (or whatever you want to return, it doesn't have to be a list). The recursive call keeps getting smaller until eventually you can return what you have accumulated. Use pattern matching to catch when the recursion has ended, and then you return what you were accumulating. This is why Lists are so popular in Scala, because of the Nil and Cons subtypes and because of operators like the :: can be handled nicely with pattern matching. One more thing, to be tail recursive, the last case has to make a recursive or it won't run.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
// I did not ingest from file I just created the test list from some literals
val dummyNameList = List(
List("Johanson", "John"), List("Nim", "Bryan"), List("Mack", "Craig")
, List("Youngs", "Daniel"), List("Williamson", "Zion"), List("Rodgersdorf", "Aaron"))
// You can use this code to populate nameList though I didn't run this code
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
val nameList = {
for (line <- lines) yield line.split(" ").reverse.toList
}.toList
println("\nsorted:")
sortedNameList.foreach(println(_))
//This take one element and it will return the lowest element from the list
//of the other argument.
#tailrec
private def swapElem(elem: List[String], listOfLists: List[List[String]]): List[String] = listOfLists match {
case Nil => elem
case h::t if (elem(0).length > h(0).length) => swapElem(h, t)
case h::t => swapElem(elem, t)
}
//If the head is not the smallest element, then swap out the element
//with the smallest element of the list. I probably could have returned
// a tuple it might have looked nicer. It just keeps iterating though until
// there is no elements
#tailrec
private def iterate(listOfLists: List[List[String]], acc: List[List[String]]): List[List[String]] = listOfLists match {
case h::Nil => acc :+ h
case h::t if (swapElem(h, t) != h) => iterate(h :: t.filterNot(_ == swapElem(h, t)), acc :+ swapElem(h, t))
case h::t => iterate(t, acc :+ swapElem(h, t))
}
val sortedNameList = iterate(nameList, List.empty[List[String]])
println("\nsorted:")
sortedNameList.foreach(println(_))
sorted:
List(Nim, Bryan)
List(Mack, Craig)
List(Youngs, Daniel)
List(Johanson, John)
List(Williamson, Zion)
List(Rodgersdorf, Aaron)

Scala - how to loop around list in functional way

I need to iterate over a list and compare the previous element in the list to current. Being from traditional programming background, I used for loop with a variable to hold the previous value but with Scala I believe there should be a better way to do it.
for (item <- dataList)
{
// Code to compare previous list element
prevElement = item
}
The code above works and give correct results but I wanted to explore the functional way of writing the same code.
There are a number of different possible solutions to this. Perhaps the most generic is to use a helper function:
// This example finds the minimum value (the hard way) in a list of integers. It's
// not looking at the previous element, as such, but you can use the same approach.
// I wanted to make the example something meaningful.
def findLowest(xs: List[Int]): Option[Int] = {
#tailrec // Guarantee tail-recursive implementation for safety & performance.
def helper(lowest: Int, rem: List[Int]): Int = {
// If there are no elements left, return lowest found.
if(rem.isEmpty) lowest
// Otherwise, determine new lowest value for next iteration. (If you need the
// previous value, you could put that here.)
else {
val newLow = Math.min(lowest, rem.head)
helper(newLow, rem.tail)
}
}
// Start off looking at the first member, guarding against an empty list.
xs match {
case x :: rem => Some(helper(x, rem))
case _ => None
}
}
In this particular case, this can be replaced by a fold.
def findLowest(xs: List[Int]): Option[Int] = xs match {
case h :: _ => Some(xs.fold(h)((b, m) => Math.min(b, m)))
case _ => None
}
which can be simplified further to just:
def findLowest(xs: List[Int]): Option[Int] = xs match {
case h :: _ => Some(xs.foldLeft(h)(Math.min))
case _ => None
}
In this case, we're using the zero argument of the fold operation to store the minimum value.
If this seems too abstract, can you post more details about what you would like to do in your loop, and I can address that in my answer?
Update: OK, so having seen your comment about the dates, here's a more specific version. I've used DateRange (which you will need to replace with your actual type) in this example. You will also have to determine what overlap and merge need to do. But here's your solution:
// Determine whether two date ranges overlap. Return true if so; false otherwise.
def overlap(a: DateRange, b: DateRange): Boolean = { ... }
// Merge overlapping a & b date ranges, return new range.
def merge(a: DateRange, b: DateRange): DateRange = { ... }
// Convert list of date ranges into another list, in which all consecutive,
// overlapping ranges are merged into a single range.
def mergeDateRanges(dr: List[DateRange]): List[DateRange] = {
// Helper function. Builds a list of merged date ranges.
def helper(prev: DateRange, ret: List[DateRange], rem: List[DateRange]):
List[DateRange] = {
// If there are no more date ranges to process, prepend the previous value
// to the list that we'll return.
if(rem.isEmpty) prev :: ret
// Otherwise, determine if the previous value overlaps with the current
// head value. If it does, create a new previous value that is the merger
// of the two and leave the returned list alone; if not, prepend the
// previous value to the returned list and make the previous value the
// head value.
else {
val (newPrev, newRet) = if(overlap(prev, rem.head)) {
(merge(prev, rem.head), ret)
}
else (rem.head, prev :: ret)
// Next iteration of the helper (I missed this off, originally, sorry!)
helper(newPrev, newRet, rem.tail)
}
}
// Start things off, trapping empty list case. Because the list is generated by pre-pending elements, we have to reverse it to preserve the order.
dr match {
case h :: rem => helper(h, Nil, rem).reverse // <- Fixed bug here too...
case _ => Nil
}
}
If you're looking to just compare the two sequential elements and get the boolean result:
val list = Seq("one", "two", "three", "one", "one")
val result = list.sliding(2).collect { case Seq(a,b) => a.equals(b)}.toList
The previous will give you this
List(false, false, false, true)
An alternative to the previous one is that you might want to get the value:
val result = list.sliding(2).collect {
case Seq(a,b) => if (a.equals(b)) Some(a) else None
case _ => None
}.toList.filter(_.isDefined).map(_.get)
This will give the following:
List(one)
But if you're just looking to compare items in list, an option would be to do something like this:
val list = Seq("one", "two", "three", "one")
for {
item1 <- list
item2 <- list
_ <- Option(println(item1 + ","+ item2))
} yield ()
This results in:
one,one
one,two
one,three
one,one
two,one
two,two
two,three
two,one
three,one
three,two
three,three
three,one
one,one
one,two
one,three
one,one
If you are not mutating the list, and are only trying to compare two elements side-by-side.
This can help you. It's functional.
def cmpPrevElement[T](l: List[T]): List[(T,T)] = {
(l.indices).sliding(2).toList.map(e => (l(e.head), l(e.tail.head)))
}

Contains() acting different on List and TreeSet

I've run into a weird case where the contains() function seems to work differently between a List and a TreeSet in Scala and I am not sure why or how to resolve it.
I've created a class called DataStructure for sake of brevity. It contains two elements: a coordinate pair (i, j) and an Int. (It's more complicated than that, but in this MWE, this is what it looks like) It has a custom comparator that will sort by the Int, and I have overridden hashCode and equals so that two elements containing the same coordinate pair (i, j) are treated as equal regardless of the Int.
When I put an instance of DataStructure into both a List and a TreeSet, the program has no problem finding exact matches. However, when checking for a new element that has the same coordinate pair, but different Int, the List.contains returns true while TreeSet.contains returns false. Why does this happen and how can I resolve it?
This is my code reduced to a minimum working example:
Class DataStructure
package foo
class DataStructure(e1: (Int, Int), e2: Int) extends Ordered[DataStructure] {
val coord: (Int, Int) = e1
val cost: Int = e2
override def equals(that: Any): Boolean = {
that match {
case that: DataStructure => if (that.coord.hashCode() == this.coord.hashCode()) true else false
case _ => false
}}
override def hashCode(): Int = this.coord.hashCode()
def compare(that: DataStructure) = {
if (this.cost == that.cost)
0
else if (this.cost > that.cost)
-1 //reverse ordering
else
1
}
}
Driver program
package runtime
import foo.DataStructure
import scala.collection.mutable.TreeSet
object Main extends App {
val ts = TreeSet[DataStructure]()
val a = new DataStructure((2,2), 2)
val b = new DataStructure((2,3), 1)
ts.add(a)
ts.add(b)
val list = List(a, b)
val listRes = list.contains(a) // true
val listRes2 = list.contains(new DataStructure((2,2), 0)) // true
val tsRes = ts.contains(a) // true
val tsRes2 = ts.contains(new DataStructure((2,2), 0)) // FALSE!
println("list contains exact match: " + listRes)
println("list contains match on first element: " + listRes2)
println("TreeSet contains exact match: " + tsRes)
println("TreeSet contains match on first element: " + tsRes2)
}
Output:
list contains exact match: true
list contains match on first element: true
TreeSet contains exact match: true
TreeSet contains match on first element: false
Almost certainly List.contains is checking equals for each element to find a match, whereas TreeSet.contains is walking the tree and using compare to find a match.
Your problem is that your compare is not consistent with your equals. I don't know why you're doing that, but don't:
https://www.scala-lang.org/api/current/scala/math/Ordered.html
"It is important that the equals method for an instance of Ordered[A] be consistent with the compare method."

How to yield a single element from for loop in scala?

Much like this question:
Functional code for looping with early exit
Say the code is
def findFirst[T](objects: List[T]):T = {
for (obj <- objects) {
if (expensiveFunc(obj) != null) return /*???*/ Some(obj)
}
None
}
How to yield a single element from a for loop like this in scala?
I do not want to use find, as proposed in the original question, i am curious about if and how it could be implemented using the for loop.
* UPDATE *
First, thanks for all the comments, but i guess i was not clear in the question. I am shooting for something like this:
val seven = for {
x <- 1 to 10
if x == 7
} return x
And that does not compile. The two errors are:
- return outside method definition
- method main has return statement; needs result type
I know find() would be better in this case, i am just learning and exploring the language. And in a more complex case with several iterators, i think finding with for can actually be usefull.
Thanks commenters, i'll start a bounty to make up for the bad posing of the question :)
If you want to use a for loop, which uses a nicer syntax than chained invocations of .find, .filter, etc., there is a neat trick. Instead of iterating over strict collections like list, iterate over lazy ones like iterators or streams. If you're starting with a strict collection, make it lazy with, e.g. .toIterator.
Let's see an example.
First let's define a "noisy" int, that will show us when it is invoked
def noisyInt(i : Int) = () => { println("Getting %d!".format(i)); i }
Now let's fill a list with some of these:
val l = List(1, 2, 3, 4).map(noisyInt)
We want to look for the first element which is even.
val r1 = for(e <- l; val v = e() ; if v % 2 == 0) yield v
The above line results in:
Getting 1!
Getting 2!
Getting 3!
Getting 4!
r1: List[Int] = List(2, 4)
...meaning that all elements were accessed. That makes sense, given that the resulting list contains all even numbers. Let's iterate over an iterator this time:
val r2 = (for(e <- l.toIterator; val v = e() ; if v % 2 == 0) yield v)
This results in:
Getting 1!
Getting 2!
r2: Iterator[Int] = non-empty iterator
Notice that the loop was executed only up to the point were it could figure out whether the result was an empty or non-empty iterator.
To get the first result, you can now simply call r2.next.
If you want a result of an Option type, use:
if(r2.hasNext) Some(r2.next) else None
Edit Your second example in this encoding is just:
val seven = (for {
x <- (1 to 10).toIterator
if x == 7
} yield x).next
...of course, you should be sure that there is always at least a solution if you're going to use .next. Alternatively, use headOption, defined for all Traversables, to get an Option[Int].
You can turn your list into a stream, so that any filters that the for-loop contains are only evaluated on-demand. However, yielding from the stream will always return a stream, and what you want is I suppose an option, so, as a final step you can check whether the resulting stream has at least one element, and return its head as a option. The headOption function does exactly that.
def findFirst[T](objects: List[T], expensiveFunc: T => Boolean): Option[T] =
(for (obj <- objects.toStream if expensiveFunc(obj)) yield obj).headOption
Why not do exactly what you sketched above, that is, return from the loop early? If you are interested in what Scala actually does under the hood, run your code with -print. Scala desugares the loop into a foreach and then uses an exception to leave the foreach prematurely.
So what you are trying to do is to break out a loop after your condition is satisfied. Answer here might be what you are looking for. How do I break out of a loop in Scala?.
Overall, for comprehension in Scala is translated into map, flatmap and filter operations. So it will not be possible to break out of these functions unless you throw an exception.
If you are wondering, this is how find is implemented in LineerSeqOptimized.scala; which List inherits
override /*IterableLike*/
def find(p: A => Boolean): Option[A] = {
var these = this
while (!these.isEmpty) {
if (p(these.head)) return Some(these.head)
these = these.tail
}
None
}
This is a horrible hack. But it would get you the result you wished for.
Idiomatically you'd use a Stream or View and just compute the parts you need.
def findFirst[T](objects: List[T]): T = {
def expensiveFunc(o : T) = // unclear what should be returned here
case class MissusedException(val data: T) extends Exception
try {
(for (obj <- objects) {
if (expensiveFunc(obj) != null) throw new MissusedException(obj)
})
objects.head // T must be returned from loop, dummy
} catch {
case MissusedException(obj) => obj
}
}
Why not something like
object Main {
def main(args: Array[String]): Unit = {
val seven = (for (
x <- 1 to 10
if x == 7
) yield x).headOption
}
}
Variable seven will be an Option holding Some(value) if value satisfies condition
I hope to help you.
I think ... no 'return' impl.
object TakeWhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else
seq(seq.takeWhile(_ == null).size)
}
object OptionLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T], index: Int = 0): T = if (seq.isEmpty) null.asInstanceOf[T] else
Option(seq(index)) getOrElse func(seq, index + 1)
}
object WhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else {
var i = 0
def obj = seq(i)
while (obj == null)
i += 1
obj
}
}
objects iterator filter { obj => (expensiveFunc(obj) != null } next
The trick is to get some lazy evaluated view on the colelction, either an iterator or a Stream, or objects.view. The filter will only execute as far as needed.

Easiest way to decide if List contains duplicates?

One way is this
list.distinct.size != list.size
Is there any better way? It would have been nice to have a containsDuplicates method
Assuming "better" means "faster", see the alternative approaches benchmarked in this question, which seems to show some quicker methods (although note that distinct uses a HashSet and is already O(n)). YMMV of course, depending on specific test case, scala version etc. Probably any significant improvement over the "distinct.size" approach would come from an early-out as soon as a duplicate is found, but how much of a speed-up is actually obtained would depend strongly on how common duplicates actually are in your use-case.
If you mean "better" in that you want to write list.containsDuplicates instead of containsDuplicates(list), use an implicit:
implicit def enhanceWithContainsDuplicates[T](s:List[T]) = new {
def containsDuplicates = (s.distinct.size != s.size)
}
assert(List(1,2,2,3).containsDuplicates)
assert(!List("a","b","c").containsDuplicates)
You can also write:
list.toSet.size != list.size
But the result will be the same because distinct is already implemented with a Set. In both case the time complexity should be O(n): you must traverse the list and Set insertion is O(1).
I think this would stop as soon as a duplicate was found and is probably more efficient than doing distinct.size - since I assume distinct keeps a set as well:
#annotation.tailrec
def containsDups[A](list: List[A], seen: Set[A] = Set[A]()): Boolean =
list match {
case x :: xs => if (seen.contains(x)) true else containsDups(xs, seen + x)
case _ => false
}
containsDups(List(1,1,2,3))
// Boolean = true
containsDups(List(1,2,3))
// Boolean = false
I realize you asked for easy and I don't now that this version is, but finding a duplicate is also finding if there is an element that has been seen before:
def containsDups[A](list: List[A]): Boolean = {
list.iterator.scanLeft(Set[A]())((set, a) => set + a) // incremental sets
.zip(list.iterator)
.exists{ case (set, a) => set contains a }
}
#annotation.tailrec
def containsDuplicates [T] (s: Seq[T]) : Boolean =
if (s.size < 2) false else
s.tail.contains (s.head) || containsDuplicates (s.tail)
I didn't measure this, and think it is similar to huynhjl's solution, but a bit more simple to understand.
It returns early, if a duplicate is found, so I looked into the source of Seq.contains, whether this returns early - it does.
In SeqLike, 'contains (e)' is defined as 'exists (_ == e)', and exists is defined in TraversableLike:
def exists (p: A => Boolean): Boolean = {
var result = false
breakable {
for (x <- this)
if (p (x)) { result = true; break }
}
result
}
I'm curious how to speed things up with parallel collections on multi cores, but I guess it is a general problem with early-returning, while another thread will keep running, because it doesn't know, that the solution is already found.
Summary:
I've written a very efficient function which returns both List.distinct and a List consisting of each element which appeared more than once and the index at which the element duplicate appeared.
Note: This answer is a straight copy of the answer on a related question.
Details:
If you need a bit more information about the duplicates themselves, like I did, I have written a more general function which iterates across a List (as ordering was significant) exactly once and returns a Tuple2 consisting of the original List deduped (all duplicates after the first are removed; i.e. the same as invoking distinct) and a second List showing each duplicate and an Int index at which it occurred within the original List.
Here's the function:
def filterDupes[A](items: List[A]): (List[A], List[(A, Int)]) = {
def recursive(remaining: List[A], index: Int, accumulator: (List[A], List[(A, Int)])): (List[A], List[(A, Int)]) =
if (remaining.isEmpty)
accumulator
else
recursive(
remaining.tail
, index + 1
, if (accumulator._1.contains(remaining.head))
(accumulator._1, (remaining.head, index) :: accumulator._2)
else
(remaining.head :: accumulator._1, accumulator._2)
)
val (distinct, dupes) = recursive(items, 0, (Nil, Nil))
(distinct.reverse, dupes.reverse)
}
An below is an example which might make it a bit more intuitive. Given this List of String values:
val withDupes =
List("a.b", "a.c", "b.a", "b.b", "a.c", "c.a", "a.c", "d.b", "a.b")
...and then performing the following:
val (deduped, dupeAndIndexs) =
filterDupes(withDupes)
...the results are:
deduped: List[String] = List(a.b, a.c, b.a, b.b, c.a, d.b)
dupeAndIndexs: List[(String, Int)] = List((a.c,4), (a.c,6), (a.b,8))
And if you just want the duplicates, you simply map across dupeAndIndexes and invoke distinct:
val dupesOnly =
dupeAndIndexs.map(_._1).distinct
...or all in a single call:
val dupesOnly =
filterDupes(withDupes)._2.map(_._1).distinct
...or if a Set is preferred, skip distinct and invoke toSet...
val dupesOnly2 =
dupeAndIndexs.map(_._1).toSet
...or all in a single call:
val dupesOnly2 =
filterDupes(withDupes)._2.map(_._1).toSet
This is a straight copy of the filterDupes function out of my open source Scala library, ScalaOlio. It's located at org.scalaolio.collection.immutable.List_._.
If you're trying to check for duplicates in a test then ScalaTest can be helpful.
import org.scalatest.Inspectors._
import org.scalatest.Matchers._
forEvery(list.distinct) { item =>
withClue(s"value $item, the number of occurences") {
list.count(_ == item) shouldBe 1
}
}
// example:
scala> val list = List(1,2,3,4,3,2)
list: List[Int] = List(1, 2, 3, 4, 3, 2)
scala> forEvery(list) { item => withClue(s"value $item, the number of occurences") { list.count(_ == item) shouldBe 1 } }
org.scalatest.exceptions.TestFailedException: forEvery failed, because:
at index 1, value 2, the number of occurences 2 was not equal to 1 (<console>:19),
at index 2, value 3, the number of occurences 2 was not equal to 1 (<console>:19)
in List(1, 2, 3, 4)