Contains() acting different on List and TreeSet - scala

I've run into a weird case where the contains() function seems to work differently between a List and a TreeSet in Scala and I am not sure why or how to resolve it.
I've created a class called DataStructure for sake of brevity. It contains two elements: a coordinate pair (i, j) and an Int. (It's more complicated than that, but in this MWE, this is what it looks like) It has a custom comparator that will sort by the Int, and I have overridden hashCode and equals so that two elements containing the same coordinate pair (i, j) are treated as equal regardless of the Int.
When I put an instance of DataStructure into both a List and a TreeSet, the program has no problem finding exact matches. However, when checking for a new element that has the same coordinate pair, but different Int, the List.contains returns true while TreeSet.contains returns false. Why does this happen and how can I resolve it?
This is my code reduced to a minimum working example:
Class DataStructure
package foo
class DataStructure(e1: (Int, Int), e2: Int) extends Ordered[DataStructure] {
val coord: (Int, Int) = e1
val cost: Int = e2
override def equals(that: Any): Boolean = {
that match {
case that: DataStructure => if (that.coord.hashCode() == this.coord.hashCode()) true else false
case _ => false
}}
override def hashCode(): Int = this.coord.hashCode()
def compare(that: DataStructure) = {
if (this.cost == that.cost)
0
else if (this.cost > that.cost)
-1 //reverse ordering
else
1
}
}
Driver program
package runtime
import foo.DataStructure
import scala.collection.mutable.TreeSet
object Main extends App {
val ts = TreeSet[DataStructure]()
val a = new DataStructure((2,2), 2)
val b = new DataStructure((2,3), 1)
ts.add(a)
ts.add(b)
val list = List(a, b)
val listRes = list.contains(a) // true
val listRes2 = list.contains(new DataStructure((2,2), 0)) // true
val tsRes = ts.contains(a) // true
val tsRes2 = ts.contains(new DataStructure((2,2), 0)) // FALSE!
println("list contains exact match: " + listRes)
println("list contains match on first element: " + listRes2)
println("TreeSet contains exact match: " + tsRes)
println("TreeSet contains match on first element: " + tsRes2)
}
Output:
list contains exact match: true
list contains match on first element: true
TreeSet contains exact match: true
TreeSet contains match on first element: false

Almost certainly List.contains is checking equals for each element to find a match, whereas TreeSet.contains is walking the tree and using compare to find a match.
Your problem is that your compare is not consistent with your equals. I don't know why you're doing that, but don't:
https://www.scala-lang.org/api/current/scala/math/Ordered.html
"It is important that the equals method for an instance of Ordered[A] be consistent with the compare method."

Related

Anagram Recursion Scala

This is the question that I tried to write a code for.
Consider a recursive algorithm that takes two strings s1 and s2 as input and checks if these strings are the anagram of each other, hence if all the letters contained in the former appear in the latter the same number of times, and vice versa (i.e. s2 is a permutation of s1).
Example:
if s1 = ”elevenplustwo” and s2 = ”twelveplusone” the output is true
if s1 = ”amina” and s2 = ”minia” the output is false
Hint: consider the first character c = s1(0) of s1 and the rest of r =s1.substring(1, s1.size) of s1. What are the conditions that s2 must (recursively) satisfy with respect to c and r?
And this is the piece of code I wrote to solve this problem. The problem is that the code works perfectly when there is no repetition of characters in the strings. For example, it works just fine for amin and mina. However, when there is repetition, for example, amina and maina, then it does not work properly.
How can I solve this issue?
import scala.collection.mutable.ArrayBuffer
object Q32019 extends App {
def anagram(s1:String, s2:String, indexAr:ArrayBuffer[Int]):ArrayBuffer[Int]={
if(s1==""){
return indexAr
}
else {
val c=s1(0)
val s=s1.substring(1,s1.length)
val ss=s2
var count=0
for (i<-0 to s2.length-1) {
if(s2(i)==c && !indexAr.contains(s2.indexOf(c))) {
indexAr+=i
}
}
anagram(s,s2,indexAr)
}
indexAr
}
var a="amin"
var b="mina"
var c=ArrayBuffer[Int]()
var d=anagram(a,b,c)
println(d)
var check=true
var i=0
while (i<a.length && check){
if (d.contains(i) && a.length==b.length) check=true
else check=false
i+=1
}
if (check) println("yes they are anagram")
else println("no, they are not anagram")
}
The easiest way is probably to sort both strings and just compare them:
def areAnagram(str1: String, str2: String): Boolean =
str1.sorted == str2.sorted
println(areAnagram("amina", "anima")) // true
println(areAnagram("abc", "bcc")) // false
Other one is more "natural". Two strings are anagrams if they have the same count of each character.
So you make two Map[Char, Int] and compare them:
import scala.collection.mutable
def areAnagram(str1: String, str2: String): Boolean = {
val map1 = mutable.Map.empty[Char, Int].withDefaultValue(0)
val map2 = mutable.Map.empty[Char, Int].withDefaultValue(0)
for (c <- str1) map1(c) += 1
for (c <- str2) map2(c) += 1
map1 == map2
}
There is also another version of second solution with Arrays probably, if you know the chars are only ASCII ones.
Or some other clever algorithm, IDK.
EDIT: One recursive solution could be to remove the first char of str1 from str2. The rests of both strings must be anagrams also.
E.g. for ("amina", "niama") first you throw out an a from both, and you get ("mina", "nima"). Those 2 strings must also be anagrams, by definition.
def areAnagram(str1: String, str2: String): Boolean = {
if (str1.length != str2.length) false
else if (str1.isEmpty) true // end recursion
else {
val (c, r1) = str1.splitAt(1)
val r2 = str2.replaceFirst(c, "") // remove c
areAnagram(r1, r2)
}
}
When you calculate anagrams you can take advantage of property of XOR operation, which says, that if you xor two same numbers you'd get 0.
Since characters in strings are essentially just numbers, you could run xor over all characters of both strings and if result is 0, then these strings are anagrams.
You could iterate over both strings using loop, but if you want to use recursion, I would suggest, that you convert your string to lists of chars.
Lists allow efficient splitting between first element (head of list) and rest (tail of list). So solution would go like this:
Split list to head and tail for both lists of chars.
Run xor over characters extracted from heads of lists and previous result.
Pass tails of list and result of xoring to the next recursive call.
When we get to the end of lists, we just return true is case result of xoring is 0.
Last optimalization we can do is short-curcuiting with false whenever strings with different lengths are passed (since they never could be anagrams anyway).
Final solution:
def anagram(a: String, b: String): Boolean = {
//inner function doing recursion, annotation #tailrec makes sure function is tail-recursive
#tailrec
def go(a: List[Char], b: List[Char], acc: Int): Boolean = { //using additional parameter acc, allows us to use tail-recursion, which is safe for stack
(a, b) match {
case (x :: xs, y :: ys) => //operator :: splits list to head and tail
go(xs, ys, acc ^ x ^ y) //because we changed string to lists of chars, we can now efficiently access heads (first elements) of lists
//we get first characters of both lists, then call recursively go passing tails of lists and result of xoring accumulator with both characters
case _ => acc == 0 //if result of xoring whole strings is 0, then both strings are anagrams
}
}
if (a.length != b.length) { //we already know strings can't be anagrams, because they've got different size
false
} else {
go(a.toList, b.toList, 0)
}
}
anagram("twelveplusone", "elevenplustwo") //true
anagram("amina", "minia") //false
My suggestion: Don't over-think it.
def anagram(a: String, b: String): Boolean =
if (a.isEmpty) b.isEmpty
else b.contains(a(0)) && anagram(a.tail, b diff a(0).toString)

What updates the inherited Map of PrefixMap?

Running the PrefixMap example from the book Programming in Scala, 3rd edition, from the chapter The Architecture of Scala Collections, I don't understand what updates the inherited Map of PrefixMap when calling update.
Here is the code:
import collection._
class PrefixMap[T]
extends mutable.Map[String, T]
with mutable.MapLike[String, T, PrefixMap[T]] {
val id: Long = PrefixMap.nextId
var suffixes: immutable.Map[Char, PrefixMap[T]] = Map.empty
var value: Option[T] = None
def get(s: String): Option[T] =
if (s.isEmpty) value
else suffixes get s(0) flatMap (_.get(s substring 1))
def withPrefix(s: String): PrefixMap[T] =
if (s.isEmpty) this
else {
val leading = s(0)
suffixes get leading match {
case None =>
suffixes = suffixes + (leading -> empty)
case _ =>
}
val ret = suffixes(leading) withPrefix (s substring 1)
println("withPrefix: ends with: id="+this.id+", size="+this.size+", this="+this)
ret
}
override def update(s: String, elem: T) = {
println("update: this before withPrefix: id="+this.id+", size="+this.size+", return="+this)
val pm = withPrefix(s)
println("update: withPrefix returned to update: id="+pm.id+", size="+pm.size+", return="+pm)
println("===> update: this after withPrefix and before assignment to pm.value : id="+this.id+", size="+this.size+", return="+this)
pm.value = Some(elem)
println("===> update: this after assinment to pm.value: id="+this.id+", size="+this.size+", return="+this)
}
override def remove(s: String): Option[T] =
if (s.isEmpty) { val prev = value; value = None; prev }
else suffixes get s(0) flatMap (_.remove(s substring 1))
def iterator: Iterator[(String, T)] =
(for (v <- value.iterator) yield ("", v)) ++
(for ((chr, m) <- suffixes.iterator;
(s, v) <- m.iterator) yield (chr +: s, v))
def += (kv: (String, T)): this.type = { update(kv._1, kv._2); this }
def -= (s: String): this.type = { remove(s); this }
override def empty = new PrefixMap[T]
}
object PrefixMap {
var ids: Long = 0
def nextId: Long = { PrefixMap.ids+=1; ids }
}
object MyApp extends App {
val pm = new PrefixMap[Int]
pm.update("a", 0)
println(pm)
}
The output is:
update: this before withPrefix: id=1, size=0, return=Map()
withPrefix: ends with: id=1, size=0, this=Map()
update: withPrefix returned to update: id=2, size=0, return=Map()
===> update: this after withPrefix and before assignment to pm.value : id=1, size=0, return=Map()
===> update: this after assinment to pm.value: id=1, size=1, return=Map(a -> 0)
Map(a -> 0)
So the question is: how it is possible that the line with "pm.value = Some(elem)" in the update method causes the inherited Map of PrefixMap to be updated with (a -> 0)?
It is not clear what you mean by "inherited Map of PrefixMap". Map is a trait which if you are coming from the Java world is similar to interface. It means that Map on its own doesn't hold any value, it just specifies contract and provides some default implementation of various convenience methods via "core" methods (the ones you implement in your PrefixMap).
As to how this whole data structure works, you should imagine this PrefixMap implementation as a "tree". Logically each edge has a single char (in the prefix sequence) and each node potentially a value that corresponds to a string that is created by accumulation all chars on the way from the root to the current node.
So if you have a Map with "ab" -> 12 key-value, the tree will look something like this:
And if you add "ac" -> 123 to the tree, it will become
Finally if you add "a" -> 1 to the tree, it will become:
Important observation here is that if you take the "a" node as a root, what you'll be left with is a valid prefix tree with all strings shortened by that "a" prefix.
Physically the layout is a bit different:
There is the root node which is PrefixMap[T] which is Map[String,T] from the outside, and also a node for an empty string key.
Internal nodes which are value + suffixes i.e. optional value and merged list of children nodes with their corresponding characters on the edge into a Map[Char, PrefixMap[T]]
As you may see update implementation is effectively find something with withPrefix call and then assigning value to it. So what the withPrefix method does? Although it is implemented recursively, it might be easier to think about it in an iterative way. From this point of view, it iterates over the characters of the given String one by one and navigates through the tree creating missing nodes see
case None =>
suffixes = suffixes + (leading -> empty)
and finally returns the node corresponding to the whole String (i.e. this in case the deepest recursive s.isEmpty)
Method get implementation is actually quite similar to the withPrefix: it recursively iterates over given string and navigates through the tree but it is simpler because it doesn't have to create missing nodes. Because children nodes are actually also stored in a Map its get method returns Option the same way PrefixMap should return Option. So you can just use flatMap and it will work OK if there is no such child node at some level.
Finally iterator creates its iterator as a union of
the value.iterator (luckily Option in Scala implements iterator that returns just 1 or 0 elements depending on whether there is a value or not)
all iterators of all the children nodes just adding its own character as a prefix to their keys.
So when you do
val pm = new PrefixMap[Int]
pm.update("a", 0)
println(pm)
update creates are node(s) in the tree and stores the value. And pm.toString actually uses iterate to build string representation. So it iterates over the tree collection all the values in non-empty value Options in all the nodes.

Beginner for loop in Scala: How do I declare a generic element?

I'm new to Scala and am having trouble with a simple generic for-loop declaration, where one instance of my class, FinSet[T] is "unionized" with my another instance of FinSet[T], other. Here is my current implementation of U (short for Union):
def U(other:FinSet[T]) = {
var otherList = other.toList
for(otherElem <- 0 until otherList.length){
this.+(otherElem)
}
this
}
When attempting to compile, it receive this error.
error: type mismatch:
found: : otherElem.type (with underlying type Int)
required : T
this.+(otherElem)
This is in class ListSet[T], which is an extension of the abstract class FinSet[T]. Both are shown here:
abstract class FinSet[T] protected () {
/* returns a list consisting of the set's elements */
def toList:List[T]
/* given a value x, it retuns a new set consisting of x
and all the elemens of this (set)
*/
def +(x:T):FinSet[T]
/* given a set other, it returns the union of this and other,
i.e., a new set consisting of all the elements of this and
all the elements of other
*/
def U(other:FinSet[T]):FinSet[T]
/* given a set other, it returns the intersection of this and other,
i.e., a new set consisting of all the elements that occur both
in this and in other
*/
def ^(other:FinSet[T]):FinSet[T]
/* given a set other, it returns the difference of this and other,
i.e., a new set consisting of all the elements of this that
do not occur in other
*/
def \(other:FinSet[T]):FinSet[T]
/* given a value x, it retuns true if and only if x is an element of this
*/
def contains(x: T):Boolean
/* given a set other, it returns true if and only if this is included
in other, i.e., iff every element of this is an element of other
*/
def <=(other:FinSet[T]):Boolean =
false // replace this line with your implementation
override def toString = "{" ++ (toList mkString ", ") ++ "}"
// overrides the default definition of == (an alias of equals)
override def equals(other:Any):Boolean = other match {
// if other is an instance of FinSet[T] then ...
case o:FinSet[T] =>
// it is equal to this iff it includes and is included in this
(this <= o) && (o <= this)
case _ => false
}
}
And here, ListSet:
class ListSet[T] private (l: List[T]) extends FinSet[T] {
def this() = this(Nil)
// invariant: elems is a list with no repetitions
// storing all of the set's elements
private val elems = l
private def add(x:T, l:List[T]):List[T] = l match {
case Nil => x :: Nil
case y :: t => if (x == y) l else y :: add(x, t)
}
val toList =
elems
def +(x: T) =
this.toList.+(x)
def U(other:FinSet[T]) = {
var otherList = other.toList
for(otherElem <- 0 until otherList.length){
this.+(otherElem)
}
this
}
def ^(other:FinSet[T]) =
this
def \(other:FinSet[T]) =
this
def contains(x:T) =
false
}
Am I missing something obvious here?
In your for loop you are assigning Ints to otherElem (x until y produces a Range[Int], which effectively gives you an iteration over the Ints from x up to y), not members of otherList. What you want is something like:
def U(other:FinSet[T]) = {
for(otherElem <- other.toList){
this.+(otherElem)
}
this
}
EDIT:
Curious, given your definitions of FinSet and ListSet (which I didn't see until after giving my initial answer), you ought to have some other issues with the above code (+ returns a List, not a FinSet, and you don't capture the result of using + anywhere, so your final return value of this ought to just return the original value of the set - unless you are not using the standard Scala immutable List class? If not, which class are you using here?). If you are using the standard Scala immutable List class, then here is an alternative to consider:
def U(other:FinSet[T]) = new ListSet((this.toList ++ other.toList).distinct)
In general, it looks a bit like you are going to some trouble to produce mutable versions of the data structures you are interested in. I strongly encourage you to look into immutable data structures and how to work with them - they are much nicer and safer to work with once you understand the principles.

Polish notation evaluate function

I am new to Scala and I am having hard-time with defining, or more likely translating my code from Ruby to evaluate calculations described as Polish Notations,
f.e. (+ 3 2) or (- 4 (+ 3 2))
I successfully parse the string to form of ArrayBuffer(+, 3, 2) or ArrayBuffer(-, 4, ArrayBuffer(+, 3 2)).
The problem actually starts when I try to define a recursive eval function ,which simply takes ArrayBuffer as argument and "return" an Int(result of evaluated application).
IN THE BASE CASE:
I want to simply check if 2nd element is an instanceOf[Int] and 3rd element is instanceOf[Int] then evaluate them together (depending on sign operator - 1st element) and return Int.
However If any of the elements is another ArrayBuffer, I simply want to reassign that element to returned value of recursively called eval function. like:
Storage(2) = eval(Storage(2)). (** thats why i am using mutable ArrayBuffer **)
The error ,which I get is:
scala.collection.mutable.ArrayBuffer cannot be cast to java.lang.Integer
I am of course not looking for any copy-and-paste answers but for some advices and observations.
Constructive Criticism fully welcomed.
****** This is the testing code I am using only for the addition ******
def eval(Input: ArrayBuffer[Any]):Int = {
if(ArrayBuffer(2).isInstaceOf[ArrayBuffer[Any]]) {
ArrayBuffer(2) = eval(ArrayBuffer(2))
}
if(ArrayBuffer(3).isInstaceOf[ArrayBuffer[Any]]) {
ArrayBuffer(3) = eval(ArrayBuffer(3))
}
if(ArrayBuffer(2).isInstaceOf[Int] && ArrayBuffer(3).isInstanceOf[Int]) {
ArrayBuffer(2).asInstanceOf[Int] + ArrayBuffer(3).asInstanceOf[Int]
}
}
A few problems with your code:
ArrayBuffer(2) means "construct an ArrayBuffer with one element: 2". Nowhere in your code are you referencing your parameter Input. You would need to replace instances of ArrayBuffer(2) with Input(2) for this to work.
ArrayBuffer (and all collections in Scala) are 0-indexed, so if you want to access the second thing in the collection, you would do input(1).
If you leave the the final if there, then the compiler will complain since your function won't always return an Int; if the input contained something unexpected, then that last if would evaluate to false, and you have no else to fall to.
Here's a direct rewrite of your code: fixing the issues:
def eval(input: ArrayBuffer[Any]):Int = {
if(input(1).isInstanceOf[ArrayBuffer[Any]])
input(1) = eval(input(1).asInstanceOf[ArrayBuffer[Any]])
if(input(2).isInstanceOf[ArrayBuffer[Any]])
input(2) = eval(input(2).asInstanceOf[ArrayBuffer[Any]])
input(1).asInstanceOf[Int] + input(2).asInstanceOf[Int]
}
(note also that variable names, like input, should be lowercased.)
That said, the procedure of replacing entries in your input with their evaluations is probably not the best route because it destroys the input in the process of evaluating. You should instead write a function that takes the ArrayBuffer and simply recurses through it without modifying the original.
You'll want you eval function to check for specific cases. Here's a simple implementation as a demonstration:
def eval(e: Seq[Any]): Int =
e match {
case Seq("+", a: Int, b: Int) => a + b
case Seq("+", a: Int, b: Seq[Any]) => a + eval(b)
case Seq("+", a: Seq[Any], b: Int) => eval(a) + b
case Seq("+", a: Seq[Any], b: Seq[Any]) => eval(a) + eval(b)
}
So you can see that for the simple case of (+ arg1 arg2), there are 4 cases. In each case, if the argument is an Int, we use it directly in the addition. If the argument itself is a sequence (like ArrayBuffer), then we recursively evaluate before adding. Notice also that Scala's case syntax lets to do pattern matches with types, so you can skip the isInstanceOf and asInstanceOf stuff.
Now there definitely style improvements you'd want to make down the line (like using Either instead of Any and not hard coding the "+"), but this should get you on the right track.
And here's how you would use it:
eval(Seq("+", 3, 2))
res0: Int = 5
scala> eval(Seq("+", 4, Seq("+", 3, 2)))
res1: Int = 9
Now, if you want to really take advantage of Scala features, you could use an Eval extractor:
object Eval {
def unapply(e: Any): Option[Int] = {
e match {
case i: Int => Some(i)
case Seq("+", Eval(a), Eval(b)) => Some(a + b)
}
}
}
And you'd use it like this:
scala> val Eval(result) = 2
result: Int = 2
scala> val Eval(result) = ArrayBuffer("+", 2, 3)
result: Int = 5
scala> val Eval(result) = ArrayBuffer("+", 2, ArrayBuffer("+", 2, 3))
result: Int = 7
Or you could wrap it in an eval function:
def eval(e: Any): Int = {
val Eval(result) = e
result
}
Here is my take on right to left stack-based evaluation:
def eval(expr: String): Either[Throwable, Int] = {
import java.lang.NumberFormatException
import scala.util.control.Exception._
def int(s: String) = catching(classOf[NumberFormatException]).opt(s.toInt)
val symbols = expr.replaceAll("""[^\d\+\-\*/ ]""", "").split(" ").toSeq
allCatch.either {
val results = symbols.foldRight(List.empty[Int]) {
(symbol, operands) => int(symbol) match {
case Some(op) => op :: operands
case None => val x :: y :: ops = operands
val result = symbol match {
case "+" => x + y
case "-" => x - y
case "*" => x * y
case "/" => x / y
}
result :: ops
}
}
results.head
}
}

Easiest way to decide if List contains duplicates?

One way is this
list.distinct.size != list.size
Is there any better way? It would have been nice to have a containsDuplicates method
Assuming "better" means "faster", see the alternative approaches benchmarked in this question, which seems to show some quicker methods (although note that distinct uses a HashSet and is already O(n)). YMMV of course, depending on specific test case, scala version etc. Probably any significant improvement over the "distinct.size" approach would come from an early-out as soon as a duplicate is found, but how much of a speed-up is actually obtained would depend strongly on how common duplicates actually are in your use-case.
If you mean "better" in that you want to write list.containsDuplicates instead of containsDuplicates(list), use an implicit:
implicit def enhanceWithContainsDuplicates[T](s:List[T]) = new {
def containsDuplicates = (s.distinct.size != s.size)
}
assert(List(1,2,2,3).containsDuplicates)
assert(!List("a","b","c").containsDuplicates)
You can also write:
list.toSet.size != list.size
But the result will be the same because distinct is already implemented with a Set. In both case the time complexity should be O(n): you must traverse the list and Set insertion is O(1).
I think this would stop as soon as a duplicate was found and is probably more efficient than doing distinct.size - since I assume distinct keeps a set as well:
#annotation.tailrec
def containsDups[A](list: List[A], seen: Set[A] = Set[A]()): Boolean =
list match {
case x :: xs => if (seen.contains(x)) true else containsDups(xs, seen + x)
case _ => false
}
containsDups(List(1,1,2,3))
// Boolean = true
containsDups(List(1,2,3))
// Boolean = false
I realize you asked for easy and I don't now that this version is, but finding a duplicate is also finding if there is an element that has been seen before:
def containsDups[A](list: List[A]): Boolean = {
list.iterator.scanLeft(Set[A]())((set, a) => set + a) // incremental sets
.zip(list.iterator)
.exists{ case (set, a) => set contains a }
}
#annotation.tailrec
def containsDuplicates [T] (s: Seq[T]) : Boolean =
if (s.size < 2) false else
s.tail.contains (s.head) || containsDuplicates (s.tail)
I didn't measure this, and think it is similar to huynhjl's solution, but a bit more simple to understand.
It returns early, if a duplicate is found, so I looked into the source of Seq.contains, whether this returns early - it does.
In SeqLike, 'contains (e)' is defined as 'exists (_ == e)', and exists is defined in TraversableLike:
def exists (p: A => Boolean): Boolean = {
var result = false
breakable {
for (x <- this)
if (p (x)) { result = true; break }
}
result
}
I'm curious how to speed things up with parallel collections on multi cores, but I guess it is a general problem with early-returning, while another thread will keep running, because it doesn't know, that the solution is already found.
Summary:
I've written a very efficient function which returns both List.distinct and a List consisting of each element which appeared more than once and the index at which the element duplicate appeared.
Note: This answer is a straight copy of the answer on a related question.
Details:
If you need a bit more information about the duplicates themselves, like I did, I have written a more general function which iterates across a List (as ordering was significant) exactly once and returns a Tuple2 consisting of the original List deduped (all duplicates after the first are removed; i.e. the same as invoking distinct) and a second List showing each duplicate and an Int index at which it occurred within the original List.
Here's the function:
def filterDupes[A](items: List[A]): (List[A], List[(A, Int)]) = {
def recursive(remaining: List[A], index: Int, accumulator: (List[A], List[(A, Int)])): (List[A], List[(A, Int)]) =
if (remaining.isEmpty)
accumulator
else
recursive(
remaining.tail
, index + 1
, if (accumulator._1.contains(remaining.head))
(accumulator._1, (remaining.head, index) :: accumulator._2)
else
(remaining.head :: accumulator._1, accumulator._2)
)
val (distinct, dupes) = recursive(items, 0, (Nil, Nil))
(distinct.reverse, dupes.reverse)
}
An below is an example which might make it a bit more intuitive. Given this List of String values:
val withDupes =
List("a.b", "a.c", "b.a", "b.b", "a.c", "c.a", "a.c", "d.b", "a.b")
...and then performing the following:
val (deduped, dupeAndIndexs) =
filterDupes(withDupes)
...the results are:
deduped: List[String] = List(a.b, a.c, b.a, b.b, c.a, d.b)
dupeAndIndexs: List[(String, Int)] = List((a.c,4), (a.c,6), (a.b,8))
And if you just want the duplicates, you simply map across dupeAndIndexes and invoke distinct:
val dupesOnly =
dupeAndIndexs.map(_._1).distinct
...or all in a single call:
val dupesOnly =
filterDupes(withDupes)._2.map(_._1).distinct
...or if a Set is preferred, skip distinct and invoke toSet...
val dupesOnly2 =
dupeAndIndexs.map(_._1).toSet
...or all in a single call:
val dupesOnly2 =
filterDupes(withDupes)._2.map(_._1).toSet
This is a straight copy of the filterDupes function out of my open source Scala library, ScalaOlio. It's located at org.scalaolio.collection.immutable.List_._.
If you're trying to check for duplicates in a test then ScalaTest can be helpful.
import org.scalatest.Inspectors._
import org.scalatest.Matchers._
forEvery(list.distinct) { item =>
withClue(s"value $item, the number of occurences") {
list.count(_ == item) shouldBe 1
}
}
// example:
scala> val list = List(1,2,3,4,3,2)
list: List[Int] = List(1, 2, 3, 4, 3, 2)
scala> forEvery(list) { item => withClue(s"value $item, the number of occurences") { list.count(_ == item) shouldBe 1 } }
org.scalatest.exceptions.TestFailedException: forEvery failed, because:
at index 1, value 2, the number of occurences 2 was not equal to 1 (<console>:19),
at index 2, value 3, the number of occurences 2 was not equal to 1 (<console>:19)
in List(1, 2, 3, 4)