Scala Set Hashcode - scala

Assume we have three sets of strings in Scala. One has elements A,B,C. Two has elements B,C,D. And Three has elements J,K,I.
My first question is, is there any way that the hashcodes for any two of these sets could be the same?
My second question is, if I add D to One and A to Two to get new Sets One.n and Two.n, are the hashcodes for One.n and Two.n the same?

Question 1) In general yes, entirely possible. A hashcode is a limited number of bytes long. A Set can be any size. So hashcodes cannot be unique (although usually they are).
Question 2) Why not try it?
scala> val One = collection.mutable.Set[String]("A", "B", "C")
One: scala.collection.mutable.Set[String] = Set(A, B, C)
scala> One.hashCode
res3: Int = 1491157345
scala> val Two = collection.mutable.Set[String]("B", "C", "D")
Two: scala.collection.mutable.Set[String] = Set(B, D, C)
scala> Two.hashCode
res4: Int = -967442916
scala> One += "D"
res5: One.type = Set(A, B, D, C)
scala> Two += "A"
res6: Two.type = Set(B, D, A, C)
scala> One.hashCode
res7: Int = -232075924
scala> Two.hashCode
res8: Int = -232075924
So, yes they are, as you might expect, since you would expect the == method to be true for these two instances.

Sets which are equal and don't have anything strange inside them (i.e. anything with an unstable hash code, or where the hash code is inconsistent with equals) should have equal hash codes. If this is not true, and the sets are the same type of set, it is a bug and should be reported. If the sets are different types of sets, it may or may not be a bug to have different hash codes (but in any case it should agree with equals). I am not aware of any cases where different set implementations are not equal (e.g. even mutable BitSet agrees with immutable Set), however.
So:
hashCode is never guaranteed to be unique, but it should be well-distributed in that the probability of collisions should be low
hashCode of sets should always be consistent with equals (as long as everything you put in the set has hashCode consistent with equals) in that equal sets have equal hash codes. (The converse is not true because of point (1).)
Sets care only about the identity of the contents, not the order of addition to the set (that's the point of having a set instead of, say, a List)

Related

What's the difference between a set and a mapping to boolean?

In Scala, I sometimes use a Map[A, Boolean], sometimes a Set[A]. There's really not much difference between these two concepts, and an implementation might use the same data structure to implement them. So why bother to have Sets? As I said, this question occurred to me in connection with Scala, but it would arise in any programming language whose library implements a Set abstraction.
The are some specific convenient methods defined on Set (intersect, diff and more). Not a big deal, but often useful.
My first thoughts are two:
efficiency: if you only want to signal presence, why bothering with a flag that can either be true or false?
meaning: a set is about the existence of something, a map is about a correlation between a key and value (generally speaking); these two ideas are quite different and should be used accordingly to simplify reading and understanding the code
Furthermore, the semantics of application change:
val map: Map[String, Bool] = Map("hello" -> true, "world" -> false)
val set: Set[String] = Set("hello")
map("hello") // true
map("world") // false
map("none!") // EXCEPTION
set("hello") // true
set("world") // false
set("none!") // false
Without having to actually store an extra pair to indicate absence (not to mention the boolean that actually indicates such absence).
Sets are very good to indicate the presence of something, which makes them very good for filtering:
val map = (0 until 10).map(_.toString).zipWithIndex.toMap
val set = (3 to 5).map(_.toString).toSet
map.filterKeys(set) // keeps pairs 3rd through 5th
Maps, in terms of processing collections, are good to indicate relationships, which makes them very good for collecting:
set.collect(map) // more or less equivalent as above, but only values are returned
You can read more about using collections as functions to process other collections here.
There are several reasons:
1) It is easier to think/work with a data structure that only has single elements as opposed to mapping to dummy true,
For example, it is easier to convert a list to Set, then to Map:
scala> val myList = List(1,2,3,2,1)
myList: List[Int] = List(1, 2, 3, 2, 1)
scala> myList.toSet
res9: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
scala> myList.map(x => (x, true)).toMap
res1: scala.collection.immutable.Map[Int,Boolean] = Map(1 -> true, 2 -> true, 3 -> true)
2) As Kombajn zbożowy pointed out, Sets have additional helper methods, union, intersect, diff, subsetOf.
3) Sense Set does not have mapping to dummy variable the size of a set in memory is smaller, this is more noticeable for small sized keys.
Having said the above, not all languages have Set data structure, Go for example does not.

How do I perform set theory minus operation between two lists in Scala?

I have the following case class
case class Cart(userId: Int, ProductId :Int, SellerId:Int, Qty: Int)
I have the following lists :
val mergedCart :List[Cart]= List(Cart(900,1,1,2),Cart(900,2,2,2),Cart(901,3,3,2),Cart(901,2,2,2),Cart(901,1,1,2),Cart(900,4,2,1))
val userCart:List[Cart] = List(Cart(900,1,1,2),Cart(900,2,2,2),Cart(900,4,2,1))
val guestCart:List[Cart] = List(Cart(901,3,3,2),Cart(901,2,2,2),Cart(901,1,1,2))
val commonCart = List(Cart(900,2,2,4), Cart(900,1,1,4))
My requirement is that I have to get the following list as the output:
List(Cart(900,2,2,4),Cart(900,1,1,4),Cart(901,3,3,2),Cart(900,4,2,1))
The final list should have the common objects from userCart and guestCart based on the ProductId,SellerId combination and the quantity of both the objects get added. Then, the other objects present in userCart and guestCart which do not match the common objects should also be present in the final list in the output.
I am new to Scala and I am not able to solve this, kindly help me with this code.
If you don't care about ordering in resulting list (so basically your result is a Set) , it's as simple as that:
def sum(a: Cart, b: Cart) = {
//require(a.userId == b.userId)
a.copy(Qty = a.Qty + b.Qty)
}
(userCart ++ guestCart)
.groupBy(x => x.ProductId -> x.SellerId)
.mapValues(_.reduce(sum _))
.values
.toList //toSet is more appropriate here
Results:
List(Cart(900,4,2,1), Cart(900,2,2,4), Cart(900,1,1,4), Cart(901,3,3,2))
(!) Be aware that I just took first userId in case of collision (see sum function). However, it preserves priority of users over guests if that's what implied.
Being represented as a Set, this result equals to your requirement:
scala> val mRes = List(Cart(900,4,2,1), Cart(900,2,2,4), Cart(900,1,1,4), Cart(901,3,3,2))
mRes: List[Cart] = List(Cart(900,4,2,1), Cart(900,2,2,4), Cart(900,1,1,4), Cart(901,3,3,2))
scala> val req = List(Cart(900,2,2,4),Cart(900,1,1,4),Cart(901,3,3,2),Cart(900,4,2,1))
req: List[Cart] = List(Cart(900,2,2,4), Cart(900,1,1,4), Cart(901,3,3,2), Cart(900,4,2,1))
scala> mRes.toSet == req.toSet
res17: Boolean = true
Explanations:
++ concatenates two lists
groupBy groups values by some predicate (like x.ProductId -> x.SellerId which equivalent to a tuple (x.ProductId, x.SellerId) in your case). It preserves order inside group, but groups themselves aren't ordered - that's why order in resulting list is undefined. The operator returns Map[Key, List[Value]], in your case Map[(Int, Int), List[Cart]]
mapValues iterates over lists with carts
reduce inside mapValues reduces List with carts by summing carts using sum function
I didn't have to reattach objects with unique (x.ProductId, x.SellerId) as they were represented just as lists with one element, so reduce function didn't touch them - it just returned first (and only) element.
a.copy(Qty = ...) makes copy of a with modified Qty field. In our case I take left element as a template, so elements that preced in the (userCart ++ guestCart) would have higher priority when userId is chosen.
Answering the headline's question about subtracting two sets:
scala> Set(1,2,3,4) - 4
res16: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
scala> Set(1,2,3,4) -- Set(3,4)
res15: scala.collection.immutable.Set[Int] = Set(1, 2)
If elements of sets are instances of case classes (given that hashCode/equals methods weren't overridden) - it would compare all fields in order to check equality between two elements.
There is a theoretical connection of groupBy solution with a set theory. First, you can easily notice that my solution is representable with SQL's GROUP BY + AGGREGATE (groupBy with reduce-catamorphism in Scala). SQL is mostly based on relational-algebra, which in its turn partially based on set-theory, so here it is.
P.S. field/value/variable name in scala should always start with lowercase letter by convention. First capital letter means a constant.

Scala Sets Immutability

scala> var immSet = Set("A", "B")
immSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> immSet += "C"
scala> println(immSet)
Set(A, B, C)
I wonder, what is the advantage I am getting by allowing var to be used with with an immutable Set? Am I not losing immutability in this case?
What is the advantage I am getting by allowing var to be used with
with a Immutabable Sets?
I would say this can mainly cause confusion. The fact that you're using a var allows you to overwrite the variable, but, the Set by itself doesn't change, it allocates a new set with the additional value "C". But since you're using a var, the previous Set is now no longer referenced, unless you've referenced is somewhere else higher up the stack:
scala> var firstSet = Set("A", "B")
firstSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> var secondSet = firstSet
secondSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> firstSet += "C"
scala> firstSet
res6: scala.collection.immutable.Set[String] = Set(A, B, C)
scala> secondSet
res7: scala.collection.immutable.Set[String] = Set(A, B)
Because secondSet still points to the Set created by firstSet, we don't see the value update reflected. I think making this immutable adds clarity that the underlying Set is immutable and well as the variable pointing to it. When you use a val, the compiler will yell if you attempt to reassign, forcing you to realize that a new collection is initialized.
Regarding immutability, we need to divide this into two. There is the immutability of the Set, and there is the immutability of the variable pointing to that Set, these are two different things. You lose the latter with this approach.
Read #YuvalItzchakov's answer if you want to understand what is the basic difference between immutable collection defined as var and mutable collection defined as val. I'll concentrate on practical aspects of both approaches.
First of all, both approaches imply mutability. If you want to stay "pure functional" you should avoid either of them.
Now, if you want mutable collection, what is the best way? Short answer, it depends.
Performance. Mutable collections are usually faster than their immutable counterparts. It means, that if your mutable variable is somehow contained (for example, doesn't escape private method), it may be better to use val c = MutableCollection(). Most of the Scala's methods in Collections API internally use mutable collections.
Thread safety. Value of immutable collection is always thread safe. You can send it to another thread and don't think about visibility and concurrent changes.
var a = ImmutableCol()
otherThreadProcessor.process(a)
a += 1 // otherThread will still have previous value
On the other hand, if you want to modify collection from multiple threads, better use Java's concurrent collection API.
Code clarity. Imagine you have some function that takes collection as an argument and then modifies it in some ways. If collection is mutable, then, after function returns, collection, passed as an argument, will stay modified.
def recImmutable(a:Set[Int]): Unit = {
var b = a
b += 4
}
val a = Set(2,3)
recImmutable(a)
println(a)
// prints Set(2, 3)
def recMutable(a:mutable.Set[Int]): Unit = {
var b = a
b += 4
}
val b = mutable.Set(2,3)
recMutable(b)
println(b)
// prints Set(2, 3, 4)

Set, contains that returns match rather than boolean

I went to the effort to create myself a class whereby I have an equals method that defines an equivalence relation for which the partition sets don't have size == 1.
The class takes an absolute path and a root (also a path), these "relative paths" are equivalent if their path relative to their roots are the same. I have two sets with these in and I have ensured that all the elements in each individual set have the same root. So, according to my logic there should be one or zero element(s() in the second set that == any element in the first set.
But now I realize that I don't have the nice O(1) lookup I wanted as the Set.contains() method only returns a boolean, not the element it found!! Is there a method or collection I'm not aware of that will give me the O(1) behaviour I'm looking for (i.e. an O(1) lookup on equals, returning the equal element.)
Redefining equality to mean something else than equality of all elements is almost always a bad idea. Scala sets assume that if two things are equal, they are interchangeable. This is not the case in your approach.
I think you will have to use a Map[T, T] instead of a Set[T] to do what you want.
This must surely have been asked before, but if so, I'm not able to find it.
There is a hole in the API here, as far as I can see.
Perhaps hope somebody can do better, but the best workaround I can find is:
def lookup[T](s: Set[T], x: T): Option[T] =
s.intersect(Set(x)).headOption
Let's take it for a spin. First define a case class that carries extra information that doesn't affect equality:
scala> case class C(x: Int)(val y: Int)
defined class C
scala> C(3)(5) == C(3)(6)
res4: Boolean = true
Now let's try a test case:
scala> val s = Set(C(3)(5), C(8)(2), C(7)(6))
s: scala.collection.immutable.Set[C] = Set(C(3), C(8), C(7))
scala> lookup(s, C(8)(99)).map(_.y)
res6: Option[Int] = Some(2)
Looks good.
As for whether it's O(1), it appears to me from perusing https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/immutable/HashSet.scala that it is.

How can I idiomatically "remove" a single element from a list in Scala and close the gap?

Lists are immutable in Scala, so I'm trying to figure out how I can "remove" - really, create a new collection - that element and then close the gap created in the list. This sounds to me like it would be a great place to use map, but I don't know how to get started in this instance.
Courses is a list of strings. I need this loop because I actually have several lists that I will need to remove the element at that index from (I'm using multiple lists to store data associated across lists, and I'm doing this by simply ensuring that the indices will always correspond across lists).
for (i <- 0 until courses.length){
if (input == courses(i) {
//I need a map call on each list here to remove that element
//this element is not guaranteed to be at the front or the end of the list
}
}
}
Let me add some detail to the problem. I have four lists that are associated with each other by index; one list stores the course names, one stores the time the class begins in a simple int format (ie 130), one stores either "am" or "pm", and one stores the days of the classes by int (so "MWF" evals to 1, "TR" evals to 2, etc). I don't know if having multiple this is the best or the "right" way to solve this problem, but these are all the tools I have (first-year comp sci student that hasn't programmed seriously since I was 16). I'm writing a function to remove the corresponding element from each lists, and all I know is that 1) the indices correspond and 2) the user inputs the course name. How can I remove the corresponding element from each list using filterNot? I don't think I know enough about each list to use higher order functions on them.
This is the use case of filter:
scala> List(1,2,3,4,5)
res0: List[Int] = List(1, 2, 3, 4, 5)
scala> res0.filter(_ != 2)
res1: List[Int] = List(1, 3, 4, 5)
You want to use map when you are transforming all the elements of a list.
To answer your question directly, I think you're looking for patch, for instance to remove element with index 2 ("c"):
List("a","b","c","d").patch(2, Nil, 1) // List(a, b, d)
where Nil is what we're replacing it with, and 1 is the number of characters to replace.
But, if you do this:
I have four lists that are associated with each other by index; one
list stores the course names, one stores the time the class begins in
a simple int format (ie 130), one stores either "am" or "pm", and one
stores the days of the classes by int
you're going to have a bad time. I suggest you use a case class:
case class Course(name: String, time: Int, ampm: String, day: Int)
and then store them in a Set[Course]. (Storing time and days as Ints isn't a great idea either - have a look at java.util.Calendar instead.)
First a few sidenotes:
List is not an index-based structure. All index-oriented operations on it take linear time. For index-oriented algorithms Vector is a much better candidate. In fact if your algorithm requires indexes it's a sure sign that you're really not exposing Scala's functional capabilities.
map serves for transforming a collection of items "A" to the same collection of items "B" using a passed in transformer function from a single "A" to single "B". It cannot change the number of resulting elements. Probably you've confused map with fold or reduce.
To answer on your updated question
Okay, here's a functional solution, which works effectively on lists:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= (courses, timeList, amOrPmList, dateList)
.zipped
.filterNot(_._1 == input)
.unzip4
But there's a catch. I actually came to be quite astonished to find out that functions used in this solution, which are so basic for functional languages, were not present in the standard Scala library. Scala has them for 2 and 3-ary tuples, but not the others.
To solve that you'll need to have the following implicit extensions imported.
implicit class Tuple4Zipped
[ A, B, C, D ]
( val t : (Iterable[A], Iterable[B], Iterable[C], Iterable[D]) )
extends AnyVal
{
def zipped
= t._1.toStream
.zip(t._2).zip(t._3).zip(t._4)
.map{ case (((a, b), c), d) => (a, b, c, d) }
}
implicit class IterableUnzip4
[ A, B, C, D ]
( val ts : Iterable[(A, B, C, D)] )
extends AnyVal
{
def unzip4
= ts.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
}
This implementation requires Scala 2.10 as it utilizes the new effective Value Classes feature for pimping the existing types.
I have actually included these in a small extensions library called SExt, after depending your project on which you'll be able to have them by simply adding an import sext._ statement.
Of course, if you want you can just compose these functions directly into the solution:
val (resultCourses, resultTimeList, resultAmOrPmList, resultDateList)
= courses.toStream
.zip(timeList).zip(amOrPmList).zip(dateList)
.map{ case (((a, b), c), d) => (a, b, c, d) }
.filterNot(_._1 == input)
.foldRight((List[A](), List[B](), List[C](), List[D]()))(
(a, z) => (a._1 +: z._1, a._2 +: z._2, a._3 +: z._3, a._4 +: z._4)
)
Removing and filtering List elements
In Scala you can filter the list to remove elements.
scala> val courses = List("Artificial Intelligence", "Programming Languages", "Compilers", "Networks", "Databases")
courses: List[java.lang.String] = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
Let's remove a couple of classes:
courses.filterNot(p => p == "Compilers" || p == "Databases")
You can also use remove but it's deprecated in favor of filter or filterNot.
If you want to remove by an index you can associate each element in the list with an ordered index using zipWithIndex. So, courses.zipWithIndex becomes:
List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Programming Languages,1), (Compilers,2), (Networks,3), (Databases,4))
To remove the second element from this you can refer to index in the Tuple with courses.filterNot(_._2 == 1) which gives the list:
res8: List[(java.lang.String, Int)] = List((Artificial Intelligence,0), (Compilers,2), (Networks,3), (Databases,4))
Lastly, another tool is to use indexWhere to find the index of an arbitrary element.
courses.indexWhere(_ contains "Languages")
res9: Int = 1
Re your update
I'm writing a function to remove the corresponding element from each
lists, and all I know is that 1) the indices correspond and 2) the
user inputs the course name. How can I remove the corresponding
element from each list using filterNot?
Similar to Nikita's update you have to "merge" the elements of each list. So courses, meridiems, days, and times need to be put into a Tuple or class to hold the related elements. Then you can filter on an element of the Tuple or a field of the class.
Combining corresponding elements into a Tuple looks as follows with this sample data:
val courses = List(Artificial Intelligence, Programming Languages, Compilers, Networks, Databases)
val meridiems = List(am, pm, am, pm, am)
val times = List(100, 1200, 0100, 0900, 0800)
val days = List(MWF, TTH, MW, MWF, MTWTHF)
Combine them with zip:
courses zip days zip times zip meridiems
val zipped = List[(((java.lang.String, java.lang.String), java.lang.String), java.lang.String)] = List((((Artificial Intelligence,MWF),100),am), (((Programming Languages,TTH),1200),pm), (((Compilers,MW),0100),am), (((Networks,MWF),0900),pm), (((Databases,MTWTHF),0800),am))
This abomination flattens the nested Tuples to a Tuple. There are better ways.
zipped.map(x => (x._1._1._1, x._1._1._2, x._1._2, x._2)).toList
A nice list of tuples to work with.
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Networks,MWF,0900,pm), (Databases,MTWTHF,0800,am))
Finally we can filter based on course name using filterNot. e.g. filterNot(_._1 == "Networks")
List[(java.lang.String, java.lang.String, java.lang.String, java.lang.String)] = List((Artificial Intelligence,MWF,100,am), (Programming Languages,TTH,1200,pm), (Compilers,MW,0100,am), (Databases,MTWTHF,0800,am))
The answer I am about to give might be overstepping what you have been taught so far in your course, so if that is the case I apologise.
Firstly, you are right to question whether you should have four lists - fundamentally, it sounds like what you need is an object which represents a course:
/**
* Represents a course.
* #param name the human-readable descriptor for the course
* #param time the time of day as an integer equivalent to
* 12 hour time, i.e. 1130
* #param meridiem the half of the day that the time corresponds
* to: either "am" or "pm"
* #param days an encoding of the days of the week the classes runs.
*/
case class Course(name : String, timeOfDay : Int, meridiem : String, days : Int)
with which you may define an individual course
val cs101 =
Course("CS101 - Introduction to Object-Functional Programming",
1000, "am", 1)
There are better ways to define this type (better representations of 12-hour time, a clearer way to represent the days of the week, etc), but I won't deviate from your original problem statement.
Given this, you would have a single list of courses:
val courses = List(cs101, cs402, bio101, phil101)
And if you wanted to find and remove all courses that matched a given name, you would write:
val courseToRemove = "PHIL101 - Philosophy of Beard Ownership"
courses.filterNot(course => course.name == courseToRemove)
Equivalently, using the underscore syntactic sugar in Scala for function literals:
courses.filterNot(_.name == courseToRemove)
If there was the risk that more than one course might have the same name (or that you are filtering based on some partial criteria using a regular expression or prefix match) and that you only want to remove the first occurrence, then you could define your own function to do that:
def removeFirst(courses : List[Course], courseToRemove : String) : List[Course] =
courses match {
case Nil => Nil
case head :: tail if head == courseToRemove => tail
case head :: tail => head :: removeFirst(tail)
}
Use the ListBuffer is a mutable List like a java list
var l = scala.collection.mutable.ListBuffer("a","b" ,"c")
print(l) //ListBuffer(a, b, c)
l.remove(0)
print(l) //ListBuffer(b, c)