Scala efficient set inclusion detection

Scala efficient set inclusion detection - scala

Let a collection of tuples where the first item is a set, for instance
val xs = Seq(
((1 to 5).toSet ++ Set(9), "apple"),
((15 to 17).toSet, "pear"),
((21 to 30).toSet, "grape"))
Given a value x:Int, how to efficiently identify the second item ? (The real use case includes thousands of sets.)
For val x = 22 the result would be Some("grape"), for val x = 19 the result would be None.
Note Values in each set are not necessarily consecutive.
Note Sets do not overlap (any sets intersection proves empty).

Depends on your use case, but given you're concerned with efficiency, I assume you're going to do a lot of lookups.
I also assume you use one xs, and lookup in that a lot of times.
Preprocess xs into a map of Int->String
val xsMap = (xs flatMap { case (s, v) => s.map((_,v))}).toMap[Int, String]
Then it's trivial (and O(1)) to look up elements
xsMap.get(22) //> res0: Option[String] = Some(grape)
xsMap.get(19) //> res1: Option[String] = None

What about:
s.find(_._1.contains(11)).map(_._2)

Related

Getting the largest integer value from many lists

I want to get the maximum integer value of a bunch of lists.
How can I do this? Keep in mind, some of the lists maybe empty.
I tried something but I was getting:
java.lang.UnsupportedOperationException: empty.max
So I just have a bunch of lists like:
val l1 = List.empty
val l2 = List(1,2,3)
val l3 = List(4,5,6)
val l4 = List(10)
I am doing this currently:
(l1 ++ l2 ++ l3).max

The max may not exist if all the lists are empty, so we can model the result as an Option[Int].
Here's a simple way of doing it:
val max: Option[Int] = List(l1, l2, l3, l4).flatten match {
case Nil => None
case list => Some(list.max)
}
Performing an operation on a List if not empty is a common use case, so there's an ad-hoc combinator that you can use alternatively, reduceOption, as suggested by Jean Logeart's answer:
If you're into one-liners, you can do:
val max: Option[Int] = List(l1, l2, l3, l4).flatten.reduceOption(_ max _)
although I would prefer the first (more verbose) solution, as I personally find it easier to read.
If instead you want to have a default result, you can fold over the flattened List starting with your default:
val max: Int = List(l1, l2, l3, l4).flatten.foldLeft(0)(_ max _) // 0 or any default
or alternatively, just prepend a 0 to your original solution
val max = (0 :: l1 ++ l2 ++ l3).max

If all the lists can be empty:
val max: Option[Int] = Seq(l1, l2, l3, l4).flatten.reduceOption(_ max _)

Pretty much all of the other answers are using flatten or flatMap to create an intermediate list. If all of your lists are quite large, that's needless memory overhead. My solution uses iterators to avoid the extra allocation in the middle.
val list = List(l1, l2, l3, l4)
val max = list.iterator.flatMap(_.iterator).reduceOption(_ max _)
As pointed out in a comment, the .flatMap(_.iterator) can actually be replaced by a flatten. Since it's being called on an iterator, the result is another iterator, rather than a complete list.

If you are running into an exception where ALL of the lists are empty, then this will solve that:
(0 :: l1 ++ l2 ++ l3).max
Assuming you just want to default to 0 if they are all empty.

Here is a way you can use Options and a try/catch to find the max:
scala> val l = List(List.empty, List(1,2,3), List(4,5,6), List(10))
l: List[List[Int]] = List(List(), List(1, 2, 3), List(4, 5, 6), List(10))
scala> l.flatMap(x => try{ Some(x.max) } catch {case _ => None}).max
res0: Int = 10
In light of the comments below: don't use exceptions for control flow. I would recommend using Gabriele Petronella's solution.

How to explain that "Set(someList : _*)" results the same as "Set(someList).flatten"

I found a piece of code I wrote some time ago using _* to create a flattened set from a list of objects.
The real line of code is a bit more complex and as I didn't remember exactly why was that there, took a bit of experimentation to understand the effect, which is actually very simple as seen in the following REPL session:
scala> val someList = List("a","a","b")
someList: List[java.lang.String] = List(a, a, b)
scala> val x = Set(someList: _*)
x: scala.collection.immutable.Set[java.lang.String] = Set(a, b)
scala> val y = Set(someList).flatten
y: scala.collection.immutable.Set[java.lang.String] = Set(a, b)
scala> x == y
res0: Boolean = true
Just as a reference of what happens without flatten:
scala> val z = Set(someList)
z: scala.collection.immutable.Set[List[java.lang.String]] = Set(List(a, a, b))
As I can't remember where did I get that idiom from I'd like to hear about what is actually happening there and if there is any consequence in going for one way or the other (besides the readability impact)
P.S.: Maybe as an effect of the overuse of underscore in Scala language (IMHO), it is kind of difficult to find documentation about some of its use cases, specially if it comes together with a symbol commonly used as a wildcard in most search engines.

_* is for expand this collection as if it was written here literally, so
val x = Set(Seq(1,2,3,4): _*)
is the same as
val x = Set(1,2,3,4)
Whereas, Set(someList) treats someList as a single argument.
To lookup funky symbols, you could use symbolhound

Scala lazy elements in iterator

Does anyone know how to create a lazy iterator in scala?
For example, I want to iterate through instantiating each element. After passing, I want the instance to die / be removed from memory.
If I declare an iterator like so:
val xs = Iterator(
(0 to 10000).toArray,
(0 to 10).toArray,
(0 to 10000000000).toArray)
It creates the arrays when xs is declared. This can be proven like so:
def f(name: String) = {
val x = (0 to 10000).toArray
println("f: " + name)
x
}
val xs = Iterator(f("1"),f("2"),f("3"))
which prints:
scala> val xs = Iterator(f("1"),f("2"),f("3"))
f: 1
f: 2
f: 3
xs: Iterator[Array[Int]] = non-empty iterator
Anyone have any ideas?
Streams are not suitable because elements remain in memory.
Note: I am using an Array as an example, but I want it to work with any type.

Scala collections have a view method which produces a lazy equivalent of the collection. So instead of (0 to 10000).toArray, use (0 to 10000).view. This way, there will be no array created in the memory. See also https://stackoverflow.com/a/6996166/90874, https://stackoverflow.com/a/4799832/90874, https://stackoverflow.com/a/4511365/90874 etc.

Use one of Iterator factory methods which accepts call-by-name parameter.
For your first example you can do one of this:
val xs1 = Iterator.fill(3)((0 to 10000).toArray)
val xs2 = Iterator.tabulate(3)(_ => (0 to 10000).toArray)
val xs3 = Iterator.continually((0 to 10000).toArray).take(3)
Arrays won't be allocated until you need them.
In case you need different expressions for each element, you can create separate iterators and concatenate them:
val iter = Iterator.fill(1)(f("1")) ++
Iterator.fill(1)(f("2")) ++
Iterator.fill(1)(f("3"))

Extract elements from one list that aren't in another

Simply, I have two lists and I need to extract the new elements added to one of them.
I have the following
val x = List(1,2,3)
val y = List(1,2,4)
val existing :List[Int]= x.map(xInstance => {
if (!y.exists(yInstance =>
yInstance == xInstance))
xInstance
})
Result :existing: List[AnyVal] = List((), (), 3)
I need to remove all other elements except the numbers with the minimum cost.

Pick a suitable data structure, and life becomes a lot easier.
scala> x.toSet -- y
res1: scala.collection.immutable.Set[Int] = Set(3)
Also beware that:
if (condition) expr1
Is shorthand for:
if (condition) expr1 else ()
Using the result of this, which will usually have the static type Any or AnyVal is almost always an error. It's only appropriate for side-effects:
if (condition) buffer += 1
if (condition) sys.error("boom!")

retronym's solution is okay IF you don't have repeated elements that and you don't care about the order. However you don't indicate that this is so.
Hence it's probably going to be most efficient to convert y to a set (not x). We'll only need to traverse the list once and will have fast O(log(n)) access to the set.
All you need is
x filterNot y.toSet
// res1: List[Int] = List(3)
edit:
also, there's a built-in method that is even easier:
x diff y
(I had a look at the implementation; it looks pretty efficient, using a HashMap to count ocurrences.)

The easy way is to use filter instead so there's nothing to remove;
val existing :List[Int] =
x.filter(xInstance => !y.exists(yInstance => yInstance == xInstance))

val existing = x.filter(d => !y.exists(_ == d))
Returns
existing: List[Int] = List(3)

Scala: How do I use fold* with Map?

I have a Map[String, String] and want to concatenate the values to a single string.
I can see how to do this using a List...
scala> val l = List("te", "st", "ing", "123")
l: List[java.lang.String] = List(te, st, ing, 123)
scala> l.reduceLeft[String](_+_)
res8: String = testing123
fold* or reduce* seem to be the right approach I just can't get the syntax right for a Map.

Folds on a map work the same way they would on a list of pairs. You can't use reduce because then the result type would have to be the same as the element type (i.e. a pair), but you want a string. So you use foldLeft with the empty string as the neutral element. You also can't just use _+_ because then you'd try to add a pair to a string. You have to instead use a function that adds the accumulated string, the first value of the pair and the second value of the pair. So you get this:
scala> val m = Map("la" -> "la", "foo" -> "bar")
m: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(la -> la, foo -> bar)
scala> m.foldLeft("")( (acc, kv) => acc + kv._1 + kv._2)
res14: java.lang.String = lalafoobar
Explanation of the first argument to fold:
As you know the function (acc, kv) => acc + kv._1 + kv._2 gets two arguments: the second is the key-value pair currently being processed. The first is the result accumulated so far. However what is the value of acc when the first pair is processed (and no result has been accumulated yet)? When you use reduce the first value of acc will be the first pair in the list (and the first value of kv will be the second pair in the list). However this does not work if you want the type of the result to be different than the element types. So instead of reduce we use fold where we pass the first value of acc as the first argument to foldLeft.
In short: the first argument to foldLeft says what the starting value of acc should be.
As Tom pointed out, you should keep in mind that maps don't necessarily maintain insertion order (Map2 and co. do, but hashmaps do not), so the string may list the elements in a different order than the one in which you inserted them.

The question has been answered already, but I'd like to point out that there are easier ways to produce those strings, if that's all you want. Like this:
scala> val l = List("te", "st", "ing", "123")
l: List[java.lang.String] = List(te, st, ing, 123)
scala> l.mkString
res0: String = testing123
scala> val m = Map(1 -> "abc", 2 -> "def", 3 -> "ghi")
m: scala.collection.immutable.Map[Int,java.lang.String] = Map((1,abc), (2,def), (3,ghi))
scala> m.values.mkString
res1: String = abcdefghi