How to explain that "Set(someList : _*)" results the same as "Set(someList).flatten" - scala

I found a piece of code I wrote some time ago using _* to create a flattened set from a list of objects.
The real line of code is a bit more complex and as I didn't remember exactly why was that there, took a bit of experimentation to understand the effect, which is actually very simple as seen in the following REPL session:
scala> val someList = List("a","a","b")
someList: List[java.lang.String] = List(a, a, b)
scala> val x = Set(someList: _*)
x: scala.collection.immutable.Set[java.lang.String] = Set(a, b)
scala> val y = Set(someList).flatten
y: scala.collection.immutable.Set[java.lang.String] = Set(a, b)
scala> x == y
res0: Boolean = true
Just as a reference of what happens without flatten:
scala> val z = Set(someList)
z: scala.collection.immutable.Set[List[java.lang.String]] = Set(List(a, a, b))
As I can't remember where did I get that idiom from I'd like to hear about what is actually happening there and if there is any consequence in going for one way or the other (besides the readability impact)
P.S.: Maybe as an effect of the overuse of underscore in Scala language (IMHO), it is kind of difficult to find documentation about some of its use cases, specially if it comes together with a symbol commonly used as a wildcard in most search engines.

_* is for expand this collection as if it was written here literally, so
val x = Set(Seq(1,2,3,4): _*)
is the same as
val x = Set(1,2,3,4)
Whereas, Set(someList) treats someList as a single argument.
To lookup funky symbols, you could use symbolhound

Related

How to get a Long typed production of a Seq[Int] in Scala?

Suppose val s = Seq[Int] and I would like to get the production of all its elements. The value is guaranteed to be greater than Int.MaxValue but less than Long.MaxValue so I hope the value to be a Long type.
It seems I cannot use product/foldLeft/reduceLeft due to the fact Long and Int are different types without any relations; therefore I need to write a for-loop myself. Is there any decent way to achieve this goal?
Note: I'm just asking the possibility to use builtin libraries but still fine with "ugly" code below.
def product(a: Seq[Int]): Long = {
var p = 1L
for (e <- a) p = p * e
p
}
There's no need to mess about with asInstanceOf or your own loop. foldLeft works just fine
val xs = Seq(1,1000000000,1000000)
xs.foldLeft(1L)((a,e) => a*e)
//> res0: Long = 1000000000000000
How about
def product(s: Seq[Int]) = s.map(_.asInstanceOf[Long]).fold(1L)( _ * _ )
In fact, having re-read your question and learnt about the existence of product itself, you could just do:
def product(s: Seq[Int]) = s.map(_.asInstanceOf[Long]).product

Scala efficient set inclusion detection

Let a collection of tuples where the first item is a set, for instance
val xs = Seq(
((1 to 5).toSet ++ Set(9), "apple"),
((15 to 17).toSet, "pear"),
((21 to 30).toSet, "grape"))
Given a value x:Int, how to efficiently identify the second item ? (The real use case includes thousands of sets.)
For val x = 22 the result would be Some("grape"), for val x = 19 the result would be None.
Note Values in each set are not necessarily consecutive.
Note Sets do not overlap (any sets intersection proves empty).
Depends on your use case, but given you're concerned with efficiency, I assume you're going to do a lot of lookups.
I also assume you use one xs, and lookup in that a lot of times.
Preprocess xs into a map of Int->String
val xsMap = (xs flatMap { case (s, v) => s.map((_,v))}).toMap[Int, String]
Then it's trivial (and O(1)) to look up elements
xsMap.get(22) //> res0: Option[String] = Some(grape)
xsMap.get(19) //> res1: Option[String] = None
What about:
s.find(_._1.contains(11)).map(_._2)

Why the variation in operators?

Long time lurker, first time poster.
In Scala, I'm looking for advantages as to why it was preferred to vary operators depending on type. For example, why was this:
Vector(1, 2, 3) :+ 4
determined to be an advantage over:
Vector(1, 2, 3) + 4
Or:
4 +: Vector(1,2,3)
over:
Vector(4) + Vector(1,2,3)
Or:
Vector(1,2,3) ++ Vector(4,5,6)
over:
Vector(1,2,3) + Vector(4,5,6)
So, here we have :+, +:, and ++ when + alone could have sufficed. I'm new at Scala, and I'll succumb. But, this seems unnecessary and obfuscated for a language that tries to be clean with its syntax.
I've done quite a few google and stack overflow searches and have only found questions about specific operators, and operator overloading in general. But, no background on why it was necessary to split +, for example, into multiple variations.
FWIW, I could overload the operators using implicit classes, such as below, but I imagine that would only cause confusion (and tisk tisks) from experienced Scala programmers using/reading my code.
object AddVectorDemo {
implicit class AddVector(vector : Vector[Any]) {
def +(that : Vector[Any]) = vector ++ that
def +(that : Any) = vector :+ that
}
def main(args : Array[String]) : Unit = {
val u = Vector(1,2,3)
val v = Vector(4,5,6)
println(u + v)
println(u + v + 7)
}
}
Outputs:
Vector(1, 2, 3, 4, 5, 6)
Vector(1, 2, 3, 4, 5, 6, 7)
The answer requires a surprisingly long detour through variance. I'll try to make it as short as possible.
First, note that you can add anything to an existing Vector:
scala> Vector(1)
res0: scala.collection.immutable.Vector[Int] = Vector(1)
scala> res0 :+ "fish"
res1: scala.collection.immutable.Vector[Any] = Vector(1, fish)
Why can you do this? Well, if B extends A and we want to be able to use Vector[B] where Vector[A] is called for, we need to allow Vector[B] to add the same sorts of things that Vector[A] can add. But everything extends Any, so we need to allow addition of anything that Vector[Any] can add, which is everything.
Making Vector and most other non-Set collections covariant is a design decision, but it's what most people expect.
Now, let's try adding a vector to a vector.
scala> res0 :+ Vector("fish")
res2: scala.collection.immutable.Vector[Any] = Vector(1, Vector(fish))
scala> res0 ++ Vector("fish")
res3: scala.collection.immutable.Vector[Any] = Vector(1, fish)
If we only had one operation, +, we wouldn't be able to specify which one of these things we meant. And we really might mean to do either. They're both perfectly sensible things to try. We could try to guess based on types, but in practice it's better to just ask the programmer to explicitly say what they mean. And since there are two different things to mean, there need to be two ways to ask.
Does this come up in practice? With collections of collections, yes, all the time. For example, using your +:
scala> Vector(Vector(1), Vector(2))
res4: Vector[Vector[Int]] = Vector(Vector(1), Vector(2))
scala> res4 + Vector(3)
res5: Vector[Any] = Vector(Vector(1), Vector(2), 3)
That's probably not what I wanted.
It's a fair question, and I think it has a lot to do with legacy code and Java compatibility. Scala copied Java's + for String concatenation, which has complicated things.
This + allows us to do:
(new Object) + "foobar" //"java.lang.Object#5bb90b89foobar"
So what should we expect if we had + for List and we did List(1) + "foobar"? One might expect List(1, "foobar") (of type List[Any]), just like we get if we use :+, but the Java-inspired String-concatenation overload would complicate this, since the compiler would fail to resolve the overload.
Odersky even once commented:
One should never have a + method on collections that are covariant in their element type. Sets and maps are non-variant, that's why they can have a + method. It's all rather delicate and messy. We'd be better off if we did not try to duplicate Java's + for String concatenation. But when Scala got designed the idea was to keep essentially all of Java's expression syntax, including String +. And it's too late to change that now.
There is some discussion (although in a different context) on the answers to this similar question.

Pattern for chaining together calls that take in Options

I'm finding that I often have to chain together functions that work on an Option and return a different Option that look something like this:
if(foo.isDefined) someFunctionReturningOption(foo.get) else None
Is there a cleaner way to do this? This pattern gets quite verbose with more complicated variables.
I'm seeing it a fair bit in form handling code that has to deal with optional data. It'll insert None if the value is None or some transformation (which could potentially fail) if there is some value.
This is very much like the ?. operator proposed for C#.
You can use flatMap:
foo.flatMap(someFunctionReturningOption(_))
Or in a for-comprehension:
for {
f <- foo
r <- someFunctionReturningOption(f)
} yield r
The for-comprehension is preferred when chaining multiple instances of these functions together, as they de-sugar to flatMaps.
There're a lot of options (pun intended) but for comprehensions, I guess, is the most convinient in case of chains
for {
x <- xOpt
y <- someFunctionReturningOption(x)
z <- anotherFunctionReturningOption(y)
} yield z
You're looking for flatMap:
foo.flatMap(someFunctionReturningOption)
This fits into the general monadic structure, where a monad wrapping a type uses flatMap to return the same type (e.g. flatMap on Seq[T] returns a Seq).
Option supports map() so when x is an Option[Int] this construct:
if (x.isDefined)
"number %d".format(x.get)
else
None
is easier to write as:
x map (i => "number %d".format(i))
map will keep None unmodified, but it will apply the function you pass to it to any value, and wrap the result back into an Option. For example note how 'x' gets converted to a string message below, but 'y' gets passed along as None:
scala> val x: Option[Int] = Some(3)
x: Option[Int] = Some(3)
scala> val y: Option[Int] = None
y: Option[Int] = None
scala> x map (i => "number %d".format(i))
res0: Option[String] = Some(number 3)
scala> y map (i => "number %d".format(i))
res1: Option[String] = None

Extract elements from one list that aren't in another

Simply, I have two lists and I need to extract the new elements added to one of them.
I have the following
val x = List(1,2,3)
val y = List(1,2,4)
val existing :List[Int]= x.map(xInstance => {
if (!y.exists(yInstance =>
yInstance == xInstance))
xInstance
})
Result :existing: List[AnyVal] = List((), (), 3)
I need to remove all other elements except the numbers with the minimum cost.
Pick a suitable data structure, and life becomes a lot easier.
scala> x.toSet -- y
res1: scala.collection.immutable.Set[Int] = Set(3)
Also beware that:
if (condition) expr1
Is shorthand for:
if (condition) expr1 else ()
Using the result of this, which will usually have the static type Any or AnyVal is almost always an error. It's only appropriate for side-effects:
if (condition) buffer += 1
if (condition) sys.error("boom!")
retronym's solution is okay IF you don't have repeated elements that and you don't care about the order. However you don't indicate that this is so.
Hence it's probably going to be most efficient to convert y to a set (not x). We'll only need to traverse the list once and will have fast O(log(n)) access to the set.
All you need is
x filterNot y.toSet
// res1: List[Int] = List(3)
edit:
also, there's a built-in method that is even easier:
x diff y
(I had a look at the implementation; it looks pretty efficient, using a HashMap to count ocurrences.)
The easy way is to use filter instead so there's nothing to remove;
val existing :List[Int] =
x.filter(xInstance => !y.exists(yInstance => yInstance == xInstance))
val existing = x.filter(d => !y.exists(_ == d))
Returns
existing: List[Int] = List(3)