How to extract sequence elements wrapped in Option in Scala? - scala

I am learning Scala and struggling with Option[Seq[String]] object I need to process. There is a small array of strings Seq("hello", "Scala", "!") which I need to filter against charAt(0).isUpper condition.
Doing it on plain val arr = Seq("hello", "Scala", "!") is as easy as arr.filter(_.charAt(0).isUpper). However, doing the same on Option(Seq("hello", "Scala", "!")) won't work since you need to call .getOrElse on it first. But even then how can you apply the condition?
arr.filter(_.getOrElse(false).charAt(0).isUpper is wrong. I have tried a lot of variants and searching stackoverflow didn't help either and I am wondering if this is at all possible. Is there an idiomatic way to handle Option wrapped cases in Scala?

If you want to apply f: X => Y to a value x of type X, you write f(x).
If you want to apply f: X => Y to a value ox of type Option[X], you write ox.map(f).
You seem to already know what you want to do with the sequence, so just put the appropriate f into map.
Example:
val ox = Option(Seq("hello", "Scala", "!"))
ox.map(_.filter(_(0).isUpper)) // Some(Seq("Scala"))

You can just call foreach or map on the option, i.e. arr.map(seq => seq.filter(_.charAt(0).isUpper))

You can use a pattern matching for that case as
Option(Seq("hello", "Scala", "!")) match {
case Some(x) => x.filter(_.charAt(0).isUpper)
case None => Seq()
}

Related

Instantiating objects on both sides of assignment operator in Scala; how does it work

I want to understand the mechanism behind the following line:
val List(x) = Seq(1 to 10)
What is the name of this mechanism? Is this the same as typecasting, or is there something else going on? (Tested in Scala 2.11.12.)
The mechanism is called Pattern Matching.
Here is the official documentation: https://docs.scala-lang.org/tour/pattern-matching.html
This works also in for comprehensions:
for{
People(name, firstName) <- peoples
} yield s"$firstName $name"
To your example:
val List(x) = Seq(1 to 10)
x is the content of that List - in your case Range 1 to 10 (You have a list with one element).
If you have really a list with more than one element, that would throw an exception
val List(x) = (1 to 10).toList // -> ERROR: undefined
So the correct pattern matching would be:
val x::xs = (1 to 10).toList
Now x is the first element (head) and xs the rest (tail).
I suspect that your problem is actually the expression
Seq(1 to 10)
This does not create a sequence of 10 elements, but a sequence containing a single Range object. So when you do this
val List(x) = Seq(1 to 10)
x is assigned to that Range object.
If you want a List of numbers, do this:
(1 to 10).toList
The pattern List(x) will only match if the expression on the right is an instance of List containing a single element. It will not match an empty List or a List with more than one element.
In this case it happens to work because the constructor for Seq actually returns an instance of List.
This technique is called object destructuring. Haskell provides a similar feature. Scala uses pattern matching to achieve this.
The method used in this case is Seq#unapplySeq:
https://www.scala-lang.org/api/current/scala/collection/Seq.html
You can think of
val <pattern> = <value>
<next lines>
as
<value> match {
case <pattern> =>
<next lines>
}
This doesn't happen only when <pattern> is just a variable or a variable with a type.

Putting two placeholders inside flatMap in Spark Scala to create Array

I am applying flatMap on a scala array and create another array from it:
val x = sc.parallelize(Array(1,2,3,4,5,6,7))
val y = x.flatMap(n => Array(n,n*100,42))
println(y.collect().mkString(","))
1,100,42,2,200,42,3,300,42,4,400,42,5,500,42,6,600,42,7,700,42
But I am trying to use placeholder "_" in the second line of the code where I create y in the following way:
scala> val y = x.flatMap(Array(_,_*100,42))
<console>:26: error: wrong number of parameters; expected = 1
val y = x.flatMap(Array(_,_*100,42))
^
Which is not working. Could someone explain what to do in such cases if I want to use placeholder?
In scala, the number of placeholders in a lambda indicates the cardinality of the lambda parameters.
So the last line is expanded as
val y = x.flatMap((x1, x2) => Array(x1, x2*100, 42))
Long story short, you can't use a placeholder to refer twice to the same element.
You have to use named parameters in this case.
val y = x.flatMap(x => Array(x, x*100, 42))
You can only use _ placeholder once per parameter. (In your case, flatMap method takes single argument, but you are saying -- hey compiler, expect two arguments which is not going to work)
val y = x.flatMap(i => Array(i._1, i._2*100,42))
should do the trick.
val y = x.flatMap { case (i1, i2) => Array(i1, i2*100,42) }
should also work (and probably more readable)

What's the reasoning behind adding the "case" keyword to Scala?

Apart from:
case class A
... case which is quite useful?
Why do we need to use case in match? Wouldn't:
x match {
y if y > 0 => y * 2
_ => -1
}
... be much prettier and concise?
Or why do we need to use case when a function takes a tuple? Say, we have:
val z = List((1, -1), (2, -2), (3, -3)).zipWithIndex
Now, isn't:
z map { case ((a, b), i) => a + b + i }
... way uglier than just:
z map (((a, b), i) => a + b + i)
...?
First, as we know, it is possible to put several statements for the same case scenario without needing some separation notation, just a line jump, like :
x match {
case y if y > 0 => y * 2
println("test")
println("test2") // these 3 statements belong to the same "case"
}
If case was not needed, compiler would have to find a way to know when a line is concerned by the next case scenario.
For example:
x match {
y if y > 0 => y * 2
_ => -1
}
How compiler would know whether _ => -1 belongs to the first case scenario or represents the next case?
Moreover, how compiler would know that the => sign doesn't represent a literal function but the actual code for the current case?
Compiler would certainly need a kind of code like this allowing cases isolation:
(using curly braces, or anything else)
x match {
{y if y > 0 => y * 2}
{_ => -1} // confusing with literal function notation
}
And surely, solution (provided currently by scala) using case keyword is a lot more readable and understandable than putting some way of separation like curly braces in my example.
Adding to #Mik378's answer:
When you write this: (a, b) => something, you are defining an anonymous Function2 - a function that takes two parameters.
When you write this: case (a, b) => something, you are defining an anonymous PartialFunction that takes one parameter and matches it against a pair.
So you need the case keyword to differentiate between these two.
The second issue, anonymous functions that avoid the case, is a matter of debate:
https://groups.google.com/d/msg/scala-debate/Q0CTZNOekWk/z1eg3dTkCXoJ
Also: http://www.scala-lang.org/old/node/1260
For the first issue, the choice is whether you allow a block or an expression on the RHS of the arrow.
In practice, I find that shorter case bodies are usually preferable, so I can certainly imagine your alternative syntax resulting in crisper code.
Consider one-line methods. You write:
def f(x: Int) = 2 * x
then you need to add a statement. I don't know if the IDE is able to auto-add parens.
def f(x: Int) = { val res = 2*x ; res }
That seems no worse than requiring the same syntax for case bodies.
To review, a case clause is case Pattern Guard => body.
Currently, body is a block, or a sequence of statements and a result expression.
If body were an expression, you'd need braces for multiple statements, like a function.
I don't think => results in ambiguities since function literals don't qualify as patterns, unlike literals like 1 or "foo".
One snag might be: { case foo => ??? } is a "pattern matching anonymous function" (SLS 8.5). Obviously, if the case is optional or eliminated, then { foo => ??? } is ambiguous. You'd have to distinguish case clauses for anon funs (where case is required) and case clauses in a match.
One counter-argument for the current syntax is that, in an intuition deriving from C, you always secretly hope that your match will compile to a switch table. In that metaphor, the cases are labels to jump to, and a label is just the address of a sequence of statements.
The alternative syntax might encourage a more inlined approach:
x match {
C => c(x)
D => d(x)
_ => ???
}
#inline def c(x: X) = ???
//etc
In this form, it looks more like a dispatch table, and the match body recalls the Map syntax, Map(a -> 1, b -> 2), that is, a tidy simplification of the association.
One of the key aspects of code readability is the words that grab your attention. For example,
return grabs your attention when you see it because you know that it is such a decisive action (breaking out of the function and possible sending a value back to the caller).
Another example is break--not that I like break, but it gets your attention.
I would agree with #Mik378 that case in Scala is more readable than the alternatives. Besides the compiler confusion he mentions, it gets your attention.
I am all for concise code, but there is a line between concise and illegible. I will gladly make the trade of 4n characters (where n is the number of cases) for the substantial readability that I get in return.

What does `var # _*` signify in Scala

I'm reviewing some Scala code trying to learn the language. Ran into a piece that looks like the following:
case x if x startsWith "+" =>
val s: Seq[Char] = x
s match {
case Seq('+', rest # _*) => r.subscribe(rest.toString){ m => }
}
In this case, what exactly is rest # _* doing? I understand this is a pattern match for a Sequence, but I'm not exactly understanding what that second parameter in the Sequence is supposed to do.
Was asked for more context so I added the code block I found this in.
If you have come across _* before in the form of applying a Seq as varargs to some method/constructor, eg:
val myList = List(args: _*)
then this is the "unapply" (more specifically, search for "unapplySeq") version of this: take the sequence and convert back to a "varargs", then assign the result to rest.
x # p matches the pattern p and binds the result of the whole match to x. This pattern matches a Seq containing '+' followed by any number (*) of unnamed elements (_) and binds rest to a Seq of those elements.

How to use Scala reduceLeft on case classes?

I understand how to use reduceLeft on simple lists of integers but attempts to use if on case class objects fail.
Assume I have:
case class LogMsg(time:Int, cat:String, msg:String)
val cList = List(LogMsg(1,"a", "bla"), LogMsg(2,"a", "bla"), LogMsg(4,"b", "bla"))
and I want to find the largest difference in time between LogMsgs.
I want to do something like:
cList.reduceLeft((a,b) => (b.time - a.time)
which of course doesn't work.
The first iteration of reduceLeft compares the first two elements, which are both of type LogMsg. After that it compares the next element (LogMsg) with the result of the first iteration (Int).
Do I just have the syntax wrong or should I be doing this another way?
I'd probably do something like this:
(cList, cList.tail).zipped.map((a, b) => b.time - a.time).max
You'll need to check beforehand that cList has at least 2 elements.
reduceLeft can't be used to return the largest difference, because it always returns the type of the List you're reducing, i.e. LogMsg in this case, and you're asking for an Int.
My try:
cList.sliding(2).map(t => t(1).time - t(0).time).max
Another one that came into my mind: since LogMsg is a case class, we can take advantage of pattern matching:
cList.sliding(2).collect{
case List(LogMsg(a, _, _), LogMsg(b, _, _)) => b - a}.
max
I would recommand you to use foldLeft which is a reduceLeft enabling you to initialize the results.
val head::tail = cList
tail.foldLeft((head.time, 0)) ((a,b) => (b.time, math.max(a._2,b.time-a._1)))._2