What does `var # _*` signify in Scala - scala

I'm reviewing some Scala code trying to learn the language. Ran into a piece that looks like the following:
case x if x startsWith "+" =>
val s: Seq[Char] = x
s match {
case Seq('+', rest # _*) => r.subscribe(rest.toString){ m => }
}
In this case, what exactly is rest # _* doing? I understand this is a pattern match for a Sequence, but I'm not exactly understanding what that second parameter in the Sequence is supposed to do.
Was asked for more context so I added the code block I found this in.

If you have come across _* before in the form of applying a Seq as varargs to some method/constructor, eg:
val myList = List(args: _*)
then this is the "unapply" (more specifically, search for "unapplySeq") version of this: take the sequence and convert back to a "varargs", then assign the result to rest.

x # p matches the pattern p and binds the result of the whole match to x. This pattern matches a Seq containing '+' followed by any number (*) of unnamed elements (_) and binds rest to a Seq of those elements.

Related

How to consecutive and non-consecutive list in scala?

val keywords = List("do", "abstract","if")
val resMap = io.Source
.fromFile("src/demo/keyWord.txt")
.getLines()
.zipWithIndex
.foldLeft(Map.empty[String,Seq[Int]].withDefaultValue(Seq.empty[Int])){
case (m, (line, idx)) =>
val subMap = line.split("\\W+")
.toSeq //separate the words
.filter(keywords.contains) //keep only key words
.groupBy(identity) //make a Map w/ keyword as key
.mapValues(_.map(_ => idx+1)) //and List of line numbers as value
.withDefaultValue(Seq.empty[Int])
keywords.map(kw => (kw, m(kw) ++ subMap(kw))).toMap
}
println("keyword\t\tlines\t\tcount")
keywords.sorted.foreach{kw =>
println(kw + "\t\t" +
resMap(kw).distinct.mkString("[",",","]") + "\t\t" +
resMap(kw).length)
}
This code is not mine and i don't own it ... .using for study purpose. However, I am still learning and I am stuck at implement consecutive to nonconsecutive list, such as the word "if" is in many line and when three or more consecutive line numbers appear then they should be written with a dash in between, e.g. 20-22, but not 20, 21, 22. How can I implement? I just wanted to learn this.
output:
keyword lines count
abstract [1] 1
do [6] 1
if [14,15,16,17,18] 5
But I want the result to be such as [14-18] because word "if" is in line 14 to 18.
First off, I'll give the customary caution that SO isn't meant to be a place to crowdsource answers to homework or projects. I'll give you the benefit of the doubt that this isn't the case.
That said, I hope you gain some understanding about breaking down this problem from this suggestion:
your existing implementation has nothing in place to understand if the int values are indeed consecutive, so you are going to need to add some code that sorts the Ints returned from resMap(kw).distinct in order to set yourself up for the next steps. You can figure out how to do this.
you will then need to group the Ints by their consecutive nature. For example, if you have (14,15,16,18,19,20,22) then this really needs to be further grouped into ((14,15,16),(18,19,20),(22)). You can come up with your algorithm for this.
map over the outer collection (which is a Seq[Seq[Int]] at this point), having different handling depending on whether or not the length of the inside Seq is greater than 1. If greater than one, you can safely call head and tail to get the Ints that you need for rendering your range. Alternatively, you can more idiomatically make a for-comprehension that composes the values from headOption and tailOption to build the same range string. You said something about length of 3 in your question, so you can adjust this step to meet that need as necessary.
lastly, now you have Seq[String] looking like ("14-16","18-20","22") that you need to join together using a mkString call similar to what you already have with the square brackets
For reference, you should get further acquainted with the Scaladoc for the Seq trait:
https://www.scala-lang.org/api/2.12.8/scala/collection/Seq.html
Here's one way to go about it.
def collapseConsecutives(nums :Seq[Int]) :List[String] =
nums.foldRight((nums.last, List.empty[List[Int]])) {
case (n, (prev,acc)) if prev-n == 1 => (n, (n::acc.head) :: acc.tail)
case (n, ( _ ,acc)) => (n, List(n) :: acc)
}._2.map{ ns =>
if (ns.length < 3) ns.mkString(",") //1 or 2 non-collapsables
else s"${ns.head}-${ns.last}" //3 or more, collapsed
}
usage:
println(kw + "\t\t" +
collapseConsecutives(resMap(kw).distinct).mkString("[",",","]") + "\t\t" +
resMap(kw).length)

Instantiating objects on both sides of assignment operator in Scala; how does it work

I want to understand the mechanism behind the following line:
val List(x) = Seq(1 to 10)
What is the name of this mechanism? Is this the same as typecasting, or is there something else going on? (Tested in Scala 2.11.12.)
The mechanism is called Pattern Matching.
Here is the official documentation: https://docs.scala-lang.org/tour/pattern-matching.html
This works also in for comprehensions:
for{
People(name, firstName) <- peoples
} yield s"$firstName $name"
To your example:
val List(x) = Seq(1 to 10)
x is the content of that List - in your case Range 1 to 10 (You have a list with one element).
If you have really a list with more than one element, that would throw an exception
val List(x) = (1 to 10).toList // -> ERROR: undefined
So the correct pattern matching would be:
val x::xs = (1 to 10).toList
Now x is the first element (head) and xs the rest (tail).
I suspect that your problem is actually the expression
Seq(1 to 10)
This does not create a sequence of 10 elements, but a sequence containing a single Range object. So when you do this
val List(x) = Seq(1 to 10)
x is assigned to that Range object.
If you want a List of numbers, do this:
(1 to 10).toList
The pattern List(x) will only match if the expression on the right is an instance of List containing a single element. It will not match an empty List or a List with more than one element.
In this case it happens to work because the constructor for Seq actually returns an instance of List.
This technique is called object destructuring. Haskell provides a similar feature. Scala uses pattern matching to achieve this.
The method used in this case is Seq#unapplySeq:
https://www.scala-lang.org/api/current/scala/collection/Seq.html
You can think of
val <pattern> = <value>
<next lines>
as
<value> match {
case <pattern> =>
<next lines>
}
This doesn't happen only when <pattern> is just a variable or a variable with a type.

How to extract sequence elements wrapped in Option in Scala?

I am learning Scala and struggling with Option[Seq[String]] object I need to process. There is a small array of strings Seq("hello", "Scala", "!") which I need to filter against charAt(0).isUpper condition.
Doing it on plain val arr = Seq("hello", "Scala", "!") is as easy as arr.filter(_.charAt(0).isUpper). However, doing the same on Option(Seq("hello", "Scala", "!")) won't work since you need to call .getOrElse on it first. But even then how can you apply the condition?
arr.filter(_.getOrElse(false).charAt(0).isUpper is wrong. I have tried a lot of variants and searching stackoverflow didn't help either and I am wondering if this is at all possible. Is there an idiomatic way to handle Option wrapped cases in Scala?
If you want to apply f: X => Y to a value x of type X, you write f(x).
If you want to apply f: X => Y to a value ox of type Option[X], you write ox.map(f).
You seem to already know what you want to do with the sequence, so just put the appropriate f into map.
Example:
val ox = Option(Seq("hello", "Scala", "!"))
ox.map(_.filter(_(0).isUpper)) // Some(Seq("Scala"))
You can just call foreach or map on the option, i.e. arr.map(seq => seq.filter(_.charAt(0).isUpper))
You can use a pattern matching for that case as
Option(Seq("hello", "Scala", "!")) match {
case Some(x) => x.filter(_.charAt(0).isUpper)
case None => Seq()
}

How to refer Spark RDD element multiple times using underscore notation?

How to refer Spark RDD element multiple times using underscore notations.
For example I need to convert RDD[String] to RDD[(String, Int)]. I can create anonymous function using function variables but I would like to do this using Underscore notation. How I can achieve this.
PFB sample code.
val x = List("apple", "banana")
val rdd1 = sc.parallelize(x)
// Working
val rdd2 = rdd1.map(x => (x, x.length))
// Not working
val rdd3 = rdd1.map((_, _.length))
Why does the last line above not work?
An underscore or (more commonly) a placeholder syntax is a marker of a single input parameter. It's nice to use for simple functions, but can get tricky to get right with two or more.
You can find the definitive answer in the Scala language specification's Placeholder Syntax for Anonymous Functions:
An expression (of syntactic category Expr) may contain embedded underscore symbols _ at places where identifiers are legal. Such an expression represents an anonymous function where subsequent occurrences of underscores denote successive parameters.
Note that one underscore references one input parameter, two underscores are for two different input parameters and so on.
With that said, you cannot use the placeholder twice and expect that they'll reference the same input parameter. That's not how it works in Scala and hence the compiler error.
// Not working
val rdd3 = rdd1.map((_, _.length))
The above is equivalent to the following:
// Not working
val rdd3 = rdd1.map { (a: String, b: String) => (a, b.length)) }
which is clearly incorrect as map expects a function of one input parameter.

Pattern Matching Array of Bytes

This fails to compile:
Array[Byte](...) match { case Array(0xFE.toByte, 0xFF.toByte, tail # _* ) => tail }
I can, however, compile this:
val FE = 0xFE.toByte
val FF = 0xFF.toByte
bytes match { Array( FE, FF, tail # _* ) => tail }
Why does the first case fail to compile but not the second case?
Matching as currently implemented in Scala does not allow you to generate expressions inside your match statement: you must either have an expression or a match, but not both.
So, if you assign the values before you get into the match, all you have is a stable identifier, and the match works. But unlike almost everywhere else in Scala, you can't substitute an expression like 0xFE.toByte for the val FE.
Type inference will work, so you could have written
xs match { case Array(-2, -1, tail) => tail }
since it knows from the type of xs that the literals -2 and -1 must be Byte values. But, otherwise, you often have to do just what you did: construct what you want to match to, and assign them to vals starting with an upper-case letter, and then use them in the match.
There is no exceedingly good reason why it must have been this way (though one does have to solve the problem of when to do variable assignment and when to test for equality and when to do another level of unapply), but this is the way it (presently) is.
Because what you see in the match is not a constructor but extractor. Extractors extracts data into variables provided. That is why the second case works - you provide variables there. By the way, you did not have to create them in advance.
More about extractors at http://docs.scala-lang.org/tutorials/tour/extractor-objects.html