Meaning of exclamation mark in zipAll(s).takeWhile(!_._2.isEmpty) - scala

What is the explanation mark doing in (!_._2.isEmpty) ?
As in :
def startsWith[A](s: Stream[A]): Boolean =
zipAll(s).takeWhile(!_._2.isEmpty) forAll {
case (h,h2) => h == h2
}
taken from Stream.
Is it just negation ?
If yes, why no space is required between ! and _ ?
Is not !_ interpreted as a method name ?
Can method names contain or start with ! ?

It is just negation. expanding the definition by replacing the _ with a more verbose name might make this more obvious.
def startsWith[A](s: Stream[A]): Boolean =
zipAll(s).takeWhile(!_._2.isEmpty) forAll {
case (h,h2) => h == h2
}
can be rewritten as
def startsWith[A](s: Stream[A]): Boolean =
zipAll(s).takeWhile( element => !element._2.isEmpty) forAll {
case (h,h2) => h == h2
}
._2 is just the second item in a tuple, in this case it looks like this list is a pair of items (references later as h and h2) so you could also rewrite this by unpacking the items into a pair of values as
def startsWith[A](s: Stream[A]): Boolean =
zipAll(s).takeWhile{ element =>
val (h, h2) = element
!h2.isEmpty
} forAll {
case (h,h2) => h == h2
}

Is it just negation?
Yes
If yes, why no space is required between ! and _ ?
Because the grammar allows it
Is not !_ interpreted as a method name ?
No, because . associates stronger than !, so the expression is parsed as !(_._2.isEmpty)
Moreover, !_ is not even a valid method name (again, specified in the grammar, see below)
Can method names contain or start with !?
Yes, but not freely. Here's the rules for identifier naming, straight from the language specification:
There are three ways to form an identifier. First, an identifier can start with a letter
which can be followed by an arbitrary sequence of letters and digits. This may be
followed by underscore ‘
_
’ characters and another string composed of either letters
and digits or of operator characters. Second, an identifier can start with an operator
character followed by an arbitrary sequence of operator characters. The preceding
two forms are called
plain
identifiers. Finally, an identifier may also be formed by an
arbitrary string between back-quotes (host systems may impose some restrictions
on which strings are legal for identifiers). The identifier then is composed of all
characters excluding the backquotes themselves.
As usual, a longest match rule applies.
(The Scala Language Specification, Version 2.9, Chapter 1.1)

Related

Scala case matching with cons. Where does this variable h come from?

I'm reading some Scala problems and I see this:
def last[A](l: List[A]): A = l match {
case h :: Nil => h
case _ :: tail => last(tail)
case _ => throw new NoSuchElementException
}
I understand the basics of the cons operator. But where does the h come from?
In that top case h, I can see how we're saying in the event that there is Nil at the end of the list, return h, which would be the last element of the list. But where is h even defined?
The h is defined right there in the case statement.
Scala's pattern matching has pretty concise syntax that can take a beat to get used to. It mixes some stuff that look very similar:
If you include a literal value, the object you're matching on must be equal to that value. Example: case 1 :: tail => ... matches only lists that start with 1.
If you include an _, that matches anything. Example: case _ :: tail => ...
If you include a new variable name, that matches anything, and assigns what it matched to that variable within the scope of the corresponding body. Example: case h :: tail => ... which matches the exact same inputs as the previous example but also assigns the first element to h within the "..." section.
h is an identifier that represents a single element of type A and is defined in the case clause itself: the letter "h" is sensible in this context because it clearly means "head." However, you can use whatever legal Scala identifier you want:
def last[A](l: List[A]): A = l match {
case baconWrappedShrimp :: Nil => baconWrappedShrimp
case _ :: tail => last(tail)
case _ => throw new NoSuchElementException
}
Scala pattern matching allows you to write temp variables in the cases.
The first case covers the case where l is a the following list {h, Nil}. Pay attention that the '::' operator is used to concatanate a single element with a list. Unlike the ':::' operator which concatanates two lists.

Scala syntax strangeness with :: and requiring lower case

is this supposed to happen?
scala> val myList = List(42)
myList: List[Int] = List(42)
scala> val s2 :: Nil = myList
s2: Int = 42
scala> val S2 :: Nil = myList
<console>:8: error: not found: value S2
val S2 :: Nil = myList
^
It appears to be case sensitive. Bug or 'feature'?
It is case-sensitive. In a match pattern, an identifier beginning with a capital letter (or quoted by backticks) is treated as a reference to a defined value, not as a new binding.
This catches a lot of people by surprise, and it isn't entirely obvious from reading the Scala language specification. The most relevant bits are “variable patterns” ...
A variable pattern x is a simple identifier which starts with a lower case letter. It matches any value, and binds the variable name to that value.
... and “stable identifier patterns”:
To resolve the syntactic overlap with a variable pattern, a stable identifier pattern may not be a simple name starting with a lower-case letter.
Related questions:
Why does pattern matching in Scala not work with variables?
Scala pattern matching with lowercase variable name
How to pattern match into an uppercase variable?
Feature :)
:: is a form of pattern matching. In Scala, variables beginning with lowercase are used for variables that should be bound by the match. Variables starting with uppercase (or enclosed in backticks) are used for existing variables that are used as part of the pattern to match.

Pattern Matching Array of Bytes

This fails to compile:
Array[Byte](...) match { case Array(0xFE.toByte, 0xFF.toByte, tail # _* ) => tail }
I can, however, compile this:
val FE = 0xFE.toByte
val FF = 0xFF.toByte
bytes match { Array( FE, FF, tail # _* ) => tail }
Why does the first case fail to compile but not the second case?
Matching as currently implemented in Scala does not allow you to generate expressions inside your match statement: you must either have an expression or a match, but not both.
So, if you assign the values before you get into the match, all you have is a stable identifier, and the match works. But unlike almost everywhere else in Scala, you can't substitute an expression like 0xFE.toByte for the val FE.
Type inference will work, so you could have written
xs match { case Array(-2, -1, tail) => tail }
since it knows from the type of xs that the literals -2 and -1 must be Byte values. But, otherwise, you often have to do just what you did: construct what you want to match to, and assign them to vals starting with an upper-case letter, and then use them in the match.
There is no exceedingly good reason why it must have been this way (though one does have to solve the problem of when to do variable assignment and when to test for equality and when to do another level of unapply), but this is the way it (presently) is.
Because what you see in the match is not a constructor but extractor. Extractors extracts data into variables provided. That is why the second case works - you provide variables there. By the way, you did not have to create them in advance.
More about extractors at http://docs.scala-lang.org/tutorials/tour/extractor-objects.html

Apache-Spark : What is map(_._2) shorthand for?

I read a project's source code, found:
val sampleMBR = inputMBR.map(_._2).sample
inputMBR is a tuple.
the function map's definition is :
map[U classTag](f:T=>U):RDD[U]
it seems that map(_._2) is the shorthand for map(x => (x._2)).
Anyone can tell me rules of those shorthand ?
The _ syntax can be a bit confusing. When _ is used on its own it represents an argument in the anonymous function. So if we working on pairs:
map(_._2 + _._2) would be shorthand for map(x, y => x._2 + y._2). When _ is used as part of a function name (or value name) it has no special meaning. In this case x._2 returns the second element of a tuple (assuming x is a tuple).
collection.map(_._2) emits a second component of the tuple. Example from pure Scala (Spark RDDs work the same way):
scala> val zipped = (1 to 10).zip('a' to 'j')
zipped: scala.collection.immutable.IndexedSeq[(Int, Char)] = Vector((1,a), (2,b), (3,c), (4,d), (5,e), (6,f), (7,g), (8,h), (9,i), (10,j))
scala> val justLetters = zipped.map(_._2)
justLetters: scala.collection.immutable.IndexedSeq[Char] = Vector(a, b, c, d, e, f, g, h, i, j)
Two underscores in '_._2' are different.
First '_' is for placeholder of anonymous function; Second '_2' is member of case class Tuple.
Something like:
case class Tuple3 (_1: T1, _2: T2, _3: T3)
{...}
I have found the solutions.
First the underscore here is as placeholder.
To make a function literal even more concise, you can use underscores
as placeholders for one or more parameters, so long as each parameter
appears only one time within the function literal.
See more about underscore in Scala at What are all the uses of an underscore in Scala?.
The first '_' is referring what is mapped to and since what is mapped to is a tuple you might call any function within the tuple and one of the method is '_2' so what below tells us transform input into it's second attribute.

Parser combinator grammar not yielding correct associativity

I am working on a simple expression parser, however given the following parser combinator declarations below, I can't seem to pass my tests and a right associative tree keeps on popping up.
def EXPR:Parser[E] = FACTOR ~ rep(SUM|MINUS) ^^ {case a~b => (a /: b)((acc,f) => f(acc))}
def SUM:Parser[E => E] = "+" ~ EXPR ^^ {case "+" ~ b => Sum(_, b)}
def MINUS:Parser[E => E] = "-" ~ EXPR ^^ {case "-" ~ b => Diff(_, b)}
I've been debugging hours for this. I hope someone can help me figure it out it's not coming out right.
"5-4-3" would yield a tree that evaluates to 4 instead of the expected -2.
What is wrong with the grammar above?
I don't work with Scala but do work with F# parser combinators and also needed associativity with infix operators. While I am sure you can do 5-4 or 2+3, the problem comes in with a sequence of two or more such operators of the same precedence and operator, i.e. 5-4-2 or 2+3+5. The problem won't show up with addition as (2+3)+5 = 2+(3+5) but (5-4)-2 <> 5-(4-2) as you know.
See: Monadic Parser Combinators 4.3 Repetition with meaningful separators. Note: The separators are the operators such as "+" and "*" and not whitespace or commas.
See: Functional Parsers Look for the chainl and chainr parsers in section 7. More parser combinators.
For example, an arithmetical expressions, where the operators that
separate the subexpressions have to be part of the parse tree. For
this case we will develop the functions chainr and chainl. These
functions expect that the parser for the separators yields a function
(!);
The function f should operate on an element and a list of tuples, each
containing an operator and an element. For example, f(e0; [(1; e1);
(2; e2); (3; e3)]) should return ((eo 1 e1) 2 e2) 3 e3. You may
recognize a version of foldl in this (albeit an uncurried one), where
a tuple (; y) from the list and intermediate result x are combined
applying x y.
You need a fold function in the semantic parser, i.e. the part that converts the tokens from the syntactic parser into the output of the parser. In your code I believe it is this part.
{case a~b => (a /: b)((acc,f) => f(acc))}
Sorry I can't do better as I don't use Scala.
"-" ~ EXPR ^^ {case "-" ~ b => Diff(_, b)}
for 5-4-3, it expands to
Diff(5, 4-3)
which is
Diff(5, Diff(4, 3))
however, what you need is:
Diff(Diff(5, 4), 3))
// for 5 + 4 - 3 it should be
Diff(Sum(5, 4), 3)
you need to involve stack.
It seems using "+" ~ EXPR made the answer incorrect. It should have been FACTOR instead.