Coming from other programming languages, I would expect the following statements to all produce the same result. Can someone explain why the parenthesis make a difference here?
PS D:\> "ab", "cd"
ab
cd
PS D:\> "a"+"b" , "c"+"d"
ab cd
PS D:\> "a"+"b" , ("c"+"d")
ab cd
PS D:\> ("a"+"b"), "c"+"d"
ab
c
d
PS D:\> ("a"+"b"), ("c"+"d")
ab
cd
PS D:\>
The applicable rules are as follows:
The , is a unary or binary operator that creates an array of the operand(s), and it has higher precedence than the + operator.
The + operator can be used for numerical addition, string concatenation, and array concatenation. The data type of the left operand determines which type of operation is performed. If possible, the right operand is cast as the data type of the left operand; if not possible, an error is thrown. *
Where precedence is equal, evaluation occurs left to right.
Arrays are cast as strings by listing the elements separated by spaces.
Parentheses, as you'd expect, have higher precedence than any operator, and cause the enclosed expression to be evaluated before anything outside the parentheses.
The default output of an array object is a list of elements, one on each line.
* That explains seemingly inconsistent results such as that "foo" + 1 evaluates to foo1, but 1 + "foo" gives you an error.
So, let's take a look at your test cases, and analyze what happens following the logic explained above.
Case I:
"ab", "cd"
This one is very straightforward: the , operator creates an array of two strings, ab and bc.
Case II:
"a" + "b", "c" + "d"
This one seems counter-intuitive at a first glance; the key is to realize that , is evaluated before +. Here's how the result is obtained, step by step:
"b", "c" is evaluated first (due to ,'s precedence over +), creating an array of the elements b and c.
Next, the + operator is applied to "a" and the array created in step 1. Since the left operand "a" is a string, a string concatenation is performed.
To concatenate a string with an array, the array is cast as a string. An array of b and c becomes the string b c.
Concatenating a with b c yields ab c.
Finally, ab c is concatenated with d to produce the final result, ab cd
Case III:
"a"+"b" , ("c"+"d")
This yields the same result by coincidence; the evaluation occurs differently:
("c"+"d") is evaluated first, due to the parentheses. That's a plain vanilla string concatenation, which results in the string cd.
Next, the , operator is applied to "b" and the string from step 1, creating an array of the elements b and cd. (This is evaluated before "a" + "b" because , has higher precedence than +.)
Then, the + operator is applied to "a" and the array created in step 2. Since the left operand is a string, a string concatenation is performed.
The array of b and cd is cast as the string b cd.
The strings a and b cd are concatenated, resulting in the final output ab cd.
Case IV:
("a" + "b"), "c" + "d"
This is the most interesting test case, and probably the most counter-intuitive, because it yields results that may appear inconsistent. Here's how it's evaluated:
Similar to Case III step 1, ("a" + "b") is evaluated first, producing the string ab.
The , is applied to the string from step 1 and "c", creating an array of the elements ab and c.
The + operator is applied to the array from step 2 and the next token, "d". This time, the left operand is an array, so an array concatenation is performed.
The string operand "d" is cast as an array of one element.
The two arrays are concatenated into an array of the elements ab, c, and d.
Case V:
("a" + "b"), ("c" + "d")
By now, it should be easy to see what happens here:
("a" + "b") is evaluated as the string ab.
("c" + "d") is evaluated as the string cd.
The , is applied to the strings from steps 1 and 2, creating an array of ab and cd.
You can see the precedence table with the command
Get-Help about_Operator_Precedence
(Help doc names can be tab-completed, so to save typing, that's help about_opTABTAB)
Related
I have two examples of foldLeft that I cannot really grasp the logic.
First example:
val donuts: List[String] = List("Plain", "Strawberry", "Glazed")
println(donuts.foldLeft("")((acc, curr) => s" $acc, $curr Donut ")) // acc for accumulated, curr for current
This will give
, Plain Donut , Strawberry Donut , Glazed Donut
Does not foldLeft accumulate value that was earlier?
I was expecting the result to be
, Plain Donut , Plain Strawberry Donut , Strawberry Glazed Donut
Second example:
def multi(num:Int,num1:Int):Int=num*10
(1 until 3).foldLeft(1)(multi)
Can someone explain what is happening in every step? I cannot see how this can become 100. Also, the second argument multi is a function. Why does it not take any input variables (num and num1)?
Does not foldLeft accumulate value that was earlier?
yes it does. It uses result of previous iteration and current element and produces new result (using the binary operator you have provided) so for the first example next steps are happening:
the start value of accumulator - ""
acc = "", curr = "Plain" -> " , Plain Donut "
acc = " , Plain Donut ", curr = "Strawberry" -> " , Plain Donut , Strawberry Donut "
acc = " , Plain Donut , Strawberry Donut ", curr = "Strawberry" -> " , Plain Donut , Strawberry Donut , Glazed Donut"
For the second example current value is simply ignored - i.e. multi can be rewritten as def multi(acc:Int, curr:Int):Int = acc*10 where curr is not actually used so foldLeft simply multiplies starting value (1) by 10 n times where n is number of elements in the sequence (i.e. (1 until 3).length which is 2).
Why does it not take any input variables (num and num1)?
foldLeft is a function which accepts a function. It accepts
a generic function which in turn accepts two parameters and returns result of the same type as the first parameter (op: (B, A) => B, where B is the the result type and A is sequence element type). multi matches this definition when B == A == Int and is passed to the foldLeft which will internally provide the input variables on the each step.
A left fold needs three things: a list to works on, an initial value, and a function.
That function takes each element of the list and applies it to the initial value. Each time, the initial value is the result of the previous fucntion application.
Summing a list is a great toy example.
val nums = List(1, 2, 3, 4)
val total = nums.foldLeft(0)((a, b) => a + b)
total is 10.
That initial value has to pass all of the state you need. Currently, you pass the accumulated string. What you should be passing is a tuple giving your function extra information to work with: the accumulated string, and a string representing the previous element in the list.
Your result would look something like:
("Plain donut, ...", "Glazed")
Then you just need to select out the accumulated string and discard the second element.
What does the following code do?
var pair: RDD[(Long,Long)] = sparkContext.parallelize(Array((3L, 4L)))
Does variable "pair" consist of a single element/pair? The types of 3 and 4 is "long" not "int".
It creates an Array[Tuple2[Long,Long]] with one element (3L, 4L) (aka Tuple2(3L,4L)).
Such case could be written (maybe more readable), as Array(3L -> 4L).
Also note it's better not to use mutable type like Array (rather Seq or List).
Let's decompose it:
Array(el1, el2, el3, ...) creates a new Array containing the given elements. In our case, there is a single element in the array: (3L,4L).
Writing (a,b) creates a Tuple2 value containing a and b of types A and B (not necessarily the same types). The Tuple2[A,B] type can also be written as (A,B) for clarity.
In our case, a=3L and b=4L. You might be confused by the L suffix after literals. It is there so that the compiler can differentiate between 3 as an Int and 3L as a Long. Scala also supports other such suffixes, like f or d for explicitly declaring Float or Double literals.
So you're indeed creating an Array[Tuple2[Long,Long]], also expressed as Array[(Long,Long)], which goes into your RDD.
I can merge two lists as follow together:
List(1,2,3) ::: List(4,5,6)
and the result is:
res2: List[Int] = List(1, 2, 3, 4, 5, 6)
the operator ::: is right associative, what does it mean?
In math, right associative is:
5 + ( 5 - ( 2 * 3 ) )
Right associative means the operator (in our case, the ::: method) is applied on the right operand while using the left operand as the argument. This means that the actual method invocation is done like this:
List(4,5,6).:::(List(1,2,3))
Since ::: prepends the list, the result is List(1,2,3,4,5,6).
In the most general sense, right-associative means that if you don't put any parentheses, they will be assumed to be on the right:
a ::: b ::: c == a ::: (b ::: c)
whereas left-associative operators (such as +) would have
a + b + c == (a + b) + c
However, according to the spec (6.12.3 Infix Operations),
A left-associative binary operation e1 op e2 is interpreted as e1.op(e2). If op is rightassociative,
the same operation is interpreted as { val x=e1; e2.op(x ) }, where
x is a fresh name.
So right-associative operators in scala are considered as methods of their right operand with their left operand as parameter (as explained in #Yuval's answer).
if am trying to write a simple function that list of pair of integers - representing a graph and returns a list of integers : all the nodes in a graph
eg if input is [(1,2) (3,4) (5,6) (1,5)]
o/p should be [1,2,3,4,5,6,1,5]
The function is simply returning list of nodes , in the returning list values may repeat as above.
I wrote the following function
fun listofnodes ((x:int,y:int)::xs) = if xs=nil then [x::y] else [[x::y]#listofnodes(xs)]
stdIn:15.12-15.18 Error: operator and operand don't agree [tycon mismatch
operator domain: int * int list
operand: int * int
in expression:
x :: y.
I am not able to figure out what is wrong.
first of all you should know what each operator does:
:: puts individual elemtents into an existing list so that: 1::2::3::[] = [1,2,3]
# puts two lists together so that: [1,2] # [3,4] = [1,2,3,4]
you can also use :: to put lists together but then it becomes a list of lists like:
[1,2] :: [3,4] = [[1,2],[3,4]]
so by writing [x::y] you are saying that x and y should become a list inside a list.
and you shouldnt use an if statement to check for the end of the list, instead you can use patterns to do it like this:
fun listofnodes [] = []
| listofnodes ((x,y)::xs) = x :: y :: listofnodes(xs);
the first pattern assures that when we reach the end of the list, when you extract the final tuple your xs is bound to an empty list which it calls itself with, it leaves an empty list to put all the elements into, so that [(1,2) (3,4) (5,6) (1,5)] would evaluate like this:
1 :: 2 :: 3 :: 4 :: 5 :: 6 :: 1 :: 5 :: [] = [1,2,3,4,5,6,1,5].
you could also make it like this:
fun listofnodes [] = []
| listofnodes ((x,y)::xs) = [x,y] # listofnodes(xs);
this way you make a small 2 element list out of each tuple, and then merge all these small lists into one big list. you dont really need the empty list at the end, but its the only way of ensuring that the recursion stops at the end of the list and you have to put something on the other side of the equals sign. it evaluates like this:
[1,2] # [3,4] # [5,6] # [1,5] # [] = [1,2,3,4,5,6,1,5].
also you cast your x and y as ints, but you dont really have to. if you dont, it gets the types " ('a * 'a) list -> 'a list " which just means that it works for all input types including ints (as long as the tuple doesnt contain conflicting types, like a char and an int).
im guessing you know this, but in case you dont: what you call pairs, (1,2), is called tuples.
When programming in Scala, I do more and more functional stuff. However, when using infix notation it is hard to tell when you need parenthesis and when you don't.
For example the following piece of code:
def caesar(k:Int)(c:Char) = c match {
case c if c isLower => ('a'+((c-'a'+k)%26)).toChar
case c if c isUpper => ('A'+((c-'A'+k)%26)).toChar
case _ => c
}
def encrypt(file:String,k:Int) = (fromFile(file) mkString) map caesar(k)_
The (fromFile(file) mkString) needs parenthesis in order to compile. When removed I get the following error:
Caesar.scala:24: error: not found: value map
def encrypt(file:String,k:Int) = fromFile(file) mkString map caesar(k)_
^
one error found
mkString obviously returns a string on which (by implicit conversion AFAIK)I can use the map function.
Why does this particular case needs parentheses? Is there a general guideline on when and why you need it?
This is what I put together for myself after reading the spec:
Any method which takes a single parameter can be used as an infix operator: a.m(b) can be written a m b.
Any method which does not require a parameter can be used as a postfix operator: a.m can be written a m.
For instance a.##(b) can be written a ## b and a.! can be written a!
Postfix operators have lower precedence than infix operators, so foo bar baz means foo.bar(baz) while foo bar baz bam means (foo.bar(baz)).bam and foo bar baz bam bim means (foo.bar(baz)).bam(bim).
Also given a parameterless method m of object a, a.m.m is valid but a m m is not as it would parse as exp1 op exp2.
Because there is a version of mkString that takes a single parameter it will be seen as an infix opertor in fromFile(file) mkString map caesar(k)_. There is also a version of mkString that takes no parameter which can be used a as postfix operator:
scala> List(1,2) mkString
res1: String = 12
scala> List(1,2) mkString "a"
res2: String = 1a2
Sometime by adding dot in the right location, you can get the precedence you need, e.g. fromFile(file).mkString map { }
And all that precedence thing happens before typing and other phases, so even though list mkString map function makes no sense as list.mkString(map).function, this is how it will be parsed.
The Scala reference mentions (6.12.3: Prefix, Infix, and Postfix Operations)
In a sequence of consecutive type infix operations t0 op1 t1 op2 . . .opn tn, all operators op1, . . . , opn must have the same associativity.
If they are all left-associative, the sequence is interpreted as (. . . (t0 op1 t1) op2 . . .) opn tn.
In your case, 'map' isn't a term for the operator 'mkstring', so you need grouping (with the parenthesis around 'fromFile(file) mkString')
Actually, Matt R comments:
It's not really an associativity issue, more that "Postfix operators always have lower precedence than infix operators. E.g. e1 op1 e2 op2 is always equivalent to (e1 op1 e2) op2". (Also from 6.12.3)
huynhjl's answer (upvoted) gives more details, and Mark Bush's answer (also upvoted) point to "A Tour of Scala: Operators" to illustrate that "Any method which takes a single parameter can be used as an infix operator".
Here's a simple rule: never used postfix operators. If you do, put the whole expression ending with the postfix operator inside parenthesis.
In fact, starting with Scala 2.10.0, doing that will generate a warning by default.
For good measure, you might want to move the postfix operator out, and use dot notation for it. For example:
(fromFile(file)).mkString map caesar(k)_
Or, even more simply,
fromFile(file).mkString map caesar(k)_
On the other hand, pay attention to the methods where you can supply an empty parenthesis to turn them into infix:
fromFile(file) mkString () map caesar(k)_
The spec doesn't make it clear, but my experience and experimentation has shown that the Scala compiler will always try to treat method calls using infix notation as infix operators. Even though your use of mkString is postfix, the compiler tries to interpret it as infix and so is trying to interpret "map" as its argument. All uses of postfix operators must either be immediately followed by an expression terminator or be used with "dot" notation for the compiler to see it as such.
You can get a hint of this (although it's not spelled out) in A Tour of Scala: Operators.