I'm new to Scala and I cannot find out what is causing this error, I have searched similar topics but unfortunately, none of them worked for me. I've got a simple code to find the line from some README.md file with the most words in it. The code I wrote is:
val readme = sc.textFile("/PATH/TO/README.md")
readme.map(lambda line :len(line.split())).reduce(lambda a, b: a if (a > b) else b)
and the error is:
Name: Compile Error
Message: <console>:1: error: ')' expected but '(' found.
readme.map(lambda line :len(line.split()) ).reduce( lambda a, b: a
if (a > b) else b ) ^
<console>:1: error: ';' expected but ')' found.
readme.map(lambda line :len(line.split()) ).reduce( lambda a, b: a
if (a > b) else b ) ^
Your code isn't valid Scala.
I think what you might be trying to do is to determine the largest number of words on a single line in a README file using Spark. Is that right? If so, then you likely want something like this:
val readme = sc.textFile("/PATH/TO/README.md")
readme.map(_.split(' ').length).reduce(Math.max)
That last line uses some argument abbreviations. This alternative version is equivalent, but a little more explicit:
readme.map(line => line.split(' ').length).reduce((a, b) => Math.max(a, b))
The map function converts an RDD of Strings (each line in the file) into an RDD of Ints (the number of words on a single line, delimited - in this particular case - by spaces). The reduce function then returns the largest value of its two arguments - which will ultimately result in a single Int value representing the largest number of elements on a single line of the file.
After re-reading your question, it seems that you might want to know the line with the most words, rather than how many words are present. That's a little trickier, but this should do the trick:
readme.map(line => (line.split(' ').length, line)).reduce((a, b) => if(a._1 > b._1) a else b)._2
Now map creates an RDD of a tuple of (Int, String), where the first value is the number of words on the line, and the second is the line itself. reduce then retains whichever of its two tuple arguments has the larger integer value (._1 refers to the first element of the tuple). Since the result is a tuple, we then use ._2 to retrieve the corresponding line (the second element of the tuple).
I'd recommend you read a good book on Scala, such as Programming in Scala, 3rd Edition, by Odersky, Spoon & Venners. There's also some tutorials and an overview of the language on the main Scala language site. Coursera also has some free Scala training courses that you might want to sign up for.
Related
The Q Tips book (Nick Psaris) shows the
following function (Chapter 10):
q)merge:`time xdesc upsert
As it is stated, it corresponds to function composition. I see the pattern: the
expression supplies a function that takes both arguments for upsert and then
uses its result to feed time xdesc. However the syntax feels weird, since
I would expect upsert to be the second argument of the xdesc invocation.
Aiming at simplifying the expression, I could see that the very same scenario
applies here:
q)f:1+*
q)f[2;3]
7
If we show its result, we can clearly see that f behaves as expected:
q)f
+[1]*
However, If we slightly modify the function, the meaning of the expression is
completely different:
q)g:+[1;]*
q)g[2;3]
'rank
[0] g[2;3]
^
In fact, +[1;] is passed as first argument to the * operator instead,
leading us to a rank error:
q)g
*[+[1;]]
I could also notice that the pattern breaks when the first function is
"monadic":
q)h:neg *
q)h[2;3]
'rank
[0] h[2;3]
^
Also here:
q)i:neg neg
'type
[0] i:neg neg
^
At this point, my intuition is that this pattern only applies when we are
interested on composing dyadic standard (vs user-defined) operators that exploit infix
notation. Am I getting it right? Is this syntactic sugar actually more general? Is there any
documentation where the pattern is fully described? Thanks!
There are some documented ways to achieve what you wish:
https://code.kx.com/q/ref/apply/#composition
You can create a unary train using #
q)r:neg neg#
q)r 1
1
https://code.kx.com/q/ref/compose/
You can use ' to compose a unary value with another of rank >=1
q)f:('[1+;*])
q)f[2;3]
7
Likely the behaviour you are seeing is not officially there to be exploited by users in q so should not be relied upon. This link may be of interest:
https://github.com/quintanar401/DCoQ
From KX: https://code.kx.com/q/ref/value/ says, when x is a list, value[x] will be result of evaluating list as a parse tree.
Q1. In code below, I understand (A) is a parse tree, given below definition. However, why does (B) also work? Is ("+";3;4) a valid parse tree?
q)value(+;3;4) / A
7
q)value("+";3;4) / B
7
q)eval(+;3;4) / C
7
q)eval("+";3;4) / D
'length
[0] eval("+";3;4)
Any other parse tree takes a form of a list, of which the first item
is a function and the remaining items are its arguments. Any of these
items can be parse trees. https://code.kx.com/q/basics/parsetrees/
Q2. In below code, value failed to return the result of what I think is a valid parse tree, but eval works fine, recursively evaluating the tree. Does this mean the topmost description is wrong?
q)value(+;3;(+;4;5))
'type
[0] value(+;3;(+;4;5))
^
q)eval(+;3;(+;4;5))
12
Q3. In general then, how do we choose whether to use value or eval?
put simply the difference between eval and value is that eval is specifically designed to evaluate parse trees, whereas value works on parse trees among other operations it does. For example value can be used to see the non-keyed values of dictionaries, or value strings, such as:
q)value"3+4"
7
Putting this string instead into the eval, we simply get the string back:
q)eval"3+4"
"3+4"
1 Following this, the first part of your question isn't too bad to answer. The format ("+";3;4) is not technically the parsed form of 3+4, we can see this through:
q)parse"3+4"
+
3
4
The good thing about value in this case is that it is valuing the string "+" into a the operator + and then valuing executing the parse tree. eval cannot understand the string "+" as this it outside the scope of the function. Which is why A, B and C work but not D.
2 In part two, your parse tree is indeed correct and once again we can see this with the parse function:
q)parse"3+(4+5)"
+
3
(+;4;5)
eval can always be used if your parse tree represents a valid statement to get the result you want. value will not work on all parse tree's only "simple" ones. So the nested list statement you have here cannot be evaluated by value.
3 In general eval is probably the best function of choice for evaluating your parse trees if you know them to be the correct parse tree format, as it can properly evaluate your statements, even if they are nested.
This question already has answers here:
toList on Range with suffix notation causes type mismatch
(2 answers)
Closed 5 years ago.
I am learning postfix unary operators in Scala.
The following can not compile:
val result = 43 toString
println(result)
However if I add one empty line inbetween the two lines, the code compile and produces right output:
val result = 43 toString
println(result)
What is the difference between these two segments?
BTW, I did not add "import scala.language.postfixOps".
Perhaps the issue is clearer if we use some other operator instead of toString.
// This parses as `List(1,2,3,4) ++ List(4,5,6)`
List(1,2,3,4) ++
List(4,5,6)
Basically, in order to make the above work, while also allowing things like foo ? (a postfix operator), Scala needs to know when it is OK to stop expecting a second argument (and accept that the expression is a postfix operator).
Its solution is give up on finding a second argument if there is an interceding new line.
I was trying to split a string and keep the empty strings. Fortunately i found a promising solution which gave me my expected results as following REPL session depicts:
scala> val test = ";;".split(";",-1)
test: Array[String] = Array("", "", "")
I was curious what the second parameter actually does and dived into the scala documentation but found nothing except this:
Also inside the REPL interpreter i only get the following information:
scala> "asdf".split
TAB
def split(String): Array[String]
def split(String, Int): Array[String]
Question
Does anybody have an alternate source of documentation for such badly documented parameters?
Or can someone explain what this 2dn parameter does on this specific function?
This is the same split from java.lang.String, which as it so happens, has better documentation:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
I'm learning Scala and lift at the same time and I got stuck on understanding the syntax used to inintialize the SiteMap in the Boot.scala:
val entries = Menu(Loc("Home", "/", "Home")) ::
Menu(Loc("Foo", "/badger", "Foo")) ::
Menu(Loc("Directory Foo", "/something/foo", "Directory Foo")) :: Nil
LiftRules.setSiteMap(SiteMap(entries:_*))
What exactly is the meaning of the SiteMap parameter?
I see that the value entries is a list of Menu. What is the colon, underscore, star?
At first I thought it is a method on the List, but I am unable to find such definition...
OK, after my colleague mentioned to me, that he encountered this secret incantation in the Programming in Scala book, I did a search in my copy and found it described in Section 8.8 Repeated parameters. (Though you need to search with space between the colon and underscore :-/ ) There is a one sentence to explain it as:
... append the array argument with a colon and an _* symbol, like this:
scala> echo(arr: _*)
This notation tells the compiler to pass each element of arr as its own argument to echo, rather than all of it as a single argument.
I find the description offered here more helpful.
So x: _* is like a type declaration that tells the compiler to treat x as repeated parameter (aka variable-length argument list — vararg).