The expression prints itself in unexpected order - scala

When I print a log information like this:
val idList = getIdList
log info s"\n\n-------idList: ${idList foreach println}"
It shows me:
1
2
3
4
5
-------idList: ()
That makes sense because foreach returns Unit. But why does it print the list of id first? idList is already evaluated in the previous line (if that's the cause)!
And how to make it print it in expected order - after idList:?

This is because you're not evaluating the log string to read what you want, you evaluate it to:
\n\n -------idList: ()
However, the members of the list appear in the output stream as a side effect, due to the println call in the string interpolation.
EDIT: since clarification was requested by the OP, what happens is that the output comes from two sources:
${idList foreach println} evaluates to (), since println itself doesn't return anything.
However, you can see the elements printed out, because when the string interpolation is evaluated, println is being called. And println prints all the elements into the output stream.
In other words:
//line with log.info() reached, starts evaluating string before method call
1 //println from foreach
2 //println from foreach
3 //println from foreach
4 //println from foreach
5 //println from foreach
//string argument log.info() evaluated from interpolation
-------idList: () //log info prints the resultant string
To solve your problem, modify the expression in the interpolated string to actually return the correct string, e.g.:
log info s"\n\n-------idList: ${idList.mkString("\n")}"

Interpolation works in a following way:
evaluate all arguments
substitute their results into resulting string

println is a Unit function that prints to the standard output, you should use mkstring instead that returns a string
log info s"\n\n-------idList: ${idList.mkString("(", ", ", ")")}"

As pointed out by #TheTerribleSwiftTomato , you need to give an expression that returns a value and has no other side-effect. So simply do it like this:
val idList = getIdList
log info s"\n\n-------idList: ${idList mkString " "}"
For example, this works for me:
val idList = List(1, 2, 3, 4, 5)
println(s"\n\n-------idList: ${idList mkString " "}")
Output:
-------idList: 1 2 3 4 5

Related

Reading multiple integers from line in text file

I am using Scala and reading input from the console. I am able to regurgitate the strings that make up each line, but if my input has the following format, how can I access each integer within each line?
2 2
1 2 2
2 1 1
Currently I just regurgitate the input back to the console using
object Main {
def main(args: Array[String]): Unit = {
for (ln <- io.Source.stdin.getLines) println(ln)
//how can I access each individual number within each line?
}
}
And I need to compile this project like so:
$ scalac main.scala
$ scala Main <input01.txt
2 2
1 2 2
2 1 1
A reasonable algorithm would be:
for each line, split it into words
parse each word into an Int
An implementation of that algorithm:
io.Source.stdin.getLines // for each line...
.flatMap(
_.split("""\s+""") // split it into words
.map(_.toInt) // parse each word into an Int
)
The result of this expression will be an Iterator[Int]; if you want a Seq, you can call toSeq on that Iterator (if there's a reasonable chance there will be more than 7 or so integers, it's probably worth calling toVector instead). It will blow up with a NumberFormatException if there's a word which isn't an integer. You can handle this a few different ways... if you want to ignore words that aren't integers, you can:
import scala.util.Try
io.Source.stdin.getLines
.flatMap(
_.split("""\s+""")
.flatMap(Try(_.toInt).toOption)
)
The following will give you a flat list of numbers.
val integers = (
for {
line <- io.Source.stdin.getLines
number <- line.split("""\s+""").map(_.toInt)
} yield number
)
As you can read here, some care must be taken when parsing the numbers.

Perl's foreach with string argument

Perldocs only indicate that foreach loops "iterates over a normal list value" https://perldoc.perl.org/perlsyn.html#Foreach-Loops, but I sometimes see them with string arguments, such as the following examples:
foreach (`curl example.com 2>/dev/null`) {
# iterates 50 times
}
foreach ("foo\nbar\nbaz") {
# iterates just 1 time. Why?
}
Is the behavior of passing a string like this defined? Separately, why the disparate results from passing the string returned by a backticked command, and a literal string, as in the example?
In scalar context, backticks return a single scalar containing all the output of the enclosed command. But foreach (...) evaluates the backticks in list context, which will separates the output into a list with one line per element.
The question revolves around the context, a critical concept for many things in Perl.
The foreach loop needs a list to iterate over, so it imposes the list context to build the list values you saw mentioned in docs. The list may be formed with literals, qw(a b c), and may have one element; this is your second example, where one string is given, forming the one-element list that is iterated over.
The list can also come from an expression, that is evaluated in the list context; this is your first example. Many operations yield different returns based on context, and qx is such an operator as explained in mob's answer. This is something to note and be careful with. An expression may also return a single value regardless of context; then it is simply used to populate the list.
From perldoc -f qx:
In list context, returns a list of lines (however you've defined lines with $/ or $INPUT_RECORD_SEPARATOR), or an empty list if the command failed.
From perldoc perlsyn:
Compound statements
[...]
LABEL foreach VAR (LIST) BLOCK
A string is not a list. If you want to iterate over the characters in a string you'll need
foreach my $character (split('', "foo\nbar\nbaz")) {
I think you might be confusing Perl with Python:
>>> for c in "foo\nbar\nbaz":
... print c
...
f
o
.... remainder deleted ....
a
z
>>>
As pointed out by the other answers backticks/qx{} return a list of output lines from the executed command in list context.

Can't print RDD content with take() action

When I try to print RDD content with first() action I am able to print it with foreach loop. But with take() action it doesn't print out content.
using first()
myRDD.first().foreach(println)
1
2013-07-25 00:00:00.0
11599
CLOSED
using take():
myRDD.take(5).foreach(println)
[Ljava.lang.String;#23a5818e
[Ljava.lang.String;#4715ae33
[Ljava.lang.String;#9fc9f91
[Ljava.lang.String;#1fac1d5c
[Ljava.lang.String;#108a46d6
I expect same output as first() indeed it should be. But ı get different output.
I assume your RDD is of type org.apache.spark.rdd.RDD[Array[String]]. In that case the return type of the first method is Array[String] and the foreach(println) prints the elements of the first string array in the RDD.
But the return type of take(5) is Array[Array[String]] and foreach(println) prints the 5 elements.
To get the same output for first and take(5) either use
println(myRDD.first())
myRDD.take(5).foreach(println)
or
myRDD.first().foreach(println)
myRDD.take(5).foreach(_.foreach(println))

Invoking print() with list.foreach in Scala is printing Nil

I am a new to Scala and learning through the language constructs. While using print() with list.foreach() also prints the Nil or "()" in the console. Is this something expected or am I missing some trick here?
Code Snippet:
val oneTwo = "one"::"two"::Nil
println(oneTwo.foreach(s=> print(s+" ")))
o/p: one two ()
You have an extra println.
oneTwo.foreach(s=> print(s+" "))
Prints the contents of the list - "one two".
The println you have outside prints out the return value of the foreach statement, which is Unit (not Nil - that's a completely different beast), represented in scala as ().
To just output the list elements,
oneTwo.foreach(s=> print(s+" "))
would suffice. Now you put another print around that, so you say "and then print whatever oneTwo.foreach(s=> print(s+" ")) evaluates to".
The return type of foreach is Unit, so it'll return the only value of that type, the empty tuple ().
So what you see is the list elements printed by the print in the foreach, and then the outer print prints the result of the foreach. Does that make sense?

Spark / Scala Split

I have this code:
rdd.map(_.split("-")).filter(row => { ... })
when I do row.length on:
This-is-a-test----on-split--
This-is-a-test-------
the output is 9 and 4 respectively. It doesn't count the trailing delimited characters if it is empty. What is the workaround here if I want both outputs to be 10?
You can accomplish what you want by passing -1 as limit parameter to split like this:
rdd.map(_.split("-", -1)).filter(row => { ... })
Btw, the expected result is 11, and not 10 (since if you want to keep empty tokens and your string ends with the delimiter, then it's interpreted as if there's an empty token after that delimiter). You can see this for more information.