Reading multiple integers from line in text file - scala

I am using Scala and reading input from the console. I am able to regurgitate the strings that make up each line, but if my input has the following format, how can I access each integer within each line?
2 2
1 2 2
2 1 1
Currently I just regurgitate the input back to the console using
object Main {
def main(args: Array[String]): Unit = {
for (ln <- io.Source.stdin.getLines) println(ln)
//how can I access each individual number within each line?
}
}
And I need to compile this project like so:
$ scalac main.scala
$ scala Main <input01.txt
2 2
1 2 2
2 1 1

A reasonable algorithm would be:
for each line, split it into words
parse each word into an Int
An implementation of that algorithm:
io.Source.stdin.getLines // for each line...
.flatMap(
_.split("""\s+""") // split it into words
.map(_.toInt) // parse each word into an Int
)
The result of this expression will be an Iterator[Int]; if you want a Seq, you can call toSeq on that Iterator (if there's a reasonable chance there will be more than 7 or so integers, it's probably worth calling toVector instead). It will blow up with a NumberFormatException if there's a word which isn't an integer. You can handle this a few different ways... if you want to ignore words that aren't integers, you can:
import scala.util.Try
io.Source.stdin.getLines
.flatMap(
_.split("""\s+""")
.flatMap(Try(_.toInt).toOption)
)

The following will give you a flat list of numbers.
val integers = (
for {
line <- io.Source.stdin.getLines
number <- line.split("""\s+""").map(_.toInt)
} yield number
)
As you can read here, some care must be taken when parsing the numbers.

Related

String interpolation of variable value

I want the variable value to be processed by string interpolation.
val temp = "1 to 10 by 2"
println(s"$temp")
output expected:
inexact Range 1 to 10 by 2
but getting
1 to 10 by 2
is there any way to get this way done?
EDIT
The normal case for using StringContext is:
$> s"${1 to 10 by 2}"
inexact Range 1 to 10 by 2
This return the Range from 1 to 10 with the step value of 2.
And String context won't work on variable, so can there be a way I can do like
$> val temp = "1 to 10 by 2"
$> s"${$temp}" //hypothetical
such that the interpreter will evaluate this as
s"${$temp}" => s"${1 to 10 by 2}" => Range from 1 to 10 by step of 2 = {1,3,5,7,9}
By setting a string value to temp you are doing just that - creating a flat String. If you want this to be actual code, then you need to drop the quotes:
val temp = 1 to 10 by 2
Then you can print the results:
println(s"$temp")
This will print the following output string:
inexact Range 1 to 10 by 2
This is the toString(...) output of a variable representing a Range. If you want to print the actual results of the 1 to 10 by 2 computation, you need to do something like this:
val resultsAsString = temp.mkString(",")
println(resultsAsString)
> 1,3,5,7,9
or even this (watch out: here the curly brackets { } are used not for string interpolation but simply as normal string characters):
println(s"{$resultsAsString}")
> {1,3,5,7,9}
Edit
If what you want is to actually interpret/compile Scala code on the fly (not recommended though - for security reasons, among others), then you may be interested in this:
https://ammonite.io/ - Ammonite, Scala scripting
In any case, to interpret your code from a String, you may try using this:
https://docs.scala-lang.org/overviews/repl/embedding.html
See these lines:
val scripter = new ScriptEngineManager().getEngineByName("scala")
scripter.eval("""println("hello, world")""")

print format adjust decimal point

I like to print a lot of numbers between -1 and 1 and need them to be aligned by the decimal point.
What I get with %2.2f is:
val (a, b) = (0.38, -0.38); println (f"${a}%2.2f\n${b}%2.2f ")
0,38
-0,38
What I like to get is:
0,38
-0,38
Is there an elegant solution?
What you can actually do is to add -+ preceding the formatting likewise:
scala> val (a, b) = (0.38, -0.38); println (f"${a}%-+2.2f\n${b}%-+2.2f")
+0.38
-0.38
a: Double = 0.38
b: Double = -0.38
You will get the + before the number though.
EDIT:
If you know the number of digits of the numbers (the first number of %n.m indicates the length of the digits), you can actually go like:
scala> printf("%5.2f", a);
0.38
scala> printf("%5.2f", b);
-0.38
Although there is already an accepted answer, I'll add one more for future reference. Scala f"" string interpolator actually uses Java formatting infrastructure and in the Java documentation you may find following flag:
' ' '\u0020' Requires the output to include a single extra space ('\u0020') for non-negative values.
So you might actually want to use it. Here is an example that shows the difference:
val arr = Array(0.38, -0.38, 10.38, -10.38, 123.38, -123.38)
println("Without space:")
arr.foreach(a => println(f"${a}%6.2f"))
println("----------------")
println("With space:")
arr.foreach(a => println(f"${a}% 6.2f"))
which produces following output:
Without space:
0,38
-0,38
10,38
-10,38
123,38
-123,38
----------------
With space:
0,38
-0,38
10,38
-10,38
123,38
-123,38
note the difference for 123.38/-123.38 i.e. for the case when there is an "overflow"
The solution is trivial: The first number does not indicate digits before the dot, but digits total, and does not yield to an errormessage, if too short. So for 2 digits after the dot, plus dot, plus one in front and an optional minus sign, I need 5 digits in total, and then it works:
val (a, b) = (0.38, -0.38); println (f"${a}%5.2f\n${b}%5.2f ")
0,38
-0,38
And no, a plus sign is not an option.

How to extract number from string column?

My requirement is to retrieve the order number from the comment column which is in a column comment and always starts with R. The order number should be added as a new column to the table.
Input data:
code,id,mode,location,status,comment
AS-SD,101,Airways,hyderabad,D,order got delayed R1657
FY-YT,102,Airways,Delhi,ND,R7856 package damaged
TY-OP,103,Airways,Pune,D,Order number R5463 not received
Expected output:
AS-SD,101,Airways,hyderabad,D,order got delayed R1657,R1657
FY-YT,102,Airways,Delhi,ND,R7856 package damaged,R7856
TY-OP,103,Airways,Pune,D,Order number R5463 not received,R5463
I have tried it in spark-sql, the query I am using is given below:
val r = sqlContext.sql("select substring(comment, PatIndex('%[0-9]%',comment, length(comment))) as number from A")
However, I'm getting the following error:
org.apache.spark.sql.AnalysisException: undefined function PatIndex; line 0 pos 0
You can use regexp_extract which has the definition :
def regexp_extract(e: Column, exp: String, groupIdx: Int): Column
(R\\d{4}) means R followed by 4 digits. You can easily accommodate any other case by using a valid regex
df.withColumn("orderId", regexp_extract($"comment", "(R\\d{4})" , 1 )).show
+-----+---+-------+---------+------+--------------------+-------+
| code| id| mode| location|status| comment|orderId|
+-----+---+-------+---------+------+--------------------+-------+
|AS-SD|101|Airways|hyderabad| D|order got delayed...| R1657|
|FY-YT|102|Airways| Delhi| ND|R7856 package dam...| R7856|
|TY-OP|103|Airways| Pune| D|Order number R546...| R5463|
+-----+---+-------+---------+------+--------------------+-------+
You can use a udf function as following
import org.apache.spark.sql.functions._
def extractString = udf((comment: String) => comment.split(" ").filter(_.startsWith("R")).head)
df.withColumn("newColumn", extractString($"comment")).show(false)
where the comment column is splitted with space and filtering the words that starts with R. head will take the first word that was filtered starting with R.
Updated
To ensure that the returned string is order number starting with R and rest of the strings are digits, you can add additional filter
import scala.util.Try
def extractString = udf((comment: String) => comment.split(" ").filter(x => x.startsWith("R") && Try(x.substring(1).toDouble).isSuccess).head)
You can edit the filter according to your need.

Count filtered records in scala

As I am new to scala ,This problem might look very basic to all..
I have a file called data.txt which contains like below:
xxx.lss.yyy23.com-->mailuogwprd23.lss.com,Hub,12689,14.98904563,1549
xxx.lss.yyy33.com-->mailusrhubprd33.lss.com,Outbound,72996,1.673717588,1949
xxx.lss.yyy33.com-->mailuogwprd33.lss.com,Hub,12133,14.9381027,664
xxx.lss.yyy53.com-->mailusrhubprd53.lss.com,Outbound,72996,1.673717588,3071
I want to split the line and find the records depending upon the numbers in xxx.lss.yyy23.com
val data = io.Source.fromFile("data.txt").getLines().map { x => (x.split("-->"))}.map { r => r(0) }.mkString("\n")
which gives me
xxx.lss.yyy23.com
xxx.lss.yyy33.com
xxx.lss.yyy33.com
xxx.lss.yyy53.com
This is what I am trying to count the exact value...
data.count { x => x.contains("33")}
How do I get the count of records who does not contain 33...
The following will give you the number of lines that contain "33":
data.split("\n").count(a => a.contains("33"))
The reason what you have above isn't working is that you need to split data into an array of strings again. Your previous statement actually concatenates the result into a single string using newline as a separator using mkstring, so you can't really run collection operations like count on it.
The following will work for getting the lines that do not contain "33":
data.split("\n").count(a => !a.contains("33"))
You simply need to negate the contains operation in this case.

Need the best way to iterate a file returning batches of lines as XML

I'm looking for the best way to process a file in which, based on the contents, i combine certain lines into XML and return the XML.
e.g. Given
line 1
line 2
line 3
line 4
line 5
I may want the first call to return
<msg>line 1, line 2</msg>
and a subsequent call to return
<msg>line 5, line 4</msg>
skipping line 3 for uninteresting content and exhausting the input stream. (Note: the <msg> tags will always contain contiguous lines but the number and organization of those lines in the XML will vary.) If you'd like some criteria for choosing lines to include in a message, assume odd line #s combine with the following four lines, even line #s combine with the following two lines, mod(10) line #s combine with the following five lines, skip lines that start with '#'.
I was thinking I should implement this as an iterator so i can just do
<root>{ for (m <- messages(inputstream)) yield m }</root>
Is that reasonable? If so, how best to implement it? If not, how best to implement it? :)
Thanks
This answer provided my solution: How do you return an Iterator in Scala?
I tried the following but there appears to be some sort of buffer issue and lines are skipped between calls to Log.next.
class Log(filename:String) {
val src = io.Source.fromFile(filename)
var node:Node = null
def iterator = new Iterator[Node] {
def hasNext:Boolean = {
for (line <- src.getLines()) {
// ... do stuff ...
if (null != node) return true
}
src.close()
false
}
def next = node
}
There might be a more Scala-way to do it and i'd like to see it but this is my solution to move forward for now.