In Scala, how to stop reading lines from a file as soon as a criterion is accomplished? - scala

Reading lines in a foreach loop, a function looks for a value by a key in a CSV-like structured text file. After a specific line is found, it is senseless to continue reading lines looking for something there. How to stop as there is no break statement in Scala?

Scala's Source class is lazy. You can read chars or lines using takeWhile or dropWhile and the iteration over the input need not proceed farther than required.

To expand on Randall's answer. For instance if the key is in the first column:
val src = Source.fromFile("/etc/passwd")
val iter = src.getLines().map(_.split(":"))
// print the uid for Guest
iter.find(_(0) == "Guest") foreach (a => println(a(2)))
// the rest of iter is not processed
src.close()

Previous answers assumed that you want to read lines from a file, I assume that you want a way to break for-loop by demand.
Here is solution
You can do like this:
breakable {
for (...) {
if (...) break
}
}

Related

find line number in an unstructured file in scala

Hi guys I am parsing an unstructured file for some key words but i can't seem to easily find the line number of what the results I am getiing
val filePath:String = "myfile"
val myfile = sc.textFile(filePath);
var ora_temp = myfile.filter(line => line.contains("MyPattern")).collect
ora_temp.length
However, I not only want to find the lines that contains MyPatterns but I want more like a tupple (Mypattern line, line number)
Thanks in advance,
You can use ZipWithIndex as eliasah pointed out in a comment (with probably the most succinct way to do this using the direct tuple accessor syntax), or like so using pattern matching in the filter:
val matchingLineAndLineNumberTuples = sc.textFile("myfile").zipWithIndex().filter({
case (line, lineNumber) => line.contains("MyPattern")
}).collect

Count filtered records in scala

As I am new to scala ,This problem might look very basic to all..
I have a file called data.txt which contains like below:
xxx.lss.yyy23.com-->mailuogwprd23.lss.com,Hub,12689,14.98904563,1549
xxx.lss.yyy33.com-->mailusrhubprd33.lss.com,Outbound,72996,1.673717588,1949
xxx.lss.yyy33.com-->mailuogwprd33.lss.com,Hub,12133,14.9381027,664
xxx.lss.yyy53.com-->mailusrhubprd53.lss.com,Outbound,72996,1.673717588,3071
I want to split the line and find the records depending upon the numbers in xxx.lss.yyy23.com
val data = io.Source.fromFile("data.txt").getLines().map { x => (x.split("-->"))}.map { r => r(0) }.mkString("\n")
which gives me
xxx.lss.yyy23.com
xxx.lss.yyy33.com
xxx.lss.yyy33.com
xxx.lss.yyy53.com
This is what I am trying to count the exact value...
data.count { x => x.contains("33")}
How do I get the count of records who does not contain 33...
The following will give you the number of lines that contain "33":
data.split("\n").count(a => a.contains("33"))
The reason what you have above isn't working is that you need to split data into an array of strings again. Your previous statement actually concatenates the result into a single string using newline as a separator using mkstring, so you can't really run collection operations like count on it.
The following will work for getting the lines that do not contain "33":
data.split("\n").count(a => !a.contains("33"))
You simply need to negate the contains operation in this case.

no dot between functions in map

I have the following code:
object testLines extends App {
val items = Array("""a-b-c d-e-f""","""a-b-c th-i-t""")
val lines = items.map(_.replaceAll("-", "")split("\t"))
print(lines.map(_.mkString(",")).mkString("\n"))
}
By mistake i did not put a dot between replaceAll and split but it worked.
By contrary when putting a dot between replaceAll and split i got an error
identifier expected but ';' found.
Implicit conversions found: items =>
What is going on?
Why does it work without a dot but is not working with a dot.
Update:
It works also with dot. The error message is a bug in the scala ide. The first part of the question is still valid
Thanks,
David
You have just discovered that Operators are methods. x.split(y) can also be written x split y in cases where the method is operator-like and it looks nicer. However there is nothing stopping you putting either side in parentheses like x split (y), (x) split y, or even (x) split (y) which may be necessary (and is a good idea for readability even if not strictly necessary) if you are passing in a more complex expression than a simple variable or constant and need parentheses to override the precedence.
With the example code you've written, it's not a bad idea to do the whole thing in operator style for clarity, using parentheses only where the syntax requires and/or they make groupings more obvious. I'd probably have written it more like this:
object testLines extends App {
val items = Array("a-b-c d-e-f", "a-b-c th-i-t")
val lines = items map (_ replaceAll ("-", "") split "\t")
print(lines map (_ mkString ",") mkString "\n")
}

find whether a string is substring of other string in SML NJ

In SML NJ, I want to find whether a string is substring of another string and find its index. Can any one help me with this?
The Substring.position function is the only one I can find in the basis library that seems to do string search. Unfortunately, the Substring module is kind of hard to use, so I wrote the following function to use it. Just pass two strings, and it will return an option: NONE if not found, or SOME of the index if it is found:
fun index (str, substr) = let
val (pref, suff) = Substring.position substr (Substring.full str)
val (s, i, n) = Substring.base suff
in
if i = size str then
NONE
else
SOME i
end;
Well you have all the substring functions, however if you want to also know the position of it, then the easiest is to do it yourself, with a linear scan.
Basically you want to explode both strings, and then compare the first character of the substring you want to find, with each character of the source string, incrementing a position counter each time you fail. When you find a match you move to the next char in the substring as well without moving the position counter. If the substring is "empty" (modeled when you are left with the empty list) you have matched it all and you can return the position index, however if the matching suddenly fail you have to return back to when you had the first match and skip a letter (incrementing the position counter) and start all over again.
Hope this helps you get started on doing this yourself.

Need the best way to iterate a file returning batches of lines as XML

I'm looking for the best way to process a file in which, based on the contents, i combine certain lines into XML and return the XML.
e.g. Given
line 1
line 2
line 3
line 4
line 5
I may want the first call to return
<msg>line 1, line 2</msg>
and a subsequent call to return
<msg>line 5, line 4</msg>
skipping line 3 for uninteresting content and exhausting the input stream. (Note: the <msg> tags will always contain contiguous lines but the number and organization of those lines in the XML will vary.) If you'd like some criteria for choosing lines to include in a message, assume odd line #s combine with the following four lines, even line #s combine with the following two lines, mod(10) line #s combine with the following five lines, skip lines that start with '#'.
I was thinking I should implement this as an iterator so i can just do
<root>{ for (m <- messages(inputstream)) yield m }</root>
Is that reasonable? If so, how best to implement it? If not, how best to implement it? :)
Thanks
This answer provided my solution: How do you return an Iterator in Scala?
I tried the following but there appears to be some sort of buffer issue and lines are skipped between calls to Log.next.
class Log(filename:String) {
val src = io.Source.fromFile(filename)
var node:Node = null
def iterator = new Iterator[Node] {
def hasNext:Boolean = {
for (line <- src.getLines()) {
// ... do stuff ...
if (null != node) return true
}
src.close()
false
}
def next = node
}
There might be a more Scala-way to do it and i'd like to see it but this is my solution to move forward for now.