Parse a log file with scala - scala

I am trying to parse a text file. My input file looks like this:
ID: 12343-7888
Name: Mary, Bob, Jason, Jeff, Suzy
Harry, Steve
Larry, George
City: New York, Portland, Dallas, Kansas City
Tampa, Bend
Expected output would:
“12343-7888”
“Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George”
“New York, Portland, Dallas, Kansas City, Tampa, Bend"
Note the “Name” and "City" fields have new lines or returns in them. I have this code below, but it is not working. The second line of code places each character in a line. Plus, I am having troubles only grabbing the data from the field, like only returning the actual names, where the “Name: “ is not part of the results. Also, looking to put quotes around each return field.
Can you help fix up my problems?
val lines = Source.fromFile("/filesdata/logfile.text").getLines().toList
val record = lines.dropWhile(line => !line.startsWith("Name: ")).takeWhile(line => !line.startsWith("Address: ")).flatMap(_.split(",")).map(_.trim()).filter(_.nonEmpty).mkString(", ")
val final results record.map(s => "\"" + s + "\"").mkString(",\n")
How can I get my results that I am looking for?

SHORT ANSWER
A two-liner that produces a string that looks exactly as you specified:
println(lines.map{line => if(line.trim.matches("[a-zA-Z]+:.*"))
("\"\n\"" + line.split(":")(1).trim) else (", " + line.trim)}.mkString.drop(2) + "\"")
LONG ANSWER
Why try to solve something in one line, if you can achieve the same thing in 94?
(That's the exact opposite of the usual slogan when working with Scala collections, but the input was sufficiently messy that I found it worthwhile to actually write out some of the intermediate steps. Maybe that's just because I've bought a nice new keyboard recently...)
val input = """ID: 12343-7888
Name: Mary, Bob, Jason, Jeff, Suzy
Harry, Steve
Larry, George
City: New York, Portland, Dallas, Kansas City
Tampa, Bend
ID: 567865-676
Name: Alex, Bob
Chris, Dave
Evan, Frank
Gary
City: Los Angeles, St. Petersburg
Washington D.C., Phoenix
"""
case class Entry(id: String, names: List[String], cities: List[String])
def parseMessyInput(input: String): List[Entry] = {
// just a first rought approximation of the structure of the input
sealed trait MessyInputLine { def content: String }
case class IdLine(content: String) extends MessyInputLine
case class NameLine(content: String) extends MessyInputLine
case class UnlabeledLine(content: String) extends MessyInputLine
case class CityLine(content: String) extends MessyInputLine
val lines = input.split("\n").toList
// a helper function for checking whether a line starts with a label
def tryParseLabeledLine
(label: String, line: String)
(cons: String => MessyInputLine)
: Option[MessyInputLine] = {
if (line.startsWith(label + ":")) {
Some(cons(line.drop(label.size + 1)))
} else {
None
}
}
val messyLines: List[MessyInputLine] = for (line <- lines) yield {
(
tryParseLabeledLine("Name", line){NameLine(_)} orElse
tryParseLabeledLine("City", line){CityLine(_)} orElse
tryParseLabeledLine("ID", line){IdLine(_)}
).getOrElse(UnlabeledLine(line))
}
/** Combines the content of the first line with the content
* of all unlabeled lines, until the next labeled line or
* the end of the list is hit. Returns the content of
* the first few lines and the list of the remaining lines.
*/
def readUntilNextLabel(messyLines: List[MessyInputLine])
: (List[String], List[MessyInputLine]) = {
messyLines match {
case Nil => (Nil, Nil)
case h :: t => {
val (unlabeled, rest) = t.span {
case UnlabeledLine(_) => true
case _ => false
}
(h.content :: unlabeled.map(_.content), rest)
}
}
}
/** Glues multiple lines to entries */
def combineToEntries(messyLines: List[MessyInputLine]): List[Entry] = {
if (messyLines.isEmpty) Nil
else {
val (idContent, namesCitiesRest) = readUntilNextLabel(messyLines)
val (namesContent, citiesRest) = readUntilNextLabel(namesCitiesRest)
val (citiesContent, rest) = readUntilNextLabel(citiesRest)
val id = idContent.head.trim
val names = namesContent.map(_.split(",").map(_.trim).toList).flatten
val cities = citiesContent.map(_.split(",").map(_.trim).toList).flatten
Entry(id, names, cities) :: combineToEntries(rest)
}
}
// invoke recursive function on the entire input
combineToEntries(messyLines)
}
// how to use
val entries = parseMessyInput(input)
// output
for (Entry(id, names, cities) <- entries) {
println(id)
println(names.mkString(", "))
println(cities.mkString(", "))
}
Output:
12343-7888
Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George
New York, Portland, Dallas, Kansas City, Tampa, Bend
567865-676
Alex, Bob, Chris, Dave, Evan, Frank, Gary
Los Angeles, St. Petersburg, Washington D.C., Phoenix
You probably could write it down in one line, sooner or later. But if you write dumb code consisting of many simple intermediate steps, you don't have to think that hard, and there are no obstacles large enough to get stuck.

Related

Scala: How to create a loop/conditional statement that allows you to only get the same word to be compared then loop?

Is there a way to set it so that you can create a loop/conditional statement that allows you to get the word you're looking for and only loop through there?
For example, a csv file:
name, year, language
kyle, 1998, english
kyle, 2011, english
kyle, 1879, french
george, 1978, spanish
zoe, 2000, english
So when looking for kyle, it'll only get the values of kyle and not the other names?
Expected Return:
kyle, 1998, english
kyle, 2011, english
kyle, 1879, french
Sorry if this is simple, but I am new to Scala and not quite sure.
Step 1: Read the file
import scala.io.Source
import scala.util.Using
Using(Source.fromFile(filename)) { source =>
val lines =
source.getLines()
}
Step 2: Process each line.
We will parse them into a case class instance and remove any corrupt fields:
final case class Record(name: String, year: Int, language: String)
object Record {
def parseLine(line: String): Option[Record] =
line.split(',').toList match {
case nameRaw :: yearRaw :: languageRaw :: Nil =>
yearRaw.trim.toIntOption.map { year =>
Record(
name = nameRaw.trim,
year = year,
language = languageRaw.trim.toLowerCase
)
}
case _ =>
None
}
}
val records =
lines.flatMap(Record.parsLine)
Step 3: Filter the records by the predicate you want:
val namedKyle: Record => Boolean =
record => record.name.toLowerCase == "kyle"
val validRecords =
records.filter(namedKyle)
Step 4: Materialize the results in strict collection like List
val result =
validRecords.toList
Step 5: Combine everything together:
def readValidRecords(filename: String)(predicate: Record => Boolean): Try[List[Record]] =
Using(Source.fromFile(filename)) { source =>
source
.getLines()
.flatMap(Record.parsLine)
.filter(predicate)
.toList
}

Parse CSV and add only matching rows to List functionally in Scala

I am reading csv scala.
Person is a case class
Case class Person(name, address)
def getData(path:String,existingName) : List[Person] = {
Source.fromFile(“my_file_path”).getLines.drop(1).map(l => {
val data = l.split("|", -1).map(_.trim).toList
val personName = data(0)
if(personName.equalsIgnoreCase(existingName)) {
val address=data(1)
Person(personName,address)
//here I want to add to list
}
else
Nil
///here return empty list of zero length
}).toList()
}
I want to achieve this functionally in scala.
Here's the basic approach to what I think you're trying to do.
case class Person(name:String, address:String)
def getData(path:String, existingName:String) :List[Person] = {
val recordPattern = raw"\s*(?i)($existingName)\s*\|\s*(.*)".r.unanchored
io.Source.fromFile(path).getLines.drop(1).collect {
case recordPattern(name,addr) => Person(name, addr.trim)
}.toList
}
This doesn't close the file reader or report the error if the file can't be opened, which you really should do, but we'll leave that for a different day.
update: added file closing and error handling via Using (Scala 2.13)
import scala.util.{Using, Try}
case class Person(name:String, address:String)
def getData(path:String, existingName:String) :Try[List[Person]] =
Using(io.Source.fromFile(path)){ file =>
val recordPattern = raw"\s*(?i)($existingName)\s*\|\s*([^|]*)".r
file.getLines.drop(1).collect {
case recordPattern(name,addr) => Person(name, addr.trim)
}.toList
}
updated update
OK. Here's a version that:
reports the error if the file can't be opened
closes the file after it's been opened and read
ignores unwanted spaces and quote marks
is pre-2.13 compiler friendly
import scala.util.Try
case class Person(name:String, address:String)
def getData(path:String, existingName:String) :List[Person] = {
val recordPattern =
raw"""[\s"]*(?i)($existingName)["\s]*\|[\s"]*([^"|]*)*.""".r
val file = Try(io.Source.fromFile(path))
val res = file.fold(
err => {println(err); List.empty[Person]},
_.getLines.drop(1).collect {
case recordPattern(name,addr) => Person(name, addr.trim)
}.toList)
file.map(_.close)
res
}
And here's how the regex works:
[\s"]* there might be spaces or quote marks
(?i) matching is case-insensitive
($existingName) match and capture this string (1st capture group)
["\s]* there might be spaces or quote marks
\| there will be a bar character
[\s"]* there might be spaces or quote marks
([^"|]*) match and capture everything that isn't quote or bar
.* ignore anything that might come thereafter
you were not very clear on what was the problem on your approach, but this should do the trick (very close to what you have)
def getData(path:String, existingName: String) : List[Person] = {
val source = Source.fromFile("my_file_path")
val lst = source.getLines.drop(1).flatMap(l => {
val data = l.split("|", -1).map(_.trim).toList
val personName = data.head
if (personName.equalsIgnoreCase(existingName)) {
val address = data(1)
Option(Person(personName, address))
}
else
Option.empty
}).toList
source.close()
lst
}
we read the file line per line, for each line we extract the personName from the first csv field, and if it's the one we are looking for we return an (Option) Person, otherwise none (Option.empty). By doing flatmap we discard the empty options (just to avoid using nils)

replace list element with another and return the new list

I am kind of stuck with this, and I know this is a bloody simple question :(
I have a case class such as:
case class Students(firstName: String, lastName: String, hobby: String)
I need to return a new list but change the value of hobby based on Student name. For example:
val classToday = List(Students("John","Smith","Nothing"))
Say if student name is John I want to change the hobby to Soccer so the resulting list should be:
List(Students("John","Smith","Soccer")
I think this can be done via map? I have tried:
classToday.map(x => if (x.firstName == "John") "Soccer" else x)
This will just replace firstName with Soccer which I do not want, I tried setting the "True" condition to x.hobby == "Soccer" but that does not work.
I think there is a simple solution to this :(
The lambda function in map has to return a Students value again, not just "Soccer". For example, if you had to replace everyone's hobbies with "Soccer", this is not right:
classToday.map(x => "Soccer")
What you want is the copy function:
classToday.map(x => x.copy(hobby = "Soccer"))
Or for the original task:
classToday.map(x => if (x.firstName == "John") x.copy(hobby = "Soccer") else x)
You can use pattern-matching syntax to pretty up this type of transition.
val newList = classToday.map{
case s#Students("John",_,_) => s.copy(hobby = "Soccer")
case s => s
}
I suggest to make it more generic, you can create a map of names to hobbies:
For example:
val firstNameToHobby = Map("John" -> "Soccer", "Brad" -> "Basketball")
And use it as follows:
case class Students(firstName: String, lastName: String, hobby: String)
val classToday = List(Students("John","Smith","Nothing"), Students("Brad","Smith","Nothing"))
val result = classToday.map(student => student.copy(hobby = firstNameToHobby.getOrElse(student.firstName, "Nothing")))
// result = List(Students(John,Smith,Soccer), Students(Brad,Smith,Basketball))
It would be better if you can create a mapping between the firstName of the student with Hobby, then you can use it like this:
scala> val hobbies = Map("John" -> "Soccer", "Messi" -> "Soccer", "Williams" -> "Cricket")
hobbies: scala.collection.immutable.Map[String,String] = Map(John -> Soccer, Messi -> Soccer, Williams -> Cricket)
scala> case class Student(firstName: String, lastName: String, hobby: String)
defined class Student
scala> val students = List(Student("John", "Smith", "Nothing"), Student("Williams", "Lopez", "Nothing"), Student("Thomas", "Anderson", "Nothing"))
students: List[Student] = List(Student(John,Smith,Nothing), Student(Williams,Lopez,Nothing), Student(Thomas,Anderson,Nothing))
scala> students.map(student => student.copy(hobby = hobbies.getOrElse(student.firstName, "Nothing")))
res2: List[Student] = List(Student(John,Smith,Soccer), Student(Williams,Lopez,Cricket), Student(Thomas,Anderson,Nothing))

Scala populate a case class for each list item

I have a csv file of countries and a CountryData case class
Example data from file:
Denmark, Europe, 1.23, 7.89
Australia, Australia, 8.88, 9.99
Brazil, South America, 7.77,3.33
case class CountryData(country: String, region: String, population: Double, economy: Double)
I can read in the file and split, etc to get
(List(Denmark, Europe, 1.23, 7.89)
(List(Australia, Australia, 8.88, 9.99)
(List(Brazil, South America, 7.77,3.33)
How can I now populate a CountryData case class for each list item?
I've tried:
for (line <- Source.getLines.drop(1)) {
val splitInput = line.split(",", -1).map(_.trim).toList
val country = splitInput(0)
val region = splitInput(1)
val population = splitInput(2)
val economy = splitInput(3)
val dataList: List[CountryData]=List(CountryData(country,region,population,economy))
But that doesn't work because it's not reading the val, it sees it as a string 'country' or 'region'.
It is not clear where exactly is your issue. Is it about Double vs String or about List being inside the loop. Still something like this will probably work
case class CountryData(country: String, region: String, population: Double, economy: Double)
object CountryDataMain extends App {
val src = "\nDenmark, Europe, 1.23, 7.89\nAustralia, Australia, 8.88, 9.99\nBrazil, South America, 7.77,3.33"
val list = Source.fromString(src).getLines.drop(1).map(line => {
val splitInput = line.split(",", -1).map(_.trim).toList
val country = splitInput(0)
val region = splitInput(1)
val population = splitInput(2)
val economy = splitInput(3)
CountryData(country, region, population.toDouble, economy.toDouble)
}).toList
println(list)
}
I would use scala case matching: i.e.
def doubleOrNone(str: Double): Option[Double] = {
Try {
Some(str.toDouble) //Not sure of exact name of method, should be quick to find
} catch {
case t: Throwable => None
}
}
def parseCountryLine(line: String): Vector[CountryData] = {
lines.split(",").toVector match {
case Vector(countryName, region, population, economy) => CountryData(countryName, region, doubleOrNone(population), doubleOrNone(economy))//I would suggest having them as options because you may not have data for all countries
case s => println(s"Error parsing line:\n$s");
}
}

How do I convert an arbitrary number of items to objects when using parser combinators in Scala?

I have been playing with Scala's combinators and parsers, and have a question that may be too elementary (apologies if it is). I have written it out in this code to make the question easy to understand and my question is at the end.
import scala.util.parsing.combinator._
// First, I create a case class
case class Human(fname: String, lname: String, age: Int)
// Now, I create a parser object
object SimpleParser extends RegexParsers {
def fname: Parser[String] = """[A-Za-z]+""".r ^^ {_.toString}
def lname: Parser[String] = """[A-Za-z]+""".r ^^ {_.toString}
def age: Parser[Int] = """[1-9][0-9]{0,2}""".r ^^ {_.toInt}
def person: Parser[Human] = fname ~ lname ~ age ^^ {case f ~ l ~ a => Human(f, l, a)}
// Now, I need to read a list of these, not just one.
// How do I do this? Here is what I tried, but can't figure out what goes inside
// the curly braces to the right of ^^
// def group: Parser[List[Human]] = rep(person) ^^ {}
// def group: Parser[List[Human]] = (person)+ ^^ {}
}
// Here is what I did to test things:
val s1 = "Bilbo Baggins 123"
val r = SimpleParser.parseAll(SimpleParser.person, s1)
println("First Name: " + r.get.fname)
println("Last Name: " + r.get.lname)
println("Age: " + r.get.age)
// So that worked well; I could read these things into an object and examine the object,
// and can do things with the object now.
// But how do I read either of these into, say, a List[Human] or some collection?
val s2 = "Bilbo Baggins 123 Frodo Baggins 40 John Doe 22"
val s3 = "Bilbo Baggins 123; Frodo Baggins 40; John Doe 22"
If there is something very obvious I missed please let me know. Thanks!
You were very close. For the space-separated version, rep is all you need:
lazy val people: Parser[List[Human]] = rep(person)
And for the version with semicolons, you can use repsep:
lazy val peopleWithSemicolons: Parser[List[Human]] = repsep(person, ";")
Note that in both cases rep* returns the result you want—there's no need to map over the result with ^^. This is also the case for fname and lname, where the regular expression will be implicitly converted into a Parser[String], which means that mapping _.toString doesn't actually change anything.