Using extractors to parse text files - scala

I'm trying to improve a CSV parsing routine and feel that extractors could be useful here but can't figure them out. Suppose there's a file with user ids and emails:
1,alice#alice.com
2,bob#bob.com
3,carol#carol.com
If the User class is defined as case class User(id: Int, email: String) everything is pretty easy with something like
lines map { line =>
line split "," match {
case Array(id, email) => User(id.toInt, email)
}
}
What I don't understand is how to deal with the case where User class can have complex properties e.g
case class Email(username: String, host: string)
case class User(id: Int, email: Email)

You probably want to use a regular expression to extract the contents of the email address. Maybe something like this:
val lines = Vector(
"1,alice#alice.com",
"2,bob#bob.com",
"3,carol#carol.com")
case class Email(username: String, host: String)
case class User(id: Int, email: Email)
val EmailRe = """(.*)#(.*\.com)""".r // substitute a real email address regex here
lines.map { line =>
line.split(",") match {
case Array(id, EmailRe(uname, host)) => User(id.toInt, Email(uname, host))
}
}

Here you go, an example using a custom Extractor.
// 1,Alice,21212,Baltimore,MD" -> User(1, Alice, Address(21212, Baltimore, MD))
Define a custom Extractor that creates the objects out of given String:
object UserExtractor {
def unapply(s: String) : Option[User] = try {
Some( User(s) )
}
catch {
// bettor handling of bad cases
case e: Throwable => None
}
}
Case classes to hold the data with a custom apply on Comapnion object on User:
case class Address(code: String, cit: String, county: String)
case class User(id: Int, name: String, address: Address)
object User {
def apply(s: String) : User = s.split(",") match {
case Array(id, name, code, city, county) => User(id.toInt, name, Address(code, city, county) )
}
}
Unapplying on a valid string (in the example valid means the correct number of fields).
"1,Alice,21212,Baltimore,MD" match { case UserExtractor(u) => u }
res0: User = User(1,Alice,Address(21212,Baltimore,MD))
More tests could be added with more custom apply methods.

I'd use a single RegexExtractor :
val lines = List(
"1,alice#alice.com",
"2,bob#bob.com",
"3,carol#carol.com"
)
case class Email(username: String, host: String)
case class User(id: Int, email: Email)
val idAndEMail = """^([^,]+),([^#]+)#(.+)$""".r
and define a function that transforms a line to the an User :
def lineToUserWithMail(line: String) : Option[User] =
idAndEMail.findFirstMatchIn(line) map {
case userWithEmail(id,user,host) => User(id.toInt, Email(user,host) )
}
Applying the function to all lines
lines flatMap lineToUserWithMail
//res29: List[User] = List(User(1,Email(alice,alice.com)), User(2,Email(bob,bob.com)), User(3,Email(carol,carol.com)))
Alternatively you could implement custom Extractors on the case classe by adding an unnapply Method. But for that case it wouldn't be worth the pain.
Here is an example for unapply
class Email(user:String, host:String)
object Email {
def unapply(s: String) : Option[(String,String)] = s.split("#") match {
case Array(user, host) => Some( (user,host) )
case _ => None
}
}
"bob#bob.com" match {
case Email(u,h) => println( s"$u , $h" )
}
// prints bob , bob.com
A word of warning on using Regex to parse CSV-data. It's not as easy as you might think, i would recommend to use a CSV-Reader as http://supercsv.sourceforge.net/ which handles some nasty edge cases out of the box.

Related

Scala generic case class with optional field

I have the following generic case class to model a resources in an HTTP API (we use Akka HTTP):
case class Job[Result](
id: String,
result: Option[Result] = None,
userId: String,
)
and want to specify multiple, named, variations of a Job, were some of them provide a result, while others don't:
case class FooResult(data: String)
type FooJob = Job[FooResult]
// BarJob does not have any result, thus result should be None
case class BarJob = Job[/*what do to?*/]
my question is, is there any way to define Job as a generic case class where the type parameter only needs to be specified when the field result is Some ? What I would like to do is something like:
// result is by default None in Job, so I don't want to specify any type here
case class BarJob = Job
Or perhaps there's better ways to do this, rather than using type aliases?
One option is to use base traits for Jobs with and without results:
trait Job {
def id: String
def userId: String
}
trait JobWithResult[T] extends Job {
def result: T
}
case class FooResult(data: String)
case class FooJob(id: String, userId: String, result: FooResult) extends JobWithResult[FooResult]
case class BarJob(id: String, userId: String) extends Job
You can then use match on a Job to see whether it has a result or not.
job match {
case FooJob(id, userId, foo) => println(s"FooJob with result $foo")
case j: JobWithResult[_] => println(s"Other job with id ${j.id} with result")
case j: Job => println(s"Job with id {$j.id} without result")
}
This assumes that the result is not actually optional.
As pointed out in the comments, the Option is "unnecessary".
I submit, it's not so much "unnecessary" as indicative ... of a way for you to implement what you want without repetitive declarations.
case class Job[Result](id: String, userId: String, result: Option[Result] = None)
object Job {
def void(id: String, userId: String) = Job[Nothing](id, userId)
def apply[R](id: String, userId: String, result: R) = Job(id, userId, Option(result))
}
type FooJob = Job[FooResult]
type VoidJob = Job[Nothing]

Play Slick: How to fetch selected fields from a DB table in play-slick 3.0.0?

I am new to play framework. I am working on a play slick based application where I want to fetch a list of objects from DB which will contains some selected fields. For fetching all the fields I am using following code:
case class Mail(txID: String,
timeStamp: Long,
toUserID: String,
mailContent: String,
mailTemplateFileName: String,
fromID: String,
toID: String
)
def getLogFromIDFuture(userID: String): Future[Option[List[Mail]]] = cache.getOrElseUpdate[Option[List[Mail]]](userID) {
val resultingUsers = db.run(mailsData.filter(x => x.toUserID === userID).result)
val res = Await.result(resultingUsers, Duration.Inf)
res.map(t => t) match {
case t if t.nonEmpty =>
Future(Some(t.toList))
case _ => Future(None)
}
}
So my question is how to fetch only timeStamp, toUserID, mailContent, fromID, toID fields as the list of objects like UserMessage(timeStamp: Long, toUserID: String, mailContent: String, fromID: String, toID: String). I tried searching about this but didn't get any convincing answers.
Like I said in the comment you can do this:
def getLogFromIDFuture(userID: String): Future[Option[List[UserMessage]]] = cache.getOrElseUpdate[Option[List[Mail]]](userID) {
val resultingUsers = db.run(mailsData.filter(x => x.toUserID === userID).map(entry =>(entry.timeStamp, entry.toUserID, entry.mailContent, entry.fromID, entry.toID))
.result)// here you have the tuple of things
// add the mapping of the tuple to the UserMessage
val res = Await.result(resultingUsers, Duration.Inf)
res.map(t => t) match {
case t if t.nonEmpty =>
Future(Some(t.toList))
case _ => Future(None)
}
}
You can get rid of that Await.result
resultingUsers.map( match {
case t if t.nonEmpty =>
Some(t.toList)
case _ => None
}
)
Hope it helps.

Scala reduce on Case class with keys

Below is the scenario of 2 case classes that I have -
case class Pack(name: String, age: Int, dob: String)
object Pack{
def getPack(name:String, age: Int, dob:String): Pack = {
Pack(name,age,dob)
}
}
case class NewPack(name: String, pack: (List[Pack]))
object NewPack{
def getNewPackList(data: List[Pack]): List[NewPack] = {
val newData = for(x <- data )yield(x.name,List(x))
val newPackData = for(x <- newData)yield(NewPack(x._1,x._2))
newPackData
}
}
val someData = List(Pack.getPack("x",12,"day1"), Pack.getPack("y",23,"day2"),Pack.getPack("x",34,"day3") )
val somePackData = NewPack.getNewPackList(someData)
the values of someData and somePackData is like this -
someData: List[Pack] = List(Pack(x,12,day1), Pack(y,23,day2), Pack(x,34,day3))
somePackData: List[NewPack] = List(NewPack(x,List(Pack(x,12,day1))), NewPack(y,List(Pack(y,23,day2))), NewPack(x,List(Pack(x,34,day3))))
Now I want to get the final data in the below mentioned format. Could anyone suggest me the better way of doing that.
finalData: List[NewPack] = List(NewPack(x,List(Pack(x,12,day1),Pack(x,34,day3)),NewPack(y,List(Pack(y,23,day2))))
Here is small simplification.
case class Pack(name: String, age: Int, dob: String)
case class NewPack(name: String, pack: List[Pack])
object NewPack{
def getNewPackList(data: List[Pack]): List[NewPack] =
data.map{case pack#Pack(name, _, _) => NewPack(name, List(pack))}
}
Note that in general you don't need additional factory method for case class. If you would like to present something tricky later, you can always define custom apply in your companion object.
Additionally if you were intending to group packs by name you can easily define something like
def newPacksGrouped(data: List[Pack]): List[NewPack] =
data.groupBy(_.name).map{case (name, packs) => NewPack(name, packs)}.toList

Break flatMap after first Some occurence

abstract class Node(id: Long, name: String) {
def find(id: Long) = if (this.id == id) Some(this) else None
}
case class Contact(id: Long, name: String, phone: String) extends Node(id, name)
case class Group(id: Long, name: String, entries: Node*) extends Node(id, name) {
override def find(id: Long): Option[Node] = {
super.find(id) orElse entries.flatMap(e => e.find(id)).headOption
}
}
val tree =
Group(0, "Root"
, Group(10, "Shop", Contact(11, "Alice", "312"))
, Group(20, "Workshop"
, Group(30, "Tyres", Contact(31, "Bob", "315"), Contact(32, "Greg", "319"))
, Contact(33, "Mary", "302"))
, Contact(1, "John", "317"))
println(tree.find(32))
Tree data is built from Contacts and Groups (w/ sub-Groups and Contacts). I want to find a node with specific id. Currently I traverse Group's members using:
entries.flatMap(e => e.find(id)).headOption
but it isn't optimal because I check all child entries instead of break upon first finding.
I'd appreciate your help in magic Scala Wold. Thanks.
You want collectFirst, which will select the first matching element and wrap it in Some, or None if it's not found. You can also turn entries into a view, to make the evaluation lazy.
entries.view.map(_.find(id)).collectFirst { case Some(node) => node }
It will work with your original code, as well:
entries.view.flatMap(_.find(id)).headOption
Another way to approach this problem could be providing traverse support for the data structure. Overall it's a tree so it can be traversed easily. Please check the code below:
sealed abstract class Node(val id: Long, val name: String) extends Traversable[Node] {
def foreach[U](f:Node => U) = this match {
case x: Contact => f(x)
case x: Group => f(x); x.entries.foreach(_ foreach f)
}
}
case class Contact(override val id: Long, override val name: String, phone: String) extends Node(id, name) {
override def toString = (id, name, phone).toString
}
case class Group(override val id: Long, override val name: String, entries: Node*) extends Node(id, name) {
override def toString = (id, name).toString + entries.map(_.toString).mkString
}
val tree =
Group(0, "Root"
, Group(10, "Shop", Contact(11, "Alice", "312"))
, Group(20, "Workshop"
, Group(30, "Tyres", Contact(31, "Bob", "315"), Contact(32, "Greg", "319"))
, Contact(33, "Mary", "302"))
, Contact(1, "John", "317"))
println(tree) // (0,Root)(10,Shop)(11,Alice,312)(20,Workshop)(30,Tyres)(31,Bob,315)(32,Greg,319)(33,Mary,302)(1,John,317)
println(tree find {_.id == 32}) //Some((32,Greg,319))
println(tree map {_.name }) //List(Root, Shop, Alice, Workshop, Tyres, Bob, Greg, Mary, John)
The good thing is that now you can use of all the benefits of Traversable[Node] trait. The problem on the other hand is that I had to override toString method in case classes. So I guess there is still room for improvement.

How to avoid using null?

Suppose, I have my domain object named "Office":
case class Office(
id: Long,
name: String,
phone: String,
address: String
) {
def this(name: String, phone: String, address: String) = this(
null.asInstanceOf[Long], name, phone, address
)
}
When I create new Office:
new Office("officeName","00000000000", "officeAddress")
I don't specify id field becouse I don't know it. When I save office (by Anorm) I now id and do that:
office.id = officeId
So. I know that using null is non-Scala way. How to avoid using null in my case?
UPDATE #1
Using Option.
Suppose, something like this:
case class Office(
id: Option[Long],
name: String,
phone: String,
address: String
) {
def this(name: String, phone: String, address: String) = this(
None, name, phone, address
)
}
And, after saving:
office.id = Option(officeId)
But what if I need to find something by office id?
SomeService.findSomethingByOfficeId(office.id.get)
Does it clear? office.id.get looks not so good)
UPDATE #2
Everyone thanks! I've got new ideas from your answers! Greate thanks!
Why not declare the id field as a Option? You should really avoid using null in Scala. Option is preferable since it is type-safe and plays nice with other constructs in the functional paradigm.
Something like (I haven't tested this code):
case class Office(
id: Option[Long],
name: String,
phone: String,
address: String
) {
def this(name: String, phone: String, address: String) = this(
None, name, phone, address
)
}
Just make the id field an Option[Long]; once you have that, you can use it like this:
office.id.map(SomeService.findSomethingByOfficeId)
This will do what you want and return Option[Something]. If office.id is None, map() won't even invoke the finder method and will immediately return None, which is what you want typically.
And if findSomethingByOfficeId returns Option[Something] (which it should) instead of just Something or null/exception, use:
office.id.flatMap(SomeService.findSomethingByOfficeId)
This way, if office.id is None, it will, again, immediately return None; however, if it's Some(123), it will pass that 123 into findSomethingByOfficeId; now if the finder returns a Some(something) it will return Some(something), if however the finder returns None, it will again return None.
if findSomethingByOfficeId can return null and you can't change its source code, wrap any calls to it with Option(...)—that will convert nulls to None and wrap any other values in Some(...); if it can throw an exception when it can't find the something, wrap calls to it with Try(...).toOption to get the same effect (although this will also convert any unrelated exceptions to None, which is probably undesirable, but which you can fix with recoverWith).
The general guideline is always avoid null and exceptions in Scala code (as you stated); always prefer Option[T] with either map or flatMap chaining, or using the monadic for syntactic sugar hiding the use of map and flatMap.
Runnable example:
object OptionDemo extends App {
case class Something(name: String)
case class Office(id: Option[Long])
def findSomethingByOfficeId(officeId: Long) = {
if (officeId == 123) Some(Something("London")) else None
}
val office1 = Office(id = None)
val office2 = Office(id = Some(456))
val office3 = Office(id = Some(123))
println(office1.id.flatMap(findSomethingByOfficeId))
println(office2.id.flatMap(findSomethingByOfficeId))
println(office3.id.flatMap(findSomethingByOfficeId))
}
Output:
None
None
Some(Something(London))
For a great introduction to Scala's rather useful Option[T] type, see http://danielwestheide.com/blog/2012/12/19/the-neophytes-guide-to-scala-part-5-the-option-type.html.
When using id: Option[Long] , extract the option value for instance with
if (office.id.isDefined) {
val Some(id) = office.id
SomeService.findSomethingByOfficeId(id)
}
or perhaps for instance
office.id match {
case None => Array()
case Some(id) => SomeService.findSomethingByOfficeId(id)
}
Also you can define case classes and objects as follows,
trait OId
case object NoId extends OId
case class Id(value: Long) extends OId
case class Office (
id: OId = NoId,
name: String,
phone: String,
address: String
)
Note that by defaulting id for example to NoId , there is no need to declare a call to this. Then
val office = Office (Id(123), "name","phone","addr")
val officeNoId = Office (name = "name",phone="phone",address="addr")
If the id member is defined last, then there is no need to name the member names,
val office = Office ("name","phone","addr")
office: Office = Office(name,phone,addr,NoId)
As of invoking (neatly) a method,
office.id match {
case NoId => Array()
case Id(value) => SomeService.findSomethingByOfficeId(value)
}
I prefer more strong restriction for object Id property:
trait Id[+T] {
class ObjectHasNoIdExcepthion extends Throwable
def id : T = throw new ObjectHasNoIdExcepthion
}
case class Office(
name: String,
phone: String,
address: String
) extends Id[Long]
object Office {
def apply(_id : Long, name: String, phone: String, address: String) =
new Office(name, phone, address) {
override def id : Long = _id
}
}
And if I try to get Id for object what is not stored in DB, I get exception and this mean that something wrong in program behavior.
val officeWithoutId =
Office("officeName","00000000000", "officeAddress")
officeWithoutId.id // Throw exception
// for-comprehension and Try monad can be used
// update office:
for {
id <- Try { officeWithoutId.id }
newOffice = officeWithoutId.copy(name = "newOffice")
} yield OfficeDAL.update(id, newOffice)
// find office by id:
for {
id <- Try { officeWithoutId.id }
} yield OfficeDAL.findById(id)
val officeWithId =
Office(1000L, "officeName","00000000000", "officeAddress")
officeWithId.id // Ok
Pros:
1) method apply with id parameter can be incapsulated in DAL logic
private[dal] def apply (_id : Long, name: String, ...
2) copy method always create new object with empty id (safty if you change data)
3) update method is safety (object not be overridden by default, id always need to be specified)
Cons:
1) Special serealization/deserealization logic needed for store id property (json for webservices, etc)
P.S.
this approach is good if you have immutable object (ADT) and store it to DB with id + object version instead object replace.