Converting MongoCursor to JSON - mongodb

Using Casbah, I query Mongo.
val mongoClient = MongoClient("localhost", 27017)
val db = mongoClient("test")
val coll = db("test")
val results: MongoCursor = coll.find(builder)
var matchedDocuments = List[DBObject]()
for(result <- results) {
matchedDocuments = matchedDocuments :+ result
}
Then, I convert the List[DBObject] into JSON via:
val jsonString: String = buildJsonString(matchedDocuments)
Is there a better way to convert from "results" (MongoCursor) to JSON (JsValue)?
private def buildJsonString(list: List[DBObject]): Option[String] = {
def go(list: List[DBObject], json: String): Option[String] = list match {
case Nil => Some(json)
case x :: xs if(json == "") => go(xs, x.toString)
case x :: xs => go(xs, json + "," + x.toString)
case _ => None
}
go(list, "")
}

Assuming you want implicit conversion (like in flavian's answer), the easiest way to join the elements of your list with commas is:
private implicit def buildJsonString(list: List[DBObject]): String =
list.mkString(",")
Which is basically the answer given in Scala: join an iterable of strings
If you want to include the square brackets to properly construct a JSON array you'd just change it to:
list.mkString("[", ",", "]") // punctuation madness
However if you'd actually like to get to Play JsValue elements as you seem to indicate in the original question, then you could do:
list.map { x => Json.parse(x.toString) }
Which should produce a List[JsValue] instead of a String. However, if you're just going to convert it back to a string again when sending a response, then it's an unneeded step.

Related

Exception handling in Spark Scala UDF

def parse_values(value: String) = {
val values = value.split(",").map(_.trim)
values.foldLeft(Array[(Int, Double)]()) {
case (acc, present) =>
val Array(k, v) = present.split(",")(0).split(":")
acc :+ (k.trim.toInt, v.trim.toDouble)
}
I am currently using the above UDF to parse a column of string into an array of keys and values.
"50:63.25,100:58.38" to [[50,63.2], [100,58.38]].
In some cases, the string is "\N" and I am unable to parse the column value.
If the string is "\N" then I should return an empty array. Can anyone help me to handle this exception or help me adding a new case? I am new to spark-scala.
Error: scala.MatchError: [Ljava.lang.String;#497cb6a9 (of class [Ljava.lang.String;)
You need to check that the resulting Array has two elements. You need a pattern matching like this to avoid that parse error:
def parse_values(value: String) = {
val values = value.split(",").map(_.trim)
values.foldLeft(Array[(Int, Double)]()) {
case (acc, present) =>
val Array(k, v) = {
present.split(",")(0).split(":") match {
case Array(_) => Array("0", "0.0")
case arr => arr
}
}
acc :+ (k.trim.toInt, v.trim.toDouble)
}
}

Scala alternative to series of if statements that append to a list?

I have a Seq[String] in Scala, and if the Seq contains certain Strings, I append a relevant message to another list.
Is there a more 'scalaesque' way to do this, rather than a series of if statements appending to a list like I have below?
val result = new ListBuffer[Err]()
val malformedParamNames = // A Seq[String]
if (malformedParamNames.contains("$top")) result += IntegerMustBePositive("$top")
if (malformedParamNames.contains("$skip")) result += IntegerMustBePositive("$skip")
if (malformedParamNames.contains("modifiedDate")) result += FormatInvalid("modifiedDate", "yyyy-MM-dd")
...
result.toList
If you want to use some scala iterables sugar I would use
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.map { v =>
v match {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
}.flatten
result.toList
Be warn if you ask for scala-esque way of doing things there are many possibilities.
The map function combined with flatten can be simplified by using flatmap
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.flatMap {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
result
Most 'scalesque' version I can think of while keeping it readable would be:
val map = scala.collection.immutable.ListMap(
"$top" -> IntegerMustBePositive("$top"),
"$skip" -> IntegerMustBePositive("$skip"),
"modifiedDate" -> FormatInvalid("modifiedDate", "yyyy-MM-dd"))
val result = for {
(k,v) <- map
if malformedParamNames contains k
} yield v
//or
val result2 = map.filterKeys(malformedParamNames.contains).values.toList
Benoit's is probably the most scala-esque way of doing it, but depending on who's going to be reading the code later, you might want a different approach.
// Some type definitions omitted
val malformations = Seq[(String, Err)](
("$top", IntegerMustBePositive("$top")),
("$skip", IntegerMustBePositive("$skip")),
("modifiedDate", FormatInvalid("modifiedDate", "yyyy-MM-dd")
)
If you need a list and the order is siginificant:
val result = (malformations.foldLeft(List.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
pair._2 ++: acc // prepend to list for faster performance
} else acc
}).reverse // and reverse since we were prepending
If the order isn't significant (although if the order's not significant, you might consider wanting a Set instead of a List):
val result = (malformations.foldLeft(Set.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
acc ++ pair._2
} else acc
}).toList // omit the .toList if you're OK with just a Set
If the predicates in the repeated ifs are more complex/less uniform, then the type for malformations might need to change, as they would if the responses changed, but the basic pattern is very flexible.
In this solution we define a list of mappings that take your IF condition and THEN statement in pairs and we iterate over the inputted list and apply the changes where they match.
// IF THEN
case class Operation(matcher :String, action :String)
def processInput(input :List[String]) :List[String] = {
val operations = List(
Operation("$top", "integer must be positive"),
Operation("$skip", "skip value"),
Operation("$modify", "modify the date")
)
input.flatMap { in =>
operations.find(_.matcher == in).map { _.action }
}
}
println(processInput(List("$skip","$modify", "$skip")));
A breakdown
operations.find(_.matcher == in) // find an operation in our
// list matching the input we are
// checking. Returns Some or None
.map { _.action } // if some, replace input with action
// if none, do nothing
input.flatMap { in => // inputs are processed, converted
// to some(action) or none and the
// flatten removes the some/none
// returning just the strings.

Idiomatic way of extracting known key-> val pairs from a string

Example context:
An HTTP Response with a body as follows:
key1=val1&key2=val2&key3=val3.
The names of the keys are always known.
Currently the extraction is done with regex:
val params = response split ("""&""") map { _.split("""=""") } map { el => { el(0) -> el(1) } } toMap;
Is there a simpler way of pattern matching the response for specific params?
I think using split is probably going to be the fastest/simplest solution here. You're not doing any advanced parsing, so using parser combinators or regex capture groups seems a little overkill.
However, when you have complex expressions involving multiple calls to map, filter, etc., it's usually an indicator that you can clean things up with a for-comprehension:
val response = "key1=val1&key2=val2&key3=val3"
val params = (for { x <- response split ("&")
Array(k, v) = x split ("=") }
yield k->v).toMap
You can use parser combinators here for most flexibility and robustness (i.e., handle failed parsing):
object Parser extends RegexParsers with App {
def lit: Parser[String] = "[^=&]+".r
def pair: Parser[(String, String)] = lit ~ "=" ~ lit ^^ {
case key ~ "=" ~ value => key -> value
}
def parse: Parser[Seq[(String, String)]] = repsep(pair, "&")
val response = "key1=val1&key2=val2&key3=val3"
val params = parse(new CharSequenceReader(response)).get.toMap
println(params)
}
You can use regexp as a matcher like this:
val r = "([^=]+)=([^=]+)".r
def toKv(s:String) = s match {
case r(k,v) => (k,v)
case _ => throw InvalidFormatException
}
So, for your case it would look like:
response split ("&") map (toKv)

Scala: how to traverse stream/iterator collecting results into several different collections

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
}
}
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}
Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
}.toList.unzip
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}
Corrections:
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
}
}
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
}
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
}
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
}
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
None;
next match {
case None => i
case Some((es, is)) => apply(es, is)
}
}
}
// main ---------------------------------------------------
{
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03 1.2.3.4",
"Login: 2012/03 user 1.2.3.4 Joe",
"Error: 2012/03 1.2.3.5",
"Error: 2012/04 1.2.3.4"
).iterator;
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
println(iter(lines).run);
}
}

Scala: Print separators when using output stream

When we need an array of strings to be concatenated, we can use mkString method:
val concatenatedString = listOfString.mkString
However, when we have a very long list of string, getting concatenated string may not be a good choice. In this case, It would be more appropriated to print out to an output stream directly, Writing it to output stream is simple:
listOfString.foreach(outstream.write _)
However, I don't know a neat way to append separators. One thing I tried is looping with an index:
var i = 0
for(str <- listOfString) {
if(i != 0) outstream.write ", "
outstream.write str
i += 1
}
This works, but it is too wordy. Although I can make a function encapsules the code above, I want to know whether Scala API already has a function do the same thing or not.
Thank you.
Here is a function that do what you want in a bit more elegant way:
def commaSeparated(list: List[String]): Unit = list match {
case List() =>
case List(a) => print(a)
case h::t => print(h + ", ")
commaSeparated(t)
}
The recursion avoids mutable variables.
To make it even more functional style, you can pass in the function that you want to use on each item, that is:
def commaSeparated(list: List[String], func: String=>Unit): Unit = list match {
case List() =>
case List(a) => func(a)
case h::t => func(h + ", ")
commaSeparated(t, func)
}
And then call it by:
commaSeparated(mylist, oustream.write _)
I believe what you want is the overloaded definitions of mkString.
Definitions of mkString:
scala> val strList = List("hello", "world", "this", "is", "bob")
strList: List[String] = List(hello, world, this, is, bob)
def mkString: String
scala> strList.mkString
res0: String = helloworldthisisbob
def mkString(sep: String): String
scala> strList.mkString(", ")
res1: String = hello, world, this, is, bob
def mkString(start: String, sep: String, end: String): String
scala> strList.mkString("START", ", ", "END")
res2: String = STARThello, world, this, is, bobEND
EDIT
How about this?
scala> strList.view.map(_ + ", ").foreach(print) // or .iterator.map
hello, world, this, is, bob,
Not good for parallelized code, but otherwise:
val it = listOfString.iterator
it.foreach{x => print(x); if (it.hasNext) print(' ')}
Here's another approach which avoids the var
listOfString.zipWithIndex.foreach{ case (s, i) =>
if (i != 0) outstream write ","
outstream write s }
Self Answer:
I wrote a function encapsulates the code in the original question:
implicit def withSeparator[S >: String](seq: Seq[S]) = new {
def withSeparator(write: S => Any, sep: String = ",") = {
var i = 0
for (str <- seq) {
if (i != 0) write(sep)
write(str)
i += 1
}
seq
}
}
You can use it like this:
listOfString.withSeparator(print _)
The separator can also be assigned:
listOfString.withSeparator(print _, ",\n")
Thank you for everyone answered me. What I wanted to use is a concise and not too slow representation. The implicit function withSeparator looks like the thing I wanted. So I accept my own answer for this question. Thank you again.