I'm learning Squeryl and trying to understand the 'using' syntax but can't find documentation on it.
In the following example two databases are created, A contains the word Hello, and B contains Goodbye. The intention is to query the contents of A, then append the word World and write the result to B.
Expected console output is Inserted Message(2,HelloWorld)
object Test {
def main(args: Array[String]) {
Class.forName("org.h2.Driver");
import Library._
val sessionA = Session.create(DriverManager.getConnection(
"jdbc:h2:file:data/dbA","sa","password"),new H2Adapter)
val sessionB = Session.create(DriverManager.getConnection(
"jdbc:h2:file:data/dbB","sa","password"),new H2Adapter)
using(sessionA){
drop; create
myTable.insert(Message(0,"Hello"))
}
using(sessionB){
drop; create
myTable.insert(Message(0,"Goodbye"))
}
using(sessionA){
val results = from(myTable)(s => select(s))//.toList
using(sessionB){
results.foreach(m => {
val newMsg = m.copy(msg = (m.msg+"World"))
myTable.insert(newMsg)
println("Inserted "+newMsg)
})
}
}
}
case class Message(val id: Long, val msg: String) extends KeyedEntity[Long]
object Library extends Schema { val myTable = table[Message] }
}
As it stands, the code prints Inserted Message(2,GoodbyeWorld), unless the toList is added on the end of the val results line.
Is there some way to bind the results query to use sessionA even when evaluated inside the using(sessionB)? This seems preferable to using toList to force the query to evaluate and store the contents in memory.
Update
Thanks to Dave Whittaker's answer, the following snippet fixes it without resorting to 'toList' and corrects my understanding of both 'using' and the running of queries.
val results = from(myTable)(s => select(s))
using(sessionA){
results.foreach(m => {
val newMsg = m.copy(msg = (m.msg+"World"))
using(sessionB){myTable.insert(newMsg)}
println("Inserted "+newMsg)
})
}
First off, I apologize for the lack of documentation. The using() construct is a new feature that is only available in SNAPSHOT builds. I actually talked to Max about some of the documentation issues for early adopters yesterday and we are working to fix them.
There isn't a way that I can think of to bind a specific Session to a Query. Looking at your example, it looks like an easy work around would be to invert your transactions. When you create a query, Squeryl doesn't actually access the DB, it just creates an AST representing the SQL to be performed, so you don't need to issue your using(sessionA) at that point. Then, when you are ready to iterate over the results you can wrap the query invocation in a using(sessionA) nested within your using(sessionB). Does that make sense?
Related
I'm having some issues when trying to execute a class function inside a "dataframe.foreach" function. My custom class is persisting the data into a DynamoDB table.
What happens is that if I have the following code, it won't work and will raise a "Null Pointer Exception" that points to the line of code where the "writer.writeRow(r)" is executed:
object writeToDynamoDB extends App {
val df: DataFrame = ...
val writer: DynamoDBWriter = new DDBWriter(...)
df
.foreach(
r => writer.writeRow(r)
)
}
If I use the same code, but having the code inside a code block or an if clause, it will work:
object writeToDynamoDB extends App {
val df: DataFrame = ...
if(true) {
val writer: DynamoDBWriter = new DDBWriter(...)
df
.foreach(
r => writer.writeRow(r)
)
}
}
I guess it has something to do with the variable scope. Even in IntelliJ the color of the variable is purple + Italic in the first case and "regular" grey in the second case. I read about it, and we have the method, field and local scope in Scala, but I'm can't relate that with what I'm trying to do.
Some questions after this introduction:
Can anyone explain why does Scala and/or Spark have this behaviour?
The solution here is to put some code inside a function, code block
or a "fake" if clause as far as I know. Is there any possible issue regarding Spark properties (shuffles, etc)?
Is there any other way to do this type of operations?
Hope I was clear.
Thanks in advance.
Regards
As said above, your issue is caused by delayed initialization when using the App trait. Spark docs strongly discourage that:
Note that applications should define a main() method instead of extending scala.App. Subclasses of scala.App may not work correctly.
The reason can be found in the Javadocs of the App trait itself:
It should be noted that this trait is implemented using the DelayedInit functionality, which means that fields of the object will not have been initialized before the main method has been executed.
This basically means that writer is still uninitialized (so null) by the time the closure passed to foreach is created.
If you put respective code into a block, writer becomes a local variable and is initialized at the time when the block is evaluated. That way your closure will contain the correct value of writer. In this case it doesn't matter anymore when the code is evaluated, because everything get's evaluated together.
The correct and recommended solution is to use a standard main method for your Spark applications:
object writeToDynamoDB {
def main(args: Array[String]): Unit = {
val df: DataFrame = ...
val writer: DynamoDBWriter = new DDBWriter(...)
df.foreach(r => writer.writeRow(r))
}
}
I am trying to prevent empty values being inserted into my mongoDB collection. The field in question looks like this:
MongoDB Field
"stadiumArr" : [
"Old Trafford",
"El Calderon",
...
]
Sample of (mapped) case class
case class FormData(_id: Option[BSONObjectID], stadiumArr: Option[List[String]], ..)
Sample of Scala form
object MyForm {
val form = Form(
mapping(
"_id" -> ignored(Option.empty[BSONObjectID]),
"stadiumArr" -> optional(list(text)),
...
)(FormData.apply)(FormData.unapply)
)
}
I am also using the Repeated Values functionality in Play Framework like so:
Play Template
#import helper._
#(myForm: Form[models.db.FormData])(implicit request: RequestHeader, messagesProvider: MessagesProvider)
#repeatWithIndex(myForm("stadiumArr"), min = 5) { (stadium, idx) =>
#inputText(stadium, '_label -> ("stadium #" + (idx + 1)))
}
This ensures that whether there are at least 5 values or not in the array; there will still be (at least) 5 input boxes created. However if one (or more) of the input boxes are empty when the form is submitted an empty string is still being added as value in the array, e.g.
"stadiumArr" : [
"Old Trafford",
"El Calderon",
"",
"",
""
]
Based on some other ways of converting types from/to the database; I've tried playing around with a few solutions; such as:
implicit val arrayWrite: Writes[List[String]] = new Writes[List[String]] {
def writes(list: List[String]): JsValue = Json.arr(list.filterNot(_.isEmpty))
}
.. but this isn't working. Any ideas on how to prevent empty values being inserted into the database collection?
Without knowing specific versions or libraries you're using it's hard to give you an answer, but since you linked to play 2.6 documentation I'll assume that's what you're using there. The other assumption I'm going to make is that you're using reactive-mongo library. Whether or not you're using the play plugin for that library or not is the reason why I'm giving you two different answers here:
In that library, with no plugin, you'll have defined a BSONDocumentReader and a BSONDocumentWriter for your case class. This might be auto-generated for you with macros or not, but regardless how you get it, these two classes have useful methods you can use to transform the reads/writes you have to another one. So, let's say I defined a reader and writer for you like this:
import reactivemongo.bson._
case class FormData(_id: Option[BSONObjectID], stadiumArr: Option[List[String]])
implicit val formDataReaderWriter = new BSONDocumentReader[FormData] with BSONDocumentWriter[FormData] {
def read(bson: BSONDocument): FormData = {
FormData(
_id = bson.getAs[BSONObjectID]("_id"),
stadiumArr = bson.getAs[List[String]]("stadiumArr").map(_.filterNot(_.isEmpty))
)
}
def write(formData: FormData) = {
BSONDocument(
"_id" -> formData._id,
"stadiumArr" -> formData.stadiumArr
)
}
}
Great you say, that works! You can see in the reads I went ahead and filtered out any empty strings. So even if it's in the data, it can be cleaned up. That's nice and all, but let's notice I didn't do the same for the writes. I did that so I can show you how to use a useful method called afterWrite. So pretend the reader/writer weren't the same class and were separate, then I can do this:
val initialWriter = new BSONDocumentWriter[FormData] {
def write(formData: FormData) = {
BSONDocument(
"_id" -> formData._id,
"stadiumArr" -> formData.stadiumArr
)
}
}
implicit val cleanWriter = initialWriter.afterWrite { bsonDocument =>
val fixedField = bsonDocument.getAs[List[String]]("stadiumArr").map(_.filterNot(_.isEmpty))
bsonDocument.remove("stadiumArr") ++ BSONDocument("stadiumArr" -> fixedField)
}
Note that cleanWriter is the implicit one, that means when the insert call on the collection happens, it will be the one chosen to be used.
Now, that's all a bunch of work, if you're using the plugin/module for play that lets you use JSONCollections then you can get by with just defining play json Reads and Writes. If you look at the documentation you'll see that the reads trait has a useful map function you can use to transform one Reads into another.
So, you'd have:
val jsonReads = Json.reads[FormData]
implicit val cleanReads = jsonReads.map(formData => formData.copy(stadiumArr = formData.stadiumArr.map(_.filterNot(_.isEmpty))))
And again, because only the clean Reads is implicit, the collection methods for mongo will use that.
NOW, all of that said, doing this at the database level is one thing, but really, I personally think you should be dealing with this at your Form level.
val form = Form(
mapping(
"_id" -> ignored(Option.empty[BSONObjectID]),
"stadiumArr" -> optional(list(text)),
...
)(FormData.apply)(FormData.unapply)
)
Mainly because, surprise surprise, form has a way to deal with this. Specifically, the mapping class itself. If you look there you'll find a transform method you can use to filter out empty values easily. Just call it on the mapping you need to modify, for example:
"stadiumArr" -> optional(
list(text).transform(l => l.filter(_.nonEmpty), l => l.filter(_.nonEmpty))
)
To explain a little more about this method, in case you're not used to reading the signatures in the scaladoc.
def
transform[B](f1: (T) ⇒ B, f2: (B) ⇒ T): Mapping[B]
says that by calling transform on some mapping of type Mapping[T] you can create a new mapping of type Mapping[B]. In order to do this you must provide functions that convert from one to the other. So the code above causes the list mapping (Mapping[List[String]]) to become a Mapping[List[String]] (the type did not change here), but when it does so it removes any empty elements. If I break this code down a little it might be more clear:
def convertFromTtoB(list: List[String]): List[String] = list.filter(_.nonEmpty)
def convertFromBtoT(list: List[String]): List[String] = list.filter(_.nonEmpty)
...
list(text).transform(convertFromTtoB, convertFromBtoT)
You might wondering why you need to provide both, the reason is because when you call Form.fill and the form is populated with values, the second method will be called so that the data goes into the format the play form is expecting. This is more obvious if the type actually changes. For example, if you had a text area where people could enter CSV but you wanted to map it to a form model that had a proper List[String] you might do something like:
def convertFromTtoB(raw: String): List[String] = raw.split(",").filter(_.nonEmpty)
def convertFromBtoT(list: List[String]): String = list.mkString(",")
...
text.transform(convertFromTtoB, convertFromBtoT)
Note that when I've done this in the past sometimes I've had to write a separate method and just pass it in if I didn't want to fully specify all the types, but you should be able to work from here given the documentation and type signature for the transform method on mapping.
The reason I suggest doing this in the form binding is because the form/controller should be the one with the concern of dealing with your user data and cleaning things up I think. But you can always have multiple layers of cleaning and whatnot, it's not bad to be safe!
I've gone for this (which always seems obvious when it's written and tested):
implicit val arrayWrite: Writes[List[String]] = new Writes[List[String]] {
def writes(list: List[String]): JsValue = Json.toJson(list.filterNot(_.isEmpty).toIndexedSeq)
}
But I would be interested to know how to
.map the existing Reads rather than redefining from scratch
as #cchantep suggests
I'm using Play 2 with Anorm to manage database access. A common pattern I find myself doing is this:
val (futureChecklists, jobsLookup) =
DB.withConnection { implicit connection =>
val futureChecklists = futureChecklistRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
val jobsLookup = futureChecklistJobRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
.groupBy(_.futureChecklist.id)
.withDefaultValue(List.empty)
(futureChecklists, jobsLookup)
}
Which seems kinda weird, because I have to repeat myself. It also gets a bit unruly if I have several variables I'll need in the outer scope, but I don't want to keep the connection open.
Is there an easy way to pass this information back without having to resort to using vars?
What I would like is something like:
val futureChecklists
val jobsLookup
DB.withConnection { implicit connection =>
futureChecklists = futureChecklistRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
jobsLookup = futureChecklistJobRepository.getAllHavingActiveTemplateAndNonNullNextRunDate()
.groupBy(_.futureChecklist.id)
.withDefaultValue(List.empty)
}
That way I don't have the same tuple at the beginning and end.
I am afraid there is no easy way not to duplicate the tuple declaration, but var is definitely not the way to go around it.
You're mentioning that it becomes weird and difficult with multiple variables at time which returned as a tuple. This indeed can become really tricky and error prone, especially then you end up having large N-tuples with the same parameter types. In that scenario I would consider having a dedicated contained i.e. a case class where you can reference variables by name and not by position in the tuple. The side benefit is that you can assign the whole container to a variable and reference it in the natural way.
Last but not least you don't mention much about your particular use case, but maybe it is worth considering having the 2 queries results obtained in the separate withConnection block. If you are using any collection pooling mechanism, then there is hardly any benefit having it in the same with connection block and with the separate blocks you might even get a flexibility to pararelize the DB queries using separate connections.
There are three ways that i came up with:
Return tuple immediately
val (users, posts) =
DB.withConnection { connection => (
connection.getUsers,
connection.getPosts
)}
I think this is OK for simple code and small numbers of vals. For more complex code and more vals this can be error prone. Someone can accidentally change order of elements in tuple on just one side of assignment, and assign data to wrong vals (which will be reported by compiler only if it also cause type mismatch).
Use anonymous class
val dbResult =
DB.withConnection { connection =>
new {
val users = connection.getUsers
val posts = connection.getPosts
}
}
If you like to have users and posts variables instead of dbResult.users and dbResult.posts you can:
import dbResult._
This solution is a little exotic, but it works just fine and is quite clean.
Use case class
First define case class for your return value:
case class DBResult(users: List[User], posts: List[Post])
and then use it:
val DBResult(users: List[User], posts: List[Post]) =
DB.withConnection { connection =>
DBResult(
users = connection.getUsers,
posts = connection.getPosts
)
}
This is best if you intend to reuse this case class multiple times.
Given rowParser of type RowParser[Photo], this is how you would parse a list of rows coming from a table photo, according to the code samples I have seen so far:
def getPhotos(album: Album): List[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
).as(rowParser *)
}
Where the * operator creates a parser of type ResultSetParser[List[Photo]]. Now, I was wondering if it was equally possible to get a parser that yields a Stream (thinking that being more lazy is always better), but I only came up with this:
def getPhotos(album: Album): Stream[Photo] = DB.withConnection { implicit c =>
SQL("select * from photo where album = {album}").on(
'album -> album.id
)() collect (rowParser(_) match { case Success(photo) => photo })
}
It works, but it seems overly complicated. I could of course just call toStream on the List I get from the first function, but my goal was to only apply rowParser on rows that are actually read. Is there an easier way to achieve this?
EDIT: I know that limit should be used in the query, if the number of rows of interest is known beforehand. I am also aware that, in many cases, you are going to use the whole result anyway, so being lazy will not improve performance. But there might be a case where you save a few cycles, e.g. if for some reason, you have search criteria that you cannot or do not want to express in SQL. So I thought it was odd that, given the fact that anorm provides a way to obtain a Stream of SqlRow, I didn't find a straightforward way to apply a RowParser on that.
I ended up creating my own stream method which corresponds to the list method:
def stream[A](p: RowParser[A]) = new ResultSetParser[Stream[A]] {
def apply(rows: SqlParser.ResultSet): SqlResult[Stream[A]] = rows.headOption.map(p(_)) match {
case None => Success(Stream.empty[A])
case Some(Success(a)) => {
val s: Stream[A] = a #:: rows.tail.flatMap(r => p(r) match {
case Success(r) => Some(r)
case _ => None
})
Success(s)
}
case Some(Error(msg)) => Error(msg)
}
}
Note that the Play SqlResult can only be either Success/Error while each row can also be Success/Error. I handle this for the first row only, assuming the rest will be the same. This may or may not work for you.
You're better off making smaller (paged) queries using limit and offset.
Anorm would need some modification if you're going to keep your (large) result around in memory and stream it from there. Then the other concern would be the new memory requirements for your JVM. And how would you deal with caching on the service level? See, previously you could easily cache something like photos?page=1&size=10, but now you just have photos, and the caching technology would have no idea what to do with the stream.
Even worse, and possibly on a JDBC-level, wrapping Stream around limited and offset-ed execute statements and just making multiple calls to the database behind the scenes, but this sounds like it would need a fair bit of work to port the Stream code that Scala generates to Java land (to work with Groovy, jRuby, etc), then get it on the approved for the JDBC 5 or 6 roadmap. This idea will probably be shunned as being too complicated, which it is.
You could wrap Stream around your entire DAO (where the limit and offset trickery would happen), but this almost sounds like more trouble than it's worth :-)
I ran into a similar situation but ran into a Call Stack Overflow exception when the built-in anorm function to convert to Streams attempted to parse the result set.
In order to get around this I elected to abandon the anorm ResultSetParser paradigm, and fall back to the java.sql.ResultSet object.
I wanted to use anorm's internal classes for the parsing result set rows, but, ever since version 2.4, they have made all of the pertinent classes and methods private to their package, and have deprecated several other methods that would have been more straight-forward to use.
I used a combination of Promises and Futures to work around the ManagedResource that anorm now returns. I avoided all deprecated functions.
import anorm._
import java.sql.ResultSet
import scala.concurrent._
def SqlStream[T](sql:SqlQuery)(parse:ResultSet => T)(implicit ec:ExecutionContext):Future[Stream[T]] = {
val conn = db.getConnection()
val mr = sql.preparedStatement(conn, false)
val p = Promise[Unit]()
val p2 = Promise[ResultSet]()
Future {
mr.map({ stmt =>
p2.success(stmt.executeQuery)
Await.ready(p.future, duration.Duration.Inf)
}).acquireAndGet(identity).andThen { case _ => conn.close() }
}
def _stream(rs:ResultSet):Stream[T] = {
if (rs.next()) parse(rs) #:: _stream(rs)
else {
p.success(())
Stream.empty
}
}
p2.future.map { rs =>
rs.beforeFirst()
_stream(rs)
}
}
A rather trivial usage of this function would be something like this:
def getText(implicit ec:ExecutionContext):Future[Stream[String]] = {
SqlStream(SQL("select FIELD from TABLE")) { rs => rs.getString("FIELD") }
}
There are, of course, drawbacks to this approach, however, this got around my problem and did not require inclusion of any other libraries.
I have a document in MongoDB that looks like this:
{"_id":"asdf", "data":[
{"a":"1","b":"2"},
{"a":"3","b":"4"},
{"a":"5","b":"6"},
]}
I would like to query that object using Scala, and convert the entries in "data" into a list of case classes. After a few hours' work, I've yet to come up with something that even compiles. Can someone point me to a tutorial with this information? This tutorial hasn't been any help. I've tried every combination of nested maps, fors, foreaches, casts, and pattern matching that I can come up with.
Edit: My super-ugly but now seemingly working code is now this:
def getData(source_id:String) = {
val source = collection.findOne(MongoDBObject("_id" -> source_id)).get
val data = source.get("data").asInstanceOf[BasicDBList]
var ret:List[Data] = List()
val it = presses.iterator
while(it.hasNext) {
val item = it.next.asInstanceOf[BasicDBObject]
ret = Data(
item.get("a").asInstanceOf[String],
item.get("b").asInstanceOf[String]
) :: ret
}
ret
}
Please, someone tell me there's a better way.
As you are using case classes anyway, the easiest solution is to just use salat – it will automatically serialize/deserialize to and from a mongo connection with very little boilerplate.
A minor point, but in your code you should be able to simply map across the DBObject holding structure rather than manually mutate the ret variable:
val ret = presses.map { item => Data(…) }
you may need to call .toList if you really want a List (though you may only need Seq or Iterable)