ReactiveMongo database dump with Play Framework 2.5 - scala

I'm trying to dump my mongo database into a json object but because my queries to the database are asynchrounous I'm having problems.
Each collection in my database contains user data and each collection name is a user name.
So, when I want to get all my users data I recover all the collection names and then loop over them to recover each collection one by one.
def databaseDump(prom : Promise[JsObject]) = {
for{
dbUsers <- getUsers
} yield dbUsers
var rebuiltJson = Json.obj()
var array = JsArray()
res.map{ users =>
users.map{ userNames =>
if(userNames.size == 0){
prom failure new Throwable("Empty database")
}
var counter = 0
userNames.foreach { username =>
getUserTables(username).map { tables =>
/* Add data to array*/
...
counter += 1
if(counter == userNames.size){
/*Add data to new json*/
...
prom success rebuiltJson
}
}
}
}
}
This kinda works, but sometimes the promise is succesfully triggered even though all the data has not yet been recoverd. This is due to that fact that my counter variable is not a reliable solution.
Is there a way to loop over all the users, query the database and wait for all the data to be recovered before succesfully triggering the promise? I tried to use for comprehension but didn't find a way to do it. Is there a way to dump a whole mongo DB into one Json : { username : data, username : data ..} ?

The users/tables terminology was getting me confused, so I wrote a new function that dumps a database into a single JsObject.
// helper function to find all documents inside a collection c
// and return them as a single JsArray
def getDocs(c: JSONCollection)(implicit ec: ExecutionContext) = c.find(Json.obj()).cursor[JsObject]().jsArray()
def dumpToJsObject(db: DefaultDB)(implicit ec: ExecutionContext): Future[JsObject] = {
// get a list of all collections in the db
val collectionNames = db.collectionNames
val collections = collectionNames.map(_.map(db.collection[JSONCollection](_)))
// each entry is a tuple collectionName -> content (as JsArray)
val namesToDocs = collections.flatMap {
colls => Future.sequence(colls.map(c => getDocs(c).map(c.name -> _)))
}
// convert to a single JsObject
namesToDocs.map(JsObject(_))
}
I haven't tested it yet (I will do so later), but this function should at least give you the general idea. You get the list of all collections inside the database. For each collection, you perform a query to get all documents inside that collection. The list of documents is converted into a JsArray, and finally all collections are composed to a single JsObject with the collection names as keys.

If the goal is to write the data to an output stream (local/file or network), with side effects.
import scala.concurrent.{ ExecutionContext, Future }
import reactivemongo.bson.BSONDocument
import reactivemongo.api.{ Cursor, MongoDriver, MongoConnection }
val mongoUri = "mongodb://localhost:27017/my_db"
val driver = new MongoDriver
val maxDocs = Int.MaxValue // max per collection
// Requires to have an ExecutionContext in the scope
// (e.g. `import scala.concurrent.ExecutionContext.Implicits.global`)
def dump()(implicit ec: ExecutionContext): Future[Unit] = for {
uri <- Future.fromTry(MongoConnection.parseURI(mongoUri))
con = driver.connection(uri)
dn <- Future(uri.db.get)
db <- con.database(dn)
cn <- db.collectionNames
_ <- Future.sequence(cn.map { collName =>
println(s"Collection: $collName")
db.collection(collName).find(BSONDocument.empty). // findAll
cursor[BSONDocument]().foldWhile({}, maxDocs) { (_, doc) =>
// Replace println by appropriate side-effect
Cursor.Cont(println(s"- ${BSONDocument pretty doc}"))
}
})
} yield ()
If using with the JSON serialization pack, just replace BSONDocument with JsObject (e.g. BSONDocument.empty ~> Json.obj()).
If testing from the Scala REPL, after having paste the previous code, it can be executed as following.
dump().onComplete {
case result =>
println(s"Dump result: $result")
//driver.close()
}

Related

How to save the outcome of collection.find into Array

object ConnHelper extends Serializable{
lazy val jedis = new Jedis("localhost")
lazy val mongoClient = MongoClient("mongodb://localhost:27017/recommender")
}
val ratingCollection = ConnHelper.mongoClient.getDatabase(mongoConfig.db).getCollection(MONGODB_RATING_COLLECTION)
val Existratings: Observable[Option[BsonValue]] = ratingCollection
.find(equal("userId",1234))
.map{
item => item.get("productId")
}
The documents are like these
{
"id":****,
"userId":4567,
"productId":12345,
"score":5.0
}
I use Scala and Mongo-Scala-driver 2.9.0 to connect MongoDB and find documents where the "userId" field equal 1234, then I want save the value of "productId" of the documents into Array, but the returned value is observable type.
Could anyone tell how to save the query outcome into Array? I would appreciate it very much.
Please try a method which uses a Promise/Future structure to find the sequence of documents that match the search criteria. For example:
import org.mongodb.scala.bson._
def find (search_key: String, search_value: String, collection_name: String): Seq[Document] = {
// The application will need to wait for the find operation thread to complete
// in order to process the returned value.
log.debug(s"Starting database find_all operation thread")
// Set up new client connection, database, and collection
val _client: MongoClient = MongoClient(config_client)
val _database: MongoDatabase = _client.getDatabase(config_database)
val collection: MongoCollection[Document] = _database.getCollection(collection_name)
// Set up result sequence
var result_seq : Seq[Document] = Seq.empty
// Set up Promise container to wait for the database operation to complete
val promise = Promise[Boolean]
// Start insert operation thread; once the thread has finished, read resulting documents.
collection.find(equal(search_key, search_value)).collect().subscribe((results: Seq[Document]) => {
log.trace(s"Found operation thread completed")
// Append found documents to the results
result_seq = result_seq ++ results
log.trace(s" Result sequence: $result_seq")
promise.success(true) // set Promise container
_client.close // close client connection to avoid memory leaks
})
val future = promise.future // Promise completion result
Await.result(future, Duration.Inf) // wait for the promise completion result
// Return document sequence
result_seq
}
Then you can iterate through the document sequence and pull the products into a List (better than Array).
def read : List[String] = {
val document_seq = Database.find("userID","1234",collection)
// Set up an empty return map
val return_map : mutable.Map[String, String] = mutable.Map.empty
// Translate data from each document into Product object
document_seq.foreach(_document => {
return_map.put(
_document("id").asString.getValue,
_document("productId").asString.getValue
)
})
// Convert values to list map and return
return_map.values.toList
}
The Mongo Scala Driver uses the Observable model which is composed by three parts.
You need to subscribe an observer to the observable. Take a look to the examples.
The fastest solution is to coplete with a toFuture call:
val Existratings =
ratingCollection
.find(equal("userId",1234))
.map{
item => item.get("productId")
}.toFuture()
That will return a Sep of BsonValues with the resultset
this maybe:
val productIds = ratingCollection
.find(equal("userId",1234))
.map { _.get("productId") }
.toArray
The most direct solution to get an Array is to fold directly in one:
ratingCollection
.find(???)
.map { ??? }
.foldLeft(Array.empty[Item]) { _ :+ _ }
.head() //in order to get a Future[Array[Item]]
.onComplete {
case Success(values: Array[Item]) => //capture the array
case Failure(exception) => //fail logic
}
It's probably best to work with the Future rather build your own Observer logic for subscription.

Slick update operation returns before object is flushed in the database

I am experiencing a scenario where when I fetch an object immediately after updating it, sometimes the result I get from the DB does not contain the most recent changes.
This has led me to think that the update thread returns before the object is actually committed in the DB. Is this expected behavior?
I would think that the update method would only return after the changes have been successfully flushed to the DB however it looks like this not guaranteed.
Below is pseudo code demonstrating what I am talking about.
def processObject = {
for {
objectId: Option[Long] <- saveObjectInDb
_ <- {
//perform other synchronous business logic and then update created object details
dao.findById(objectId.get).map { objectOption: Option[MyObject] =>
dao.update(objectOption.get.copy(processingStep = "third-step"))
}
}
mostRecentMyObject <- dao.findById(objectId.get)
} yield mostRecentMyObject
}
Below is how my update logic looks like
def update(myObject: MyObject): Future[Int] = {
db.run(table.filter(_.id === myObject.id).update(myObject))
}
The problem is that you are not considering the inner Future returned by the update method.
Given the signature of findById:
def findById(id: Long): Future[Option[MyObject]]
the snippet:
dao.findById(objectId.get).map { objectOption: Option[MyObject] =>
dao.update(objectOption.get.copy(processingStep = "third-step"))
}
will gave an object of type Future[Future[Int]].
You should instead flatMap instead of map over the findById future, like so:
dao.findById(objectId.get).flatMap { objectOption: Option[MyObject] =>
dao.update(objectOption.get.copy(processingStep = "third-step"))
}
this will simplify to a single future (Future[Int]), and so you can be sure retrieve the object only once inserted.
Moreover you can rewrite this as:
def processObject = {
for {
objectId: Option[Long] <- saveObjectInDb
objectOption <- dao.findById(objectId.get)
_ <- dao.update(objectOption.get.copy(processingStep = "third-step"))
mostRecentMyObject <- dao.findById(objectId.get)
} yield mostRecentMyObject
}
because, into for-comprehension, the <- is a syntactic sugar for the flatMap

Why a Thread.sleep or closing the connection is required after waiting for a remove call to complete?

I'm again seeking you to share your wisdom with me, the scala padawan!
I'm playing with reactive mongo in scala and while I was writting a test using scalatest, I faced the following issue.
First the code:
"delete" when {
"passing an existent id" should {
"succeed" in {
val testRecord = TestRecord(someString)
Await.result(persistenceService.persist(testRecord), Duration.Inf)
Await.result(persistenceService.delete(testRecord.id), Duration.Inf)
Thread.sleep(1000) // Why do I need that to make the test succeeds?
val thrownException = intercept[RecordNotFoundException] {
Await.result(persistenceService.read(testRecord.id), Duration.Inf)
}
thrownException.getMessage should include(testRecord._id.toString)
}
}
}
And the read and delete methods with the code initializing connection to db (part of the constructor):
class MongoPersistenceService[R](url: String, port: String, databaseName: String, collectionName: String) {
val driver = MongoDriver()
val parsedUri: Try[MongoConnection.ParsedURI] = MongoConnection.parseURI("%s:%s".format(url, port))
val connection: Try[MongoConnection] = parsedUri.map(driver.connection)
val mongoConnection = Future.fromTry(connection)
def db: Future[DefaultDB] = mongoConnection.flatMap(_.database(databaseName))
def collection: Future[BSONCollection] = db.map(_.collection(collectionName))
def read(id: BSONObjectID): Future[R] = {
val query = BSONDocument("_id" -> id)
val readResult: Future[R] = for {
coll <- collection
record <- coll.find(query).requireOne[R]
} yield record
readResult.recover {
case NoSuchResultException => throw RecordNotFoundException(id)
}
}
def delete(id: BSONObjectID): Future[Unit] = {
val query = BSONDocument("_id" -> id)
// first read then call remove. Read will throw if not present
read(id).flatMap { (_) => collection.map(coll => coll.remove(query)) }
}
}
So to make my test pass, I had to had a Thread.sleep right after waiting for the delete to complete. Knowing this is evil usually punished by many whiplash, I want learn and find the proper fix here.
While trying other stuff, I found instead of waiting, entirely closing the connection to the db was also doing the trick...
What am I misunderstanding here? Should a connection to the db be opened and close for each call to it? And not do many actions like adding, removing, updating records with one connection?
Note that everything works fine when I remove the read call in my delete function. Also by closing the connection, I mean call close on the MongoDriver from my test and also stop and start again embed Mongo which I'm using in background.
Thanks for helping guys.
Warning: this is a blind guess, I've no experience with MongoDB on Scala.
You may have forgotten to flatMap
Take a look at this bit:
collection.map(coll => coll.remove(query))
Since collection is Future[BSONCollection] per your code and remove returns Future[WriteResult] per doc, so actual type of this expression is Future[Future[WriteResult]].
Now, you have annotated your function as returning Future[Unit]. Scala often makes Unit as a return value by throwing away possibly meaningful values, which it does in your case:
read(id).flatMap { (_) =>
collection.map(coll => {
coll.remove(query) // we didn't wait for removal
() // before returning unit
})
}
So your code should probably be
read(id).flatMap(_ => collection.flatMap(_.remove(query).map(_ => ())))
Or a for-comprehension:
for {
_ <- read(id)
coll <- collection
_ <- coll.remove(query)
} yield ()
You can make Scala warn you about discarded values by adding a compiler flag (assuming SBT):
scalacOptions += "-Ywarn-value-discard"

Make CRUD operations with ReactiveMongo

I have started to learn scala recently and trying to create simple api using akka HTTP and reactivemongo.
Have problems with simple operations. Spend a lot of time digging docks, official tutorials, stackoverflow etc. Probably I am missing something very simple.
My code:
object MongoDB {
val config = ConfigFactory.load()
val database = config.getString("mongodb.database")
val servers = config.getStringList("mongodb.servers").asScala
val credentials = Lis(Authenticate(database,config.getString("mongodb.userName"), config.getString("mongodb.password")))
val driver = new MongoDriver
val connection = driver.connection(servers, authentications = credentials)
//val db = connection.database(database)
}
Now I would like to make basic CRUD operations. I am trying different code snippets but can't get it working.
Here are some examples:
object TweetManager {
import MongoDB._
//taken from docs
val collection = connection.database("test").
map(_.collection("tweets"))
val document1 = BSONDocument(
"author" -> "Tester",
"body" -> "test"
)
//taken from reactivemongo tutorial, it had extra parameter as BSONCollection, but can't get find the way of getting it
def insertDoc1(doc: BSONDocument): Future[Unit] = {
//another try of getting the collection
//def collection = for ( db1 <- db) yield db1.collection[BSONCollection]("tweets")
val writeRes: Future[WriteResult] = collection.insert(doc)
writeRes.onComplete { // Dummy callbacks
case Failure(e) => e.printStackTrace()
case Success(writeResult) =>
println(s"successfully inserted document with result: $writeResult")
}
writeRes.map(_ => {}) // in this example, do nothing with the success
}
}
insertDoc1(document1)
I can't do any operation on the collection. IDE gives me: "cannot resolve symbol". Compiler gives error:
value insert is not a member of scala.concurrent.Future[reactivemongo.api.collections.bson.BSONCollection]
What is the correct way of doing it?
You are trying to call the insert operation on a Future[Collection], rather than on the underlying collection (calling operation on Future[T] rather than on T is not specific to ReactiveMongo).
It's recommanded to have a look at the documentation.

Higher order functions with Scala Slick for DRY goodness

I have an idea how my data access layer with Scala Slick should look like, but I'm not sure if it's really possible.
Let's assume I have a User table which has the usual fields like id, email, password, etc.
object Users extends Table[(String, String, Option[String], Boolean)]("User") {
def id = column[String]("id", O.PrimaryKey)
def email = column[String]("email")
def password = column[String]("password")
def active = column[Boolean]("active")
def * = id ~ email ~ password.? ~ active
}
And I wish to query them in different ways, currently the ugly way is to have a new database session, do the for comprehension and then do different if statements to achieve what I want.
e.g.
def getUser(email: String, password: String): Option[User] = {
database withSession { implicit session: Session =>
val queryUser = (for {
user <- Users
if user.email === email &&
user.password === password &&
user.active === true
} //yield and map to user class, etc...
}
def getUser(identifier: String): Option[User] = {
database withSession { implicit session: Session =>
val queryUser = (for {
user <- Users
if user.id === identifier &&
user.active === true
} //yield and map to user class, etc...
}
What I would prefer is to have a private method for the query and then public methods which define queries along the lines of
type UserQuery = User => Boolean
private def getUserByQuery(whereQuery: UserQuery): Option[User] = {
database withSession { implicit session: Session =>
val queryUser = (for {
user <- Users
somehow run whereQuery here to filter
} // yield and boring stuff
}
def getUserByEmailAndPassword(email, pass){ ... define by query and call getUserByQuery ...}
getUserById(id){….}
getUserByFoo{….}
That way, the query logic is encapsulated in the relevant public functions and the actual querying and mapping to the user object is in a reusable function that other people dont need to be concerned with.
The problem I have is trying to refactor the "where" bit's into functions that I can pass around. Trying to do things like select them in intellij and using the refactoring results in some pretty crazy typing going on.
Does anyone have any examples they could show of doing close to what I am trying to achieve?
1) wrapping queries in a def means the query statement is re-generated on every single request, and, since query params are not bound, no prepared statement is passed to the underlying DBMS.
2) you're not taking advantage of composition
Instead, if you define parameterized query vals that def query wrappers call, you can get the best of both worlds.
val uBase = for{
u <- Users
ur <- UserRoles if u.id is ur.UserID
} yield (u,ur)
// composition: generates prepared statement one time, on startup
val byRole = for{ roleGroup <- Parameters[String]
(u,ur) <- uBase
r <- Roles if(r.roleGroup is roleGroup) && (r.id is ur.roleID)
} yield u
def findByRole(roleGroup: RoleGroup): List[User] = {
db withSession { implicit ss:SS=>
byRole(roleGroup.toString).list
}
}
If you need one-off finders for a single property, use:
val byBar = Foo.createFinderBy(_.bar)
val byBaz = Foo.createFinderBy(_.baz)
Can't remember where, maybe on SO, or Slick user group, but I did see a very creative solution that allowed for multiple bound params, basically a createFinderBy on steroids. Not so useful to me though, as the solution was limited to a single mapper/table object.
At any rate composing for comprehensions seems to do what you're trying to do.
I have recently done something similar, one way to do this could be following, write a general select method which takes a predicate
def select(where: Users.type => Column[Boolean]): Option[User] = {
database withSession { implicit session: Session =>
val queryUser = (for {
user <- Users where(user)
} //yield and map to user class, etc...
}
and then write the method which passes the actual predicate as a higher order function
def getUserByEmail(email:String):Option[User]={
select((u: Users.type) => u.*._2 === email)
}
similarly
def getActiveUserByEmail(email:String):Option[User]={
select((u: Users.type) => u.*._2 === email && u.*._4 === true)
}