I successfully inserted data into a mongodb database, but I don't know how to extract data out of a query. I use the default scala mongodb drive :
"org.mongodb.scala" %% "mongo-scala-driver" % "1.1.1"
The documentation seems to contains errors, by the way. This line rises a compilation error while this is copy pasted from the doc :
collection.find().first().printHeadResult()
This is how I query a collection:
collection.find()
How to convert it to a scala collection of object on which I can iterate and process ? Thanks
Yes, I agree with the compilation error. I think "collection.find().first().printHeadResult()" is not part of scala driver 1.1.1 release. The current scala driver github which uses this code is "1.2.0-SNAPSHOT" version.
You can get the results using the below code. However, you may experience some async behavior using the below code. Please refer the driver documentation.
val observable: FindObservable[Document] = collection.find();
observable.subscribe ( new Observer[Document] {
override def onNext(result: Document): Unit = println(result.toJson())
override def onError(e: Throwable): Unit = println("Failed" + e.getMessage)
override def onComplete(): Unit = println("Completed")
})
Mongo driver Observables link
This is answered from the best of my current knowledge. I spent a lot of time using casbah and I've recently switched to using the new async scala driver, so there may be more ergonomic ways to do some of this stuff that I don't know yet.
Basically you need to transform the result of the observable and then eventually, you'll probably want to turn it into something that isn't an observable so you can have synchronous code interact with it (maybe, depending on what you're doing).
In the current Mongo Scala API (2.7.0 as of writing this), you might process a list of documents like this:
coll.find(Document("head" -> 1)).map(dbo => dbo.getInteger("head"))
That takes the list of documents where head is equal to one and then applies the map function to convert it from the Document (dbo) into an Int by extracting the "head" element (note this will fall in ugly fashion if there isn't a head field or the field is not an int. There are more robust ways to get values out using get[T]).
You can find a full list of the operations that an Observable supports here:
https://mongodb.github.io/mongo-scala-driver/2.7/reference/observables/
under the list of Monadic operators.
The other part is how do you get the good stuff out the Observable because you want to do something synchronous with them. The best answer I have found so far is to dump the Observable into a Future and then calling Await.result on that.
val e = coll.find(Document("head" -> 1)).map(dbo => dbo.getInteger("head"))
val r = Await.result(e.toFuture(), Duration.Inf)
println(r)
That will print out the List[Int] that was created by evaluating the map function for each Document in the Observable.
Related
trying to upsert with the new Scala Async Driver using this code, but the DB never gets created even though this is called many times:
override def insertOrUpdateOne(user: UserNew): Future[Unit] = {
users.replaceOne(equal("_id", user._id), user)
.toFuture
.map(_ => ())
}
mongo client created fine, collection created fine, but the above causes no write to be performed in fact the db is never created (which should happen on the first write). I have other code that uses casbah and that is working synchronously.
users is the Mongo collection with no syntactic sugar from me. (edited)
toFuture seems to do the subscribe (which should trigger the write?) but it still doesn't work. I'm mixing some collections with casbah and one with the async client, the casbah ones work but not the one above using async.
Any ideas?
it should be
override def insertOrUpdateOne(user: UserNew): Future[Unit] = {
users.replaceOne(equal("_id", user._id), user, UpdateOptions().upsert(true))
.toFuture
.map(_ => ())
}
Nothing in the docs but when I read the code I see it defaults upsert to false.
BTW this allows us to ignore all the Observable and subscribe stuff and setup of callbacks. Just process the results in a Scala idiomatic someFuture.map() This makes the UsersDaonon-Mongo specific code as an interface. This in turn allows all code in the UsersDaoImplto handle Mongo specific stuff, which enables the Impl to be injected so a runtime choice can be made about which Impl to use. Mongo can be replaced without dependent code caring or knowing about the Impl.
I tried to use the Slick(3.0.2) to operate database in my project with scala.
Here is my part of code
val query = table.filter(_.username === "LeoAshin").map { user => (user.username, user.password, user.role, user.delFlg) }
val f = db.run(query.result)
How can I read the data from "f"
I have tried google the solution many times, but no answer can solved my confuse
Thanks a lot
f is a Future and there are several things you can do to get at the value, depending on just how urgently you need it. If there's nothing else you can do while you're waiting then you will just have to wait for it to complete and then get the value. Perhaps the easiest is along the following lines (from the Slick documentation):
val q = for (c <- coffees) yield c.name
val a = q.result
val f: Future[Seq[String]] = db.run(a)
f.onSuccess { case s => println(s"Result: $s") }
You could then go on to do other things that don't depend on the result of f and the result of the query will be printed on the console asynchronously.
However, most of the time you'll want to use the value for some other operation, perhaps another database query. In that case, the easiest thing to do is to use a for comprehension. Something like the following:
for (r <- f) yield db.run(q(r))
where q is a function/method that will take the result of your first query and build another one. The result of this for comprehension will also be a Future, of course.
One thing to be aware of is that if you are running this in a program that will do an exit after all the code has been run, you will need to use Await (see the Scala API) to prevent the program exiting while one of your db queries is still working.
Type of query.result is DBIO. When you call db.run, it turns into Future.
If you want to print the data, use
import scala.concurrent.ExecutionContext.Implicits.global
f.foreach(println)
To continue working with data, use f.map { case (username, password, role, delFlg) ⇒ ... }
If you want to block on the Future and get the result (if you're playing around in REPL for example), use something like
import scala.concurrent.Await
import scala.concurrent.duration._
Await.result(f, 1.second)
Bear in mind, this is not what you want to do in production code — blocking on Futures is a bad practice.
I generally recommend learning about Scala core types and Futures specifically. Slick "responsibility" ends when you call db.run.
I use mongo-scala-driver 1.0.1 that return Observable[Completed] after insertMany.
I put many documents in cycle and I need to do some action after all of it already inserted.
I can use observable.toFuture to get Seq[Future[Completed]] and then Future.sequence(...) to handle future.
But is it possible to transform Seq[Observable[Completed]] to Observable[...] in Scala? Or there is a better way to handle it?
mongo-scala-driver has it's own Observable trait org.mongodb.scala.Observable
Here is an example of how to flatten List of Observables into single Observable.
List(
Observable.interval(200 millis),
Observable.interval(400 millis),
Observable.interval(800 millis)
).toObservable.flatten.take(12).toBlocking.foreach(println(_))
You can find this and many more examples here: https://github.com/ReactiveX/RxScala/blob/0.x/examples/src/test/scala/examples/RxScalaDemo.scala
Edit
Here is what should work with mongo api, although I didn't test it.
val observables: Seq[Observable[Int]] = ???
val result: Observable[Int] = Observable(observables).flatMap(identity)
I am not getting any useful information about "how to close connection for mongodb using casbah API". Actually, I have defined multiple methods and in each method I need to establish a connection with mongodb. After working I need to close that too. I am using Scala.
one of the method like (code example in scala):
import com.mongodb.casbah.Imports._
import com.mongodb.casbah.MongoConnection
def index ={
val mongoConn = MongoConnection(configuration("hostname"))
val log = mongoConn("ab")("log")
val cursor = log.find()
val data = for {x <- cursor} yield x.getAs[BasicDBObject]("message").get
html.index(data.toList)
//mongoConn.close() <-- here i want to close the connection but this .close() is not working
}
It is unclear, from your question why exactly close is not working. Does it throw some exception, it is not compiling, or has no effect?
But since MongoConnection is a thin wrapper over com.mongodb.Mongo, you could work with underlying Mongo directly, just like in plain old Java driver:
val mongoConn = MongoConnection(configuration("hostname"))
mongoConn.underlying.close()
Actually, that's exactly, how close is implemented in casbah.
Try using .close instead. If a function doesn't have arguments in scala, you sometimes don't use parentheses after it.
EDIT: I had wrong information, edited to include correct information + link.
Passing messages around with actors is great. But I would like to have even easier code.
Examples (Pseudo-code)
val splicedList:List[List[Int]]=biglist.partition(100)
val sum:Int=ActorPool.numberOfActors(5).getAllResults(splicedList,foldLeft(_+_))
where spliceIntoParts turns one big list into 100 small lists
the numberofactors part, creates a pool which uses 5 actors and receives new jobs after a job is finished
and getallresults uses a method on a list. all this done with messages passing in the background. where maybe getFirstResult, calculates the first result, and stops all other threads (like cracking a password)
With Scala Parallel collections that will be included in 2.8.1 you will be able to do things like this:
val spliced = myList.par // obtain a parallel version of your collection (all operations are parallel)
spliced.map(process _) // maps each entry into a corresponding entry using `process`
spliced.find(check _) // searches the collection until it finds an element for which
// `check` returns true, at which point the search stops, and the element is returned
and the code will automatically be done in parallel. Other methods found in the regular collections library are being parallelized as well.
Currently, 2.8.RC2 is very close (this or next week), and 2.8 final will come in a few weeks after, I guess. You will be able to try parallel collections if you use 2.8.1 nightlies.
You can use Scalaz's concurrency features to achieve what you want.
import scalaz._
import Scalaz._
import concurrent.strategy.Executor
import java.util.concurrent.Executors
implicit val s = Executor.strategy[Unit](Executors.newFixedThreadPool(5))
val splicedList = biglist.grouped(100).toList
val sum = splicedList.parMap(_.sum).map(_.sum).get
It would be pretty easy to make this prettier (i.e. write a function mapReduce that does the splitting and folding all in one). Also, parMap over a List is unnecessarily strict. You will want to start folding before the whole list is ready. More like:
val splicedList = biglist.grouped(100).toList
val sum = splicedList.map(promise(_.sum)).toStream.traverse(_.sum).get
You can do this with less overhead than creating actors by using futures:
import scala.actors.Futures._
val nums = (1 to 1000).grouped(100).toList
val parts = nums.map(n => future { n.reduceLeft(_ + _) })
val whole = (0 /: parts)(_ + _())
You have to handle decomposing the problem and writing the "future" block and recomposing it in to a final answer, but it does make executing a bunch of small code blocks in parallel easy to do.
(Note that the _() in the fold left is the apply function of the future, which means, "Give me the answer you were computing in parallel!", and it blocks until the answer is available.)
A parallel collections library would automatically decompose the problem and recompose the answer for you (as with pmap in Clojure); that's not part of the main API yet.
I'm not waiting for Scala 2.8.1 or 2.9, it would rather be better to write my own library or use another, so I did more googling and found this: akka
http://doc.akkasource.org/actors
which has an object futures with methods
awaitAll(futures: List[Future]): Unit
awaitOne(futures: List[Future]): Future
but http://scalablesolutions.se/akka/api/akka-core-0.8.1/
has no documentation at all. That's bad.
But the good part is that akka's actors are leaner than scala's native ones
With all of these libraries (including scalaz) around, it would be really great if scala itself could eventually merge them officially
At Scala Days 2010, there was a very interesting talk by Aleksandar Prokopec (who is working on Scala at EPFL) about Parallel Collections. This will probably be in 2.8.1, but you may have to wait a little longer. I'll lsee if I can get the presentation itself. to link here.
The idea is to have a collections framework which parallelizes the processing of the collections by doing exactly as you suggest, but transparently to the user. All you theoretically have to do is change the import from scala.collections to scala.parallel.collections. You obviously still have to do the work to see if what you're doing can actually be parallelized.