I tried to use the Slick(3.0.2) to operate database in my project with scala.
Here is my part of code
val query = table.filter(_.username === "LeoAshin").map { user => (user.username, user.password, user.role, user.delFlg) }
val f = db.run(query.result)
How can I read the data from "f"
I have tried google the solution many times, but no answer can solved my confuse
Thanks a lot
f is a Future and there are several things you can do to get at the value, depending on just how urgently you need it. If there's nothing else you can do while you're waiting then you will just have to wait for it to complete and then get the value. Perhaps the easiest is along the following lines (from the Slick documentation):
val q = for (c <- coffees) yield c.name
val a = q.result
val f: Future[Seq[String]] = db.run(a)
f.onSuccess { case s => println(s"Result: $s") }
You could then go on to do other things that don't depend on the result of f and the result of the query will be printed on the console asynchronously.
However, most of the time you'll want to use the value for some other operation, perhaps another database query. In that case, the easiest thing to do is to use a for comprehension. Something like the following:
for (r <- f) yield db.run(q(r))
where q is a function/method that will take the result of your first query and build another one. The result of this for comprehension will also be a Future, of course.
One thing to be aware of is that if you are running this in a program that will do an exit after all the code has been run, you will need to use Await (see the Scala API) to prevent the program exiting while one of your db queries is still working.
Type of query.result is DBIO. When you call db.run, it turns into Future.
If you want to print the data, use
import scala.concurrent.ExecutionContext.Implicits.global
f.foreach(println)
To continue working with data, use f.map { case (username, password, role, delFlg) ⇒ ... }
If you want to block on the Future and get the result (if you're playing around in REPL for example), use something like
import scala.concurrent.Await
import scala.concurrent.duration._
Await.result(f, 1.second)
Bear in mind, this is not what you want to do in production code — blocking on Futures is a bad practice.
I generally recommend learning about Scala core types and Futures specifically. Slick "responsibility" ends when you call db.run.
Related
I have been trying to understand why Scala Futures are regarded as eager and violate referential transparency. I think I understand this part reasonably. However, I have trouble understanding what this means:
(A => Unit) => Unit
With respect to a Future.
I am not sure if this is the right forum, but ELI5 answers appreciated
The reason why Future is regarded as eager (and as such violates referential transparency) is because it evaluates as soon as the value is defined. Below is the ELI5 and non-ELI5 explanation for this.
As for (A => Unit) => Unit, it's a signature for the callback-driven asynchronous computation. In a synchronous computation, you evaluate the Future[A] to A, even if it means sitting in place and waiting a long time for the evaluation to finish. But with asynchronous computation, you don't sit and wait; instead, you pass a function of type A => Unit, and you immediately get the Unit back. Later, when the computation has finished in the background and value A has been produced, function A => Unit will be applied to it. So basically you tell the Future "once you obtain A, here's what I want you to do with it", and it responds "OK, will do, here's a Unit for you, leave now and do other stuff".
TBH I wouldn't overthink this signature too much because that's not what your mental model of working with Future should be. Instead, just become familiar with the notion of mapping and flatMapping. When you have a value wrapped in a Future, you shouldn't try to get that value out of the Future context because that would be a blocking synchronous operation. But what you can do is map over it and say "alright Future, I don't need this value A right now, I just want to describe a function A => B to you which turns it to another value B, and you make sure to apply it to once you have the original A". And if B is wrapped in a yet another Future, meaning your function is not A => B but A => Future[B], instead of mapping you should use flatMap. This is how you chain asynchronous operations. Imagine a database query which as a parameter needs something returned in the previous query.
And that's it. Somewhere at the end of the world, e.g. when you're done processing an http request and are ready to send some response payload over the wire, you will finally unwrap that future in a synchronous way (you can't send a payload if you don't know what to put in it).
Now, about referential transparency in Future:
ELI5:
Imagine you have two daughters, Anna and Betty. You tell them that their task will be to count to 20 out loud. You also tell them that Betty should start only after Anna is done. Whole process is hence expected to take about 40 seconds.
But if they evaluate their task eagerly (like Future does), as soon as you explain the task to them, they will each start counting right away. Whole process will hence last about 20 seconds.
In the context of programming, referential transparency says that you should always be able to replace (pseudocode):
// imagine >> as a pipe operator which starts the next function
// only after previous one has terminated
count(20) >> count(20)
with
anna = count(20)
betty = count(20)
anna >> betty
but that's not true in this situation because of eager evaluation (the girls start counting as soon as their task is explained to them, so in the second case the program will last only 20 seconds regardless of the pipe).
non-ELI5:
Let's prepare an execution context for Future and a function that will be evaluated. It simply sleeps for two seconds before printing "hi".
import scala.concurrent.ExecutionContext.Implicits.global
def f = {
Thread.sleep(2000)
println("hi")
}
Let's now write a for comprehension which will create two Futures one after another:
val done = for {
f1 <- Future(f)
f2 <- Future(f)
} yield (f1, f2)
import scala.concurrent.duration._
Await.result(done, 5000 millis)
As expected, after two seconds we'll get the first "hi" (from f1), and after additional two seconds we'll get the second "hi" (from f2).
Now let's do a small modification; we will first define two Future values, and then we'll use those in the for comprehension:
val future1 = Future(f)
val future2 = Future(f)
val done = for {
f1 <- future1
f2 <- future2
} yield (f1, f2)
import scala.concurrent.duration._
Await.result(done, 5000 millis)
What happens this time is that after approximately two seconds you get two simultaneous "hi" printouts. This is because both future1 and future2 started getting evaluated as soon as they were defined. By the time they got chained in the for comprehension, they were already running alongside each other on the given execution context.
This is why referential transparency is broken; normally you should be able to replace:
doStuff(foo)
with
val f = foo
doStuff(f)
without having any consequence on the behaviour of the program, but in the case of Future, as you can see above, that's not the case.
Say I have the following snippet
def testFailure2() = {
val f1 = Future.failed(new Exception("ex1"))
val f2 = Future.successful(2);
val f3 = Future.successful((5));
val f4 = Future.failed(new Exception("ex4"))
val l = List(f1, f2, f3, f4)
l
}
The return type is List[Future[Int]]. In a normal way, I can just do Future.sequence and get List[Future[Int]]. But in this scenario it won't work as I have a failed Future. So I want to convert this to List[Future[Int]] by ignoring the failed Futures. How do I do that?
Second Q on similar topic I have is, I understand filter, collect, partition, etc on a List. In this scenario, say I wanted to filter/partition the list into two lists
- Failed Futures in one
- Successfully done Futures in another.
How do I do that?
One way would be to first convert all Future[Int]s to Future[Option[Int]] that always succeed (but result in None if the original future fails). Then you can use Future.sequence and then flatten the result:
def sequenceIgnoringFailures[A](xs: List[Future[A]])(implicit ec: ExecutionContext): Future[List[A]] = {
val opts = xs.map(_.map(Some(_)).fallbackTo(Future(None)))
Future.sequence(opts).map(_.flatten)
}
The other answer is correct : you should use a Future[List[X]] where X is something that differentiate between failure and success. It can be an Option, an Either, a Try, or whatever you want.
It seems like you're bothered by this, and I suppose it's because you're willing to find something like :
Do all these futures in parallel, ignore the failed ones during the process
And you're given
Do all these futures, wait for everything to finish, and discard based on the result
But actually, there is no special way to express "ignore the failed ones". Something has to acknowledge each future result since you're interested in it, otherwise starting it makes no sense in the first place. And this something has to wait for all futures to finish anyway. And as such, the flag for "you can now ignore me" is indeed the Option being None, the Either being Left, or the Try being Failure. There is not, afaik, a specific flag for futures for "this result being discarded", and I don't think scala would need one.
So, fear not, and go for Future[List[X]], because it actually expresses what you want ! :-)
I successfully inserted data into a mongodb database, but I don't know how to extract data out of a query. I use the default scala mongodb drive :
"org.mongodb.scala" %% "mongo-scala-driver" % "1.1.1"
The documentation seems to contains errors, by the way. This line rises a compilation error while this is copy pasted from the doc :
collection.find().first().printHeadResult()
This is how I query a collection:
collection.find()
How to convert it to a scala collection of object on which I can iterate and process ? Thanks
Yes, I agree with the compilation error. I think "collection.find().first().printHeadResult()" is not part of scala driver 1.1.1 release. The current scala driver github which uses this code is "1.2.0-SNAPSHOT" version.
You can get the results using the below code. However, you may experience some async behavior using the below code. Please refer the driver documentation.
val observable: FindObservable[Document] = collection.find();
observable.subscribe ( new Observer[Document] {
override def onNext(result: Document): Unit = println(result.toJson())
override def onError(e: Throwable): Unit = println("Failed" + e.getMessage)
override def onComplete(): Unit = println("Completed")
})
Mongo driver Observables link
This is answered from the best of my current knowledge. I spent a lot of time using casbah and I've recently switched to using the new async scala driver, so there may be more ergonomic ways to do some of this stuff that I don't know yet.
Basically you need to transform the result of the observable and then eventually, you'll probably want to turn it into something that isn't an observable so you can have synchronous code interact with it (maybe, depending on what you're doing).
In the current Mongo Scala API (2.7.0 as of writing this), you might process a list of documents like this:
coll.find(Document("head" -> 1)).map(dbo => dbo.getInteger("head"))
That takes the list of documents where head is equal to one and then applies the map function to convert it from the Document (dbo) into an Int by extracting the "head" element (note this will fall in ugly fashion if there isn't a head field or the field is not an int. There are more robust ways to get values out using get[T]).
You can find a full list of the operations that an Observable supports here:
https://mongodb.github.io/mongo-scala-driver/2.7/reference/observables/
under the list of Monadic operators.
The other part is how do you get the good stuff out the Observable because you want to do something synchronous with them. The best answer I have found so far is to dump the Observable into a Future and then calling Await.result on that.
val e = coll.find(Document("head" -> 1)).map(dbo => dbo.getInteger("head"))
val r = Await.result(e.toFuture(), Duration.Inf)
println(r)
That will print out the List[Int] that was created by evaluating the map function for each Document in the Observable.
I have a service that makes use of the Scala Async library. I'm using this library primarily to time my database calls. The method that I want to test contains multiple calls to the database using the async await mechanism. A pseudo code of what I have is as below:
def myDbMethod() = async {
val firstCall = await(call the db and get the result)
val secondCall = await(all the db and get the result)
val thirdCall = await(all the db and get the result)
...
}
In my Scala test unit test, I have
Await.result(myDbMethod(), 10.seconds)
I was just trying to debug myMethod by running my unit test which would return with a test success even before getting into the secondCall. I mean I had breakpoints in all the 3 calls to the database, but the IntelliJ debugger would just exit out as soon as it finishes the first call to the database. Why is this? How can I test this behaviour using IntelliJ debugger?
I'm not sure that my answer would suffice your expectations, but it's a known issue. The problem is that async/await is quite a complicated macro, which does heavy transformations on the trees (you can check the output by enabling -Xprint:<phase_name_after_typer> flag). Unfortunately neither of existing IDEs (I'm working with Intellij and Ensime) can debug it, but I'm not familiar with their internals to explain why they can't in details.
From my experience I couldn't find any neat pros over the native for-comprehension, so you can stick with th native syntax or explicit flatmap calls, which is nicely debuggable.
This construct could be used for depdendant asynchronous calls.
async / await adds some sugar to make that easier, but to formalate that by hand you can do it like this:
def dbMethodDependant : Future[T] = for {
res1 <- firstCall
res2 <- secondCall(res2)
res3 <- thirdCall(res3)
} yield res3
Await.result(dbMethodDependant, forever)
I have the next code:
sc.parquetFile("some large parquet file with bc").registerTempTable("bcs")
sc.parquetFile("some large parquet file with imps").registerTempTable("imps")
val bcs = sc.sql("select * from bcs")
val imps = sc.sql("select * from imps")
I want to do:
bcs.map(x => wrapBC(x)).collect
imps.map(x => wrapIMP(x)).collect
but when I do this, it's running not async. I can to do it with Future, like that:
val bcsFuture = Future { bcs.map(x => wrapBC(x)).collect }
val impsFuture = Future { imps.map(x => wrapIMP(x)).collect }
val result = for {
bcs <- bcsFuture
imps <- impsFuture
} yield (bcs, imps)
Await.result(result, Duration.Inf) //this return (Array[Bc], Array[Imp])
I want to do this without Future, how can I do it?
Update This was originally composed before the question was updated. Given those updates, I agree with #stholzm's answer to use cartesian in this case.
There do exist a limited number of actions which will produce a FutureAction[A] for an RDD[A] and be executed in the background. These are available on the AsyncRDDActions class, and so long as you import SparkContext._ any RDD will can be implicitly converted to an AysnchRDDAction as needed. For your specific code example that would be:
bcs.map(x => wrapBC(x)).collectAsync
imps.map(x => wrapIMP(x)).collectAsync
In additionally to evaluating the DAG up to action in the background, the FutureAction produced has the cancel method to attempt to end processing early.
Caveat
This may not do what you think it does. If the intent is to get data from both sources and then combine them you're more likely to want to join or group the RDDs instead. For that you can look at the functions available in PairRDDFunctions, again available on RDDs through implicit conversion.
If the intention isn't to have the data graphs interact then so far in my experience then running batches concurrently might only serve to slow down both, though that may be a consequence of how the cluster is configured. If the resource manager is set up to give each execution stage a monopoly on the cluster in FIFO order (the default in standalone and YARN modes, I believe; I'm not sure about Mesos) then each of the asynchronous collects will contend with each other for that monopoly, run their tasks, then contend again for the next execution stage.
Compare this to using a Future to wrap blocking calls to downstream services or database, for example, where either the resources in question are completely separate or generally have enough resource capacity to handle multiple requests in parallel without contention.
Update: I misunderstood the question. The desired result is not the cartesian product Array[(Bc, Imp)].
But I'd argue that it does not matter how long the single map calls take because as soon as you add other transformations, Spark tries to combine them in an efficient way. As long as you only chain transformations on RDDs, nothing happens on the data. When you finally apply an action then the execution engine will figure out a way to produce the requested data.
So my advice would be to not think so much about the intermediate steps and avoid collect as much as possible because it will fetch all the data to the driver program.
It seems you are building a cartesian product yourself. Try cartesian instead:
val bc = bcs.map(x => wrapBC(x))
val imp = imps.map(x => wrapIMP(x))
val result = bc.cartesian(imp).collect
Note that collect is called on the final RDD and no longer on intermediate results.
You can use union for solve this problem. For example:
bcs.map(x => wrapBC(x).asInstanceOf[Any])
imps.map(x => wrapIMP(x).asInstanceOf[Any])
val result = (bcs union imps).collect()
val bcsResult = result collect { case bc: Bc => bc }
val impsResult = result collect { case imp: Imp => imp }
If you want to use sortBy or another operations, you can use inheritance of trait or main class.