Seq of observable to observable - scala

I use mongo-scala-driver 1.0.1 that return Observable[Completed] after insertMany.
I put many documents in cycle and I need to do some action after all of it already inserted.
I can use observable.toFuture to get Seq[Future[Completed]] and then Future.sequence(...) to handle future.
But is it possible to transform Seq[Observable[Completed]] to Observable[...] in Scala? Or there is a better way to handle it?
mongo-scala-driver has it's own Observable trait org.mongodb.scala.Observable

Here is an example of how to flatten List of Observables into single Observable.
List(
Observable.interval(200 millis),
Observable.interval(400 millis),
Observable.interval(800 millis)
).toObservable.flatten.take(12).toBlocking.foreach(println(_))
You can find this and many more examples here: https://github.com/ReactiveX/RxScala/blob/0.x/examples/src/test/scala/examples/RxScalaDemo.scala
Edit
Here is what should work with mongo api, although I didn't test it.
val observables: Seq[Observable[Int]] = ???
val result: Observable[Int] = Observable(observables).flatMap(identity)

Related

Update concurrent map inside a stream map on flink

I have one stream that constantly streaming the latest values of some keys.
Stream A:DataStream[(String,Double)]
I have another stream that wants to get the latest value on each process call.
My approach was to introduce a concurrentHashMap which will be updated by stream A and read by the second stream.
val rates = new concurrentHasMap[String,Double].asScala
val streamA : DataStream[(String,Double)]= ???
streamA.map(keyWithValue => rates(keyWithValue._1)= keyWithValue._2) //rates never gets updated
rates("testKey")=2 //this works
val streamB: DataStream[String] = ???
streamB.map(str=> rates(str) // rates does not contain the values of the streamA at this point
//some other functionality
)
Is it possible to update a concurrent map from a stream? Any other solution on sharing data from a stream with another is also acceptable
The behaviour You are trying to use will not work in a distributed manner, basically if You will have parellelism > 1 it will not work. In Your code rates are actually updated, but in different instance of parallel operator.
Actually, what You would like to do in this case is use a BroadcastState which was designed to solve exactly the issue You are facing.
In Your specific usecase it would look like something like this:
val streamA : DataStream[(String,Double)]= ???
val streamABroadcasted = streamA.broadcast(<Your Map State Definition>)
val streamB: DataStream[String] = ???
streamB.connect(streamABroadcasted)
Then You could easily use BroadcastProcessFunction to implement Your logic. More on the Broadcast state pattern can be found here

How to implement a recursive Fibonacci sequence in Scala using FS2?

While trying to become familiar with FS2, I came across a nifty recursive implementation using the Scala collections' Stream, and thought I'd have a go at trying it in FS2:
import fs2.{Pure, Stream}
val fibs: Stream[Pure, Int] = Stream[Pure, Int](0) ++ fibs.fold[Int](1)(_ + _)
println(fibs take 10 toList) // This will hang
What is the reason this hangs in FS2, and what is the best way to get a similar, working solution?
Your issue is that Stream.fold consumes all elements of the stream, producing a single final value from the fold. Note that it only emits one element.
The recursive stream only terminates when 10 elements have been emitted (this is specified by take 10). Since this stream is not productive enough, fold continues to add values without stopping.
The simplest way to fix this is to use a combinator that emits the partial results from the fold; this is scan.
Also, FS2 can infer most of the types in this code, so you don't necessarily need as many type annotations.
The following implementation should work fine:
import fs2.{Pure, Stream}
val fibs: Stream[Pure, Int] = Stream(0) ++ fibs.scan(1)(_ + _)
println(fibs take 10 toList)

Scala mongodb : result of query as list

I successfully inserted data into a mongodb database, but I don't know how to extract data out of a query. I use the default scala mongodb drive :
"org.mongodb.scala" %% "mongo-scala-driver" % "1.1.1"
The documentation seems to contains errors, by the way. This line rises a compilation error while this is copy pasted from the doc :
collection.find().first().printHeadResult()
This is how I query a collection:
collection.find()
How to convert it to a scala collection of object on which I can iterate and process ? Thanks
Yes, I agree with the compilation error. I think "collection.find().first().printHeadResult()" is not part of scala driver 1.1.1 release. The current scala driver github which uses this code is "1.2.0-SNAPSHOT" version.
You can get the results using the below code. However, you may experience some async behavior using the below code. Please refer the driver documentation.
val observable: FindObservable[Document] = collection.find();
observable.subscribe ( new Observer[Document] {
override def onNext(result: Document): Unit = println(result.toJson())
override def onError(e: Throwable): Unit = println("Failed" + e.getMessage)
override def onComplete(): Unit = println("Completed")
})
Mongo driver Observables link
This is answered from the best of my current knowledge. I spent a lot of time using casbah and I've recently switched to using the new async scala driver, so there may be more ergonomic ways to do some of this stuff that I don't know yet.
Basically you need to transform the result of the observable and then eventually, you'll probably want to turn it into something that isn't an observable so you can have synchronous code interact with it (maybe, depending on what you're doing).
In the current Mongo Scala API (2.7.0 as of writing this), you might process a list of documents like this:
coll.find(Document("head" -> 1)).map(dbo => dbo.getInteger("head"))
That takes the list of documents where head is equal to one and then applies the map function to convert it from the Document (dbo) into an Int by extracting the "head" element (note this will fall in ugly fashion if there isn't a head field or the field is not an int. There are more robust ways to get values out using get[T]).
You can find a full list of the operations that an Observable supports here:
https://mongodb.github.io/mongo-scala-driver/2.7/reference/observables/
under the list of Monadic operators.
The other part is how do you get the good stuff out the Observable because you want to do something synchronous with them. The best answer I have found so far is to dump the Observable into a Future and then calling Await.result on that.
val e = coll.find(Document("head" -> 1)).map(dbo => dbo.getInteger("head"))
val r = Await.result(e.toFuture(), Duration.Inf)
println(r)
That will print out the List[Int] that was created by evaluating the map function for each Document in the Observable.

Round-robin combining observables

I'm new to RxJava, and I've been trying to combine multiple observables in a round-robin way.
So, imagine you have three observables:
o1: --0---1------2--
o2: -4--56----------
o3: -------8---9----
Combining those in a round-robin way would give you something like:
r : --04---815-9-26-
What would be the best way to approach this?
Since it looks like RxJava, RxScala etc. pretty much share API, answer in any language should be fine. :)
Thanks,
Matija
RxJava doesn't have such operator by default. The closest thing is using merge with well paced sources because it uses round-robin to collect values, but this property can't be relied upon. Why do you need this round-robin behavior?
The best bet is to implement this behavior manually. Here is an example without backpressure support.
There is an approach that is very simple to implement and does almost exactly what you want - just zip the three source observables, and than emit the three values from the zipped observable each time a new triplet arrives.
Translated to RxScala
val o1 = Observable.just(1, 2, 3)
val o2 = Observable.just(10, 20, 30)
val o3 = Observable.just(100, 200, 300)
val roundRobinSource = Observable
.zip(Observable.just(o1, o2, o3))
.flatMap(Observable.from[Int])
roundRobinSource.subscribe(println, println)
gives you
1
10
100
2
20
200
3
30
300
Which is precisely what you want.
The problem with this approach is that it will block until a value from each of the three sources arrives, but if your cool with that, I think this is by far the simplest solution. I'm curious, what is your use case?
Update, Take #2
This is actually a fun question. Here is another take, that will trade one drawback for another.
import rx.lang.scala.{Subject, Observable}
val s1 = Subject[Int]()
val s2 = Subject[Int]()
val s3 = Subject[Int]()
val roundRobinSource3 = s1.publish(po1 ⇒ s2.publish(po2 ⇒ s3.publish(po3 ⇒ {
def oneRound: Observable[Int] = po1.take(1) ++ po2.take(1) ++ po3.take(1)
def all: Observable[Int] = oneRound ++ Observable.defer(all)
all
})))
roundRobinSource3.subscribe(println, println, () ⇒ println("Completed"))
println("s1.onNext(1)")
s1.onNext(1)
println("s2.onNext(10)")
s2.onNext(10)
println("s3.onNext(100)")
s3.onNext(100)
println("s2.onNext(20)")
s2.onNext(20)
println("s1.onNext(2)")
s1.onNext(2)
println("s3.onNext(200)")
s3.onNext(200)
println("s1.onCompleted()")
s1.onCompleted()
println("s2.onCompleted()")
s2.onCompleted()
println("s3.onCompleted()")
s3.onCompleted()
println("Done...")
Gives you
s1.onNext(1)
1
s2.onNext(10)
10
s3.onNext(100)
100
s2.onNext(20)
s1.onNext(2)
2
20
s3.onNext(200)
200
s1.onCompleted()
s2.onCompleted()
s3.onCompleted()
Done...
It doesn't block, it round robins, but... it also doesn't complete :( You could make it complete in a stateful manner using a takeUntil, Subject and doOnComplete if you need it, though..
As for the mechanism, it uses the to me somehow mysterious behavior of publish, that keeps track of things already emitted. I have been originally pointed to it by #lopar when he answered my own questiong Implementing a turnstile-like operator with RxJava.
The behavior of publish is actually such a mystery to me, that I have posted a question about it here: https://github.com/ReactiveX/RxJava/issues/2775. If you are curious, you can follow it.

Is there an implementation of rapid concurrent syntactical sugar in scala? eg. map-reduce

Passing messages around with actors is great. But I would like to have even easier code.
Examples (Pseudo-code)
val splicedList:List[List[Int]]=biglist.partition(100)
val sum:Int=ActorPool.numberOfActors(5).getAllResults(splicedList,foldLeft(_+_))
where spliceIntoParts turns one big list into 100 small lists
the numberofactors part, creates a pool which uses 5 actors and receives new jobs after a job is finished
and getallresults uses a method on a list. all this done with messages passing in the background. where maybe getFirstResult, calculates the first result, and stops all other threads (like cracking a password)
With Scala Parallel collections that will be included in 2.8.1 you will be able to do things like this:
val spliced = myList.par // obtain a parallel version of your collection (all operations are parallel)
spliced.map(process _) // maps each entry into a corresponding entry using `process`
spliced.find(check _) // searches the collection until it finds an element for which
// `check` returns true, at which point the search stops, and the element is returned
and the code will automatically be done in parallel. Other methods found in the regular collections library are being parallelized as well.
Currently, 2.8.RC2 is very close (this or next week), and 2.8 final will come in a few weeks after, I guess. You will be able to try parallel collections if you use 2.8.1 nightlies.
You can use Scalaz's concurrency features to achieve what you want.
import scalaz._
import Scalaz._
import concurrent.strategy.Executor
import java.util.concurrent.Executors
implicit val s = Executor.strategy[Unit](Executors.newFixedThreadPool(5))
val splicedList = biglist.grouped(100).toList
val sum = splicedList.parMap(_.sum).map(_.sum).get
It would be pretty easy to make this prettier (i.e. write a function mapReduce that does the splitting and folding all in one). Also, parMap over a List is unnecessarily strict. You will want to start folding before the whole list is ready. More like:
val splicedList = biglist.grouped(100).toList
val sum = splicedList.map(promise(_.sum)).toStream.traverse(_.sum).get
You can do this with less overhead than creating actors by using futures:
import scala.actors.Futures._
val nums = (1 to 1000).grouped(100).toList
val parts = nums.map(n => future { n.reduceLeft(_ + _) })
val whole = (0 /: parts)(_ + _())
You have to handle decomposing the problem and writing the "future" block and recomposing it in to a final answer, but it does make executing a bunch of small code blocks in parallel easy to do.
(Note that the _() in the fold left is the apply function of the future, which means, "Give me the answer you were computing in parallel!", and it blocks until the answer is available.)
A parallel collections library would automatically decompose the problem and recompose the answer for you (as with pmap in Clojure); that's not part of the main API yet.
I'm not waiting for Scala 2.8.1 or 2.9, it would rather be better to write my own library or use another, so I did more googling and found this: akka
http://doc.akkasource.org/actors
which has an object futures with methods
awaitAll(futures: List[Future]): Unit
awaitOne(futures: List[Future]): Future
but http://scalablesolutions.se/akka/api/akka-core-0.8.1/
has no documentation at all. That's bad.
But the good part is that akka's actors are leaner than scala's native ones
With all of these libraries (including scalaz) around, it would be really great if scala itself could eventually merge them officially
At Scala Days 2010, there was a very interesting talk by Aleksandar Prokopec (who is working on Scala at EPFL) about Parallel Collections. This will probably be in 2.8.1, but you may have to wait a little longer. I'll lsee if I can get the presentation itself. to link here.
The idea is to have a collections framework which parallelizes the processing of the collections by doing exactly as you suggest, but transparently to the user. All you theoretically have to do is change the import from scala.collections to scala.parallel.collections. You obviously still have to do the work to see if what you're doing can actually be parallelized.