In Slick 3.0, why is the newly-introduced `Streaming` useful? - scala

I found Slick 3.0 introduced a new feature called streaming
http://slick.typesafe.com/doc/3.0.0-RC1/database.html#streaming
I'm not familiar with Akka. streaming seems a lazy or async value, but it is not very clear for me to understand why it is useful, and when will it be useful..
Does anyone have ideas about this?

So lets imagine the following use case:
A "slow" client wants to get a large dataset from the server. The client sends a request to the server which loads all the data from the database, stores it in memory and then passes it down to the client.
And here we're faced with problems: The client handles the data not so fast as we wanted => we can't release the memory => this may result in an out of memory error.
Reactive streams solve this problem by using backpressure. We can wrap Slick's publisher around the Akka source and then "feed" it to the client via Akka HTTP.
The thing is that this backpressure is propagated through TCP via Akka HTTP down to the publisher that represents the database query.
That means that we only read from the database as fast as the client can consume the data.
P.S This just a little aspect where reactive streams can be applied.
You can find more information here:
http://www.reactive-streams.org/
https://youtu.be/yyz7Keg1w9E
https://youtu.be/9S-4jMM1gqE

Related

Bulk insertion of data to elasticsearch via logstash with scala

I need to insert large bulk data to elasticsearch regulary via scala code. When googling, I found to use logstash for large insertion rate but logstash doesn't have any java libraries or Api to call so I tried to connect to it via http client. I don't know it is a good approach to send large data with http protocol or better to use other approaches for example using broker, queues, redis, etc.
I know last versions of logstash(6.X,7.x) enables uses of persistent queue so it can be another solution to use logstash queue but again through http or tcp protocol.
Also note that reliability is the first priority for me since data must not be lost and there should be a mechanism to return response in code so to handle success or failure.
I would appreciate any ideas.
Update
It seems using http is robust and has acknowledgement mechanism based on here but if taking this approach, what http client libs in scala is more appropriate as I need to send bulk data in sequence of key value format and handle response in none-blocking way?
It may sound overkill but introducing a buffering layer between scala code and logstash may prove helpful as you can get rid of heavy HTTP calls and rely on lightweight protocol transport.
Consider adding Kafka between your scala code and logstash for queuing of messages. Logstash can reliably process the messages from Kafka using TCP transport and bulk insert into ElasticSearch. On the other hand, you can put messages into Kafka from your scala code in build (batches) to make the whole pipeline work efficiently.
With that being said, if you don't have a volume in let say 10,000 msgs/sec then you can also consider fiddling around with logstash HTTP input plugin by tweaking threads and using multiple logstash processes. This is to reduce the complexity of adding another moving piece (Kafka) into your architecture.

Designing a REST service with akka-http and akka-stream

I'm new to akka http & streams and would like to figure out what the most idiomatic implementation is for a REST api. Let's say I need to implement a single endpoint that:
accepts a path parameter and several query string parameters
validates the params
constructs a db query based on the params
executes the query
sends response back to client
From my research I understand this is a good use case for akka http and modeling the flow of the request -> response seems to map well to using akka streams with several Flows but I'd like some clarification:
Does using the akka streams library make sense here?
Is that still true if the database driver making the call does not have an async api?
Do the stream back pressure semantics still hold true if there is a Flow making a blocking call?
How is parallelism handled with the akka streams implementation? Say for example the service experiences 500 concurrent connections, does the streams abstraction simply have a pool of actors under the covers to handle each of these connections?
Edit: This answers most of the questions I had: http://doc.akka.io/docs/akka-http/current/scala/http/handling-blocking-operations-in-akka-http-routes.html

Providing a reactive api to a database

How do I provide a reactivestreams api for a db that does not support streaming? Like lets say for example dynamodb. When doing a get call, dynamodb is going to return all the results. So even if I wrap the get call in a Source, How do I handle backpressure from the downstream stages? Also how do I implement write calls into db? What will my sink look like? Any pointers on this will be helpful.
One option is to implement your database Source using an ActorPublisher -
See: http://doc.akka.io/docs/akka/2.4.11/scala/stream/stream-integrations.html#ActorPublisher
Just mixing in this trait and implementing the command interface will give you a reactive streams compliant data publisher that can handle down stream backpressure. Your publisher will get a Request message if subscribers down stream pull more data and it will have access to the current perceived demand if it needs to actively push more data down stream. You can then plug this publisher into your Akka Streams pipeline by creating a Source from it:
Source.actorPublisher[Data](MyPublisher.props).runWith(MySink)
To deal with the fact that the underlying DB is itself NOT reactive, you would need to implement some buffering and polling logic within the ActorPublisher.

Scala Replacement for Akka Transactors

In version 2.3 Akka dropped support for transactors.
A quote from Akka's 2.0 documentation about transactors:
When you really need composable message flows across many actors updating their internal local state but need them to do that atomically in one big transaction. Might not be often but when you do need this then you are screwed without it.
What is the best way to manage STM transactions across multiple actors in the absence of transactors?
Is it a bad idea using STM in the first place?

akka persistence or not

Current application uses Akka eventstream and its publish/subscribe for a use case which imports a lot of data and upon receiving data for each row it publishes and event and there is an subscriber to it. this design is running into risk of losing events if something goes wrong with either publisher/subscriber as such.
I am wondering if using Akka persistence makes sense here, for a few reasons
1)Persist events
2)Audit history
3)Recreate scenario with snapshot
note there isn't a shared/global state (generally described as a use case in almost all Akka persistence blogs/examples) in the system.
Does Akka persistence make sense here?
If I understand your scenario correctly, I'd say no for 1), yes for 2), no for 3):
1) If the message is lost due to a problem with the pub/sub mediator (which you don't really control), it will never reach your persistent actors and therefore will never be saved in the event stream, thus never replayed.
2) Recorded message will be lookable upon during audit.
3) If your actors are stateless processors, what scenario are you going to recreate/save in the snapshot?
I'd suggest you can work around 1 by using a confirmation/retry mechanism in which you resend the message at regular intervals until you receive an ack from the consumer.