Scala batch processing triggered by size or time [closed] - scala

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'd like to batch up some events for processing, and have the triggering of the processing for a batch be based on the number of events reaching a certain threshold OR a time interval expiring (whichever happens first). What should I consider? Futures? Akka? Some more special-purpose library that might exist?

Two options come to mind:
Using Akka
Using Quartz
This depends on your specific architecture, but you can use any form of scheduling. You can use an Akka scheduler to schedule a task to run regularly and use your own internal queue to trigger the batch job if it's full. You can also do very similar things with Quartz but you might have to write more boilerplate code with the benefit of having more flexibility.
If you don't wish to bring in a fairly heavyweight library, I suppose you could implement something yourself, but you would be reinventing the wheel.

Related

Optimal ways to rate limit a spark stream [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 days ago.
Improve this question
I have a spark stream whose source is a blob storage, that transforms and enriches data in a batch process written in Scala 2.12. The enrich step calls an external service to get some additional information thats added before the data is dropped in a sink.
The webservice is rate limited, so I'm looking for ways to control the request rate to the service. These are the ways I have thought of so far:
Use partitioning from spark libraries to process data in smaller chunks
Make use of Akka streams, load the spark stream onto an akka stream and throttle the requests. The disadvantage of this approach is that I'll end up loading a lot of data into memory. I can still mitigate this by making multiple smaller blob files that are processed one after another.
Look for an HTTP client library that takes care of throttling and retrying for me.
Implement some kind of a circuit breaker that stops when it encounters 429 and resumes later.
What's the best way to solve this?

Queries about ReactiveX programming with swift [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
A completed project in swift 3.0, I want to update it with ReactiveX frameworks for swift i.e rxSwift, rxCocoa.
My point is as i'm learning reactive it is so different and new for me.
But before doing this I have some question in my mind
Is it worth working, spending time on ReactiveX?
Does it increase the performance of the application?
What do you personally think about the future of ReactiveX?
There are certain topics of contention in the rx-world. I will give u that.
But if ur previous project version did not have rx (in any language), then changes are its bulky.
Imagine this:-
Without Rx:- (we need to pull data)
- u query a data structure/function/service
- a value is returned
With Rx:- (data is already pushed down to us, we do not need to req. separately, but just subscribe)
- values are always available on subscription
Rx changes the way u look at file systems/events, etc.
They are all viewed as data-streams which can be emitted using an Observable.
An observer can then request it on subscription.
So, it is the future and yes the code is reduced severely and much much readable.
learning curve is steep, but eventually you end up writing much less code (like you can forget delegates altogether)
for existing projects, it will be a lot of hassle especially if not the whole team is on the same level rx-wise
performance-wise no noticeable difference
(IMHO)

What are the benefits of using a tool like Chef vs. using a makefile/shell script for deployment? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have heard good things about Chef, was curious about all of the benefits before I devote time to learning a new tool. Not looking to turn this into opinion thread, looking for a list of additional features it has over makefile/shell script.
Chef, and Ansible/Puppet/Salt too (collectively called CAPS), are all based on the structure of "describe the desired state of the system and the tool will make it happen".
A script or Makefile is generally a procedural system, run this, then run that, etc. That means you need to keep a mental model of system from each step to the next, and if that ever deviates from the real system (ex, a directory you are trying to set the owner of doesn't exist) your script usually breaks.
With some stuff this is easy, like yum/apt-get install as they are internally idempotent, you can run them every time and if the package is already installed, it just does nothing.
CAPS systems take that principle (idempotence) and apply it to all management tasks. This has for the most part resulted in less brittle configuration management as you only need to tell the tool what the end result should look like and it will take care of figuring out the delta from the current state.

How to implement or use a WebSocket in perl? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to have 3 major things from perl and don't know how to go about it.
Non Blocking websocket implementation like mojo.
The server needs to accept broadcast calls after it has started
The server needs to be able to access data that is on a different thread.
I have tried mojo but didn't find a way to control the port (I can live with that) and didn't figure out how to call events after the server has started. I wasn't able to test if it could handle events after the fact.
I have tried Net::WebSocket::Server but it is blocking. I am tempted to wrap my own code around it so that it can handle non blocking and shared data as it is by far the simplest implementation and easy to modify.
I have also tried pocket.io but it didn't have a very easy way to implement OO design and still remain thread safe. (Mostly because of the Plack framework).
Does anyone have a good example of how to do this with Mojolicious or pocket.io? If not I will just have to implement my own implementation.

Akka message throttling [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am trying to implement following scenario with Akka but hitting heap limitations (out of memory errors):
User uploads a text file(25mb aprox) containing around 1000000 lines.
After file gets uploaded HTTP 200 OK is sent back to the client and file processing is starting in the background.
Each line should be processed (saved to the database and external web service call should be made to look up the contents of the line with database update if lookup returned some results.)
Please suggest the approach/pattern.
Many thanks in advance!
There are several ways for achieve this, for example:
1) Use bounded mailbox for some of your actors, then your code that send messages to such actors will block if the target mailbox is full;
2) Use work pulling model when some of your actors will "ask" for more work when idle.