I've been recently working on reactive programming to do a server sent events app and I have a use case I don't know if it could be done using webflux spring boot.
I have to do a stream of values per request so i've created my endpoint like this :
#GetMapping(value = "/subscribe", produces = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Flux<Long> Invoke() {
return Flux.fromStream(Stream.generate(()-> 1)).interval(Duration.ofSeconds(2));
}
The problem now is that the endpoint "/subscribe" create each time it's called a Thread and I am not able to handle the large set of requests.
Note: The stream depends on the sender (Different stream per user) , so implementing a hard stream could not be the right solution as I know.
Related
Let's say I have a Spring Batch remote partitioned job, i.e. I have a manager application instance which starts the job and partitions the work and I have multiple workers who are executing individual partitions.
The message channel where the partitions are sent to the workers is an ActiveMQ queue and the Spring Integration configuration is based on JMS.
Assume that I wanna make sure that in case of a worker crashing in the middle of the partition execution, I want to make sure that another worker will pick up the same partition.
I think here's where acknowledging JMS messages would come in handy to only acknowledge a message in case a worker has fully completed its work on a particular partition but it seems as soon as the message is received by a worker, the message is acknowledged right away and in case of failures in the worker Spring Batch steps, the message won't reappear (obviously).
Is this even possible with Spring Batch? I've tried transacted sessions too but it doesn't really work either.
I know how to achieve this with JMS API. The difficulty comes from the fact that there is a lot of abstraction with Spring Batch in terms of messaging, and I'm unable to figure it out.
I know how to achieve this with JMS API. The difficulty comes from the fact that there is a lot of abstraction with Spring Batch in terms of messaging, and I'm unable to figure it out.
In this case, I think the best way to answer this question is to remove all these abstractions coming from Spring Batch (as well as Spring Integration), and try to see where the acknowledgment can be configured.
In a remote partitioning setup, workers are listeners on a queue in which messages coming from the manager are of type StepExecutionRequest. The most basic code form of a worker in this setup is something like the following (simplified version of StepExecutionRequestHandler, which is configured as a Spring Integration service activator when using the RemotePartitioningWorkerStepBuilder):
#Component
public class BatchWorkerStep {
#Autowired
private JobRepository jobRepository;
#Autowired
private StepLocator stepLocator;
#JmsListener(destination = "requests")
public void receiveMessage(final Message<StepExecutionRequest> message) throws JMSException {
StepExecutionRequest request = message.getObject();
Long jobExecutionId = request.getJobExecutionId();
Long stepExecutionId = request.getStepExecutionId();
String stepName = request.getStepName();
StepExecution stepExecution = jobRepository.getStepExecution(jobExecutionId, stepExecutionId);
Step step = stepLocator.getStep(stepName);
try {
step.execute(stepExecution);
stepExecution.setStatus(BatchStatus.COMPLETED);
} catch (Throwable e) {
stepExecution.addFailureException(e);
stepExecution.setStatus(BatchStatus.FAILED);
} finally {
jobRepository.update(stepExecution); // this is needed in a setup where the manager polls the job repository
}
}
}
As you can see, the JMS message acknowledgment cannot be configured on the worker side (there is no way to do it with attributes of JmsListener, so it has to be done somewhere else. And this is actually at the message listener container level with DefaultJmsListenerContainerFactory#setSessionAcknowledgeMode.
Now if you are using Spring Integration to configure the messaging middleware, you can configure the acknowledgment mode in Spring Integration .
I am converting a monolithic application which was handling CRUD events into multiple microservices. Each microservice being designed as kafka stream with specific function. Idea was to allow scaling each microservice individually.
The challenge is, there is a difference in the processing flow of Create and update events where Create event has to go through multiple functional microservices( filtering, enriching) in the pipeline where as update event need not go through these processors. But in order to process the create and update event in order, update events also have to flow through all the microservices( read/write from kafka topics)if not update event can get published even before create event is processed.
event publisher(create/update event) =>(kafka topic)=> filter =>(kafka topic)=> enrichment=>..=>...=>(output topic)
Is this solution ok to have dummy pipe/service for update event flow to achieve ordering?
Another approach is, instead of designing each microservice as kafka stream, i create each microservice with a synchronous rest interface and call it from a single kafka stream application which can then manage the order of processing create and update events.
event publisher(create/update event) =>(kafka topic)=> event processor=>(output topic)
where event processor invoked each functional microservice with rest interface.
In this case, although i still achieve separate microservice for each function which can scale individually, invoking the rest calls inside an async streaming application might impact its scalability and other potential problems.
Any comments/suggestion on both the approaches and any alternative approach recommendation would help.
Is there a way to do a typical batch processing with Vert.x - like providing a file or DB query as input and let each record be processed by a vertice in non-blocking way.
In examples of Vertice, a server is defined in startup. And even though multiple vertices are deployed, server is created only onece. Which means that Vert.x engine does have a build in concept of a server and knows how to send incomming requests to each vertice for processing.
Same happens with Event Bus as well.
But is there a way to define a vertice with a handler for processing data from a general stream - query, file, etc..
I am particularly interested in spreading data processing over cluster nodes.
One way I can think of, is execute a query a regular way and then publish data to event bus for processing. But that means that if I have to process few millions of records, I will run out of memory. Of course I could do paging, etc.. - but there is no coordination between retrieving and processing of data.
Thanks
Andrius
If you are using the JDBC Client, you can stream the query result:
(using vertx-rx-java2)
JDBCClient client = ...;
JsonObject params = new JsonArray().add(dataCategory);
client.rxQueryStreamWithParams("SELECT * FROM data WHERE data.category = ?", params)
.flatMapObservable(SQLRowStream::toObservable)
.subscribe(
(JsonArray row) -> vertx.eventBus().send("data.process", row)
);
This way each row is send to the event bus. If you then have multiple verticle instances that each listen to this address, you spread the data processing to multiple threads.
If you are using another SQL Client have a look at its documentation - Maybe is has a similar method.
I have two applications as stated below:
Spring boot application - Acts as rest end point, publishes the request to message queue. ( Apache Pulsar )
Heron (Storm) topology - which processes the message received from Message queue ( PULSAR ) and has all logic for processing.
My requirement, i need to serve different user queries through Spring boot application, which emits that query to message queue, and is consumed at spout. Once spout and bolts process the requests, a message is published again from bolt. That response from Bolt is handled at Spring boot (consumer) and replies to the user request. Typlically as shown below:
To serve to the same request, Im right now caching the deferred result object ( I set a reqID to each message which is sent to topology and I also maintain a key, value pair for ) in memory and when the message arrives I parse the request id and set the result to the defferedResult (I know this is a bad design, HOW SHOULD ONE SOLVE THIS ISSUE ?).
How can I proceed to serve the response back to the same request in this scenario where the order of messages received from topology is not sequential ( as each request which is processes takes its own time and producer bolt will fire the response as on when it is receives one ).
Im kind of stuck with this design and not able to proceed further.
//Controller
public DeferredResult<ResponseEntity<?>> process(//someinput) {
DeferredResult<ResponseEntity<?>> result = new DeferredResult<>(config.getTimeout());
CompletableFuture<String> serviceResponse = service.processAsync(inputSource);
serviceResponse.whenComplete((response, exception) -> {
if (!ObjectUtils.isEmpty(exception))
result.setErrorResult(//error);
else
result.setResult(//complete);
});
return result;
}
//In Service
public CompletableFuture processAsync(//input){
producer.send(input);
CompletableFuture result = new CompletableFuture();
//consumer has a listener as shown below
// **I want to avoid below line, how can I redesign this**
map.put(id, result);
return result;
}
//in same service, a listener is present for consumer for reading the messages
consumerListener(Message msg){
int reqID = msg.getRequestID();
map.get(reqID).complete(msg.getData);
}
As shown above as soon as I get a message I get the completableFuture
object and set the result, which interally calls the defferred result
object and returns the response to the user.
How can I proceed to serve the response back to the same request in this scenario where the order of messages received from topology is not sequential ( as each request which is processes takes its own time and producer bolt will fire the response as on when it is receives one ).
It sounds like you are looking for the Correlation Identifier messaging pattern. In broad strokes, you compute/create an identifier that gets attached to the message sent to pulsar, and arrange that Heron copies that identifier from the request it receives to the response it sends.
Thus, when your Spring Boot component is consuming messages from pulsar at step 5, you match the correlation id to the correct http request, and return the result.
Using the original requestId() as your correlation identifier should be fine, as far as I can tell.
To serve to the same request, Im right now caching the deferred result object ( I set a reqID to each message which is sent to topology and I also maintain a key, value pair for ) in memory and when the message arrives I parse the request id and set the result to the defferedResult (I know this is a bad design, HOW SHOULD ONE SOLVE THIS ISSUE ?).
Ultimately, you are likely to be doing that at some level; which is to say that the consumer at step 5 is going to be using the correlation id to look up something that was stored by the producer. Trying to pass the original request across four different process boundaries is likely to end in tears.
The more general form is to store a callback, rather than a CompletableFuture, in the map; but in this case the callback probably just completes the future.
The one thing I would want to check carefully in the design: you want to be sure that the consumer at step 5 sees the future it is supposed to use before the message arrives. In other words, there should be a happens-before memory barrier somewhere to ensure that the map lookup at step 5 doesn't fail.
I'm new to akka http & streams and would like to figure out what the most idiomatic implementation is for a REST api. Let's say I need to implement a single endpoint that:
accepts a path parameter and several query string parameters
validates the params
constructs a db query based on the params
executes the query
sends response back to client
From my research I understand this is a good use case for akka http and modeling the flow of the request -> response seems to map well to using akka streams with several Flows but I'd like some clarification:
Does using the akka streams library make sense here?
Is that still true if the database driver making the call does not have an async api?
Do the stream back pressure semantics still hold true if there is a Flow making a blocking call?
How is parallelism handled with the akka streams implementation? Say for example the service experiences 500 concurrent connections, does the streams abstraction simply have a pool of actors under the covers to handle each of these connections?
Edit: This answers most of the questions I had: http://doc.akka.io/docs/akka-http/current/scala/http/handling-blocking-operations-in-akka-http-routes.html