Sinatra/Sidekiq logger and thread model - sinatra

I want to implement logger that attaches various metadata based on the context.
global metadata like build version
request-level metadata like path, sessionID
job-level metadata like job name
When model tries to log data, I want model to just call the global logger instance, and let logger look up the current context (eg if model is called from user request, automatically attach the sessionID to log entry).
My question is, how can I implement such lookup logic. Is it safe to have mapping from the threadID->metadata? Does concurrency model of sinatra and sidekiq grantees that each thread handles one request/job only and don't handle any other thing in between?
Let's say we have requestA and requestB. Thread 1 starts to process requestA and in the middle it makes outbound HTTP request and therefore free (waiting for the response). Does sinatra start to process requestB with thread1 in such case? (if so we cannot have threadID->metadata lookup)

Related

Kogito - wait until data from multiple endpoints is received

I am using Kogito with Quarkus. I have set on drl rule and am using a bpmn configuration. As can be seen below, currently one endpoint is exposed, that starts the process. All needed data is received from the initial request, it is then evaluated and process goes on.
I would like to extend the workflow to have two separate endpoints. One to provide the age of the person and another to provide the name. The process must wait until all needed data is gathered before it proceeds with evaluation.
Has anybody come across a similar solution?
Technically you could use a signal or message to add more data into a process instance before you execute the rules over the entire data, see https://docs.kogito.kie.org/latest/html_single/#ref-bpmn-intermediate-events_kogito-developing-process-services.
In order to do that you need to have some sort of correlation between these events, otherwise, how do you map that event name 1 should be matched to event age 1. If you can keep the process instance id, then the second event can either trigger a rest endpoint to the specific process instance or send it a message via a message broker.
You also have your own custom logic to aggregate the events and only fire a new process instance once your criteria of complete data is met, and there is also plans in Kogito to extend the capabilities of how correlation is done, allowing for instance to use variables of the process as the identifier. For example, if you have person.id as correlation and event to name and age of the same id would signal the same process instance. HOpe this info helps.

How application server handle multiple requests to save data into table

I have created a web application in jsf and it has a button.
If the button is clicked then it will go to the server side and execute the below function to save the data in a table and I am using mybatis for this.
public void save(A a)
{
SqlSession session = null;
try{
session = SqlConnection.getInstance().openSession();
TestMapper testmap= session.getMapper(TestMapper.class);
testmap.insert(a);
session .commit();
}
catch(Exception e){
}
finally{
session.close();
}
}
Now i have deployed this application in an application server JBoss(wildfly).
As per my understanding, when multiple users try to access the application
by hitting the URL, the application server creates thread for each of the user request.
For example if 4 clients make request then 4 threads will be generated that is t1,t2,t3 and t4.
If all the 4 users hit the save button at the same time, how save method will be executed, like if t1 access the method and execute insert statement
to insert data into table, then t2,t3 and t4 or simultaneously all the 4 threads will execute the insert method and insert data?
To bring some context I would describe first two possible approaches to handling requests. In this case HTTP but these approaches do not depend on the protocol used and the main important thing is that requests come from the network and for their execution some IO is needed (either access to filesystem or database or network calls to other systems). Note that the following description has some simplifications.
These two approaches are:
synchronous
asynchronous
In general to process the typical HTTP request that involves DB access at least four IO operations are needed:
request handler needs to read the request data from the client socket
request handler needs to write request to the socket connected to the DB
request handler needs to read response from the DB socket
request handler needs to write the response to the client socket
Let's see how this is done for both cases.
Synchronous
In this approach the server has a pool (think a collection) of threads that are ready to serve a request.
When the request comes in the server borrows a thread from the pool and executes a request handler in that thread.
When the request handler needs to do the IO operation it initiates the IO operation and then waits for its completion. By wait I mean that thread execution is blocked until the IO operation completes and the data (for example response with the results of the SQL query) is available.
In this case concurrency that is requests processing for multiple clients simultaneously is achieved by having some number of threads in the pool. IO operations are much slower if compared to CPU so most of the time the thread processing some request is blocked on IO operation and CPU cores can execute stages of the request processing for other clients.
Note that because of the slowness of the IO operations thread pool used for handling HTTP requests is usually large enough. Documentation for sync requests processing subsystem used in wildfly says about 10 threads per CPU core as a reasonable value.
Asynchronous
In this case the IO is handled differently. There is a small number of threads handling IO. They all work the same way and I'll describe one of them.
Such thread runs a loop which basically waits for events and every time an event happen it calls a handler for an event.
The first such event is new request. When a request processing is started the request handler is invoked from the loop that is run by one of the IO threads. The first thing the request handler is doing it tries to read request from the client socket. So the handler initiates the IO operation on the client socket and returns control to the caller. That means that the thread is released and it can process another event.
Another event happens when the IO operations that reads from client socket got some data available. In this case the loop invokes the handler at the point where the handler returned the control to the loop after the IO initiate namely it is resumed on the next step that processes the input data (like parses HTTP parameters) and initiates new IO operation (in this case request to the DB socket). And again the handler releases the thread so it can handler other events (like completion of IO operations that are part of other clients' requests processing).
Given that IO operations are slow compared to the speed of CPU itself one thread handling IO can process a lot of requests concurrently.
Note: that it is important that the requests handler code never uses any blocking operation (like blocking IO) because that would steal the IO thread and will not allow other requests to proceed.
JSF and Mybatis
In case of JSF and mybatis the synchronous approach is used. JSF uses a servlet to handle requests from the UI and servlets are handled by the synchronous processors in WildFly. JDBC which is used by mybatis to communicate to a DB is also using synchronous IO so threads are used to execute requests concurrently.
Congestions
All of the above is written with the assumption that there is no other sources of the congestion. By congestion here I mean a limitation on the ability of the certain component of the system to execute things in parallel.
For example imagine a situation that a database is configured to only allow one client connection at a time (this is not a reasonable configuration and I'm using this only to demonstrate the idea). In this case even if multiple threads can execute the code of the save method in parallel all but one will be blocked at the moment when they try to open the connection to the database.
Another similar example is if you are using sqlite database. It only allows one client to write to the DB at a time. So at the point when thread A tries to execute insert it will be blocked if the is another thread B that is already executing the insert. And only after the commit executed by the thread B the thread A would be able to proceed with the insert. The time A depends on the time it take for B to execute its request and the number of other threads waiting to do a write operation to the same DB.
In practice if you are using a RDBMS that scales better (like postgresql, mysql or oracle) you will not hit this problem when using the small number of connection. But it may become a problem when there is a big number of concurrent requests and there is a limitation in the DB on the number of client connections or the connection pool is used to limit the number of connections on the application side. In this case if there are already many connections to the database the new clients will wait until existing requests are finished and connections are closed.

async call back using scala and play2 or spray

I have a systems design challenge that I would like to get some community feedback on.
Basic system structure:
[Client] ---HTTP-POST--> [REST Service] ---> [Queue] ---> [Processors]
[Client] POSTs json to [REST Service] for processing.
Based on request, [Rest Services] sends data to various queues to be picked up by various processors written in various languages and running in different processes.
Work is parallelized in each processor but can still take up to 30 seconds to process. The time to process is a function of the complexity of the data and cannot be speed up.
The result cannot be streamed back to the client as it is completed because there is a final post processing step that can only be completed once all the sub steps are completed.
Key challenge: Once the post processing is complete, the client either needs to:
be sent the results after the client has been waiting
be notified async that the job is completed and passed an id to request the final result
Design requirements
I don't want to block the [REST Service]. It needs to take the incoming request, route the data to the appropriate queues for processing in other processes, and then be immediately available for the next incoming request.
Normally I would have used actors and/or futures/promises so the [REST Service] is not blocked when waiting for background workers to complete. The challenge here is the workers doing the background work are running in separate processes/VMs and written in various technology stacks. In order to pass these messages between heterogeneous systems and to ensure integrity of the request lifetime, a durable queue is being used (not in memory message passing or RPC).
Final point of consideration, in order to scale, there are a load balanced set of [REST Services] and [Processors] in respective pools. Therefore, since the messages from the [REST Service] to the [Processor] need to be sent asynchronously via a queue (and everything is running is separate processes), there is no way to correlate the work done in a background [Processor] back to its original calling [REST Service] instance in order to return the final processed data in a promise or actor message and finally pass the response back to the original client.
So, the question is, how to make this correlation? Once the all the background processing is completed, I need to get the result back to the client either via a long waited response or a notification (I do not want to use something like UrbanAirship as most of the clients are browsers or other services.
I hope this is clear, if not, please ask for clarification.
Edit: Possible solution - thoughts?
I think I pass a spray RequestContext to any actor which can then response back to the client (does not have to be the original actor that received HTTP request). If this is true, can I cache the RequestContext and then use it later to asynchronously send the response to the appropriate client using this cached RequestContext when the processing is completed?
Well, it's not the best because it requires more work from your Client, but it sounds like you want to implement a webhook. So,
[Client] --- POST--> [REST Service] ---> [Calculations] ---> POST [Client]
[Client] --- GET
For explanation:
Client sends a POST request to your service. Your Service then does whatever processing necessary. Upon completion, your service will then send an HTTP-POST to a URL that the Client has already set. With that POST data, the Client will then have the necessary information to then do a GET request for the completed data.

How to design spring batch to avoid long queue of request and restart failed job

I am writing a project that will be generating reports. It would read all requests from a database by making a rest call, based on the type of request it will make a rest call to an endpoint, after getting the response it will save the response in an object and save it back to the database by making a call to an endpoint.
I am using spring-batch to handle the batch work. So far what I came up with is a single job (reader, processor, writer) that will do the whole things. I am not sure if this is the correct design considering
I do not want to queue up requests if some request is taking a long time to get a response back. [not sure yet]
I do not want to hold up saving response until all the responses are received. [using commit-internal will help]
If the job crashes for some reason, how can I restart the job [maybe using batch-admin will help but what are my other options]
By using chunk oriented processing Reader, Processor and Writer get executed in order until Reader has nothing to return.
If you can read one item at a time, process it and send it back to the endpoint that handles the persistence this approach is handy.
If you must read ALL the information at once the reader will get a big collection with all items and pass it to processor. The processor will process all the items and send the result to the writer. You cannot send just a few to the writer so you would have to do the persistence directly from processor and that would be against the design.
So, as I understand this, you have two options:
Design a reader that can read one item at a time. Use the chunk oriented processing that you already started to read one item, process it and send it back for persistence. Have a look at how other readers are implemented (like JdbcCursorItemReader).
You create a tasklet that reads the whole collection of items process it and sends them back for processing. You can break this in different tasklets.
commit-interval only controls after how many items transaction is commited. So it will not help you as all the processing and persistence is done by calling rest services.
I have figured out a design and I think it will work fine.
As for the questions that I asked, following are the answers:
Using asynchronous processors will help avoiding any queue.
http://docs.spring.io/spring-batch/trunk/reference/html/springBatchIntegration.html#asynchronous-processors
using commit-internal will solve it
This thread has the answer - Spring batch :Restart a job and then start next job automatically

Akka Named Resource Serial Execution

I'm looking for suggestions on how to accomplish the following. My Akka application, which will be running as a cluster, will be persisting to a backend web service. Each resource I'm persisting is named. For example: A, B, C
There will be a queue of changes for each resource, and I'm looking for an idea on how I can have a configuration which allows me to control the following:
Maximum number of REST calls in progress at any point in time (overall concurrency)
Ensure that only one REST request for each named resource is in progress
It's fine for concurrent requests, as long as they are not for the same resource
The named resources are dynamic, based on records in a database
Thanks
My vision is next:
You need to have some kind of supervisor actor which maintains some state. On each request to this supervisor you check wheather you have this resource been currently processing. If yes, then you should store this request to some storage/queue. If no, spawn new actor and put this actor and the resource to the mentioned state. On completion, remove the actor and resource from the state. I strongly recommend you to have storage/queue to temporarily save request to supervisor actor. In such case you protect yourself from overwhelming the system. To guarantee overall concurrency requirement you may make internal state of supervisor bounded and if the size exceeds you store request to storage/queue.
Of course you need some mechanism of polling this queue and make requests to supervisor