Round-robin like handling of messages with SQS (or other mq solutions in AWS) - queue

The context
I have an infrastructure where a server produces long running jobs where each job consists of logical chunks that are about the same size, but every job have vastly different amount of chunks. I have a scalable number of workers which can take the chunks do the work (processor heavy), and return the result to the server. One worker works on only one chunk at the same time.
Currently for scheduling the chunks I use an SQS queue so when a job is created I dump all chunks to the SQS queue and the workers will take the chunks. It works like a FIFO.
So to summarize what does what:
A job is a lot of processor intensive calculations. It consists of multiple independent chunks that are about the same size.
A chunk is a processor intensive calculation the workers can work on. Independent of other chunks and can be calculated itself without additional context.
The server creates jobs. When the job is created the server puts all the job's chunks on the Queue (and essentially forgets about the jobs).
The workers can work on chunks. It does not matter what job is the chunk part of, the worker can take on any. A worker when it has nothing to work on (is newly created, or already finished its previous chunk) looks for the next chunk on the queue.
The problem
When a job is scheduled all chunks are added to the queue and when a next job is scheduled it will not be started to be worked on until the first job is finished. So in a scenario where job A (first) takes 4 hours and job B (second) takes 5 minutes, job B will not get started in the first few hours and will only be finished in about 4 hours 5 minutes, so if there is a large job scheduled it will effectively block all other calculations. The queue will look like this:
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 ... A100 B1 B2
I would like to not block the new calculations coming in but process them in a different order like:
A1 B1 A2 B2 A3 A4 A5 A6 A7 A8 A9 A10 ... A100
If a third job arrives after A1 and B1 has been picked up, it should still not be blocked:
A2 B2 C1 A3 C2 A4 C3 A5 C4 A6 A7 A8 A9 A10 ... A100
With ordering the chunks like this I can guarantee the following:
For every job the first task is picked up relatively fast.
For every job there is constant perceived progress (some new chunks are always finished)
Short jobs (not many chunks) are finished relatively fast.
Solutions
I know I cannot reorder an SQS queue in itself, so I might have to do something like:
Change technologies. Maybe some queue supports this out of the box in AWS
When a new job is about to be scheduled, the server just takes all chunks from the queue, shuffles in the new chunks, puts back everything in the queue.
Somehow reach the intended behavior with a priority queue (maybe RabbitMQ).
Is there some easy, safe solution for this? How should I do it?

Related

Kafka - Long running job decreases through put

I have three partitions and three consumers in my consumer group.
Job A, B assigned to partition1
Job C, D assigned to partition2
Job E, F assigned to partition3
Job C,D,E,F took less than 20 seconds to complete but job A is taking 30 minutes to complete. B if executed will take 10 seconds to complete but got stuck behind A. B will have to wait 30 minutes although two consumers are just sitting idle.
How do I solve this and not get B stuck if there are consumers sitting idle.
This isn't really something Kafka can solve. Plus, you cannot have more than one consumer on any partition, so "A, B" is really just "A" from a consumer-group perspective. You'd need to decouple your "processing" logic that is blocking from the event consumption if you want to "increase throughput", with the tradeoff of skipping, duplicating, or handling events out of order.
As one example, methodA(record) does a thing, and takes 30 minutes. That's completely unrelated to Kafka, as you've executed a blocking function. Without more details, you'll simply need to re-write this to not block as long. Otherwise, do something like new Thread(() => methodA(record)).start() (run it in the background, and keep consuming, and starting more threads).
But, when complete, sends to A-completed topic, which "consumer B" then reads and runs methodB(recordFromA).

Branch folding example unclear

I was learning about branch folding from a book called "Computer Organization" by Carl Hamacher (5th edition) when I came across this example:
Aditional details:
Queue length here denotes the number of instructions present in the instruction queue
F,D,E and W denotes the fetch, decode, execute and write stages of a pipeline respectively
The dotted lines at instructions 2,3 and 4 (I2,I3 and I4) denote that the stage the instruction currently is in is idle (i.e. waiting for the next stage to complete)
I5 is a branch instruction with branch target Ik
The pipeline starts with an instruction fetch unit which is connected to an instruction queue, which is connected to a decode/dispatch unit, which is connected to an execution unit and finally it ends with a write stage. There exists intermediate buffers between the decode, execute and write stages.
My doubt here is how D3 and D5 of I3 and I5 are being performed in the same clock cycle in spite of the fact that there is only one decode unit (given) ? Further more, the instruction queue should be of length 2 at cycle 4, why is it still 1 ? As both F3 and F4 seem to be in the instruction queue and none of them have been dispatched at cycle 4.

Understanding multilevel feedback queue scheduling

I'm trying to understand multilevel feedback queue scheduling and I came across the following example from William Stallings Operating Systems Internal and Principles Design (7th ed).
I got this process:
And the result in the book is this:
I believe I'm doing the first steps wright but when I get to process E CPU time my next process is B not D as in the book example.
I can't understand if there are n RQ and after each time a process get's CPU time it is demoted to a lower priority time RQ or if, for example, process A is in RQ1 and there are no process at the çower RQ, the process is promoted to that ready queue (this is how I am doing).
Can someone explain me the process how, at the above example, after E is processed, D gets CPU time and them E (and not B) is served?
The multilevel feedback algortihm selects always the first job of the lowest queue (i.e., the queue with the highest priority) that is not empty.
When job E leaves RQ1 (time 9), job D is in queue RT2 but job B in RT3. Thus, B is executed. Please consider the modified figure, where the red numbers give the queue in which the job is executed.
As you can see, job B has already left RT2 at time 9 (more preceisly, it leaves RT2 at time 6), whereas job D has just entered.

Why speed of variable length pipeline is determined by the slowest stage + whats the total execution time of program?

I am new to pipelining and I need some help regarding the fact that
The speed of the pipelining is determined by the speed of the slowest stage
Not only this, if I am given a 5 stage pipeline with duration of them 5 ns,10 ns, 8 ns, 7 ns,7 ns respectively , it is said that each instruction would take 10 ns time.
Can I get a clear explanation for this?
(edited)
Also let my program has 3 instructions I1,I2,I3 and I take 1 clk cycle duration = 1ns
such that the above stages take - 5, 10, 8 , 7 , 7 clock cycles respectively.
Now according to theory a snapshot of the pipeline would be -
But that gives me a total time to be -no of clk cycles*clk cycle duration = 62 * 1 = 62 ns
But according to theory total time should be - (slowest stage) * no. of instructions = 10 * 3 = 30 ns
Though I have an idea why slowest stage is important (each pipeline stage needs to wait hence 1 instruction is produced after every 10 clk cycle- but the result is inconsistent when i calculate it using clk cycles.Why this inconsistency? What am I missing??
(edited)
Assume a car manufacturing process. Assume it's used two stage pipe lining. Say it takes 1 day to manufacture an engine. 2 days to manufacture the rest. You can do both stages in parallel. What is your car output rate? It should be one car per 2 days. Although you manufacture the rest in 1 day, you have to wait another day to get the engine.
In your case, although other stages finish their job in lesser time, you have to wait 10ns to get the whole process done
Staging allows you to do the "parts" of the same operation at onces.
I'll create a smaller example here, dropping the last 2 stages of your example: 5, 10, 8 ns
Let's take two operations:
5 10 8
5 10 8
| The first operation starts here
| At stage 2 the second operation can start it's fist stage
| However, since the stages take different amount of times,
| the longest ones determines the runtime
| the thirds stage can only start after the 2nd has completed: after 15ns
| this is also true for the 2nd stage of the 2nd operation
I am not sure about the source of your confusion. If one unit of your pipeline is taking longer, the units behind it cannot push the pipeline to move ahead until that unit is finished, even though they themselves are finished with their work. Like DPG said, try to look at it from the car manufacture line example. It is one of the most common ones given to explain a pipeline. If the units AHEAD of the slowest unit after finished quicker, it still doesn't matter because they have to wait until the slower unit finishes its work. So yes, your pipeline is executing 3 instructions for a total execution time of 30ns.
Thank you all for your answers
I think I have got it clear by now.
This is what I think the answer is -
Question- 1 :- Why pipeline execution dependent on the slowest step
Well clearly from the diagram each stage has to wait for the slowest stage to complete.
So the total time after which each instruction is completed bounded by the time of wait.
(In my example after a gap of 10 ns)
Question-2 :- Whats the total execution time of program
I wanted to know how long will the particular program containing 3 instructions take to execute
NOT how long will it take for 3 instructions to execute- which is obviously 30ns pertaining to the fact
that every instruction gets completed every 10 ns.
Now suppose I1 is fetched in the pipeline then already 4 other instructions are executing in it.
4 instructions are completed in 40 ns time.
After that I1,I2,I3 executed in order 30 ns.(Assuming no pipeline stalls)
This gives a total of 30+40=70ns .
In fact for n instruction program,k- stage pipeline
I think it is (n + k-1 ) *C *T
where C= no. of clk cycles in slowest stage
T= clock cycle time
Please review my understanding .... to know if I am thinking anything wrong and so that
I can accept my own answer!!!

Differences (if any) among livelock and starvation in Operating systems

What are the differences (if any) among the starvartion and livelock Or just they are the synonyms used? If there is a difference please some one afford an example.
Note: I have seen the wikipedia...but confused...
Thanks
Livelock is a special case of resource starvation where two processes follow an algorithm for resolving a deadlock that results in a cycle of different locked states because each process is attempting the same strategy to avoid the lock.
Starvation itself can occur for one process without another process being cyclically blocked; in this case no livelock exists, just a single unfortunate process that gets no resources allocated by the scheduler.
Starvation and Livelock (by Java docs) state:
Starvation and Livelock
Starvation and livelock are much less common a problem than deadlock, but are still problems that every designer of concurrent software is likely to encounter.
Starvation
Starvation describes a situation where a thread is unable to gain regular access to shared resources and is unable to make progress. This happens when shared resources are made unavailable for long periods by "greedy" threads. For example, suppose an object provides a synchronized method that often takes a long time to return. If one thread invokes this method frequently, other threads that also need frequent synchronized access to the same object will often be blocked.
Livelock
A thread often acts in response to the action of another thread. If the other thread's action is also a response to the action of another thread, then livelock may result. As with deadlock, livelocked threads are unable to make further progress. However, the threads are not blocked — they are simply too busy responding to each other to resume work. This is comparable to two people attempting to pass each other in a corridor: Alphonse moves to his left to let Gaston pass, while Gaston moves to his right to let Alphonse pass. Seeing that they are still blocking each other, Alphone moves to his right, while Gaston moves to his left. They're still blocking each other, so...
LiveLock
Livelock is a form of deadlock. In a deadlocked computation there is no possible execution sequence which succeeds. but In a livelocked computation, there are successful computations, but there are one or more execution sequences in which no process enters its critical section.
#Example scenario
process P1
c1 = 1
c2 = 1
while (true){
nonCriticalSection;
c1 = 0;
while(c2!=1){
c1=1;
c1=0;
}
criticalSection1;
c1 =1;
}
process P2
c1 = 1
c2 = 1
while (true){
nonCriticalSection;
c2 = 0;
while(c1!=1){
c2=1;
c2=0;
}
criticalSection1;
c2 =1;
}
In this scenario, How can starvation happen?
For example,
P1 sets c1 to 0
P2 sets c2 to 0
P2 checks c1 and resets c2 to 1.
P1 completes a full cycle;
checks c2
enters critical section
resets c1
enters non-critical section
sets c1 to 0
P2 sets c2 to 0
So now the same thing happens again and again, so P1 again may get a chance to execute and P2 will be stuck in the while loop. We don't force our algorithm to give a chance to P2. P1 may run a million times even before P2 gets a chance from OS since we don't enforce anything. So which means there can be some sequence that P2 starved. Since P1 can process and P2 starves, we call sequences as starvation.
Livelock is actually both threads that will be stuck in a while loop without doing anything. Since the above lines may give livelock the same as deadlock but the deadlock you don't do anything. but in live lock, some instructions will be executed but these executing instructions are not enough to allow a process to its critical section.
In this above pseudo-code how livelock will see with following line of executions.
P1 sets c1 to 0.
P2 sets c2 to 0.
P1 checks c2 and remains in the loop.
P2 checks c1 and remains in the loop.
P1 resets c1 to 1.
P2 resets c2 to 1.
P1 resets c1 to 0.
P2 resets c2 to 0.
P1 and P2 will be in the while loop doing some executions.
Difference from deadlock and livelock
When deadlock happens, No execution will happen. but in livelock, some executions will happen but those executions are not enough to enter the critical section.
Difference between Livelock and Starvation
In starvation, some processes will enter the critical section while some of them are not allowed to critical section due to some reasons(os scheduling, priority) but in livelock, the Critical section will be empty and processes are competing to enter the critical section with doing soemthing.