Currently, I am reading about Schedulers and scheduling algorithms.
I am really confused with short-term scheduler and dispatcher.
At some places, it is written that they are same. At some places, it is written that their jobs are different.
From whatever I read I concluded that - "Scheduling" of the scheduler is caused by the code associated with a hardware interrupt, or code associated with a system call. With this a mode switch from user mode to kernel mode took place. Then short-term scheduler selects a process from a queue of the available process to give it control of the CPU. The task of short-term scheduler ends here.
Now dispatcher comes into play. The dispatcher is the module that gives control of the CPU to the process selected by the short-term scheduler. This function involves the following: -Switching context -Switching to user mode -Jumping to the proper location in the user program to restart that program.
Is my understanding correct?
Suppose Process A is preempted and process B is scheduled next. What happened during the context switch ? How context data of Process
A, scheduler, dispatcher, Process B is saved and restored?
The various divisions of the process switching steps are system dependent. Operating system books like to make these steps complicated and divide the into multiple steps.
There are really only two steps:
1. pick a the new process.
2. Switch to the new process.
That last step is very simple; so simple that it is probably not worthy of being called a separate step.
Most CPUs define a structure that is usually called the Process Context Block (PBC). The PCB has a slot for every register that defines the state of the process. Switching processes can be as simple as:
SAVEPCTX pcb_address_of_current_process ; Save the state of the running process
LOADPCTX pcb_address_of_new_process ; Load the state of the other process.
REI
Some processors require more steps, like having to save floating point registers separately.
To implement round-robin algorithm, the "circular queue" is considered as the best data structure.
Which data structure is used for ready queue in Windows Operating System?
I don't know but I have found some articles which might contain the answer:
Processes, Threads, and Jobs in the Windows Operating System by Mark E. Russinovich and David A. Solomon - search for "ready queue"
Internals of Windows Thread by Mahesh Bailwal - search for "ready queue"
Google query: site:reactos.org "ready queue"
According the article #1 there are fields in the Processor Region Control Block kernel object (PRCB) called ReadySummary (Bitmask 32 bits), DeferredReadyListHead (Single linked list), DispatcherReadyListHead (Array of 32 list entries) which implement the ready queue.
You can use source code of the ReactOS Operating System (article #3) to learn more about the Windows behavior.
Which data structure is used for ready queue in operating system?
This depends on the operating system. For most modern operating system this is some sort of algorithm made to be as fair as possible. This can be a round robin, fifo, CFS (red-black tree), or some other algorithm. The ready queue may be split into priority levels.
When a process is created or unblocked, it can be appended at the back of the list/queue. It may also be left in place and skipped as long as it is blocked.
What is used also depends on weather preemption is in use or not, which it is by default in general purpose operating systems (Linux/Windows/MacOS).
The round-robin scheduler chooses the processes on a first-come, first-served basis. The first to arrive is the first to be executed. Each process is executed for a short time called time-slice. When the time-slice expires, a timer interrupt occurs. Then the first in the queue is chosen to be executed and the interrupted process is placed at the end of the queue. This FIFO feature favors the implementation of the ready queue as a FIFO queue structure.
I was going through the following lecture notes on OS :
http://williamstallings.com/Extras/OS-Notes/h2.html
What I could draw is that "A process is a stream of execution ,i.e.basically a sequence of statements and so is a thread .However , the register states of one process are independent of the register states of another process but the register states of another thread can be accessed inside a thread. For every process at least one thread is allotted or dedicated ,when a process is started the OS activities for that process are taken over by the thread ( or a thread)"
What was the rationale behind conceiving the idea of threads ? When the OS is running a particular process why do we need some intermediate like a thread between them ?
"However , the register states of one process are independent of the register states of another process but the register states of another thread can be accessed inside a thread".
Can the above statement be taken as in the code for a process we cannot access the register states of a another process but in a code for a thread we can access the register states of another thread ?
(The above question did have the substitution of process and thread by their definition as codes or sequences of streams )
P.S : The title of the question is a metaphoric one .Please forgive if it misleads in any way . :P Could I take the liberty to broaden up and ask that if
the processor generates a thread for every process what does it write in the code for a thread ?(How does the code for a thread look like ? )
Terminology - for a system with virtual memory, threads share the same virtual memory address space, while each process has it's own address space. Processes can share physical memory by having a portion of that memory shared into their virtual address spaces (but the virtual address for each process may be different even though it is the same physical memory block).
Early (1960's) instances of multi-processing were mainframes that ran multiple processes that usually did not communicate with each other. Most of this activity was for batch oriented jobs, with a stream of jobs to be run, often from a punched card reader, or in more advanced situations, from remote job entry sites, which were other computers with a few peripherals (card readers, tape drives, line printers, ... ) that communicated with the mainframe to run jobs. There were also time sharing applications, similar to servers, except in many cases, relatively dumb terminals were used to communicate with the main frame. By the 1970's, APL/SV (A Programmming Language / Share Variables) was a time sharing application / programming language that could share variables between users.
For multi-process / multi-threaded operating systems, the device drivers operate from a queue of requests (such as a file read or write). Each request to be added to a device driver queue is done similar to a context switch so there won't be conflicts between process or thread requests for I/O. Some peripherals, such as mainframe, SCSI, or ... disk drives also operated from an internal queue, and could process I/O requests out of order to reduce random access overhead.
The basic problem that drove thread was how can an application handle multiple tasks at the same time and do it in a system-independent manner?
In classical eunuchs, a process could only do one thing at a time. If you needed to handle multiple things you kicked off multiple processes.
In the olde RSX and VMS systems (and Windoze under the covers), programmers relied on software interrupts. A process could queue I/O requests to multiple devices and receive a software interrupt when the request completed, thus allowing the application to do multiple things at once.
Another approach to the multiple things at once problem was to use event queues (Windoze, X Windows).
The ADA programming language was the first (and still really the only) mainstream programming language to support threads (tasks) as a system independent way to handle these kinds of problems. DOD compliance mandates drove the creation of threads.
Originally, threads were implemented through libraries ("use threads", "many to one model"). With the rise of multiprocessor systems, there became an increased demand to be able to have threads execute in parallel on different processors. This drove the creates of kernel threads in operating systems. (Many operating systems still do not support kernel threads).
I have been learning O.S in which it is written that there are two types of Process
1) CPU Bound Processes
2) I/O Bound Processes.
and somewhere its
1)Independent Processes
2)Cooperative Processes.
same goes for Threads
1) Single Level Thread.
2) Multilevel Thread.
and
1)User Level Thread
2)Kernel Level Thread.
Now confusion is that if someone asks me about Types of Process and Thread so which ones should i tell them, from above?
Kindly Make My Concept Clear?
I shall remain thankful to you!
Processes are two types based on their types of categories. The first one which you mentioned is related to event-specific process categorization and the next categorization is based on their nature. But, if someone asks you, you should ask for more clarification as to which type of category does he/she wants the classification. If null, then you should state the first(default) category as shown below:-
Event-specific based category of process
a) CPU Bound Process: Processes that spend the majority of their time simply using the CPU (doing calculations).
b) I/O Bound Process: Processes that are associated with input/output-based activity like reading from files, etc.
Category of processes based on their nature
a) Independent Process: A process that does not need any other external factor to get triggered is an independent process.
b) Cooperative Process: A process that works on the occurrence of any event and the outcome affects any part of the rest of the system is a cooperating process.
But, Threads have got only one classification based on their nature(Single Level Thread and Multi-Level Threads).
Actually, in modern operating systems, there are two levels at which threads operate. They are system or kernel threads and user-level threads. This one is generally not the classification, though some of them freely do classify. It is a misuse.
If you've further doubts, leave a comment below.
Basically there are two types of process:
Independent process.
Cooperating process.
For execution a process should be mixer of CPU bound and I/O bound.
CPU bound: is a time process reside in processor and perform it's execution.
I/O bound: is a time in which a process perform input output operation.e.g take input from keyboard or display output in monitor.
What is a Process?
A process is a program in execution. Process is not as same as program code but a lot more than it. A process is an 'active' entity as opposed to program which is considered to be a 'passive' entity. Attributes held by process include hardware state, memory, CPU etc.
Process memory is divided into four sections for efficient working :
The Text section is made up of the compiled program code, read in from non-volatile storage when the program is launched.
The Data section is made up the global and static variables, allocated and initialized prior to executing the main.
The Heap is used for the dynamic memory allocation, and is managed via calls to new, delete, mallow, free, etc.
The Stack is used for local variables. Space on the stack is reserved for local variables when they are declared.
Category of process:
1.Independent/isolated/competing.
2.Dependent/co-operating/concurrent.
1.Independetn:Execution of one process does not effect the execution's of other process that means there is nothing common for sharing.
2.Dependent:in it process can share some deliver buffer variable ,resources,(cpu,printer).
it process can share any thing, then execution of one process can effect other.
->execution of one process can effect or get affected by the execution of process.
what is the difference between these concepts?
“Process” is well-defined; “job” and “task” are ambiguous.
Fundamentally a job/task is what work is done, while a process is how it is done, usually anthropomorphised as who does it. A job is an overall unit of work, and is composed of tasks. In practice usage is very inconsistent, and often “task” == “process”, though formally a process performs a task.
Process is a well-defined operating systems concept, as is thread: a process is an instance of a program that is being executed, and is the basic unit of resources: a process consists of or “owns” its image, execution context, memory, files, etc.; etymologically a process is the steps done by a processor. A process consists of one or more threads, which are the unit of scheduling, and consist of some subset of a process (possibly shared with other threads): execution context and perhaps more. Traditionally a thread is the unit of execution on a processor (a thread is “what is executing”), but with multi-core processors and hardware threads, some scheduling is done even at the level of a single core. There are various kinds of processes and threads, and the exact definition varies between platforms.
Job and task are today vague, ambiguous terms, especially task. A “job” often means a set of processes, while a “task” may mean a process, a thread, a process or thread, or, distinctly, a unit of work done by a process or thread.
To give an idea how confused the naming is,
Windows Task Manager manages (running) processes, while
Windows Task Scheduler schedules programs to execute in future, what is traditionally known as a job scheduler, and uses the .job extension!
The term “job” traditionally means a “piece of work” (as opposed to “occupation”), and is used as such in manufacturing, in the phrase “job production”, meaning “custom production”, where it is contrasted with batch production (many items at once, one step at a time) and flow production (many items at once, all steps at the same time, by item). Note that these distinctions have become blurred in computing, notably in the oxymoronic term “batch job”.
In computing, “job” originates in non-interactive processing on mainframes, notably in IBM’s Job Control Language for the DOS/360 and OS/360 of the mid-1960s, and formally means a “unit of work for an operating system”, which consists of steps, each of which is a request to execute a specific program. Early computers primarily did batch processing (running the same program over many input data), like census or billing, and a standard type of one-off job was compiling a program from source, which could then process batches of data. Later batch came to be applied to all non-interactive computing, whether one-off or multiple items.
In Unix shells, a “job” is the shell’s representation for a process group – a set of processes that can all be sent a signal – concretely a pipeline and its descendent processes; note that running a script starts a job, exactly as in mainframes. The job is not done until the processes complete, and a job can be stopped, resumed, or terminated, which corresponds to suspending, resuming, or terminating the processes. Thus while formally a job is distinct from the process group, this is a subtle distinction and thus people often use “job” to mean “set of processes”.
Traditional jobs (and batches) have finite input data and should complete processing, successfully or not. By contrast, when running a server, such as a web server, the input, such as a stream of requests, is unlimited (formally codata). This is analogous to flow production, and the process (or “job”) never completes, though it can be terminated or “canceled”. In a quip, “a server’s job is never done” (formally, exit status will be CANCELED, not COMPLETED/SUCCESS).
The term “step” makes sense for sequential computing – one step follows another – but once you have concurrent computing, you have a set of tasks, which do not necessarily run in a particular order, rather than a sequence of steps. The term “task” was popularized by OS/360, which featured “Multiprogramming with a Fixed number of Tasks (MFT)” and “Multiprogramming with a Variable number of Tasks (MVT)”, though in this case “task” was used synonymously with “process” or “thread”, as the basic task is “execute this program” (so the resulting process/thread performs the task), which is probably the source of the ambiguity.
Formally “multitasking” means “working on multiple tasks concurrently”, but in practice means an operating system (or virtual machine, or runtime, or individual process) “running multiple processes/threads concurrently”.
A clear distinction between tasks as work and process/threads as how the work is done is given in a
task queue, as in this diagram of a thread pool: there is a (big, potentially unlimited) queue of incoming tasks (pending), which are performed by a (small, often fixed) set of threads, each task being performed by a single thread, and each thread performing a single task at a time: the active tasks correspond to the active threads. Concretely, consider a multithreaded web server, where the tasks are “service this web page request”, and each thread fetches (from disk or memory) or renders the web page (say by a template or PHP), then returns the result.
As you can see from this last example, it is often useful to distinguish tasks from threads or processes, and in particular contexts “job” and “task” have specific meanings, though in general they are ambiguous.
Clearest is thus to avoid using “job” or “task” and instead refer to a “set of processes”, “process”, or “thread”, and for servers to refer to requests (or queries) rather than tasks.
They can be all considered the same thing, really depends on the context. A process though is usually an isolated entity that's managed by the operating system. A job is often more of an application level term or just some script that's executed to do a specific set of task(s). A task is often a part of a job - sometimes the only part.
A job is a unit of work that has been submitted by user. It is usually associated with batch systems. A batch job might be a request to run multiple programs in succession [pg 144]. However, it can be assumed that a job is a request to run a single program. Hence, depending on the context, a job can be a program (we usually assume this), or a set of programs (e.g. batch systems) [pg 8].
A process is an active entity, which requires a set of resources, including a processor and special registers to perform its function. It is a single instance of an executable program. So from here, you can see the connection between a process and a program, hence, a job.
The Linux kernel internally represents processes as tasks [pg 742].
Source: Modern Operating Systems (3rd edition) by Tanenbaum, published by Pearson Education, Inc, 2009
A task represents the execution of a single process or multiple processes on a compute node. A collection of tasks that is used to perform a computation is known as a job. Jobs are used to reserve the resources required by tasks.
source: jobs and tasks http://msdn.microsoft.com/en-us/library/bb525214%28v=vs.85%29.aspx
Well...
This might not be as clear as described here. It may very well depend of the operating system some one is dealing with.
For example when compiling a DIGITAL Equipment OSF1 kernel (also known as TruUnix64) -- when that Unix was still existant, end of the nineties, beginning of the century -- the term TASK was dedicated to the number of parallel tasks the kernel was able to handle.
It was a fixed array of tasks the kernel could perform at a given moment.
Thus it was the sum of the processes it could spawn as well as internal tasks it has to do even if not seen as processes by ps. Then it was a very low level count of actions allowed to the kernel on each NUMA node, not something accessible outside the kernel.
On the other hand a previous operating system like DEC VMS was known for having its base OS unit as a job (you interactively logged under a job) executing possibly (depending on system and account parameters and privileges) many processes at a time. An image (an executable) then occupying a process and (most of the time) multiple threads (the OS took care of multithreading by itself) at a time.
So, then, the job was not application related but really OS related.
Somewhat similarly Windows, which does not natively support fork() as a lightweight process creator, tends to create processes (using a spawn - CreateProcess - primitive that looks very much alike the one that existed onto VMS / OpenVMS 40 years ago) that are heavier that the Unix ones. Here, we have the same word (process) to describe (in term of OS) two realities that are quite different: a Windows process tends to be closer to a VMS job than a true Unix process.
As I did not configure/build any Unix Kernel since TrueUnix64, I am not able to discuss the TASK kernel parameter of a Debian or Linux OS if any. It might be interesting that someone with inner knowledge of the tasks limit of those kind of OS could explain us further on this concept in these systems.
To conclude: task, process, job, spawn, fork, thread... the more you dig into different OS, the more varieties you get and possible contradictory definitions you face.
Gilles
[non native English speaker, pardon my English].