job, task and process, what's the difference - operating-system

what is the difference between these concepts?

“Process” is well-defined; “job” and “task” are ambiguous.
Fundamentally a job/task is what work is done, while a process is how it is done, usually anthropomorphised as who does it. A job is an overall unit of work, and is composed of tasks. In practice usage is very inconsistent, and often “task” == “process”, though formally a process performs a task.
Process is a well-defined operating systems concept, as is thread: a process is an instance of a program that is being executed, and is the basic unit of resources: a process consists of or “owns” its image, execution context, memory, files, etc.; etymologically a process is the steps done by a processor. A process consists of one or more threads, which are the unit of scheduling, and consist of some subset of a process (possibly shared with other threads): execution context and perhaps more. Traditionally a thread is the unit of execution on a processor (a thread is “what is executing”), but with multi-core processors and hardware threads, some scheduling is done even at the level of a single core. There are various kinds of processes and threads, and the exact definition varies between platforms.
Job and task are today vague, ambiguous terms, especially task. A “job” often means a set of processes, while a “task” may mean a process, a thread, a process or thread, or, distinctly, a unit of work done by a process or thread.
To give an idea how confused the naming is,
Windows Task Manager manages (running) processes, while
Windows Task Scheduler schedules programs to execute in future, what is traditionally known as a job scheduler, and uses the .job extension!
The term “job” traditionally means a “piece of work” (as opposed to “occupation”), and is used as such in manufacturing, in the phrase “job production”, meaning “custom production”, where it is contrasted with batch production (many items at once, one step at a time) and flow production (many items at once, all steps at the same time, by item). Note that these distinctions have become blurred in computing, notably in the oxymoronic term “batch job”.
In computing, “job” originates in non-interactive processing on mainframes, notably in IBM’s Job Control Language for the DOS/360 and OS/360 of the mid-1960s, and formally means a “unit of work for an operating system”, which consists of steps, each of which is a request to execute a specific program. Early computers primarily did batch processing (running the same program over many input data), like census or billing, and a standard type of one-off job was compiling a program from source, which could then process batches of data. Later batch came to be applied to all non-interactive computing, whether one-off or multiple items.
In Unix shells, a “job” is the shell’s representation for a process group – a set of processes that can all be sent a signal – concretely a pipeline and its descendent processes; note that running a script starts a job, exactly as in mainframes. The job is not done until the processes complete, and a job can be stopped, resumed, or terminated, which corresponds to suspending, resuming, or terminating the processes. Thus while formally a job is distinct from the process group, this is a subtle distinction and thus people often use “job” to mean “set of processes”.
Traditional jobs (and batches) have finite input data and should complete processing, successfully or not. By contrast, when running a server, such as a web server, the input, such as a stream of requests, is unlimited (formally codata). This is analogous to flow production, and the process (or “job”) never completes, though it can be terminated or “canceled”. In a quip, “a server’s job is never done” (formally, exit status will be CANCELED, not COMPLETED/SUCCESS).
The term “step” makes sense for sequential computing – one step follows another – but once you have concurrent computing, you have a set of tasks, which do not necessarily run in a particular order, rather than a sequence of steps. The term “task” was popularized by OS/360, which featured “Multiprogramming with a Fixed number of Tasks (MFT)” and “Multiprogramming with a Variable number of Tasks (MVT)”, though in this case “task” was used synonymously with “process” or “thread”, as the basic task is “execute this program” (so the resulting process/thread performs the task), which is probably the source of the ambiguity.
Formally “multitasking” means “working on multiple tasks concurrently”, but in practice means an operating system (or virtual machine, or runtime, or individual process) “running multiple processes/threads concurrently”.
A clear distinction between tasks as work and process/threads as how the work is done is given in a
task queue, as in this diagram of a thread pool: there is a (big, potentially unlimited) queue of incoming tasks (pending), which are performed by a (small, often fixed) set of threads, each task being performed by a single thread, and each thread performing a single task at a time: the active tasks correspond to the active threads. Concretely, consider a multithreaded web server, where the tasks are “service this web page request”, and each thread fetches (from disk or memory) or renders the web page (say by a template or PHP), then returns the result.
As you can see from this last example, it is often useful to distinguish tasks from threads or processes, and in particular contexts “job” and “task” have specific meanings, though in general they are ambiguous.
Clearest is thus to avoid using “job” or “task” and instead refer to a “set of processes”, “process”, or “thread”, and for servers to refer to requests (or queries) rather than tasks.

They can be all considered the same thing, really depends on the context. A process though is usually an isolated entity that's managed by the operating system. A job is often more of an application level term or just some script that's executed to do a specific set of task(s). A task is often a part of a job - sometimes the only part.

A job is a unit of work that has been submitted by user. It is usually associated with batch systems. A batch job might be a request to run multiple programs in succession [pg 144]. However, it can be assumed that a job is a request to run a single program. Hence, depending on the context, a job can be a program (we usually assume this), or a set of programs (e.g. batch systems) [pg 8].
A process is an active entity, which requires a set of resources, including a processor and special registers to perform its function. It is a single instance of an executable program. So from here, you can see the connection between a process and a program, hence, a job.
The Linux kernel internally represents processes as tasks [pg 742].
Source: Modern Operating Systems (3rd edition) by Tanenbaum, published by Pearson Education, Inc, 2009

A task represents the execution of a single process or multiple processes on a compute node. A collection of tasks that is used to perform a computation is known as a job. Jobs are used to reserve the resources required by tasks.
source: jobs and tasks http://msdn.microsoft.com/en-us/library/bb525214%28v=vs.85%29.aspx

Well...
This might not be as clear as described here. It may very well depend of the operating system some one is dealing with.
For example when compiling a DIGITAL Equipment OSF1 kernel (also known as TruUnix64) -- when that Unix was still existant, end of the nineties, beginning of the century -- the term TASK was dedicated to the number of parallel tasks the kernel was able to handle.
It was a fixed array of tasks the kernel could perform at a given moment.
Thus it was the sum of the processes it could spawn as well as internal tasks it has to do even if not seen as processes by ps. Then it was a very low level count of actions allowed to the kernel on each NUMA node, not something accessible outside the kernel.
On the other hand a previous operating system like DEC VMS was known for having its base OS unit as a job (you interactively logged under a job) executing possibly (depending on system and account parameters and privileges) many processes at a time. An image (an executable) then occupying a process and (most of the time) multiple threads (the OS took care of multithreading by itself) at a time.
So, then, the job was not application related but really OS related.
Somewhat similarly Windows, which does not natively support fork() as a lightweight process creator, tends to create processes (using a spawn - CreateProcess - primitive that looks very much alike the one that existed onto VMS / OpenVMS 40 years ago) that are heavier that the Unix ones. Here, we have the same word (process) to describe (in term of OS) two realities that are quite different: a Windows process tends to be closer to a VMS job than a true Unix process.
As I did not configure/build any Unix Kernel since TrueUnix64, I am not able to discuss the TASK kernel parameter of a Debian or Linux OS if any. It might be interesting that someone with inner knowledge of the tasks limit of those kind of OS could explain us further on this concept in these systems.
To conclude: task, process, job, spawn, fork, thread... the more you dig into different OS, the more varieties you get and possible contradictory definitions you face.
Gilles
[non native English speaker, pardon my English].

Related

Does each system call create a process?

Does each system call create a process?
Are all functions (e.g. interrupts) of programs and operating systems executed in the form of processes?
I feel that such a large number of process control blocks, a large number of process scheduling waste a lot of resources.
Or, the kernel instruction of the system call is regarded as part of the current
process.
The short answer is - not exactly. But we have to agree on what we are going to call a "process". A process is more of an abstract idea, which encapsulates multiple instructions, each sequentially executed.
So let's start from the first question.
Does each system call create a process?
No. Each system call is the product of the currently running process, that tells the OS - "Hey OS, I need you to open this file for me, or read these here bits". In this case, the process is a bag of sequentially executed instructions, some are system calls, some are not.
Then we have.
Are all functions (e.g. interrupts) of programs and operating systems executed in the form of processes?
Well this kind of goes back to the first question. We are not considering that a system call (an operation that tell the OS to do something and works under very strict conditions) is a separate process. We will NOT see that system call execution to have its OWN process id (pid).
Then we have.
I feel that such a large number of process control blocks, a large number of process scheduling waste a lot of resources.
Well, I would say, do not underestimate your OS and the capabilities of your hardware. A modern processor with a modern OS on it, is VERY, VERY fast and more than capable of computing billions of instructions in seconds. We can't really imagine how fast that is. I wouldn't worry for optimizations on such a micro-level.
Okay, but let's dig deeper into this. What is a process exactly?
Informally, a process is a program in execution. The status of the current activity of a process is represented by a value, called the program counter, and the contents of the processor’s registers. The memory layout of a process is typically divided into multiple sections.
These sections include:
Text section.
Data section.
Heap section.
Stack section.
As a process executes, it changes state. The state of a process is defined in part by the current activity of that process. Each process is represented in the OS by a process control block (PCB), as you already mentioned.
So we can see that we treat a process as a very complicated structure that is MORE that just occupying CPU time. It has a state, storage, timing, and so on.
But because you are interested in system calls, then what are they?
For us, system calls provide an interface to the services made available by an OS. They are the way we tell the OS to do things FOR US. We know that systems execute thousands of system calls per second.
No, they don't.
The operating system uses software interrupt to execute the system call operation within the same process.
You can imagine them as a function call but they are executed with kernel privileges.

Micro scheduler for real-time kernel in embedded C applications?

I am working with time-critical applications where the microsecond counts. I am interested to a more convenient way to develop my applications using a non bare-metal approach (some kind of framework or base foundation common to all my projects).
A considered real-time operating system such as RTX, Xenomai, Micrium or VXWorks are not really real-time under my terms (or under the terms of electronic engineers). So I prefer to talk about soft-real-time and hard-real-time applications. An hard-real-time application has an acceptable jitter less than 100 ns and a heat-beat of 100..500 microseconds (tick timer).
After lots of readings about operating systems I realized that typical tick-time is 1 to 10 milliseconds and only one task can be executed each tick. Therefore the tasks take usually much more than one tick to complete and this is the case of most available operating systems or micro kernels.
For my applications a typical task has a duration of 10..100 microseconds, with few exceptions that can last for more than one tick. So any real-time operating system cannot not fulfill my requirements. That is the reason why other engineers still not consider operating system, micro or nano kernels because the way they work is too far from their needs. I still want to struggle a bit and in my case I now realize I have to consider a new category of operating system that I never heard about (and that may not exist yet). Let's call this category nano-kernel or subtick-scheduler
In such dreamed kernels I would find:
2 types of tasks:
Preemptive tasks (that run in their own memory space)
Non-preemptive tasks (that run in the kernel space and must complete in less than one tick.
Deterministic kernel scheduler (fixed duration after the ISR to reach the theoretical zero second jitter)
Ability to run multiple tasks per tick
For a better understanding of what I am looking for I made this figure below that represents the two types or kernels. The first representation is the traditional kernel. A task executes at each tick and it may interrupt the kernel with a system call that invoke a full context switch.
The second diagram shows a sub-tick kernel scheduler where multiple tasks may share the same tick interrupt. Task 1 was summoned with a maximum execution time value so it needs 2 ticks to complete. Task 2 is set with low priority, so it consumes the remaining time of each tick upon completion. Task 3 is non-preemptive so it operates on the kernel space which save some precious context switch time.
Available operating systems such as RTOS, RTAI, VxWorks or µC/OS are not fully real-time and are not suitable for embedded hard real-time applications such as motion-control where a typical cycle would last no more than 50 to 500 microseconds. By analyzing my needs I land on different topology for my scheduler were multiple tasks can be executed under the same tick interrupt. Obviously I am not the only one with this kind of need and my problem might simply be a kind of X-Y problem. So said differently I am not really looking at what I am really looking for.
After this (pretty) long introduction I can formulate my question:
What could be a good existing architecture or framework that can fulfill my requirements other than a naive bare-metal approach where everything is written sequentially around one master interrupt? If this kind of framework/design pattern exists what would it be called?
Sorry, but first of all, let me say that your entire post is completely wrong and shows complete lack of understanding how preemptive RTOS works.
After lots of readings about operating systems I realized that typical tick-time is 1 to 10 milliseconds and only one task can be executed each tick.
This is completely wrong.
In reality, a tick frequency in RTOS determines only two things:
resolution of timeouts, sleeps and so on,
context switch due to round-robin scheduling (where two or more threads with the same priority are "runnable" at the same time for a long period of time.
During a single tick - which typically lasts 1-10ms, but you can usually configure that to be whatever you like - scheduler can do hundreds or thousands of context switches. Or none. When an event arrives and wakes up a thread with sufficiently high priority, context switch will happen immediately, not with the next tick. An event can be originated by the thread (posting a semaphore, sending a message to another thread, ...), interrupt (posting a semaphore, sending a message to a queue, ...) or by the scheduler (expired timeout or things like that).
There are also RTOSes with no system ticks - these are called "tickless". There you can have resolution of timeouts in the range of nanoseconds.
That is the reason why other engineers still not consider operating system, micro or nano kernels because the way they work is too far from their needs.
Actually this is a reason why these "engineers" should read something instead of pretending to know everything and seeking "innovative" solutions to non-existing problems. This is completely wrong.
The first representation is the traditional kernel. A task executes at each tick and it may interrupt the kernel with a system call that invoke a full context switch.
This is not a feature of a RTOS, but the way you wrote your application - if a high priority task is constantly doing something, then lower priority tasks will NOT get any chance to run. But this is just because you assigned wrong priorities.
Unless you use cooperative RTOS, but if you have such high requirements, why would you do that?
The second diagram shows a sub-tick kernel scheduler where multiple tasks may share the same tick interrupt.
This is exactly how EVERY preemptive RTOS works.
Available operating systems such as RTOS, RTAI, VxWorks or µC/OS are not fully real-time and are not suitable for embedded hard real-time applications such as motion-control where a typical cycle would last no more than 50 to 500 microseconds.
Completely wrong. In every known RTOS it is not a problem to get a response time down to single microseconds (1-3us) with a chip that has clock in the range of 100MHz. So you actually can run "jobs" which are as short as 10us without too much overhead. You can even have "jobs" as short as 10ns, but then the overhead will be pretty high...
What could be a good existing architecture or framework that can fulfill my requirements other than a naive bare-metal approach where everything is written sequentially around one master interrupt? If this kind of framework/design pattern exists what would it be called?
This pattern is called preemptive RTOS. Do note that threads in RTOS are NOT executed in "tick interrupt". They are executed in standard "thread" context, and tick interrupt is only used to switch context of one thread to another.
What you described in your post is a "cooperative" RTOS, which does NOT preempt threads. You use that in systems with extremely limited resources and with low timing requirements. In every other case you use preemptive RTOS, which is capable of handling the events immediately.

How did the apples fell for threads to be conceived

I was going through the following lecture notes on OS :
http://williamstallings.com/Extras/OS-Notes/h2.html
What I could draw is that "A process is a stream of execution ,i.e.basically a sequence of statements and so is a thread .However , the register states of one process are independent of the register states of another process but the register states of another thread can be accessed inside a thread. For every process at least one thread is allotted or dedicated ,when a process is started the OS activities for that process are taken over by the thread ( or a thread)"
What was the rationale behind conceiving the idea of threads ? When the OS is running a particular process why do we need some intermediate like a thread between them ?
"However , the register states of one process are independent of the register states of another process but the register states of another thread can be accessed inside a thread".
Can the above statement be taken as in the code for a process we cannot access the register states of a another process but in a code for a thread we can access the register states of another thread ?
(The above question did have the substitution of process and thread by their definition as codes or sequences of streams )
P.S : The title of the question is a metaphoric one .Please forgive if it misleads in any way . :P Could I take the liberty to broaden up and ask that if
the processor generates a thread for every process what does it write in the code for a thread ?(How does the code for a thread look like ? )
Terminology - for a system with virtual memory, threads share the same virtual memory address space, while each process has it's own address space. Processes can share physical memory by having a portion of that memory shared into their virtual address spaces (but the virtual address for each process may be different even though it is the same physical memory block).
Early (1960's) instances of multi-processing were mainframes that ran multiple processes that usually did not communicate with each other. Most of this activity was for batch oriented jobs, with a stream of jobs to be run, often from a punched card reader, or in more advanced situations, from remote job entry sites, which were other computers with a few peripherals (card readers, tape drives, line printers, ... ) that communicated with the mainframe to run jobs. There were also time sharing applications, similar to servers, except in many cases, relatively dumb terminals were used to communicate with the main frame. By the 1970's, APL/SV (A Programmming Language / Share Variables) was a time sharing application / programming language that could share variables between users.
For multi-process / multi-threaded operating systems, the device drivers operate from a queue of requests (such as a file read or write). Each request to be added to a device driver queue is done similar to a context switch so there won't be conflicts between process or thread requests for I/O. Some peripherals, such as mainframe, SCSI, or ... disk drives also operated from an internal queue, and could process I/O requests out of order to reduce random access overhead.
The basic problem that drove thread was how can an application handle multiple tasks at the same time and do it in a system-independent manner?
In classical eunuchs, a process could only do one thing at a time. If you needed to handle multiple things you kicked off multiple processes.
In the olde RSX and VMS systems (and Windoze under the covers), programmers relied on software interrupts. A process could queue I/O requests to multiple devices and receive a software interrupt when the request completed, thus allowing the application to do multiple things at once.
Another approach to the multiple things at once problem was to use event queues (Windoze, X Windows).
The ADA programming language was the first (and still really the only) mainstream programming language to support threads (tasks) as a system independent way to handle these kinds of problems. DOD compliance mandates drove the creation of threads.
Originally, threads were implemented through libraries ("use threads", "many to one model"). With the rise of multiprocessor systems, there became an increased demand to be able to have threads execute in parallel on different processors. This drove the creates of kernel threads in operating systems. (Many operating systems still do not support kernel threads).

Difference between Job Queue,Input Queue and Ready Queue?

Could someone explain what exactly is the function of all the 3 queues and how are they different from each other? It would be great if you could also tell where exactly the queue resides (i.e Main memory or Disk). Thanks!
Edit: I want to know their function with respect to them being used for queueing processes in UNIX based Operating Systems.
Jobs and their queues are abstract concepts with many different implementations (see Wikipedia: Job queue and Wikipedia: Job scheduler) which then define their meaning. Input queue and ready queue fall into the same "abstract" category.
For example: the Windows AT command can schedule and execute jobs in the form of arbitrary OS shell command and the job queue resides almost in Wikipedia: Windows Registry which resides on disk but for performance reasons is also cached in the main memory. See http://ss64.com/nt/at.html for more details

What are the Types of Process and Thread in Operating System?

I have been learning O.S in which it is written that there are two types of Process
1) CPU Bound Processes
2) I/O Bound Processes.
and somewhere its
1)Independent Processes
2)Cooperative Processes.
same goes for Threads
1) Single Level Thread.
2) Multilevel Thread.
and
1)User Level Thread
2)Kernel Level Thread.
Now confusion is that if someone asks me about Types of Process and Thread so which ones should i tell them, from above?
Kindly Make My Concept Clear?
I shall remain thankful to you!
Processes are two types based on their types of categories. The first one which you mentioned is related to event-specific process categorization and the next categorization is based on their nature. But, if someone asks you, you should ask for more clarification as to which type of category does he/she wants the classification. If null, then you should state the first(default) category as shown below:-
Event-specific based category of process
a) CPU Bound Process: Processes that spend the majority of their time simply using the CPU (doing calculations).
b) I/O Bound Process: Processes that are associated with input/output-based activity like reading from files, etc.
Category of processes based on their nature
a) Independent Process: A process that does not need any other external factor to get triggered is an independent process.
b) Cooperative Process: A process that works on the occurrence of any event and the outcome affects any part of the rest of the system is a cooperating process.
But, Threads have got only one classification based on their nature(Single Level Thread and Multi-Level Threads).
Actually, in modern operating systems, there are two levels at which threads operate. They are system or kernel threads and user-level threads. This one is generally not the classification, though some of them freely do classify. It is a misuse.
If you've further doubts, leave a comment below.
Basically there are two types of process:
Independent process.
Cooperating process.
For execution a process should be mixer of CPU bound and I/O bound.
CPU bound: is a time process reside in processor and perform it's execution.
I/O bound: is a time in which a process perform input output operation.e.g take input from keyboard or display output in monitor.
What is a Process?
A process is a program in execution. Process is not as same as program code but a lot more than it. A process is an 'active' entity as opposed to program which is considered to be a 'passive' entity. Attributes held by process include hardware state, memory, CPU etc.
Process memory is divided into four sections for efficient working :
The Text section is made up of the compiled program code, read in from non-volatile storage when the program is launched.
The Data section is made up the global and static variables, allocated and initialized prior to executing the main.
The Heap is used for the dynamic memory allocation, and is managed via calls to new, delete, mallow, free, etc.
The Stack is used for local variables. Space on the stack is reserved for local variables when they are declared.
Category of process:
1.Independent/isolated/competing.
2.Dependent/co-operating/concurrent.
1.Independetn:Execution of one process does not effect the execution's of other process that means there is nothing common for sharing.
2.Dependent:in it process can share some deliver buffer variable ,resources,(cpu,printer).
it process can share any thing, then execution of one process can effect other.
->execution of one process can effect or get affected by the execution of process.