Is it possible to prevent children inheriting the CPU/core affinity of the parent? - multicore

I'm particularly interesting in doing this on Linux, regarding Java programs. There are already a few questions that say you have no control from Java, and some RFEs closed by Sun/Oracle.
If you have access to source code and use a low-level language, you can certainly make the relevant system calls. However, sand-boxed systems - possibly without source code - present more of a challenge. I would have thought that a tool to set this per-process or an kernel parameter are able to control this from outside the parent process. This is really what I'm after.
I understand the reason why this is the default. It looks like some version of Windows may allow some control of this, but most do not. I was expecting Linux to allow control of it, but seems like it's not an option.

Provided you have sufficient privileges, you could simply call setaffinity before execing in the child. In other words, from
if (fork() == 0)
execve("prog", "prog", ...);
move to use
/* simple example using taskset rather than setaffinity directly */
if (fork() == 0)
execve("taskset", "taskset", "-c", "0-999999", ...);
[Of course using 999999 is not nice, but that can be substituted by a program which automatically determined the number of cpus and resets the affinity mask as desired.]

What you could also do, is change the affinity of the child from the parent, after the fork(). By the way, I'm assuming you're on linux, some of this stuff, such as retrieving the number of cores with sysconf() will be different on different OS's and unix flavors.... The example here, gets the cpu of the parent process and tries to ensure all child processes are scheduled on a different core, in round robin.
/* get the number of cpu's */
numcpu = sysconf( _SC_NPROCESSORS_ONLN );
/* get our CPU */
CPU_ZERO(&mycpuset);
sched_getaffinity( getpid() , sizeof mycpuset , &mycpuset);
for(i=0 ; i < numcpu ; i++ )
{
if(CPU_ISSET( i, &mycpuset))
{
mycpu = i;
break;
}
}
//...
while(1)
{
//Some other stuff.....
/* now the fork */
if((pid = fork()) == 0)
{
//do your child stuff
}
/* Parent... can schedule child. */
else
{
cpu = ++cpu % numcpu;
if(cpu == mycpu)
cpu = ++cpu % numcpu;
CPU_ZERO(&mycpuset);
CPU_SET(cpu,&mycpuset);
/*set processor affinity*/
sched_setaffinity(pid, sizeof mycpuset, &mycpuset );
//any other father stuff
}
}

Related

How to monitor Gtk3 Event Loop latency

I would like to monitor Gtk3 event loop latency, i.e time spent for each iteration of Gtk main event loop. Basically, the idea is to run a custom function at each tick of the main event loop.
I tried g_idle_add, but documentation is not clear if the callback will be invoked at each loop.
Any thoughts ?
Probably writing a custom GSource is your best choice.
GSource *
g_source_new (GSourceFuncs *source_funcs,
guint struct_size);
The size is specified to allow creating structures derived from GSource that contain additional data
You should also give it the highest priority.
I'm not sure it will be dispatched at every single iteration, but it will be prepared on every iteration. To bring your source to life you obtain context with g_main_loop_get_context and call g_source_attach.
All in all it looks like this:
// struct MySource
// {
// struct GSource glib;
// int my_data;
// };
gboolean my_prepare (GSource *source,
gint *timeout_)
{
g_message ("%li", g_get_monotonic_time());
*timeout_ = 0;
(MySource*)source->my_data = 1;
return TRUE;
}
GSourceFuncs funcs = {.prepare = my_prepare};
GSource *src = g_source_new (&funcs, sizeof (MySource));
g_source_set_priority (src, G_PRIORITY_HIGH);
g_source_attach (src, g_main_loop_get_context());
This doesn't include any cleanup.

Fork() in XV6, does the process child execute in kernel or user mode?

In XV6, when a fork() is called, does the child execute in kernel mode or user mode?
This is the fork code in XV6:
// Create a new process copying p as the parent.
// Sets up stack to return as if from system call.
// Caller must set state of returned proc to RUNNABLE.
int fork(void){
int i, pid;
struct proc *np;
struct proc *curproc = myproc();
// Allocate process.
if((np = allocproc()) == 0){
return -1;
}
// Copy process state from proc.
if((np->pgdir = copyuvm(curproc->pgdir, curproc->sz)) == 0){
kfree(np->kstack);
np->kstack = 0;
np->state = UNUSED;
return -1;
}
np->sz = curproc->sz;
np->parent = curproc;
*np->tf = *curproc->tf;
// Clear %eax so that fork returns 0 in the child.
np->tf->eax = 0;
for(i = 0; i < NOFILE; i++)
if(curproc->ofile[i])
np->ofile[i] = filedup(curproc->ofile[i]);
np->cwd = idup(curproc->cwd);
safestrcpy(np->name, curproc->name, sizeof(curproc->name));
pid = np->pid;
acquire(&ptable.lock);
np->state = RUNNABLE;
release(&ptable.lock);
return pid;
}
I did some research but even from the code I can't understand how it works. Understanding how it works in UNIX would also help
It is almost the exact copy of the parent process except the value of eax register and parent process information, so it will execute whichever context the parent process is in.
The fork() function here creates a new process structure by calling allocproc() and fills it with the values of the original process and maps the same page tables.
Finally, it sets the process state to RUNNABLE which allows the scheduler to run the new process along with the parent.
That means actual running is performed by the scheduler, not the fork code here.
What Sedat has written entirely correct. The forked process, or the child would run in the same context which it's parent was, i.e. either Kernel or User.
In addition to that, I feel what confused you were the calls done by alloproc() like kalloc() and the attributes like kstack. These deal with setting up the new process in the system with regards to the page tables and the memory part.

Difference between sleep(1) and while(sleep(1))

I have the following piece of code while looking for sigchild code. In the code below 50 children are created and the parent process waits in sigchild handler until all 50 children are destroyed.
I get the expected result if I use while(sleep(1)) at the end of main, however if I replace it by sleep(1), the parent gets destoyed before all child processes terminate.
int l=0;
/* SIGCHLD handler. */
static void sigchld_hdl (int sig)
{
/* Wait for all dead processes.
* We use a non-blocking call to be sure this signal handler will not
* block if a child was cleaned up in another part of the program. */
while (waitpid(-1, NULL, WNOHANG) > 0) {
printf(" %d",l++);
}
printf("\nExiting from child :: %d\n",l);
}
int main (int argc, char *argv[])
{
struct sigaction act;
int i;
memset (&act, 0, sizeof(act));
act.sa_handler = sigchld_hdl;
if (sigaction(SIGCHLD, &act, 0)) {
perror ("sigaction");
return 1;
}
/* Make some children. */
for (i = 0; i < 50; i++) {
switch (fork()) {
case -1:
perror ("fork");
return 1;
case 0:
return 0;
}
}
/* Wait until we get a sleep() call that is not interrupted by a signal. */
while (sleep(1)) {
}
// sleep(1);
printf("\nterminating\n");
return 0;
}
I have the following piece of code while looking for sigchild code. In
the code below 50 children are created and the parent process waits in
sigchild handler until all 50 children are destroyed.
No, it does not. waitpid WNOHANG will fail if there is nobody exited. And there is no guarantee all the children exited (or will exit) during execution of the handler.
Even with mere sleep(1) there is no guarantee any child will manage to exit, but in practice most of them will.
sleeping is a fundamentally wrong approach here. Since you know how many children you created, you should wait for all of them to finish and that's it. For instance you can decrement a counter of existing children each time you reap something and wait for it to go to 0.
Depending on how the real program looks like, it may be you don't want the handler in the first place: just have the loop at the end, but without WNOHANG.
I also have to comment about this:
/* Wait for all dead processes.
* We use a non-blocking call to be sure this signal handler will not
* block if a child was cleaned up in another part of the program. */
You can't mix a signal handler and waiting on your own. You risk snatching the process from the other code waiting for it, what happens then?
It's a design error. fork/exit behaviour has to either be unified OR decentralized.
From the manual page
Return Value
Zero if the requested time has elapsed, or the number of seconds
left to sleep, if the call was interrupted by a signal handler.
So I guess without the while bit, the sleep is being interrupted, hence that process ending quickly

Tasks behaving incorrectly in round-robin schedule

I have FreeRTOS running on a STM32F4DISCOVERY board, and I have this code:
xTaskCreate( vTask1, "Task 1", 200, NULL, 1, NULL );
xTaskCreate( vTask2, "Task 2", 200, NULL, 1, NULL );
vTaskStartScheduler();
where vTask1 is this function:
void vTask1( void *pvParameters )
{
volatile unsigned long ul;
for( ;; )
{
LED_On(0);
for( ul = 0; ul < mainDELAY_LOOP_COUNT; ul++ )
{
}
LED_On(2);
LED_Off(0);
}
}
vTask2 has nearly the same code:
void vTask2( void *pvParameters )
{
const char *pcTaskName = "Task 2 is running\n";
volatile unsigned long ul;
for( ;; )
{
LED_On(3);
LED_Off(2);
for( ul = 0; ul < mainDELAY_LOOP_COUNT; ul++ )
{
}
LED_Off(3);
}
}
When I run the program, I see that LED0 and LED3 are always on (their switching is too fast for my eye, which is fine), and that LED2, the "shared resource", is blinking very fast.
The problem is this: when I reverse the order of the xTaskCreate calls, I get the same situation with a different blinking behavior of LED2, which is much slower.
Why would this happen, since the tasks should have equal priority and therefore follow a round-robin schedule? Shouldn't they get the same amount of time? Why is their behavior changing after only having created them in different order?
Thanks in advance.
The rtos does not try to round robin through the tasks and you should not expect them to execute in any specific order. Neither of the tasks you have created have a delay in them as iama pointed out in their comment. instead of creating a delay by burning through no-ops in a for loop, use the delay function. this will allow the code in your while(1) loop to execute then yield back to the rtos so that the processor can run other tasks until the wait time has elapsed. If you need to synchronize work you may want to just keep it in a single task. If you have one task that is dependent on something done in another, you may want to use semaphores, queues, or other cross thread communication method.
Your code reminds me of when I was transitioning from using a while(1) loop in main to an rtos. If your new to using an rtos, this guide from ST looks like it would be a good introduction https://www.st.com/resource/en/user_manual/dm00105262-developing-applications-on-stm32cube-with-rtos-stmicroelectronics.pdf
Also, do not rely on the delay function for fine grain timing; use a timer driven interrupt instead. The link above should give you a better understanding of why then I could manage in this post.
ST should also have example projects somewhere, which would be good to reference or use as a start for your own project.

Simplest way to process a list of items in a multi-threaded manner

I've got a piece of code that opens a data reader and for each record (which contains a url) downloads & processes that page.
What's the simplest way to make it multi-threaded so that, let's say, there are 10 slots which can be used to download and process pages in simultaneousy, and as slots become available next rows are being read etc.
I can't use WebClient.DownloadDataAsync
Here's what i have tried to do, but it hasn't worked (i.e. the "worker" is never ran):
using (IDataReader dr = q.ExecuteReader())
{
ThreadPool.SetMaxThreads(10, 10);
int workerThreads = 0;
int completionPortThreads = 0;
while (dr.Read())
{
do
{
ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
if (workerThreads == 0)
{
Thread.Sleep(100);
}
} while (workerThreads == 0);
Database.Log l = new Database.Log();
l.Load(dr);
ThreadPool.QueueUserWorkItem(delegate(object threadContext)
{
Database.Log log = threadContext as Database.Log;
Scraper scraper = new Scraper();
dc.Product p = scraper.GetProduct(log, log.Url, true);
ManualResetEvent done = new ManualResetEvent(false);
done.Set();
}, l);
}
}
You do not normally need to play with the Max threads (I believe it defaults to something like 25 per proc for worker, 1000 for IO). You might consider setting the Min threads to ensure you have a nice number always available.
You don't need to call GetAvailableThreads either. You can just start calling QueueUserWorkItem and let it do all the work. Can you repro your problem by simply calling QueueUserWorkItem?
You could also look into the Parallel Task Library, which has helper methods to make this kind of stuff more manageable and easier.