Mutually Exclusive Bitbake Recipes/Tasks - yocto

I have several recipes who's do_compile task uses a lot of memory (lots of C++ templates). If I build the recipes at the same time, they exhaust the host machine of memory and the out-of-memory killer starts killing.
I've modified BB_NUMBER_PARSE_THREADS, BB_NUMBER_THREADS, PARALLEL_MAKE, and PARALLEL_MAKEINST numerous times, but it is not feasible to pick numbers for these variables that'll work well in all situations. For example, if I set BB_NUMBER_THREADS to 1 in order to get only one of these recipes to build at a time, I end up increasing the build time a lot when there are no changes (everything can be pulled from the cache). I don't feel like those are the right solution to my problem.
Is there any way to tell bitbake to only build one of these recipe's do_compile tasks at a time, but let other recipe's tasks build normally?

It isn't quite the answer you're looking for but you could have something like:
do_compile[lockfiles] = "${WORKDIR}/mylock"
which would require the task to take and hold the lock to execute, then you can be sure only one would run at a time.

Related

Should mill tasks with non-file products produce something analogous to PathRef?

I'm using mill to build a pipeline that
cleans up a bunch of CSV files (producing new files)
loads them into a database
does more work in the database (create views, etc)
runs queries to extract some files.
Should the tasks associated with steps 2 and 3 be producing something analogous to PathRef? If so, what? They aren't producing a file on the disk but nevertheless should not be repeated unless the inputs change. Similarly, tasks associated with step 3 should run if tasks in step 2 are run again.
I see in the documentation for targets that you can return a case class and that re-evaluation depends on the .hashCode of the target's return value. But I'm not sure what to do with that information.
And a related question: Does mill hash the code in each task? It seems to be doing the right thing if I change the code for one task but not others.
A (cached) task in mill is re-run, when the build file (build.sc or it's dependencies/includes) or inputs/dependencies of that task change. Whenever you construct a PathRef, a checksum of the path content is calculated and used as hashCode. This makes it possible to detect changes and only act if anything has changed.
Of course there are exceptions, e.g. input tasks (created with T.input or T.sources) and commands (created with T.command) will always run.
It is in general a good idea to return something from a task. A simple String or Int will do, e.g. to show it in the shell with mill show myTask or post-process it later. Although I think a task running something in an external database should be implemented as a command or input task (which might check, when running, if it really needs something to do), you can also implement it as cached task. But please keep in mind, that the state will only be correct if no other process/user is changing the database in between.
That aside, You could return a current database schema/data version or a last change date in step 2. Make sure, it changes whenever the database was modified. Each change of that return value would trigger step 3 and other dependent tasks. (Of course, step 3 needs to depend on step 2.) As a bonus, you could return the same (previous/old) value in case you did not change the database, which would then avoid any later work in dependent tasks.

Architecture/Optimization of a scheduling problem on OptaPlanner

I'd like to generate a planning with OptaPlanner, with the following problem :
"Task" is the planning entity
It has a fixed duration
It has an employee (planning variable) which must match required skills
It has a workstation (planning variable) which must match required attributes
It has dependencies: some tasks must start after some others
Some tasks have a deadline
At first, I tried with chained/shadow variables, basing on the OptaPlanner Task Assignment example. On my first attempt, I kept the employee as an anchor, and let the workstation management to the solver:
I quickly saw that there was a problem with this approach. Here, Task A and Task B (and all others) have an influence the one on the other while they are not on the same chain and also the previous element of the chain has not enough information to determine the start time of the task. Also, the worst thing is that workstations changes not being tracked in this model, the solutions are just non-sense as workstations can be used several times.
To fix the problem of workstation tracking, I added workstations to the anchor and made all employee/workstation combinations
This way I know the employee and workstation on each chain. However this does not solve the problem of start time of tasks on a chain being dependent on tasks on other chains (e.g, Task A and Task D share the same workstation) and therefore Task insertions/removals shall have repercussion on other chains, which is not the spirit.
I ended giving up the idea of chained variables as it seems that their usage does not fit my problem.
So I modified my classes and the start time of tasks is now resolved with all intelligence left to OptaPlanner's solver, driven by pure drools rules and a ValueRangeProvider.
When I have only the following rules :
No employee recovery (Hard)
No workstation recovery (Hard)
Skills and attributes requirements (Hard)
Tasks with deadlines ending before deadline (Soft, could be Medium)
Tasks ending as soon as possible (sum of squared end times, Soft)
I can get quite fast a solution that seems to be the best.
However, when I add dependencies between tasks (with a hard rule going down task dependencies to see if a dependency does not end after the task starts), the complexity seems to dramatically increase so for a few dozens task, with only 2 operators, 3 workstations, a satisfying solution (without an unexpected holes between tasks) can take an hour to come, with the following parameters:
Experiment length: 10,000 min
Granularity: 1 min
I also have a TaskDifficultyComparator to help the solver place the hardest tasks before
This is very long for a few tasks and it will be far worse when inserting a notion of availability of user, I even suspect it never converges as task end time will "jump" depending on availabilities.
So my questions are:
Are there solutions leveraging chained variables that would suit my problem?
Are their any optimizations on the solver/rules/something else that could grant me a precious speed-up?

Matlab code taking a long time to run

I have a Matlab code (from a journal paper) and I'm trying to re-simulate their data.
I executed the code one week ago. I think the code is taking so long time to run. Matlab is still busy and taking 50% of my cpu.
I was wondering if the process has ended with some errors somewhere in the code. My question is:
When I see no errors, can I be sure that everything is fine with this running process? And I can wait until it is finished?
Is there any way to check which part of code is being run now ( without stopping the execution)?
Or I should stop the program and try something else?
Actually I don't want to loose this 1 week and if you think everything is fine, I would wait until the code stops.
(The authors of the paper didn't reply to my question and I don't know how long should it naturally take... They just mentioned it may take a long time to simulate the data).
Unfortunately, there is little we can do for you.
When I see no errors, can I be sure that everything is fine with this running process?
That's pretty much the definition of an error. If no error is raised, then it means that the program is still running.
Is there any way to check which part of code is being run now (without stopping the execution)?
Unfortunately no. For long-lasting execution times like that, a good developing practice is to display some information from time to time to inform the end user of the execution status.
However, if the programs produces files all along the way (like for instance at every step in an iterative simulation) you can check on your computer that the files are well-produced, and the production rate will more or less inform you on the total execution time.
For all your other questions, well, it's up to you to decide what to do (stop it or let it run). Be aware that the execution time can differ significantly from one machine to another, so the time it took on the author's machine may not be really informative to you.
In the future, I would advise you to react faster than within a week. When you launch a code that has a long execution time and see that there is no display within the first hour, you should stop it, modify it such that it regulatly displays information, and re-run it. It's better to loose one hour than one week.
Best,

CC.Net Build Server -- how to find the NUnit test that does not complete at all?

I have a test suite of ~ 1500 tests and they generally run and finish within 'reasonable time'.
Recently, however, I've changed parts of the code to use threads -- and now my builds fail from time to time by simply timing out. I imagine that a thread refuses to die and the build waits until reaching the maximum build time.
My problem is how to detect which test is causing the problem?
Can I activate some logging that shows me that a test has started/finished? I can of course be done by inserting code in every single test method - or just the fixtures, but that is A LOT of work that I'd rather avoid.
I'd suggest upgrading to NUnit 2.5 and decorating your tests with Timeout attribute, specifying maximum per-test run time. For example, you can put this in your AssemblyInfo.cs:
[assembly: NUnit.Framework.Timeout(100)]
NUnit will now launch each test in a separate thread and cancell it if it exceeds its time slot. However, this might be costly, so it's probably better to identify long-running tests and then remove assembly-level attribute in favor of test-fixture time slots. You can also override this on individual tests, assigning them more time to run.
This way you move the timeout/hang detection from CruiseControl.Net to NUnit and get information inside the report on the tasks that did not complete properly. Unfortunately there's no way for CC.Net to get this information when it has to kill the process because of timeout.

Explain the steps for db2-cobol's execution process if both are db2 -cobol programs

How to run two sub programs from a main program if both are db2-cobol programs?
My main program named 'Mainpgm1', which is calling my subprograms named 'subpgm1' and 'subpgm2' which are a called programs and I preferred static call only.
Actually, I am now using a statement called package instead of a plan and one member, both in 'db2bind'(bind program) along with one dbrmlib which is having a dsn name.
The main problem is that What are the changes affected in 'db2bind' while I am binding both the db2-cobol programs.
Similarly, in the 'DB2RUN'(run program) too.
Each program (or subprogram) that contains SQL needs to be pre-processed to create a DBRM. The DBRM is then bound into
a PLAN that is accessed by a LOAD module at run time to obtain the correct DB/2 access
paths for the SQL statements it contains.
You have gone from having all of your SQL in one program to several sub-programs. The basic process
remains the same - you need a PLAN to run the program.
DBA's often suggest that if you have several sub-programs containing SQL that you
create PACKAGES for them and then bind the PACKAGES into a PLAN. What was once a one
step process is now two:
Bind DBRM into a PACKAGE
Bind PACKAGES into a PLAN
What is the big deal with PACKAGES?
Suppose you have 50 sub-programs containing SQL. If you create
a DBRM for each of them and then bind all 50 into a PLAN, as a single operation, it is going
to take a lot of resources to build the PLAN because every SQL statement in every program needs
to be analyzed and access paths created for them. This isn't so bad when all 50 sub-programs are new
or have been changed. However, if you have a relatively stable system and want to change 1 sub-program you
end up reBINDing all 50 DBRMS to create the PLAN - even though 49 of the 50 have not changed and
will end up using exactly the same access paths. This isn't a very good apporach. It is analagous to compiling
all 50 sub-programs every time you make a change to any one of them.
However, if you create a PACKAGE for each sub-program, the PACKAGE is what takes the real work to build.
It contains all the access paths for its associated DBRM. Now if you change just 1 sub-program you
only have to rebuild its PACKAGE by rebinding a single DBRM into the PACKAGE collection and then reBIND the PLAN.
However, binding a set of PACKAGES (collection) into a PLAN
is a whole lot less resource intensive than binding all the DBRM's in the system.
Once you have a PLAN containing all of the access paths used in your program, just use it. It doesn't matter
if the SQL being executed is from subprogram1 or subprogram2. As long as you have associated the PLAN
to the LOAD that is being run it should all work out.
Every installation has its own naming conventions and standards for setting up PACKAGES, COLLECTIONS and
PLANS. You should review these with your Data Base Administrator before going much further.
Here is some background information concerning program preperation in a DB/2 environment:
Developing your Application