Speed-critical section in gcc? - compiler-optimization

I'm already using -O3, so I don't think I can do much better in general, but the generated assembly is still a tight fit for the timing requirements. Is there a way to, for example, tell avr-gcc to keep non-speed-critical code out of a speed-critical section?
if (READY)
{
ACKNOWLEDGE();
SETUP_PART1();
SETUP_PART2();
//start speed-critical section
SYNC_TO_TIMER();
MINIMAL_SPEED_CRITICAL_CODE();
//end speed-critical section
CLEANUP();
}
A tedious reading of the -O3-optimized assembly listing (.lss file) shows that SETUP_PART2(); has been reordered to come between SYNC_TO_TIMER(); and MINIMAL_SPEED_CRITICAL_CODE();. That's a problem because it adds time spent in the speed-critical section that doesn't need to exist.
I don't want to make the critical section itself any longer than it needs to be, so I'm hesitant to relax the optimization. (the same action that causes this problem may also be what makes the final version fit inside the time available) So how can I tell it that "this behavior" needs to stay pure and uncluttered, but still optimize it fully?
I've used inline assembly before, so I'm sure I could figure out again how to communicate between that and C without completely tying up or clobbering registers, etc. But I'd much rather stick with 100% C if I can, and let the compiler figure out how not to run over itself.

Not quite the direct answer that I as looking for, but I found something that works. So it's technically an XY problem, but I'll still take a direct answer to the question as stated.
Here's what I ended up doing:
1. Separate the speed-critical section into its own function, and prevent it from inlining:
void __attribute__((noinline)) _fast_function(uint8_t arg1, uint8_t arg2)
{
//start speed-critical section
SYNC_TO_TIMER();
MINIMAL_SPEED_CRITICAL_CODE(arg1, arg2);
//end speed-critical section
}
void normal_function(void)
{
if (READY)
{
ACKNOWLEDGE();
SETUP_PART1();
SETUP_PART2(); //determines arg1, arg2
_fast_function(arg1, arg2);
CLEANUP();
}
}
That keeps the speed-critical stuff together, and by itself. It also adds some overhead, but that happens OUTSIDE the fast part, where I can afford it.
2. Manually unroll some loops:
The original code was something like this:
uint8_t data[3];
uint8_t byte_num = 3;
while(byte_num)
{
byte_num--;
uint8_t bit_mask = 0b10000000;
while(bit_mask)
{
if(data[byte_num] & bit_mask) WEIRD_OUTPUT_1();
else WEIRD_OUTPUT_0();
bit_mask >>= 1;
}
}
The new code is like this:
#define WEIRD_OUTPUT(byte,bit) \
do \
{ \
if((byte) & (bit)) WEIRD_OUTPUT_1(); \
else WEIRD_OUTPUT_0(); \
} while(0)
uint8_t data[3];
WEIRD_OUTPUT(data[2],0b10000000);
WEIRD_OUTPUT(data[2],0b01000000);
WEIRD_OUTPUT(data[2],0b00100000);
WEIRD_OUTPUT(data[2],0b00010000);
WEIRD_OUTPUT(data[2],0b00001000);
WEIRD_OUTPUT(data[2],0b00000100);
WEIRD_OUTPUT(data[2],0b00000010);
WEIRD_OUTPUT(data[2],0b00000001);
WEIRD_OUTPUT(data[1],0b10000000);
WEIRD_OUTPUT(data[1],0b01000000);
WEIRD_OUTPUT(data[1],0b00100000);
WEIRD_OUTPUT(data[1],0b00010000);
WEIRD_OUTPUT(data[1],0b00001000);
WEIRD_OUTPUT(data[1],0b00000100);
WEIRD_OUTPUT(data[1],0b00000010);
WEIRD_OUTPUT(data[1],0b00000001);
WEIRD_OUTPUT(data[0],0b10000000);
WEIRD_OUTPUT(data[0],0b01000000);
WEIRD_OUTPUT(data[0],0b00100000);
WEIRD_OUTPUT(data[0],0b00010000);
WEIRD_OUTPUT(data[0],0b00001000);
WEIRD_OUTPUT(data[0],0b00000100);
WEIRD_OUTPUT(data[0],0b00000010);
WEIRD_OUTPUT(data[0],0b00000001);
In addition to dropping the loop code itself, this also replaces an array index with a direct access, and a variable with a constant, each of which offers its own speedup.
And it still barely fits in the time available. But it does fit! :)

You can try to use a memory barrier to prevent ordering code across it:
__asm volatile ("" ::: "memory");
The compiler will not move volatile accesses or memory accesses across such a barrier. It might however move other stuff across it like arithmetic that does not involve memory access.
If the compiler is reordering code, that it's a valid compilation w.r.t. the C language standard, which means that you'll have some hard time avoiding it by pure C code hacks.
If you are after speed, then consider writing WEIRD_OUTPUT(data[2],0b10000000); etc. as inline asm, or writing the whole function in assembly. This gives you definite control over the timing (not considering IRQs), whereas the C language standard does not make any statements on program timing whatsoever.

Related

What does assignment mean to a C11 atomic?

For example,
atomic_int test(void)
{
atomic_int tmp = ATOMIC_VAR_INIT(14);
tmp = 47; // Looks like atomic_store
atomic_int mc; // Probably just uninitialised data
memcpy(&mc,&tmp,sizeof(mc)); // Probably equivalent to a copy
tmp = mc + 4; // Arithmetic
return tmp; // A copy - perhaps load then store
}
Clang is happy with all this. I've read section 7.17 of the standard, and it says a lot about the memory model and the defined functions (init, store, load etc) but doesn't say anything about the usual operations (+, = etc).
Also of interest is the behaviour of passing struct wot { atomic_int value; } to functions.
I would like to believe that assignment behaves identically to an atomic load then store using memory_order_seq_cst.
Even more optimistically, I would like to believe that struct assignment, passing to function, returning from function and even memcpy also behaves identically to carefully copying the bit pattern across under memory_order_seq_cst.
I can't find any supporting evidence for either belief in the standard though. There's definitely a chance that assignment and memcpy of atomic primitives is undefined behaviour.
How should primitive operations on atomic primitives behave?
Thanks!
Operations on objects that are _Atomic qualified (and atomic_int is just a different writing for that) are guaranteed to have sequential consistency. You find that mentionned at the end of the semantics section for each of the operands. (And maybe the mention for assignment is missing.)
Your code is not correct at two places: initialization must use the ATOMIC_VAR_INIT macro (7.17.2.1), and memcpy is undefined (the sizes might not agree), although it probably will work on most of the architectures.
Also the line
tmp = mc + 4; // Arithmetic
doesn't do what your comment claims. This is not arithmetic on an atomic object, but a load followed by an ordinary addition. More interesting would be
mc += 4; // Arithmetic
which is an atomic operation with sequential consistency.

Swift: macro for __attribute__((section))

This is kind of a weird and un-Swift-thonic question, so bear with me.
I want to do in Swift something like the same thing I'm currently doing in Objective-C/C++, so I'll start by describing that.
I have some existing C++ code that defines a macro that, when used in an expression anywhere in the code, will insert an entry into a table in the binary at compile time. In other words, the user writes something like this:
#include "magic.h"
void foo(bool b) {
if (b) {
printf("%d\n", MAGIC(xyzzy));
}
}
and thanks to the definition
#define MAGIC(Name) \
[]{ static int __attribute__((used, section("DATA,magical"))) Name; return Name; }()
what actually happens at compile time is that a static variable named xyzzy (modulo name-mangling) is created and allocated into the special magical section of my Mach-O binary, so that running nm -m foo.o to dump the symbols shows something a lot like this:
0000000000000098 (__TEXT,__eh_frame) non-external EH_frame0
0000000000000050 (__TEXT,__cstring) non-external L_.str
0000000000000000 (__TEXT,__text) external __Z3foob
00000000000000b0 (__TEXT,__eh_frame) external __Z3foob.eh
0000000000000040 (__TEXT,__text) non-external __ZZ3foobENK3$_0clEv
00000000000000d8 (__TEXT,__eh_frame) non-external __ZZ3foobENK3$_0clEv.eh
0000000000000054 (__DATA,magical) non-external [no dead strip] __ZZZ3foobENK3$_0clEvE5xyzzy
(undefined) external _printf
Through the magic of getsectbynamefromheader(), I can then load the symbol table for the magical section, scan through it, and find out (by demangling every symbol I find) that at some point in the user's code, he calls MAGIC(xyzzy). Eureka!
I can replicate the whole second half of that workflow just fine in Swift — starting with the getsectbynamefromheader() part. However, the first part has me stumped.
Swift has no preprocessor, so spelling the magic as elegantly as MAGIC(someidentifier) is impossible. I don't want it to be too ugly, though.
As far as I know, Swift has no way to insert symbols into a given section — no equivalent of __attribute__((section)). This is okay, though, since nothing in my plan requires a dedicated section; that part just makes the second half easier.
As far as I know, the only way to get a symbol into the symbol table in Swift is via a local struct definition. Something like this:
func foo(b: Bool) -> Void {
struct Local { static var xyzzy = 0; };
println(Local.xyzzy);
}
That works, but it's a bit of extra typing, and can't be done inline in an expression (not that that'll matter if we can't make a MAGIC macro in Swift anyway), and I'm worried that the Swift compiler might optimize it away.
So, there are three questions here, all about how to make Swift do things that Swift doesn't want to do: Macros, attributes, and creating symbols that are resistant to compiler optimization.
I'm aware of #asmname but I don't think it helps me since I can already deal with demangling on my own.
I'm aware that Swift has "generics", but they seem to be closer to Java generics than to C++ templates; I don't think they can be used as a substitute for macros in this particular case.
I'm aware that the code for the Swift compiler is now open-source; I've skimmed bits of it in vain; but I can't read through all of it looking for tricks that might not even be there.
Here is the answer to your question about preprocessor (and macros).
Swift has no preprocessor, so spelling the magic as elegantly as MAGIC(someidentifier) is impossible. I don't want it to be too ugly, though.
Swift project has a preprocessor (but, AFAIK, it is not distributed with Swift's binary).
From swift-users mailing list:
What are .swift.gyb files?
It’s a preprocessor the Swift
team wrote so that when they needed to build, say, ten nearly-identical
variants of Int, they wouldn’t have to literally copy and paste the same
code ten times. If you open one of those files, you’ll see that they’re
mainly Swift code, but with some lines of code intermixed that are written in Python.
It is not as beautiful as C macros, but, IMHO, is more powerful.
You can see the available commands with ./swift/utils/gyb --help command after cloning the Swift's git repo.
$ swift/utils/gyb --help
usage, etc (TL;DR)...
Example template:
- Hello -
%{
x = 42
def succ(a):
return a+1
}%
I can assure you that ${x} < ${succ(x)}
% if int(y) > 7:
% for i in range(3):
y is greater than seven!
% end
% else:
y is less than or equal to seven
% end
- The End. -
When run with "gyb -Dy=9", the output is
- Hello -
I can assure you that 42 < 43
y is greater than seven!
y is greater than seven!
y is greater than seven!
- The End. -
My example of GYB usage is available on GitHub.Gist.
For more complex examples look for *.swift.gyb files in #apple/swift/stdlib/public/core.

Breaking when a method returns null in the Eclipse debugger

I'm working on an expression evaluator. There is an evaluate() function which is called many times depending on the complexity of the expression processed.
I need to break and investigate when this method returns null. There are many paths and return statements.
It is possible to break on exit method event but I can't find how to put a condition about the value returned.
I got stuck in that frustration too. One can inspect (and write conditions) on named variables, but not on something unnamed like a return value. Here are some ideas (for whoever might be interested):
One could include something like evaluate() == null in the breakpoint's condition. Tests performed (Eclipse 4.4) show that in such a case, the function will be performed again for the breakpoint purposes, but this time with the breakpoint disabled. So you will avoid a stack overflow situation, at least. Whether this would be useful, depends on the nature of the function under consideration - will it return the same value at breakpoint time as at run time? (Some s[a|i]mple code to test:)
class TestBreakpoint {
int counter = 0;
boolean eval() { /* <== breakpoint here, [x]on exit, [x]condition: eval()==false */
System.out.println("Iteration " + ++counter);
return true;
}
public static void main(String[] args) {
TestBreakpoint app = new TestBreakpoint();
System.out.println("STARTED");
app.eval();
System.out.println("STOPPED");
}
}
// RESULTS:
// Normal run: shows 1 iteration of eval()
// Debug run: shows 2 iterations of eval(), no stack overflow, no stop on breakpoint
Another way to make it easier (to potentially do debugging in future) would be to have coding conventions (or personal coding style) that require one to declare a local variable that is set inside the function, and returned only once at the end. E.g.:
public MyType evaluate() {
MyType result = null;
if (conditionA) result = new MyType('A');
else if (conditionB) result = new MyType ('B');
return result;
}
Then you can at least do an exit breakpoint with a condition like result == null. However, I agree that this is unnecessarily verbose for simple functions, is a bit contrary to flow that the language allows, and can only be enforced manually. (Personally, I do use this convention sometimes for more complex functions (the name result 'reserved' just for this use), where it may make things clearer, but not for simple functions. But it's difficult to draw the line; just this morning had to step through a simple function to see which of 3 possible cases was the one fired. For today's complex systems, one wants to avoid stepping.)
Barring the above, you would need to modify your code on a case by case basis as in the previous point for the single function to assign your return value to some variable, which you can test. If some work policy disallows you to make such non-functional changes, one is quite stuck... It is of course also possible that such a rewrite could result in a bug inadvertently being resolved, if the original code was a bit convoluted, so beware of reverting to the original after debugging, only to find that the bug is now back.
You didn't say what language you were working in. If it's Java or C++ you can set a condition on a Method (or Function) breakpoint using the breakpoint properties. Here are images showing both cases.
In the Java example you would unclik Entry and put a check in Exit.
Java Method Breakpoint Properties Dialog
!
C++ Function Breakpoint Properties Dialog
This is not yet supported by the Eclipse debugger and added as an enhancement request. I'd appreciate if you vote for it.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=425744

Best way to add a "forCount" control structure to Objective-C?

Adam Ko has provided a magnificent solution to this question, thanks Adam Ko.
BTW if, like me, you love the c preprocessor (the thing that handles #defines), you may not be aware there is a handy thing in XCode: right click on the body of one of your open source files, go down near the bottom .. "Preprocess". It actually runs the preprocessor, showing you the overall "real deal" of what is going to be compiled. It's great!
This question is a matter of style and code clarity. Consider it similar to questions about subtle naming issues, or the best choice (more readable, more maintainable) among available idioms.
As a matter of course, one uses loops like this:
for(NSUInteger _i=0; _i<20; ++_i)
{
.. do this 20 times ..
}
To be clear, the effect is to to do something N times. (You are not using the index in the body.)
I want to signal clearly for the reader that this is a count-based loop -- ie, the index is irrelevant and algorithmically we are doing something N times.
Hence I want a clean way to do a body N times, with no imperial entanglements or romantic commitments. You could make a macro like this:
#define forCount(N) for(NSUinteger __neverused=0; __neverused<N; ++__neverused)
and that works. Hence,
forCount(20)
{
.. do this 20 times ..
}
However, conceivably the "hidden" variable used there could cause trouble if it collided with something in the future. (Perhaps if you nested the control structure in question, among other problems.)
To be clear efficiency, etc., is not the issue here. There are already a few different control structures (while, do, etc etc) that are actually of course exactly the same thing, but which exist only as a matter of style and to indicate clearly to the reader the intended algorithmic meaning of the code passage in question. "forCount" is another such needed control structure, because "index-irrelevant" count loops are completely basic in any algorithmic programming.
Does anyone know the really, really, REALLY cool solution to this? The #define mentioned is just not satisfying, and you've thrown in a variable name that inevitably someone will step on.
Thanks!
Later...
A couple of people have asked essentially "But why do it?"
Look at the following two code examples:
for ( spaceship = 3; spaceship < 8; ++spaceship )
{
beginWarpEffectForShip( spaceship )
}
forCount( 25 )
{
addARandomComet
}
Of course the effect is utterly and dramatically different for the reader.
After all, there are alresdy numerous (totally identical) control structures in c, where the only difference is style: that is to say, conveying content to the reader.
We all use "non-index-relative" loops ("do something 5 times") every time we touch a keyboard, it's as natural as pie.
So, the #define is an OKish solution, is there a better way to do it? Cheers
You could use blocks for that. For instance,
void forCount(NSUInteger count, void(^block)()) {
for (NSUInteger i = 0; i < count; i++) block();
}
and it could be used like:
forCount(5, ^{
// Do something in the outer loop
forCount(10, ^{
// Do something in the inner loop
});
});
Be warned that if you need to write to variables declared outside the blocks you need to specify the __block storage qualifier.
A better way is to do this to allow nested forCount structure -
#define $_TOKENPASTE(x,y) x##y
#define $$TOKENPASTE(x,y) $_TOKENPASTE(x, y)
#define $itr $$TOKENPASTE($_itr_,__LINE__)
#define forCount(N) for (NSUInteger $itr=0; $itr<N; ++$itr)
Then you can use it like this
forCount(5)
{
forCount(10)
{
printf("Hello, World!\n");
}
}
Edit:
The problem you suggested in your comment can be fixed easily. Simply change the above macro to become
#define $_TOKENPASTE(x,y) x##y
#define $$TOKENPASTE(x,y) $_TOKENPASTE(x, y)
#define UVAR(var) $$TOKENPASTE(var,__LINE__)
#define forCount(N) for (NSUInteger UVAR($itr)=0, UVAR($max)=(NSUInteger)(N); \
UVAR($itr)<UVAR($max); ++UVAR($itr))
What it does is that it reads the value of the expression you give in the parameter of forCount, and use the value to iterate, that way you avoid multiple evaluations.
On possibility would be to use dispatch_apply():
dispatch_apply(25, myQueue, ^(size_t iterationNumber) {
... do stuff ...
});
Note that this supports both concurrent and synchronous execution, depending on whether myQueue is one of the concurrent queues or a serial queue of your own creation.
To be honest, I think you're over addressing a non-issue.
If want to iterate over an entire collection use the Objective-C 2 style iterators, if you only want to iterate a finite number of times just use a standard for loop - the memory space you loose from an otherwise un-used integer is meaningless.
Wrapping such standard approaches up just feels un-necessary and counter-intuitive.
No, there is no cooler solution (not with Apple's GCC version anyways). The level at which C works requires you to explicitly have counters for every task that require counting, and the language defines no way to create new control structures.
Other compilers/versions of GCC have a __COUNTER__ macro that I suppose could somehow be used with preprocessor pasting to create unique identifiers, but I couldn't figure a way to use it to declare identifiers in a useful way.
What's so unclean about declaring a variable in the for and never using it in its body anyways?
FYI You could combine the below code with a define, or write something for the reader to the effect of:
//Assign an integer variable to 0.
int j = 0;
do{
//do something as many times as specified in the while part
}while(++j < 20);
Why not take the name of the variable in the macro? Something like this:
#define forCount(N, name) for(NSUInteger name; name < N; name++)
Then if you wanted to nest your control structures:
forCount(20, i) {
// Do some work.
forCount(100, j) {
// Do more work.
}
}

What is the difference between forward declaration and forward reference?

What is the difference between forward declaration and forward reference?
Forward declaration is, in my head, when you declare a function that isn't yet implemented, but is this incorrect? Do you have to look at the specified situation for either declaring a case "forward reference" or "forward declaration"?
A forward declaration is the declaration of a method or variable before you implement and use it. The purpose of forward declarations is to save compilation time.
The forward declaration of a variable causes storage space to be set aside, so you can later set the value of that variable.
The forward declaration of a function is also called a "function prototype," and is a declaration statement that tells the compiler what a function’s return type is, what the name of the function is, and the types its parameters. Compilers in languages such as C/C++ and Pascal store declared symbols (which include functions) in a lookup table and references them as it comes across them in your code. These compilers read your code sequentially, that is, top to bottom, so if you don't forward declare, the compiler discovers a symbol that it can't reference in the lookup table, and it raises an error that it doesn't know how to respond to the function.
The forward declaration is a hint to the compiler that you have defined (filled out the implementation of) the function elsewhere.
For example:
int first(int x); // forward declaration of first
...
int first(int x) {
if (x == 0) return 1;
else return 2;
}
But, you ask, why don't we just have the compiler make two passes on every source file: the first one to index all the symbols inside, and the second to parse the references and look them up? According to Dan Story:
When C was created in 1972, computing resources were much more scarce
and at a high premium -- the memory required to store a complex
program's entire symbolic table at once simply wasn't available in
most systems. Fixed storage was also expensive, and extremely slow, so
ideas like virtual memory or storing parts of the symbolic table on
disk simply wouldn't have allowed compilation in a reasonable
timeframe... When you're dealing with magnetic tape where seek times
were measured in seconds and read throughput was measured in bytes per
second (not kilobytes or megabytes), that was pretty meaningful.
C++, while created almost 17 years later, was defined as a superset
of C, and therefore had to use the same mechanism.
By the time Java rolled around in 1995, average computers had enough
memory that holding a symbolic table, even for a complex project, was
no longer a substantial burden. And Java wasn't designed to be
backwards-compatible with C, so it had no need to adopt a legacy
mechanism. C# was similarly unencumbered.
As a result, their designers chose to shift the burden of
compartmentalizing symbolic declaration back off the programmer and
put it on the computer again, since its cost in proportion to the
total effort of compilation was minimal.
In Java and C#, identifiers are recognized automatically from source files and read directly from dynamic library symbols. In these languages, header files are not needed for the same reason.
A forward reference is the opposite. It refers to the use of an entity before its declaration. For example:
int first(int x) {
if (x == 0) return 1;
return second(x-1); // forward reference to second
}
int second(int x) {
if (x == 0) return 0;
return first(x-1);
}
Note that "forward reference" is used sometimes, though less often, as a synonym for "forward declaration."
From Wikipedia:
Forward Declaration
Declaration of a variable or function which are not defined yet. Their defnition can be seen later on.
Forward Reference
Similar to Forward Declaration but where the variable or function appears first the definition is also in place.
forward declarations are used to allow single-pass compilation of a language (C, Pascal).
if forward references are allowed without forward declaration (Java, C#), a two-pass compiler is required.