Is it possible to tell libfuzzer ignore certain code? - fuzzing

I use libfuzzer and it's been great experience so far.
My code under fuzz is full of branches like this:
bool fuzzingThisFunc() {
if(!checkSomething()) {
fmt::printf("error log");
return false;
}
...
return true;
}
Where fmt::printf is a function from a third party library (http://github.com/fmtlib/fmt).
I feel like after few iterations fuzzer enters this function and effectively starts fuzzing all branches inside it (like when it's using DFS instead of BFS).
I would like to add some barrier or instruction to a fuzzer to not insert instrumentation into third party libraries, so my fuzzer will try to cover only my code.
Is it possible?

Libfuzzer supports instrumentation on source file level. An option would be to build third party libraries without -fsanitize=fuzzer flag.
Check CFLAGS passed to configure of these libraries, to remove this flag.
Header-only libraries typically include templates, which is the case for fmt. They must be instantiated at compile time. I see no simple way to handle these. You can find all used template arguments, create thunk code that uses them with these arguments, exclude this code from instrumentation and modify your calling code to use these instantiated funcs, but this is very difficult.
When the code you want to be not instrumented does only logging or other activities that can be skipped without modifying the behaviour of your application, you can make it conditional for compiling. Libfuzzer docs suggests to use FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION define to mark code which you don't want to build for fuzzing. In fmt case this would be:
bool fuzzingThisFunc() {
if(!checkSomething()) {
#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
fmt::printf("error log");
#endif
return false;
}
...
return true;
}
or modifying library code:
template <typename S, typename... Args,
FMT_ENABLE_IF(detail::is_string<S>::value)>
inline int printf(const S& format_str, const Args&... args) {
#ifndef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
using context = basic_printf_context_t<char_t<S>>;
return vprintf(to_string_view(format_str),
make_format_args<context>(args...));
#else
return 0; //C printf returns number of characters written, I assume same for fmt.
#endif
}
The second case is easier to code (one modification per excluded func), but you have to add this modification everytime you get a new fmt version.
In the first case you have to modify every excluded func call site.
For both cases you should add -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION to CFLAGS of configure for fuzzing target build.

Related

Speed-critical section in gcc?

I'm already using -O3, so I don't think I can do much better in general, but the generated assembly is still a tight fit for the timing requirements. Is there a way to, for example, tell avr-gcc to keep non-speed-critical code out of a speed-critical section?
if (READY)
{
ACKNOWLEDGE();
SETUP_PART1();
SETUP_PART2();
//start speed-critical section
SYNC_TO_TIMER();
MINIMAL_SPEED_CRITICAL_CODE();
//end speed-critical section
CLEANUP();
}
A tedious reading of the -O3-optimized assembly listing (.lss file) shows that SETUP_PART2(); has been reordered to come between SYNC_TO_TIMER(); and MINIMAL_SPEED_CRITICAL_CODE();. That's a problem because it adds time spent in the speed-critical section that doesn't need to exist.
I don't want to make the critical section itself any longer than it needs to be, so I'm hesitant to relax the optimization. (the same action that causes this problem may also be what makes the final version fit inside the time available) So how can I tell it that "this behavior" needs to stay pure and uncluttered, but still optimize it fully?
I've used inline assembly before, so I'm sure I could figure out again how to communicate between that and C without completely tying up or clobbering registers, etc. But I'd much rather stick with 100% C if I can, and let the compiler figure out how not to run over itself.
Not quite the direct answer that I as looking for, but I found something that works. So it's technically an XY problem, but I'll still take a direct answer to the question as stated.
Here's what I ended up doing:
1. Separate the speed-critical section into its own function, and prevent it from inlining:
void __attribute__((noinline)) _fast_function(uint8_t arg1, uint8_t arg2)
{
//start speed-critical section
SYNC_TO_TIMER();
MINIMAL_SPEED_CRITICAL_CODE(arg1, arg2);
//end speed-critical section
}
void normal_function(void)
{
if (READY)
{
ACKNOWLEDGE();
SETUP_PART1();
SETUP_PART2(); //determines arg1, arg2
_fast_function(arg1, arg2);
CLEANUP();
}
}
That keeps the speed-critical stuff together, and by itself. It also adds some overhead, but that happens OUTSIDE the fast part, where I can afford it.
2. Manually unroll some loops:
The original code was something like this:
uint8_t data[3];
uint8_t byte_num = 3;
while(byte_num)
{
byte_num--;
uint8_t bit_mask = 0b10000000;
while(bit_mask)
{
if(data[byte_num] & bit_mask) WEIRD_OUTPUT_1();
else WEIRD_OUTPUT_0();
bit_mask >>= 1;
}
}
The new code is like this:
#define WEIRD_OUTPUT(byte,bit) \
do \
{ \
if((byte) & (bit)) WEIRD_OUTPUT_1(); \
else WEIRD_OUTPUT_0(); \
} while(0)
uint8_t data[3];
WEIRD_OUTPUT(data[2],0b10000000);
WEIRD_OUTPUT(data[2],0b01000000);
WEIRD_OUTPUT(data[2],0b00100000);
WEIRD_OUTPUT(data[2],0b00010000);
WEIRD_OUTPUT(data[2],0b00001000);
WEIRD_OUTPUT(data[2],0b00000100);
WEIRD_OUTPUT(data[2],0b00000010);
WEIRD_OUTPUT(data[2],0b00000001);
WEIRD_OUTPUT(data[1],0b10000000);
WEIRD_OUTPUT(data[1],0b01000000);
WEIRD_OUTPUT(data[1],0b00100000);
WEIRD_OUTPUT(data[1],0b00010000);
WEIRD_OUTPUT(data[1],0b00001000);
WEIRD_OUTPUT(data[1],0b00000100);
WEIRD_OUTPUT(data[1],0b00000010);
WEIRD_OUTPUT(data[1],0b00000001);
WEIRD_OUTPUT(data[0],0b10000000);
WEIRD_OUTPUT(data[0],0b01000000);
WEIRD_OUTPUT(data[0],0b00100000);
WEIRD_OUTPUT(data[0],0b00010000);
WEIRD_OUTPUT(data[0],0b00001000);
WEIRD_OUTPUT(data[0],0b00000100);
WEIRD_OUTPUT(data[0],0b00000010);
WEIRD_OUTPUT(data[0],0b00000001);
In addition to dropping the loop code itself, this also replaces an array index with a direct access, and a variable with a constant, each of which offers its own speedup.
And it still barely fits in the time available. But it does fit! :)
You can try to use a memory barrier to prevent ordering code across it:
__asm volatile ("" ::: "memory");
The compiler will not move volatile accesses or memory accesses across such a barrier. It might however move other stuff across it like arithmetic that does not involve memory access.
If the compiler is reordering code, that it's a valid compilation w.r.t. the C language standard, which means that you'll have some hard time avoiding it by pure C code hacks.
If you are after speed, then consider writing WEIRD_OUTPUT(data[2],0b10000000); etc. as inline asm, or writing the whole function in assembly. This gives you definite control over the timing (not considering IRQs), whereas the C language standard does not make any statements on program timing whatsoever.

Is there a way to declare an inline function in Swift?

I'm very new to the Swift language.
I wanted to declare an inline function just like in C++
so my func declaration looks like this:
func MyFunction(param: Int) -> Int {
...
}
and I want to do something like this:
inline func MyFunction(param: Int) -> Int {
...
}
I tried to search on the web but I didn't find anything relevant maybe there is no inline keyword but maybe there is another way to inline the function in Swift.
Swift 1.2 will include the #inline attribute, with never and __always as parameters. For more info, see here.
As stated before, you rarely need to declare a function explicitly as #inline(__always) because Swift is fairly smart as to when to inline a function. Not having a function inlined, however, can be necessary in some code.
Swift-5 added the #inlinable attribute, which helps ensuring that library/framework stuff are inlineable for those that link to your library. Make sure you read up about it, as there may be a couple of gotchas that might make it unusable. It's also only for functions/methods declared public, as it's meant for libraries wanting to expose inline stuff.
All credit to the answer, just summarizing the information from the link.
To make a function inline just add #inline(__always) before the function:
#inline(__always) func myFunction() {
}
However, it's worth considering and learning about the different possibilities. There are three possible ways to inline:
sometimes - will make sure to sometimes inline the function. This is the default behavior, you don't have to do anything! Swift compiler might automatically inline functions as an optimization.
always - will make sure to always inline the function. Achieve this behavior by adding #inline(__always) before the function. Use "if your function is rather small and you would prefer your app ran faster."
never - will make sure to never inline the function. This can be achieved by adding #inline(never) before the function. Use "if your function is quite long and you want to avoid increasing your code segment size."
I came across an issue that i needed to use #inlinable and #usableFromInline attributes that were introduced in Swift 4.2 so i would like to share my experience with you.
Let me get straight to the issue though, Our codebase has a Analytics Facade module that links other modules.
App Target -> Analytics Facade module -> Reporting module X.
Analytics Facade module has a function called report(_ rawReport: EventSerializable) that fire the reporting calls, This function uses an instance from the reporting module X to send the reporting calls for that specific reporting module X.
The thing is, calling that report(_ rawReport: EventSerializable) function many times to send the reporting calls once the users launch the app creates unavoidable overhead that caused a lot of crashes for us.
Moreover it's not an easy task to reproduce these crashes if you are setting the Optimisation level to None on the debug mode. In my case i was able only to reproduce it when i set the Optimisation level to Fastest, Smallest or even higher.
The solution was to use #inlinable and #usableFromInline.
Using #inlinable and #usableFromInline export the body of a function as part of a module's interface, making it available to the optimiser when referenced from other modules.
The #usableFromInline attribute marks an internal declaration as being part of the binary interface of a module, allowing it to be used from #inlinable code without exposing it as part of the module's source interface.
Across module boundaries, runtime generics introduce unavoidable overhead, as reified type metadata must be passed between functions, and various indirect access patterns must be used to manipulate values of generic type. For most applications, this overhead is negligible compared to the actual work performed by the code itself.
A client binary built against this framework can call those generics functions and enjoy a possible performance improvement when built with optimisations enabled, due to the elimination of abstraction overhead.
Sample Code:
#inlinable public func allEqual<T>(_ seq: T) -> Bool
where T : Sequence, T.Element : Equatable {
var iter = seq.makeIterator()
guard let first = iter.next() else { return true }
func rec(_ iter: inout T.Iterator) -> Bool {
guard let next = iter.next() else { return true }
return next == first && rec(&iter)
}
return rec(&iter)
}
More Info - Cross-module inlining and specialization

Breaking when a method returns null in the Eclipse debugger

I'm working on an expression evaluator. There is an evaluate() function which is called many times depending on the complexity of the expression processed.
I need to break and investigate when this method returns null. There are many paths and return statements.
It is possible to break on exit method event but I can't find how to put a condition about the value returned.
I got stuck in that frustration too. One can inspect (and write conditions) on named variables, but not on something unnamed like a return value. Here are some ideas (for whoever might be interested):
One could include something like evaluate() == null in the breakpoint's condition. Tests performed (Eclipse 4.4) show that in such a case, the function will be performed again for the breakpoint purposes, but this time with the breakpoint disabled. So you will avoid a stack overflow situation, at least. Whether this would be useful, depends on the nature of the function under consideration - will it return the same value at breakpoint time as at run time? (Some s[a|i]mple code to test:)
class TestBreakpoint {
int counter = 0;
boolean eval() { /* <== breakpoint here, [x]on exit, [x]condition: eval()==false */
System.out.println("Iteration " + ++counter);
return true;
}
public static void main(String[] args) {
TestBreakpoint app = new TestBreakpoint();
System.out.println("STARTED");
app.eval();
System.out.println("STOPPED");
}
}
// RESULTS:
// Normal run: shows 1 iteration of eval()
// Debug run: shows 2 iterations of eval(), no stack overflow, no stop on breakpoint
Another way to make it easier (to potentially do debugging in future) would be to have coding conventions (or personal coding style) that require one to declare a local variable that is set inside the function, and returned only once at the end. E.g.:
public MyType evaluate() {
MyType result = null;
if (conditionA) result = new MyType('A');
else if (conditionB) result = new MyType ('B');
return result;
}
Then you can at least do an exit breakpoint with a condition like result == null. However, I agree that this is unnecessarily verbose for simple functions, is a bit contrary to flow that the language allows, and can only be enforced manually. (Personally, I do use this convention sometimes for more complex functions (the name result 'reserved' just for this use), where it may make things clearer, but not for simple functions. But it's difficult to draw the line; just this morning had to step through a simple function to see which of 3 possible cases was the one fired. For today's complex systems, one wants to avoid stepping.)
Barring the above, you would need to modify your code on a case by case basis as in the previous point for the single function to assign your return value to some variable, which you can test. If some work policy disallows you to make such non-functional changes, one is quite stuck... It is of course also possible that such a rewrite could result in a bug inadvertently being resolved, if the original code was a bit convoluted, so beware of reverting to the original after debugging, only to find that the bug is now back.
You didn't say what language you were working in. If it's Java or C++ you can set a condition on a Method (or Function) breakpoint using the breakpoint properties. Here are images showing both cases.
In the Java example you would unclik Entry and put a check in Exit.
Java Method Breakpoint Properties Dialog
!
C++ Function Breakpoint Properties Dialog
This is not yet supported by the Eclipse debugger and added as an enhancement request. I'd appreciate if you vote for it.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=425744

Dynamic arg types for a python function when embedding

I am adding to Exim an embedded python interpreter. I have copied the embedded perl interface and expect python to work the same as the long-since-coded embedded perl interpreter. The goal is to allow the sysadmin to do complex functions in a powerful scripting language (i.e. python) instead of trying to use exim's standard ACL commands because it can get quite complex to do relatively simple things using the exim ACL language.
My current code as of the time of this writing is located at http://git.exim.org/users/tlyons/exim.git/blob/9b2c5e1427d3861a2154bba04ac9b1f2420908f7:/src/src/python.c . It is working properly in that it can import the sysadmin's custom python code, call functions in it, and handle the returned values (simple return types only: int, float, or string). However, it does not yet handle values that are passed to a python function, which is where my question begins.
Python seems to require that any args I pass to the embedded python function be explicitly cast to one of int,long,double,float or string using the c api. The problem is the sysadmin can put anything in that embedded python code and in the c side of things in exim, I won't know what those variable types are. I know that python is dynamically typed so I was hoping to maintain that compliance when passing values to the embedded code. But it's not working that way in my testing.
Using the following basic super-simple python code:
def dumb_add(a,b):
return a+b
...and the calling code from my exim ACL language is:
${python {dumb_add}{800}{100}}
In my c code below, reference counting is omitted for brevity. count is the number of args I'm passing:
pArgs = PyTuple_New(count);
for (i=0; i<count; ++i)
{
pValue = PyString_FromString((const char *)arg[i]);
PyTuple_SetItem(pArgs, i, pValue);
}
pReturn = PyObject_CallObject(pFunc, pArgs);
Yes, **arg is a pointer to an array of strings (two strings in this simple case). The problem is that the two values are treated as strings in the python code, so the result of that c code executing the embedded python is:
${python {dumb_add}{800}{100}}
800100
If I change the python to be:
def dumb_add(a,b):
return int(a)+int(b)
Then the result of that c code executing the python code is as expected:
${python {dumb_add}{800}{100}}
900
My goal is that I don't want to force a python user to manually cast all of the numeric parameters they pass to an embedded python function. Instead of PyString_FromString(), if there was a PyDynamicType_FromString(), I would be ecstatic. Exim's embedded perl parses the args and does the casting automatically, I was hoping for the same from the embedded python. Can anybody suggest if python can do this arg parsing to provide the dynamic typing I was expecting?
Or if I want to maintain that dynamic typing, is my only option going to be for me to parse each arg and guess at the type to cast it to? I was really really REALLY hoping to avoid that approach. If it comes to that, I may just document "All parameters passed are strings, so if you are actually trying to pass numbers, you must cast all parameters with int(), float(), double(), or long()". However, and there is always a comma after however, I feel that approach will sour strong python coders on my implementation. I want to avoid that too.
Any and all suggestions are appreciated, aside from "make your app into a python module".
The way I ended up solving this was by finding out how many args the function expected, and exit with an error if the number of args passed to the function didn't match. Rather than try and synthesize missing args or to simply omit extra args, for my use case I felt it was best to enforce matching arg counts.
The args are passed to this function as an unsigned char ** arg:
int count = 0;
/* Identify and call appropriate function */
pFunc = PyObject_GetAttrString(pModule, (const char *) name);
if (pFunc && PyCallable_Check(pFunc))
{
PyCodeObject *pFuncCode = (PyCodeObject *)PyFunction_GET_CODE(pFunc);
/* Should not fail if pFunc succeeded, but check to be thorough */
if (!pFuncCode)
{
*errstrp = string_sprintf("Can't check function arg count for %s",
name);
return NULL;
}
while(arg[count])
count++;
/* Sanity checking: Calling a python object requires to state number of
vars being passed, bail if it doesn't match function declaration. */
if (count != pFuncCode->co_argcount)
{
*errstrp = string_sprintf("Expected %d args to %s, was passed %d",
pFuncCode->co_argcount, name, count);
return NULL;
}
The string_sprintf is a function within the Exim source code which also handles memory allocation, making life easy for me.

Avoiding messiness with debug stuff in code

When I write something, half the effort tends to go into adding clear and concise debug output, or functionality that can be enabled/disabled when that something needs debugging.
An example of debug functionality is a downloader class where I can turn on a #define which makes it "pretend" to download the file and simply hand me back the one I have already. That way I can test to see what happens when a user downloads a file, without having to wait for the network to physically grab the file every single time. This is great functionality to have, but the code gets messier with the #ifdefs.
I eventually end up with a bunch of #defines like
// #define DEBUG_FOOMODULE_FOO
// #define DEBUG_BARMODULE_THINGAMAJIG
// ...
which are uncommented for the stuff I want to look at. The code itself turns out something like
- (void)something
{
#ifdef DEBUG_FOOMODULE_FOO
DebugLog(#"something [x = %#]", x);
#endif
// ...
#ifdef DEBUG_FOOMODULE_MOO
// etc
}
This works great for writing / maintaining the code, but it does nothing for the appearance of the code.
How do people write effortless on-the-fly long-term debug "stuff" anyway?
Note: I'm not only talking about NSLogging here... I'm also talking about stuff like the pretend-download above.
I read several libraries before writing my own and saw two approaches: macro + C functions (NSLogger) or macro + Singleton (GTMLogger, Cocoa Lumberjack).
I wrote my naive implementation here using macro + singleton. I do this during runtime:
[Logger singleton].logThreshold = kDebug;
trace(#"hi %#",#"world); // won't show
debug(#"hi %#",#"world);
You could do the same for packages instead log levels. If I want it gone, I change the #defines. Here is the code involved:
#define trace(args...) [[Logger singleton] debugWithLevel:kTrace line:__LINE__ funcName:__PRETTY_FUNCTION__ message:args];
if (level>=logThreshold){
// ...
}
If you want something more sophisticated look into Lumberjack, it has a register class facility to toggle logging for some classes.
Having two functions and then select them appropriately either at run-time or when compiling seems like a clean approach to me. This makes it possible to have one download.c and one download_debug.c file with the same functions except with different implementations. Then link with the appropriate one if you are building with -DDEBUG or not.
Otherwise, using function pointers works as well for run-time selecting functions.
If you insist of having debug code interspersed throughout your functions, you are pretty much setting up yourself for a mess :) But, you can of course break out those snippets into separate functions and then do the above (or make them no-ops as in the DLog example).
For your case, you could have separate logging macros, say MooLog and FooLog, that all conditionally compile based upon separate flags.
#ifdef DEBUG_FOO
# define FooLog(fmt, ...) NSLog((#"%s [Line %d] " fmt), __PRETTY_FUNCTION__, __LINE__, ##__VA_ARGS__);
#else
# define FooLog(...)
#endif
#ifdef DEBUG_MOO
# define MooLog(fmt, ...) NSLog((#"%s [Line %d] " fmt), __PRETTY_FUNCTION__, __LINE__, ##__VA_ARGS__);
#else
# define MooLog(...)
#endif
Your complex logic everywhere now becomes this:
- (void)something
{
// This only gets logged if the "DEBUG_FOO" flag is set.
FooLog(#"something [x = %#]", x);
// This only gets logged if the "DEBUG_MOO" flag is set.
MooLog(#"Something else [y = %#]", y);
}