For black-box analysis of the outcome of a system call, is a complete comparison of before-and-after forensic system images the right way to measure? - x86-64

I'm doing x86-64 binary obfuscation research and fundamentally one of the key challenges in the offense / defense cat and mouse game of executing a known-bad program and detecting it (even when obfuscated) is system call sequence analysis.
Put simply, obfuscation is just achieving the same effects on the system through a different sequence of instructions and memory states in order to minimize observable analysis channels. But at the end of the day, you need to execute certain system calls in a certain order to achieve certain input / output behaviors for a program.
Or do you? The question I want to study is this: Could the intended outcome of some or all system calls be achieved through different system calls? Let's say system call D, when executed 3 times consecutively, with certain parameters can be heuristically attributed to malicious behavior. If system calls A, B, and C could be found to achieve the same effect (perhaps in addition to other side-effects) desired from system call D, then it would be possible to evade kernel hooks designed to trace and heuristically analyze system call sequences.
To determine how often this system call outcome overlap exists in a given OS, I don't want to use documentation and manual analysis for a few reasons:
undocumented behavior
lots of work, repeated for every OS and even different versions
So rather, I'm interested in performing black-box analysis to fuzz system calls with various arguments and observing the effects. My problem is I'm not sure how to measure the effects. Once I execute a system call, what mechanism could I use to observe exactly which changes result from it? Is there any reliable way, aside from completely iterating over entire forensic snapshots of the machine before and after?

Related

Getting real time statistics in Omnet++

In:
https://docs.omnetpp.org/tutorials/tictoc/part5/
and
https://doc.omnetpp.org/omnetpp/manual/#sec:simple-modules:declaring-statistics
it's shown how network statistics can be processed after a simulation.
Is it possible to get network parameters dynamically?
TL;DR: Use signals (not statistics) and hook up your own simple module on these signals and compute the required statistics in that module.
You cannot access the value of #statistics in your code, and there is a reason for this as this would be an anti pattern. NED based statistics were introduced as a method to add calculations and measurements to your model without modifying your models behavior and code. This means that statistics are NOT considered part of a model, but rather they are considered as a configuration. Changing a statistics (i.e. deciding that you want to measure something else) should never change the behavior of your model. That's why the actual value of a given statistic is not exposed (easily) to the C++ code. You could dig them out, but it is highly discouraged.
Now, this does not mean that what you want to achieve is not legitimate but the actual statistics gathering must be an integral part of your model. I.e. you should not aim for using built-in statistics, but rather create an explicit statistics gathering submodule that should hook up on the necessary signals (https://doc.omnetpp.org/omnetpp/manual/#sec:simple-modules:subscribing-to-signals) and do the actual statistics computation you need in its C++ code. After that, other modules are free to access this information and make decisions based on that.

Is there any standard for supporting Lock-step processor?

I want to ask about supporting Lock-step(lockstep, lock-step) processors in SW-level.
As I know, in AUTOSAR-ASILD, Lock-step processor is used for fault torelant system as below scenario.
The input signals for a processor is copied to another processor(its Lock-step pair).
The output signals from two different processors are compared.
If two output signals are different, trap is generated.
I think that if there is generated trap, then this generated trap should be processed somewhere in SW-level.
However, I could not find any standard for this processing.
I have read some error handling in SW topics specified in AUTOSAR, but I could not find any satisfying answers.
So, my question is summarized as below.
In AUTOSAR or other standard, where is the right place that processes Lock-step trap(SW-C or RTE or BSW)?.
In AUTOSAR or other standard, what is the right action that processes Lock-step trap(RESET or ABORT)?
Thank you.
There are multiple concepts involved here, from different sources.
The ASIL levels are defined by ISO 26262. ASIL-D is the highest level and using a lockstep CPU is one of the methods typically used to achieve ASIL-D compliance for the whole system. Autosar doesn't define how you achieve ASIL-D, or any ASIL level at all. From an Autosar perspective, lockstep would be an implementation detail of the MCU driver, and Autosar doesn't require MCUs to support lockstep. How a particular lockstep implementation works (whether the outputs are compared after each instruction or not, etc.) depends on the hardware, so you can find those answers in the corresponding hardware manual.
Correspondingly, some decisions have to be made by people working on the system, including an expert on functional safety. The decision on what to do on lockstep failure is one such decision - how you react to a lockstep trap should be defined at the system level. This is also not defined by Autosar, although the most reasonable option is to reset your microcontroller after saving some information about the error.
As for where in the Autosar stack the trap should be handled, this is also an implementation decision, although the reasonable choice is for this to happen at the MCAL level - to the extent that talking about levels even makes sense here, as the trap will run in interrupt/trap context and not the normal OS task context. Typically, a trap would come with a higher priority than any interrupt, and also typically it's not possible to disable the traps in software. A trap will be handled by some routine that is registered by the OS in the same way it registers ISRs, so you'd want to configure the trap handler in whatever tool you're using for OS configuration. The lockstep trap may (again, depending on the hardware) be considered a non-recoverable trap, meaning that the trap handler should trigger a reset eventually. Calling the standard ShutdownOS() function may be reasonable.

How to implement deterministic single threaded network simulation

I read about how FoundationDB does its network testing/simulation here: http://www.slideshare.net/FoundationDB/deterministic-simulation-testing
I would like to implement something very similar, but cannot figure out how they actually did implement it. How would one go about writing, for example, a C++ class that does what they do. Is it possible to do the kind of simulation they do without doing any code generation (as they presumeably do)?
Also: How can a simulation be repeated, if it contains random events?? Each time the simulation would require to choose a new random value and thus be not the same run as the one before. Maybe I am missing something here...hope somebody can shed a bit of light on the matter.
You can find a little bit more detail in the talk that went along with those slides here: https://www.youtube.com/watch?v=4fFDFbi3toc
As for the determinism question, you're right that a simulation cannot be repeated exactly unless all possible sources of randomness and other non-determinism are carefully controlled. To that end:
(1) Generate all random numbers from a PRNG that you seed with a known value.
(2) Avoid any sort of branching or conditionals based on facts about the world which you don't control (e.g. the time of day, the load on the machine, etc.), or if you can't help that, then pseudo-randomly simulate those things too.
(3) Ensure that whatever mechanism you pick for concurrency has a mode in which it can guarantee a deterministic execution order.
Since it's easy to mess all those things up, you'll also want to have a way of checking whether determinism has been violated.
All of this is covered in greater detail in the talk that I linked above.
In the sims I've built the biggest issue with repeatability ends up being proper seed management (as per the previous answer). You want your simulations to give different results only when you supply a different seed to your random number generators than before.
After that the biggest issue I've seen seems tends to be making sure you don't iterate over collections with nondeterministic ordering. For instance, in Java, you'd use a LinkedHashMap instead of a HashMap.

How to handle the two signals depending on each other?

I read Deprecating the Observer Pattern with Scala.React and found reactive programming very interesting.
But there is a point I can't figure out: the author described the signals as the nodes in a DAG(Directed acyclic graph). Then what if you have two signals(or event sources, or models, w/e) depending on each other? i.e. the 'two-way binding', like a model and a view in web front-end programming.
Sometimes it's just inevitable because the user can change view, and the back-end(asynchronous request, for example) can change model, and you hope the other side to reflect the change immediately.
The loop dependencies in a reactive programming language can be handled with a variety of semantics. The one that appears to have been chosen in scala.React is that of synchronous reactive languages and specifically that of Esterel. You can have a good explanation of this semantics and its alternatives in the paper "The synchronous languages 12 years later" by Benveniste, A. ; Caspi, P. ; Edwards, S.A. ; Halbwachs, N. ; Le Guernic, P. ; de Simone, R. and available at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1173191&tag=1 or http://virtualhost.cs.columbia.edu/~sedwards/papers/benveniste2003synchronous.pdf.
Replying #Matt Carkci here, because a comment wouldn't suffice
In the paper section 7.1 Change Propagation you have
Our change propagation implementation uses a push-based approach based on a topologically ordered dependency graph. When a propagation turn starts, the propagator puts all nodes that have been invalidated since the last turn into a priority queue which is sorted according to the topological order, briefly level, of the nodes. The propagator dequeues the node on the lowest level and validates it, potentially changing its state and putting its dependent nodes, which are on greater levels, on the queue. The propagator repeats this step until the queue is empty, always keeping track of the current level, which becomes important for level mismatches below. For correctly ordered graphs, this process monotonically proceeds to greater levels, thus ensuring data consistency, i.e., the absence of glitches.
and later at section 7.6 Level Mismatch
We therefore need to prepare for an opaque node n to access another node that is on a higher topological level. Every node that is read from during n’s evaluation, first checks whether the current propagation level which is maintained by the propagator is greater than the node’s level. If it is, it proceed as usual, otherwise it throws a level mismatch exception containing a reference to itself, which is caught only in the main propagation loop. The propagator then hoists n by first changing its level to a level above the node which threw the exception, reinserting n into the propagation queue (since it’s level has changed) for later evaluation in the same turn and then transitively hoisting all of n’s dependents.
While there's no mention about any topological constraint (cyclic vs acyclic), something is not clear. (at least to me)
First arises the question of how is the topological order defined.
And then the implementation suggests that mutually dependent nodes would loop forever in the evaluation through the exception mechanism explained above.
What do you think?
After scanning the paper, I can't find where they mention that it must be acyclic. There's nothing stopping you from creating cyclic graphs in dataflow/reactive programming. Acyclic graphs only allow you to create Pipeline Dataflow (e.g. Unix command line pipes).
Feedback and cycles are a very powerful mechanism in dataflow. Without them you are restricted to the types of programs you can create. Take a look at Flow-Based Programming - Loop-Type Networks.
Edit after second post by pagoda_5b
One statement in the paper made me take notice...
For correctly ordered graphs, this process
monotonically proceeds to greater levels, thus ensuring data
consistency, i.e., the absence of glitches.
To me that says that loops are not allowed within the Scala.React framework. A cycle between two nodes would seem to cause the system to continually try to raise the level of both nodes forever.
But that doesn't mean that you have to encode the loops within their framework. It could be possible to have have one path from the item you want to observe and then another, separate, path back to the GUI.
To me, it always seems that too much emphasis is placed on a programming system completing and giving one answer. Loops make it difficult to determine when to terminate. Libraries that use the term "reactive" tend to subscribe to this thought process. But that is just a result of the Von Neumann architecture of computers... a focus of solving an equation and returning the answer. Libraries that shy away from loops seem to be worried about program termination.
Dataflow doesn't require a program to have one right answer or ever terminate. The answer is the answer at this moment of time due to the inputs at this moment. Feedback and loops are expected if not required. A dataflow system is basically just a big loop that constantly passes data between nodes. To terminate it, you just stop it.
Dataflow doesn't have to be so complicated. It is just a very different way to think about programming. I suggest you look at J. Paul Morison's book "Flow Based Programming" for a field tested version of dataflow or my book (once it's done).
Check your MVC knowledge. The view doesn't update the model, so it won't send signals to it. The controller updates the model. For a C/F converter, you would have two controllers (one for the F control, on for the C control). Both controllers would send signals to a single model (which stores the only real temperature, Kelvin, in a lossless format). The model sends signals to two separate views (one for C view, one for F view). No cycles.
Based on the answer from #pagoda_5b, I'd say that you are likely allowed to have cycles (7.6 should handle it, at the cost of performance) but you must guarantee that there is no infinite regress. For example, you could have the controllers also receive signals from the model, as long as you guaranteed that receipt of said signal never caused a signal to be sent back to the model.
I think the above is a good description, but it uses the word "signal" in a non-FRP style. "Signals" in the above are really messages. If the description in 7.1 is correct and complete, loops in the signal graph would always cause infinite regress as processing the dependents of a node would cause the node to be processed and vice-versa, ad inf.
As #Matt Carkci said, there are FRP frameworks that allow loops, at least to a limited extent. They will either not be push-based, use non-strictness in interesting ways, enforce monotonicity, or introduce "artificial" delays so that when the signal graph is expanded on the temporal dimension (turning it into a value graph) the cycles disappear.

Importance of knowing if a standard library function is executing a system call

Is it actually important for a programmer to know if the standard library function he/she is using is actually executing a system call? If so, why?
Intuitively I'm guessing the only importance is in knowing if the general standard function is a library function or a system call itself. In other cases, I'm guessing there isn't much of a need to know if a library functions uses internally a system call?
It is not always possible to know (for sure) if a library function wraps a system call. But in one way or another, this knowledge can help improve the portability and (or) efficiency of your program. At least in the following two cases, knowing the syscall-level behaviours of your program is helpful.
When your program is time critical. Some system calls are expensive, and the library functions that wrap them are even more expensive. Thus time-critical tasks may need to switch to equivalent functions that do not enter kernel space at all.
It is also worth noticing the vsyscall (or vdso) mechanism of linux, which accelerates some system calls (i.e. gettimeofday) through mapping their implementations into user-space memory. See this for more details.
When your program needs to be deployed to some restricted environments with system call auditing. In order for your programs to survive such environments, it could be necessary to profile your program for any potential policy violations, or perhaps less tough if you are aware of the restrictions when you wrote the program.
Sometimes it might be important, and sometimes it isn't. I don't think there's any universal answer to this question. Reasons I can think of that might be important in some contexts are: if the system call requires user permissions that the user might not have; in performance critical code a system call might be too heavyweight; if you're writing a signal-handler where most system calls are forbidden; if it might use some system resource (e.g. reading from /dev/random for every random number could use up the whole entropy pool - you'd want to know if that's going to happen every time you call rand()).