How to hide auto-generated comments? - radare2

When disassembling in Radare2, the output is decorated with random annotations of memory peeks, decimal conversions, etc., for example:
...
0000:06ea and al, 0x7f
0000:06ec cmp al, 5 ; 5
0000:06ee jne 0x712
0000:06f0 mov eax, dword [bx + 8] ; [0x8:4]=-1 ; 8
0000:06f4 mov edx, dword [bp + 0x14] ; [0x14:4]=-1 ; 20
...
I find them largely irrelevant: for example, I don't care about the value at 0x14 when that is used as a displacement rather than a fixed address. What command do I use to hide them, either globally or for a particular address?

This is possible since version 3.0. The commands are:
e asm.comments=false
e asm.usercomments=true
The former turns all comments off and the latter overrides this for user-added comments. Currently there's no finer distinction than this: you can't turn off the [0x8:4]=-1 keeping the decimal conversions only, for example.

Related

Systemverilog expansion rule

When I review some codes, I found something strange.
It seems that it comes from expansion and operation priority.
(I know that because "sig" is declared with 'signed', $signed is not necessary and '-sig' is correct one, anyway..)
reg signed [9:0] sig;
reg signed [11:0] out;
initial
begin
$monitor ("%0t] sig=%0d, out=%0h", $time, sig, out);
sig = 64;
out = $signed(-sig);
#1
out = -$signed(sig);
#1
sig = -512;
out = $signed(-sig);
#1
out = -$signed(sig);
#1
$finish;
end
Simulation result for above codes is,
0] sig=64, out=-64
2] sig=-512, out=-512
3] sig=-512, out=512
When sig=-512, I expected that 10 bits sig would be expanded to 12bits before negation, but it was expanded after negation.
Therefore negation of -512 was still -512, and after expansion, it had a -512.
I guess "$signed() blocks expansion..Any idea what happens??
First of all, -512 and 512 are identical numbers in 10-bit represenntation. 10 bits can actually only hold signed values from -512 to 511. In this scheme negation of -512 should work weirdly, not mentioned in lrm, at least i was not able to locate anything related. This is probably an undefined behavior.
However, it is logical to assume that in this scheme in order to represent a negated value of '-512' just removing signess is sufficient. It seems that all commercial compilers in eda playground do this. So, a result of the unaray - operator in this case will be unsigned value of 512.
So, in out = $signed(-(-512)) the negation operator returns an unsigned value of 512 and it gets converted to a signed by the system task. Therefore, it gets sign extended in out.
out = -$signed(-512) for the same reason the outermost negation operator returns an unsigned value of 512. No sign extension happens here.
You can again make it signed by enclosing in yet another $signed as out = $signed(-$signed(-512))

Construct a 64 bit mask register from four 16 bit ones

What is the best way to end up with a __mmask64 from four __mmask16? I just want to concatenate them. Can't seem to find a solution on the internet.
AVX-512 has hardware instructions for concatenating two mask registers, for example 2x kunpckwd instructions and one kunpckdq would do the trick here.
(Each instruction is 4 cycle latency, port 5 only, on SKX and Ice Lake. https://uops.info. But at least the 2 independent ones in the first step can mostly overlap, starting one cycle apart, limited by competition for port 5. But they won't all be ready at once anyway, if the compiler schedules the instructions that generate the 4 masks so one pair should be ready first so it can get started.)
// compiles nicely with GCC/clang/ICC. Current MSVC has major pessimizations
inline
__mmask64 set_mask64_kunpck(__mmask16 m0, __mmask16 m1, __mmask16 m2, __mmask16 m3)
{
__mmask32 md0 = _mm512_kunpackw(m1, m0); // hi, lo
__mmask32 md1 = _mm512_kunpackw(m3, m2);
__mmask64 mq = _mm512_kunpackd(md1, md0);
return mq;
}
That's your best bet if your __mask16 values are actually in k registers, where a compiler will have them if they're the result of AVX-512 compare/test intrinsics like _mm512_cmple_epu32_mask. If they're coming from an array you generated earlier, it might be better to combine them with plain scalar stuff (See Paul's answer), instead of slowly getting them into mask registers with kmov. kmov k, mem is 3 uops for the front-end, with scalar integer load and a kmov k, reg back-end uops, plus an extra front-end uop for no apparent reason.
__mmask16 is just a typedef for unsigned short (in gcc/clang/ICC/MSVC) so you can simply manipulate it like an integer, and compilers will use kmov as necessary. (This can lead to pretty inefficient code if you're not careful, and unfortunately current compilers aren't smart enough to compile a shift/OR function into using kunpckwd.)
There are intrinsics like unsigned int _cvtmask16_u32 (__mmask16 a) but they're optional for current compilers that implement __mmask16 as unsigned short.
To look at compiler output for a case where __mmask16 values start out in k registers, it's necessary to write a test function that uses intrinsics to create the mask values. (Or use inline asm constraints.) The standard x86-64 calling conventions handle __mmask16 as a scalar integer, so as a function arg it's already in an integer register, not a k register.
__mmask64 test(__m256i v0, __m256i v1, __m256i v2, __m256i v3)
{
__mmask16 m0 = _mm256_movepi16_mask(v0); // clang can optimize _mm_movepi8_mask into pmovmskb eax, xmm avoiding k regs
__mmask16 m1 = _mm256_movepi16_mask(v1);
__mmask16 m2 = _mm256_movepi16_mask(v2);
__mmask16 m3 = _mm256_movepi16_mask(v3);
//return set_mask64_mmx(m0,m1,m2,m3);
//return set_mask64_scalar(m0,m1,m2,m3);
return set_mask64_kunpck(m0,m1,m2,m3);
}
With GCC and clang, that compiles to (Godbolt):
# gcc 11.1 -O3 -march=skylake-avx512
test(long long __vector(4), long long __vector(4), long long __vector(4), long long __vector(4)):
vpmovw2m k3, ymm0
vpmovw2m k1, ymm1
vpmovw2m k2, ymm2
vpmovw2m k0, ymm3 # create masks
kunpckwd k1, k1, k3
kunpckwd k0, k0, k2
kunpckdq k4, k0, k1 # combine masks
kmovq rax, k4 # use mask, in this case by returning as integer
ret
I could have used the final mask result for a blend intrinsic between two of the inputs, for example, but the compiler didn't try to avoid kunpck by doing 4x kmov (also only 1 port).
MSVC 19.29 -O2 -Gv -arch:AVX512 does a rather poor job, extracting each mask to a scalar integer regs between intrinsics. like
MSVC 19.29
kmovw ax, k1
movzx edx, ax
...
kmovd k3, edx
This is supremely dumb, not even using kmovw eax, k1 to zero-extend into a 32-bit register, not to mention not realizing that the next kunpck only cares about the low part of its input anyway, so there was not need to kmov the data to/from an integer register at all. Later, it even uses this, apparently not realizing that kmovd writing a 32-bit register zero-extends into the 64-bit register. (To be fair, GCC has some dumb missed optimizations like that around its __builtin_popcount intrinsic.)
; MSVC 19.29
kmovd ecx, k2
mov ecx, ecx
kmovq k1, rcx
The kunpck intrinsics do have strange prototypes, with inputs as wide as their outputs, e.g.
__mmask32 _mm512_kunpackw (__mmask32 a, __mmask32 b)
So perhaps this is tricking MSVC into manually doing the uint16_t -> uint32_t conversion by going to scalar and back, since it apparently doesn't know that vpmovw2m k3, ymm0 already zero-extends into the full k3.
You can just treat __mmask16 and __mmask64 like 16 bit and 64 bit ints, e.g.
__mmask64 set_mask64(__mmask16 m0, __mmask16 m1, __mmask16 m2, __mmask16 m3)
{
return (((__mmask64)m0) << 0)
| (((__mmask64)m1) << 16)
| (((__mmask64)m2) << 32)
| (((__mmask64)m3) << 48);
}
or perhaps:
__mmask64 set_mask64(__mmask16 m0, __mmask16 m1, __mmask16 m2, __mmask16 m3)
{
return (__mmask64)_mm_set_pi16(m0, m1, m2, m3);
}
Both of the above use scalar/SSE code. Using AVX512 mask intrinsics will be more efficient (see #Peter's answer for better solutions).

How to display the address of the function in WinDBG for .fnret command?

I need to get the address of the function required by .fnret command in WinDBG.
For example, I want to get the information about return value of apphelp!ApphelpCheckRunApp function.
First, I set a breakpoint on this function:
bp apphelp!ApphelpCheckRunApp
Then I'm continuing the execution, until it breaks on that function.
After breaking, I'm executing .fnret [Address] command.
I already tried to use the 77b345d5 address displayed on the breakpoint:
Breakpoint 0 hit
eax=77b345d5 ebx=7ed320f5 ecx=7ffac000 edx=7c886920 esi=7ffac000 edi=00000018
eip=77b345d5 esp=0378ce90 ebp=0378d108 iopl=0 nv up ei pl nz ac po cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000213
appHelp!ApphelpCheckRunApp:
77b345d5 8bff mov edi,edi
but that seems to be not what I need, because I get the following error:
^ Unknown or unsupported return type in '.fnret 77b345d5'
Also I used return address 7c818cdf of this function from call stack (got via kb command):
ChildEBP RetAddr Args to Child
0283ce8c 7c818cdf 00000474 046bb7d0 00000000 appHelp!ApphelpCheckRunApp
but it leads me to the same error.
Which WinDBG command I should use for that and which return address it will display (in case it isn't displayed yet on breakpoint)? Will it then properly work for .fnret or .fnret /s commands? Unfortunately, there are no any examples of using them on MSDN, only the documentation.
Hoping on your help. Thanks in advance.
.fnret is only useful if you have private pdb
it is not useful if you have public pdb because it needs to retrieve the type information
here is a sample usage on a compiled code with private pdb
0:000> x /t /v /f myst!towlower
prv func 00007ff6`74ba5f84 7 <function> myst!towlower (unsigned short)
0:000> x /t /v /f myst!toupper
prv func 00007ff6`74b91b10 2a <function> myst!toupper (int)
0:000> .fnret myst!towlower
myst!towlower (00007ff6`74ba5f84) = unsigned short 1
0:000> .fnret myst!toupper
myst!toupper (00007ff6`74b91b10) = int 0n1
error on a known function which returns a HANDLE using public stripped pdb
0:000> .fnret KERNELBASE!CreateFileA
^ Unknown or unsupported return type in '.fnret KERNELBASE!CreateFileA'
success on a system file with private pdb
it casts the forced return value dumped in #rax as a typed return with value of a function with type information
a system file with prrivate pdb
0:000> .printf "%y\n" , 0x00000001`800bace0 ; an arbitrary function
ole32!ToUnicode (00000001`800bace0)
0:000> .printf "%mu\n" , 00000001`8014c17a ; an arbitrary wide string
guageErrorPointerą“‚Š
0:000> r rax = 00000001`8014c17a the $retreg is populated with an address of wide string
0:000> .fnret 0x00000001`800bace0 << fnret casts the $retreg as wide string and prints the resulting widestring
ole32!ToUnicode (00000001`800bace0) = wchar_t * 0x00000001`8014c17a
"guageErrorPointer???"
OK, that command is indeed not helpful at all when using public PDBs.
I found better solution here: How to get return value from a function in windbg?.
It is possible to get the memory address of return value by viewing eax/rax register on x86/x64 appropriately, using r command (since it always is stored there). After breakpoint, I'm just typing r eax on x86 or r rax on x64. Output will be look like this:
eax=[Address]
Then, I'm displaying a value of received memory address via d* (dd, du etc. displaying data types commands), like this:
du [Address]
After looking at the output, it becomes understandable which data is returned, and its data type also (at least in most of cases).
But to understand first, which data type is used, I'm trying the different combinations of display memory commands and display referenced memory commands.

struct or class in assembly

I need something like struct or class in c++
For example I need a class with an array and two attribute (size and len) and some function like append and remove .
How can I implement this in assembly with macros and procedures?
Tasm supports eg.
struc String // note: without 't' at the end
size dw 100
len dw 10
data db 0 dup(100)
ends String
Gnu assembler also has a .struct directive.
The syntax for MASM is:
String STRUCT
size dw 100
len dw 10
String ENDS
Usage again from the same MASM manual:
ASSUME eax:PTR String
mov ecx, [eax].size,
mov edx, [eax].len
ASSUME eax:nothing
.. or ..
mov ecx, (String PTR [eax]).size // One can 'cast' to struct pointer
One can also access a local variable directly
mov eax, myStruct.len
Here's a sample MASM struct from a HID interface routine that I wrote:
SP_DEVICE_INTERFACE_DATA struct
CbSize DWORD ?
ClassGuid GUID <>
Flags DWORD ?
Reserved ULONG ?
SP_DEVICE_INTERFACE_DATA ends
Structure in 8086 MASM
syntax
struct_name STRUC
var_name type ?
...
struct_name ENDS
Rules
1)It can't be initialized (If initialized results in garbage values)
2)It should be accessed using "direct addressing mode" (If not result in "immediate addressing mode")
program to add two numbers
DATA SEGMENT
FOO STRUC
A DB ?
B DB ?
SUM DW ?
FOO ENDS
DATA ENDS
CODE SEGMENT
ASSUME CS:CODE,DS:DATA
START:MOV AX,DATA
MOV DS,AX
XOR AX,AX
MOV DS:[FOO.A],0FFH
MOV DS:[FOO.B],0FFH
MOV AL,DS:[FOO.A] ;al=ff
ADD AL,DS:[FOO.B] ;al=al+ff
ADC AH,00H ;ah=ah+carry_flag(1/0)+00
MOV DS:[FOO.SUM],AX ;sum=ax
HLT ;stop
CODE ENDS
END START

Is there a way to dump the individual arguments of va_list in windbg?

Is there a way to dump the arguments in va_list in windbg given the format string and the starting address of va_list?
I usually do this by just dumping the content of the stack using command dd esp (for x86) or dq rsp (for x64). Knowing the starting address of va_list makes it a bit easier to locate the place in the stack where vararg block begins, but usually you can either guess it or calculate by knowing the sizes of regular (non-vararg parameters) to the function.
Here is an annotated example for x86. The function beeing called:
printf("%d %o %g %s %c", 101, 201, 301.0, "401-string", '5');
in debugger:
0:000> bp MSVCR100D!printf
0:000> g
Breakpoint 1 hit
eax=00000001 ebx=00000000 ecx=2549afc4 edx=00000000 esi=002ceeb8 edi=002cf040
eip=0ff57ee0 esp=002cee98 ebp=002cf04c iopl=0 nv up ei pl nz ac po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000212
MSVCR100D!printf:
0ff57ee0 8bff mov edi,edi
0:000> dd /c1 esp
002cee98 01365cee // return address
002cee9c 0137d6e8 // pointer to the format string "%d %o %g %s %c" --> next follows our variable arguments
002ceea0 00000065 // first vararg argument, int 101
002ceea4 000000c9 // second vararg argument, int 201
002ceea8 00000000 // third vararg argument, double 301.0, it occupies two slots in stack
002ceeac 4072d000 // third argument continues
002ceeb0 0137d70c // fourth vararg argument, pointer to string
002ceeb4 00000035 // fifth vararg argument, 8-bit character (still occupies 4 bytes in stack)
002ceeb8 25b87244
002ceebc 002cf254
002ceec0 0041c520
002ceec4 00000000
...
For other functions it will be very similar, because all functions that use variable number of arguments have to be following __cdecl calling convention, so you will find the same type of layout of parameters in the stack.