Poor C optimization in Watcom - compiler-optimization

Poor C optimization in Watcom - compiler-optimization

I am using Watcom C compiler (wcc) version 2.0. I compile this simple code with many optimizations enabled, but the resulting ASM seems very unoptimized to me.
int test(int x) {
return x ? 1 : 2;
}
Compiling with 8086 as target, fastest possible optimizations (-otexan).
wcc test.c -i="C:\Data\Projects\WATCOM/h" -otexan -d2 -bt=dos -fo=.obj -mc
Then disassembling with:
wdis test.obj -s > test.dasm
Resulting assembler looks like this:
...
return x ? 1 : 2;
02C7 83 7E FA 00 cmp word ptr -0x6[bp],0x0000
02CB 74 03 je L$39
02CD E9 02 00 jmp L$40
02D0 L$39:
02D0 EB 07 jmp L$41
02D2 L$40:
02D2 C7 46 FE 01 00 mov word ptr -0x2[bp],0x0001
02D7 EB 05 jmp L$42
02D9 L$41:
02D9 C7 46 FE 02 00 mov word ptr -0x2[bp],0x0002
02DE L$42:
02DE 8B 46 FE mov ax,word ptr -0x2[bp]
02E1 89 46 FC mov word ptr -0x4[bp],ax
02E4 8B 46 FC mov ax,word ptr -0x4[bp]
...
These jumps look heavily un-optimized to me. I would expect less unnecessary jumps and maybe putting result directly to AX without putting it under BP location there and back (last two lines).
cmp word ptr -0x6[bp],0x0000
jz L$39
mov ax,0x0001
jmp L$40
L$39:
mov ax,0x0002
L$40:
...
Am I missing something there? Is wcc ignoring my switches for some reason? Thanks.

Problem is the -d2 switch that generates detailed debug info. It probably inserts unnecessary lines to correspond to original C file's lines (to be able to put hardware breakpoints there, maybe?). I put -d1 instead and voila:
01A0 test_:
01A0 85 C0 test ax,ax
01A2 74 04 je L$15
01A4 B8 01 00 mov ax,0x0001
01A7 C3 ret
01A8 L$15:
01A8 B8 02 00 mov ax,0x0002
01AB C3 ret

Related

Where are files marked as assume-unchanged in Git? [duplicate]

I like to modify config files directly (like .gitignore and .git/config) instead of remembering arbitrary commands, but I don't know where Git stores the file references that get passed to "git update-index --assume-unchanged file".
If you know, please do tell!

It says where in the command - git update-index
So you can't really be editing the index as it is not a text file.
Also, to give more detail on what is stored with the git update-index --assume-unchanged command, see the Using “assume unchanged” bit section in the manual

As others said, it's stored in the index, which is located at .git/index.
After some detective work, I found that it is located at the: assume valid bit of each index entry.
Therefore, before understanding what follows, you should first understand the global format of the index, as explained in my other answer.
Next, I will explain how I verified that the "assume valid" bit is the culprit:
empirically
by reading the source
Empirical
Time to hd it up.
Setup:
git init
echo a > b
git add b
Then:
hd .git/index
Gives:
00000000 44 49 52 43 00 00 00 02 00 00 00 01 54 e9 b6 f3 |DIRC........T...|
00000010 2d 4f e1 2f 54 e9 b6 f3 2d 4f e1 2f 00 00 08 05 |-O./T...-O./....|
00000020 00 de 32 ff 00 00 81 a4 00 00 03 e8 00 00 03 e8 |..2.............|
00000030 00 00 00 00 e6 9d e2 9b b2 d1 d6 43 4b 8b 29 ae |...........CK.).|
00000040 77 5a d8 c2 e4 8c 53 91 00 01 62 00 c9 a2 4b c1 |wZ....S...b...K.|
00000050 23 00 1e 32 53 3c 51 5d d5 cb 1a b4 43 18 ad 8c |#..2S<Q]....C...|
00000060
Now:
git update-index --assume-unchanged b
hd .git/index
Gives:
00000000 44 49 52 43 00 00 00 02 00 00 00 01 54 e9 b6 f3 |DIRC........T...|
00000010 2d 4f e1 2f 54 e9 b6 f3 2d 4f e1 2f 00 00 08 05 |-O./T...-O./....|
00000020 00 de 32 ff 00 00 81 a4 00 00 03 e8 00 00 03 e8 |..2.............|
00000030 00 00 00 00 e6 9d e2 9b b2 d1 d6 43 4b 8b 29 ae |...........CK.).|
00000040 77 5a d8 c2 e4 8c 53 91 80 01 62 00 17 08 a8 58 |wZ....S...b....X|
00000050 f7 c5 b3 e1 7d 47 ac a2 88 d9 66 c7 5c 2f 74 d7 |....}G....f.\/t.|
00000060
By comparing the two indexes, and looking at the global structure of the index, see that the only differences are:
byte number 0x48 (9th on line 40) changed from 00 to 80. That is our flag, the first bit of the cache entry flags.
the 20 bytes from 0x4C to 0x5F. This is expected since that is a SHA-1 over the entire index.
This has also though me that the SHA-1 of the index entry in bytes from 0x34 to 0x47 does not take into account the flags, since it did not changed between both indexes. This is probably why the flags are placed after the SHA, which only considers what comes before it.
Source code
Now let's see if that is coherent with source code of Git 2.3.
First look at the source of update-index, grep assume-unchanged.
This leads to the following line:
{OPTION_SET_INT, 0, "assume-unchanged", &mark_valid_only, NULL,
N_("mark files as \"not changing\""),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, MARK_FLAG},
{OPTION_SET_INT, 0, "no-assume-unchanged", &mark_valid_only, NULL,
N_("clear assumed-unchanged bit"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG},
so the value is stored at mark_valid_only. Grep it, and find that it is only used at one place:
if (mark_valid_only) {
if (mark_ce_flags(path, CE_VALID, mark_valid_only == MARK_FLAG))
die("Unable to mark file %s", path);
return;
}
CE means Cache Entry.
By quickly inspecting mark_ce_flags, we see that:
if (mark)
active_cache[pos]->ce_flags |= flag;
else
active_cache[pos]->ce_flags &= ~flag;
So the function basically sets or unsets the CE_VALID bit, depending on mark_valid_only, which is a tri-state:
mark: --assume-unchanged
unmark: --no-assume-unchanged
do nothing: the default value 0 of the option set at {OPTION_SET_INT, 0
Next, by grepping under builtin/, we see that no other place sets the value of CE_VALID, so --assume-unchanged must be the only command that sets it.
The flag is however used in many places of the source code, which should be expected as it has many side-effects, and it is used every time like:
ce->ce_flags & CE_VALID
so we conclude that it is part of the ce_flags field of struct cache_entry.
The index is specified at cache.h because one of its functions is to be a cache for creating commits faster.
By looking at the definition of CE_VALID under cache.h and surrounding lines we have:
#define CE_STAGEMASK (0x3000)
#define CE_EXTENDED (0x4000)
#define CE_VALID (0x8000)
#define CE_STAGESHIFT 12
So we conclude that it is the very first bit of that integer (0x8000), just next to the CE_EXTENDED, which is coherent with my earlier experiment.

x86-64 encoding for MOV instruction, weird case

I disassemble ( with objdump -d ) this opcode (c7 45 fc 05 00 00 00) and get this (mov DWORD PTR [rbp-0x4],0x5). then i try to decode myself and i think it should be (mov DWORD PTR [ebp-0x4],0x5) . Why it is RBP register but not EBP register ? am i missing something ?
Here what i have try:
First i look at the mov opcode for C7 opcode.
C7 /0 iw | MOV r/m16,imm16
C7 /0 id | MOV r/m32,imm32
REX.W + C7 /0 id | MOV r/m64,imm32
so there is no REX.W prefix here, and there also no +rb,+rw,+rd,+ro here. /0 mean ModR/M byte only use r/m register, reg/opcode field can safely ignore here. so 0x45 translate to [EBP]+disp8 (using table2-2. 32-bit addressing forms with the modR/M byte in Volume 2/chapter 2 ). disp8 is 0xfc -> -4. and the rest of the opcode is (05 00 00 00) is imm32.

In 64 bit mode, the default address size is 64 bit: without the address size override prefix (that's 67h, not REX.W) the 64 bit variants of registers will be used in address calculation. That's nice, you almost always want a 64 bit address calculation in 64 bit mode, if that was not the default then there would be a lot of wasted prefixes. The REX.W prefix does not affect address size but "operation size", REX.W + C7 ... is the encoding for mov QWORD PTR [...], imm32.
So, when assembling for 64 bit mode, mov DWORD PTR [ebp-0x4],0x5 would be encoded as 67 c7 45 fc 05 00 00 00, and when disassembling c7 45 fc 05 00 00 00 for 64 bit mode it means mov DWORD PTR [RBP-0x4],0x5.
An interesting consequence of these defaults is that the shortest forms of lea use a 64 bit address but a 32 bit destination, which looks a bit wrong at first.

How do I prevent ld from adding .note.gnu.property?

I've got a toy x86 assembly program that I'm writing and compiling with as and ld:
.text
.global _start
_start:
movq $1, %rax
movq $0x7FFFFFFF, %rbx
L1:
cmp %rbx, %rax
je L2
addq $1, %rax
jmp L1
L2:
movq %rax, %rbx
movq $1, %rax
int $0x80
And then to build:
as -o test.o test.S
ld -s -o test test.o
The second step of this -- ld -- generates an additional note:
$ objdump -D test
test: file format elf64-x86-64
Disassembly of section .note.gnu.property:
00000000004000e8 <.note.gnu.property>:
4000e8: 04 00 add $0x0,%al
4000ea: 00 00 add %al,(%rax)
4000ec: 10 00 adc %al,(%rax)
4000ee: 00 00 add %al,(%rax)
4000f0: 05 00 00 00 47 add $0x47000000,%eax
4000f5: 4e 55 rex.WRX push %rbp
4000f7: 00 01 add %al,(%rcx)
4000f9: 00 00 add %al,(%rax)
4000fb: c0 04 00 00 rolb $0x0,(%rax,%rax,1)
4000ff: 00 01 add %al,(%rcx)
400101: 00 00 add %al,(%rax)
400103: 00 00 add %al,(%rax)
400105: 00 00 add %al,(%rax)
...
Disassembly of section .text:
0000000000401000 <.text>:
401000: 48 c7 c0 01 00 00 00 mov $0x1,%rax
401007: 48 c7 c3 ff ff ff 7f mov $0x7fffffff,%rbx
40100e: 48 39 d8 cmp %rbx,%rax
401011: 74 06 je 0x401019
401013: 48 83 c0 01 add $0x1,%rax
401017: eb f5 jmp 0x40100e
401019: 48 89 c3 mov %rax,%rbx
40101c: 48 c7 c0 01 00 00 00 mov $0x1,%rax
401023: cd 80 int $0x80
Is there a way to eliminate or prevent generation of the .note.gnu.property section?

I didn't find a direct flag to pass to ld to prevent this section's generation (neither -s nor -s -x work), even calling strip --strip-all doesn't remove this section, however, strip --remove-section=.note.gnu.property test did remove the section:
$ as -o test.o test.s
$ ld -o test test.o
$ objdump -D test
test: file format elf64-x86-64
Disassembly of section .note.gnu.property:
00000000004000e8 <.note.gnu.property>:
4000e8: 04 00 add $0x0,%al
4000ea: 00 00 add %al,(%rax)
4000ec: 10 00 adc %al,(%rax)
4000ee: 00 00 add %al,(%rax)
4000f0: 05 00 00 00 47 add $0x47000000,%eax
4000f5: 4e 55 rex.WRX push %rbp
4000f7: 00 01 add %al,(%rcx)
4000f9: 00 00 add %al,(%rax)
4000fb: c0 04 00 00 rolb $0x0,(%rax,%rax,1)
4000ff: 00 01 add %al,(%rcx)
400101: 00 00 add %al,(%rax)
400103: 00 00 add %al,(%rax)
400105: 00 00 add %al,(%rax)
...
Disassembly of section .text:
0000000000401000 <_start>:
401000: 48 c7 c0 01 00 00 00 mov $0x1,%rax
401007: 48 c7 c3 ff ff ff 7f mov $0x7fffffff,%rbx
000000000040100e <L1>:
40100e: 48 39 d8 cmp %rbx,%rax
401011: 74 06 je 401019 <L2>
401013: 48 83 c0 01 add $0x1,%rax
401017: eb f5 jmp 40100e <L1>
0000000000401019 <L2>:
401019: 48 89 c3 mov %rax,%rbx
40101c: 48 c7 c0 01 00 00 00 mov $0x1,%rax
401023: cd 80 int $0x80
$ strip --remove-section=.note.gnu.property test
strip: test: warning: empty loadable segment detected at vaddr=0x400000, is this intentional?
$ objdump -D test
test: file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <.text>:
401000: 48 c7 c0 01 00 00 00 mov $0x1,%rax
401007: 48 c7 c3 ff ff ff 7f mov $0x7fffffff,%rbx
40100e: 48 39 d8 cmp %rbx,%rax
401011: 74 06 je 0x401019
401013: 48 83 c0 01 add $0x1,%rax
401017: eb f5 jmp 0x40100e
401019: 48 89 c3 mov %rax,%rbx
40101c: 48 c7 c0 01 00 00 00 mov $0x1,%rax
401023: cd 80 int $0x80

objcopy --remove-section .note.gnu.property test
works for me.

Mifare Desfire Wrapped Mode: How to calculate CMAC?

When using Desfire native wrapped APDUs to communicate with the card, which parts of the command and response must be used to calculate CMAC?
After successful authentication, I have the following session key:
Session Key: 7CCEBF73356F21C9191E87472F9D0EA2
Then when I send a GetKeyVersion command, card returns the following CMAC which I'm trying to verify:
<< 90 64 00 00 01 00 00
>> 00 3376289145DA8C27 9100
I have implemented CMAC algorithm according to "NIST special publication 800-38B" and made sure it is correct. But I don't know which parts of command and response APDUs must be used to calculate CMAC.
I am using TDES, so MAC is 8 bytes.

I have been looking at the exact same issue for the last few days and I think I can at least give you some pointers. Getting everything 'just so' has taken some time and the documentation from NXP (assuming you have access) is a little difficult to interpret in some cases.
So, as you probably know, you need to calculate the CMAC (and update your init vec) on transmit as well as receive. You need to save the CMAC each time you calculate it as the init vec for the next crypto operation (whether CMAC or encryption etc).
When calculating the CMAC for your example the data to feed into your CMAC algorithm is the INS byte (0x64) and the command data (0x00). Of course this will be padded etc as specified by CMAC. Note, however, that you do not calculate the CMAC across the entire APDU wrapping (i.e. 90 64 00 00 01 00 00) just the INS byte and data payload is used.
On receive you need to take the data (0x00) and the second status byte (also 0x00) and calculate the CMAC over that. It's not important in this example but order is important here. You use the response body (excluding the CMAC) then SW2.
Note that only half of the CMAC is actually sent - CMAC should yield 16 bytes and the card is sending the first 8 bytes.
There were a few other things that held me up including:
I was calculating the session key incorrectly - it is worth double checking this if things are not coming out as you'd expect
I interpreted the documentation to say that the entire APDU structure is used to calculate the CMAC (hard to read them any other way tbh)
I am still working on calculating the response from a Write Data command correctly. The command succeeds but I can't validate the CMAC. I do know that Write Data is not padded with CMAC padding but just zeros - not yet sure what else I've missed.
Finally, here is a real example from communicating with a card from my logs:
Authentication is complete (AES) and the session key is determined to be F92E48F9A6C34722A90EA29CFA0C3D12; init vec is zeros
I'm going to send the Get Key Version command (as in your example) so I calculate CMAC over 6400 and get 1200551CA7E2F49514A1324B7E3428F1 (which is now my init vec for the next calculation)
Send 90640000010000 to the card and receive 00C929939C467434A8 (status is 9100).
Calculate CMAC over 00 00 and get C929939C467434A8A29AB2C40B977B83 (and update init vec for next calculation)
The first half of our CMAC from step #4 matches the 8 byte received from the card in step #3

Sry for my English,- its terrible :) but it's not my native language. I'm Russian.
Check first MSB (7 - bit) of array[0] and then shiffting this to the left. And then XOR if MSB 7 bit was == 1;
Or save first MSB bit of array[0] and after shiffting put this bit at the end of array[15] at the end (LSB bit).
Just proof it's here:
https://www.nxp.com/docs/en/application-note/AN10922.pdf
Try this way:
Zeros <- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SessionKey <- 00 01 02 03 E3 27 64 0C 0C 0D 0E 0F 5C 5D B9 D5
Data <- 6F 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00
First u have to encrypt 16 bytes (zeros) with SesionKey;
enc_aes_128_ecb(Zeros);
And u get EncryptedData.
EncryptedData <- 3D 08 A2 49 D9 71 58 EA 75 73 18 F2 FA 6A 27 AC
Check bit 7 [MSB - LSB] of EncryptedData[0] == 1? switch i to true;
bool i = false;
if (EncryptedData[0] & 0x80){
i = true;
}
Then do Shiffting of all EncryptedData to 1 bit <<.
ShiftLeft(EncryptedData,16);
And now, when i == true - XOR the last byte [15] with 0x87
if (i){
ShiftedEncryptedData[15] ^= 0x87;
}
7A 11 44 93 B2 E2 B1 D4 EA E6 31 E5 F4 D4 4F 58
Save it as KEY_1.
Try bit 7 [MSB - LSB] of ShiftedEncryptedData[0] == 1?
i = false;
if (ShiftedEncryptedData[0] & 0x80){
i = true;
}
Then do Shiffting of all ShiftedEncryptedData to 1 bit <<.
ShiftLeft(ShiftedEncryptedData,16);
And now, when i == true - XOR the last byte [15] with 0x87
if (i){
ShiftedEncryptedData[15] ^= 0x87;
}
F4 22 89 27 65 C5 63 A9 D5 CC 63 CB E9 A8 9E B0
Save it as KEY_2.
Now we take our Data (6F 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00)
As Michael say's - pad command with 0x80 0x00...
XOR Data with KEY_2 - if command was padded, or KEY_1 if don't.
If we have more like 16 bytes (32 for example) u have to XOR just last 16 bytes.
Then encrypt it:
enc_aes_128_ecb(Data);
Now u have a CMAC.
CD C0 52 62 6D F6 60 CA 9B C1 09 FF EF 64 1A E3
Zeros <- 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SessionKey <- 00 01 02 03 E3 27 64 0C 0C 0D 0E 0F 5C 5D B9 D5
Key_1 <- 7A 11 44 93 B2 E2 B1 D4 EA E6 31 E5 F4 D4 4F 58
Key_2 <- F4 22 89 27 65 C5 63 A9 D5 CC 63 CB E9 A8 9E B0
Data <- 6F 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CMAC <- CD C0 52 62 6D F6 60 CA 9B C1 09 FF EF 64 1A E3
C/C++ function:
void ShiftLeft(byte *data, byte dataLen){
for (int n = 0; n < dataLen - 1; n++) {
data[n] = ((data[n] << 1) | ((data[n+1] >> 7)&0x01));
}
data[dataLen - 1] <<= 1;
}
Have a nice day :)

How to disassembly in Eclipse .exe file made by a C program?

I need to disassemble an executable file made from C .
My lecturer said to use GDB (I need to use a linux platform , so I'm using Eclipse Indigo
under VMWARE virtual machine) but I don't like GDB .
What I have at the moment is only a .exe file . Can I use Eclipse to disassemble that file ?
I have 6 phases that I need to decipher , the program is known as "Defusing a Binary Bomb"
. So , in order to do that I need to disassemble that file , but I can't seem to find a way to do that in Eclipse ...
Can you please explain how do I do disassembly in Eclipse ?

Why not use objdump?
$ objdump -d print_jmp_buf
print_jmp_buf: file format elf64-x86-64
Disassembly of section .init:
00000000004003f8 <_init>:
4003f8: 48 83 ec 08 sub $0x8,%rsp
4003fc: e8 7b 00 00 00 callq 40047c <call_gmon_start>
<... snip ... >
0000000000400534 <main>:
400534: 55 push %rbp
400535: 53 push %rbx
400536: 48 81 ec d8 00 00 00 sub $0xd8,%rsp
40053d: 48 89 e7 mov %rsp,%rdi
400540: e8 fb fe ff ff callq 400440 <_setjmp#plt>
400545: ba 08 00 00 00 mov $0x8,%edx
40054a: be 40 00 00 00 mov $0x40,%esi
40054f: bf 7c 06 40 00 mov $0x40067c,%edi
400554: b8 00 00 00 00 mov $0x0,%eax
400559: e8 c2 fe ff ff callq 400420 <printf#plt>
40055e: 48 89 e3 mov %rsp,%rbx
400561: 48 8d 6c 24 40 lea 0x40(%rsp),%rbp
400566: 48 8b 33 mov (%rbx),%rsi
400569: bf 83 06 40 00 mov $0x400683,%edi
40056e: b8 00 00 00 00 mov $0x0,%eax
400573: e8 a8 fe ff ff callq 400420 <printf#plt>
400578: 48 83 c3 08 add $0x8,%rbx
40057c: 48 39 eb cmp %rbp,%rbx
40057f: 75 e5 jne 400566 <main+0x32>
400581: b8 00 00 00 00 mov $0x0,%eax
400586: 48 81 c4 d8 00 00 00 add $0xd8,%rsp
40058d: 5b pop %rbx
40058e: 5d pop %rbp
40058f: c3 retq
<...snip...>

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse