mov %edi, %edi

Here is a simple C function:
long foo (unsigned a, unsigned b)
{
    return ((long)b<<32)|a;
}
Compile it with an x86-64-targeted GCC with proper optimizations enabled (-O2 for example), you get the following instructions (in AT&T-style assembly):
foo:
        movq    %rsi, %rax
        mov     %edi, %edi
        salq    $32, %rax
        orq     %rdi, %rax
        ret
Pay attention to the red line. Literally it means assigning the value of register edi to register edi. Five years ago, anybody would agree this instruction does nothing like nops. But in an x86-64 system, this is not the case.

In x86-64 assembly, any instruction with a 32-bit register as its destination zeroes the higher 32 bits of the corresponding 64-bit register at the same time. Consequently, the function of ‘mov %edi, %edi’ is zeroing bits 32 to 63 of register rdi while leaving the lower 32 bits (i.e., register edi) unchanged.

One may want to rewrite it with a more intuitive and instruction:
andq $0xffffffff, %rdi
But this does NOT assemble! Because $0x00000000ffffffff is not representable in signed 32-bit format, but 64-bit immediates are currently allowed only in mov instructions whose destination is a general-purpose register (such a mov is usually explicitly written as movabsq). So if one must use and, one need something like this:
movl $0xffffffff, %eax
andq %rax, %rdi
Remember the zeroing rule for operations on 32-bit registers, so ‘movl $0xffffffff, %eax’ is equivalent to ‘movabsq $0xffffffff, %rax’...

X86-64 assembly really is too ugly, at least in this sense...

Reference
[1] Gentle Introduction to x86-64 Assembly

No comments:

Post a Comment