Linux’s vsyscall

It is obvious that querying the current time can in no way be done completely in userspace. However, strace does not record any system call used by the time function in Linux x86_64.

Let’s disassemble glibc:
$ objdump -d /lib64/libc-2.9.so | fgrep -A5 '<time>:'
000000000008a510 <time>:
   8a510:       48 83 ec 08             sub    $0x8,%rsp
   8a514:       48 c7 c0 00 04 60 ff    mov    $0xffffffffff600400,%rax
   8a51b:       ff d0                   callq  *%rax
   8a51d:       48 83 c4 08             add    $0x8,%rsp
   8a521:       c3                      retq

It seems glibc is redirecting the function call to something fixed at virtual address 0xffffffffff600400. But what is there?

Then I found out it was the so-called vsyscall (virtual system call) mechanism, which Linux used as an effort to make certain system calls as fast as possible. This does not involve the syscall instruction and is therefore ignored by strace.

The vsyscalls are part of the kernel, but the kernel pages containing them are executable with userspace privileges. And they’re mapped to fixed addresses in the virtual memory[1].

There are currently 3 vsyscalls in Linux x86_64: gettimeofday, time and getcpu. Their locations in the virtual memory can be found with the VSYSCALL_ADDR macro defined in /usr/include/asm/vsyscall.h:
#ifndef _ASM_X86_VSYSCALL_H
#define _ASM_X86_VSYSCALL_H

enum vsyscall_num {
    __NR_vgettimeofday,
    __NR_vtime,
    __NR_vgetcpu,
};

#define VSYSCALL_START (-10UL << 20)
#define VSYSCALL_SIZE 1024
#define VSYSCALL_END (-2UL << 20)
#define VSYSCALL_MAPPED_PAGES 1
#define VSYSCALL_ADDR(vsyscall_nr) (VSYSCALL_START+VSYSCALL_SIZE*(vsyscall_nr))


#endif /* _ASM_X86_VSYSCALL_H */
NOTE: We do not need to use vsyscalls explicitly. The corresponding glibc wrappers (for getcpu, it’s sched_getcpu) already take advantage of them.


[1] I really hate Microsoft’s use of the term ‘virtual memory’ to refer to swapping files in disks! It once confused me so much..

3 comments:

  1. I wouldn't actually call those "system calls", and I would say that they are in fact done "completely in userspace": you can do getcpu() with the CPUID instruction, and get the time with RDTSC ("read from timestamp counter"), both of which can execute in userspace (although it looks like prctl can disable RDTSC if you want to explicitly do so).

    ReplyDelete
  2. rdtsc is not recommended on SMP or multi-core systems as there can be skew across individual cores. If your process is migrated from one core to another, you could actually see time move backwards! :)

    I believe gettimeofday is the recommended method and should be very fast.

    ReplyDelete
  3. Microsoft is right, virtual memory is for providing more memory than what is physically available. Swapping is the mechanism for achieve it.

    You could use "Virtual memory space" instead "Virtual memory" (if you want no be confused)

    ReplyDelete