Linux’s vsyscall

It is obvious that querying the current time can in no way be done completely in userspace. However, strace does not record any system call used by the time function in Linux x86_64.

Let’s disassemble glibc:
$ objdump -d /lib64/libc-2.9.so | fgrep -A5 '<time>:'
000000000008a510 <time>:
   8a510:       48 83 ec 08             sub    $0x8,%rsp
   8a514:       48 c7 c0 00 04 60 ff    mov    $0xffffffffff600400,%rax
   8a51b:       ff d0                   callq  *%rax
   8a51d:       48 83 c4 08             add    $0x8,%rsp
   8a521:       c3                      retq

It seems glibc is redirecting the function call to something fixed at virtual address 0xffffffffff600400. But what is there?

Then I found out it was the so-called vsyscall (virtual system call) mechanism, which Linux used as an effort to make certain system calls as fast as possible. This does not involve the syscall instruction and is therefore ignored by strace.

The vsyscalls are part of the kernel, but the kernel pages containing them are executable with userspace privileges. And they’re mapped to fixed addresses in the virtual memory[1].

There are currently 3 vsyscalls in Linux x86_64: gettimeofday, time and getcpu. Their locations in the virtual memory can be found with the VSYSCALL_ADDR macro defined in /usr/include/asm/vsyscall.h:
#ifndef _ASM_X86_VSYSCALL_H
#define _ASM_X86_VSYSCALL_H

enum vsyscall_num {
    __NR_vgettimeofday,
    __NR_vtime,
    __NR_vgetcpu,
};

#define VSYSCALL_START (-10UL << 20)
#define VSYSCALL_SIZE 1024
#define VSYSCALL_END (-2UL << 20)
#define VSYSCALL_MAPPED_PAGES 1
#define VSYSCALL_ADDR(vsyscall_nr) (VSYSCALL_START+VSYSCALL_SIZE*(vsyscall_nr))


#endif /* _ASM_X86_VSYSCALL_H */
NOTE: We do not need to use vsyscalls explicitly. The corresponding glibc wrappers (for getcpu, it’s sched_getcpu) already take advantage of them.


[1] I really hate Microsoft’s use of the term ‘virtual memory’ to refer to swapping files in disks! It once confused me so much..

4 comments:

  1. I wouldn't actually call those "system calls", and I would say that they are in fact done "completely in userspace": you can do getcpu() with the CPUID instruction, and get the time with RDTSC ("read from timestamp counter"), both of which can execute in userspace (although it looks like prctl can disable RDTSC if you want to explicitly do so).

    ReplyDelete
  2. Microsoft is right, virtual memory is for providing more memory than what is physically available. Swapping is the mechanism for achieve it.

    You could use "Virtual memory space" instead "Virtual memory" (if you want no be confused)

    ReplyDelete
  3. I am cheerful to see this you tube video at this web site, so right now I am also going to add all my video clips at YouTube web site.

    Dorani intercom

    ReplyDelete
  4. TouchTec is a leading company in region with more than 10 years of experience that provides Security, Safety and Surveillance Solutions with high assurance to improve security and efficiencies for identity management, access to critical facilities, intelligence analysis, guest worker programs, and national identity programs
    CCTV for School
    Cctv Camera for Office

    ReplyDelete