Have just had a new place. http://en.chys.info.
All existing posts here have been copied over.
Linux’s vsyscall
It is obvious that querying the current time can in no way be done completely in userspace. However,
Let’s disassemble glibc:
It seems glibc is redirecting the function call to something fixed at virtual address
Then I found out it was the so-called vsyscall (virtual system call) mechanism, which Linux used as an effort to make certain system calls as fast as possible. This does not involve the
The vsyscalls are part of the kernel, but the kernel pages containing them are executable with userspace privileges. And they’re mapped to fixed addresses in the virtual memory[1].
There are currently 3 vsyscalls in Linux x86_64:
[1] I really hate Microsoft’s use of the term ‘virtual memory’ to refer to swapping files in disks! It once confused me so much..
strace
does not record any system call used by the time
function in Linux x86_64.Let’s disassemble glibc:
$ objdump -d /lib64/libc-2.9.so | fgrep -A5 '<time>:' 000000000008a510 <time>: 8a510: 48 83 ec 08 sub $0x8,%rsp 8a514: 48 c7 c0 00 04 60 ff mov $0xffffffffff600400,%rax 8a51b: ff d0 callq *%rax 8a51d: 48 83 c4 08 add $0x8,%rsp 8a521: c3 retq
It seems glibc is redirecting the function call to something fixed at virtual address
0xffffffffff600400
. But what is there?Then I found out it was the so-called vsyscall (virtual system call) mechanism, which Linux used as an effort to make certain system calls as fast as possible. This does not involve the
syscall
instruction and is therefore ignored by strace
.The vsyscalls are part of the kernel, but the kernel pages containing them are executable with userspace privileges. And they’re mapped to fixed addresses in the virtual memory[1].
There are currently 3 vsyscalls in Linux x86_64:
gettimeofday
, time
and getcpu
. Their locations in the virtual memory can be found with the VSYSCALL_ADDR
macro defined in /usr/include/asm/vsyscall.h
:NOTE: We do not need to use vsyscalls explicitly. The corresponding glibc wrappers (for getcpu, it’s#ifndef _ASM_X86_VSYSCALL_H #define _ASM_X86_VSYSCALL_H enum vsyscall_num { __NR_vgettimeofday, __NR_vtime, __NR_vgetcpu, }; #define VSYSCALL_START (-10UL << 20) #define VSYSCALL_SIZE 1024 #define VSYSCALL_END (-2UL << 20) #define VSYSCALL_MAPPED_PAGES 1 #define VSYSCALL_ADDR(vsyscall_nr) (VSYSCALL_START+VSYSCALL_SIZE*(vsyscall_nr)) #endif /* _ASM_X86_VSYSCALL_H */
sched_getcpu
) already take advantage of them.[1] I really hate Microsoft’s use of the term ‘virtual memory’ to refer to swapping files in disks! It once confused me so much..
Difference between dup(0) and open("/dev/fd/0",...);
I believe APUE (2nd ed.; Sec. 3.16) is not correct.
APUE says
A test program:
Let’s run the program with an empty
Now, let’s comment out Line 9 and uncomment line 8 and try it again.
First I ran it in Solaris, the
Try it again in Linux - It was successful!
It seems that in Linux,
Let’s try it again with a shell script:
Run it in Linux (with DASH or BASH): Both outputed ‘Hello world’.
Run it in Solaris (with Bourne shell and BASH): Both failed, outputting nothing (Bourne shell) or failing with ‘Bad file number’ (BASH).
Conclusion:
(1) Solaris handles
(2) Linux simply consider
(I’ll try later how Linux handles
Kernels used in the above tests:
Linux:
Solaris:
APUE says
fd = open("/dev/fd/0", mode);
is equivalent to fd = dup (0);
, and mode
is completely ignored. It seems this is the case in Solaris, but wrong in Linux. (I don’t have access to other Unices at this moment.)A test program:
01 #include <unistd.h> 02 #include <fcntl.h> 03 04 int main () 05 { 06 close (0); 07 printf ("%d\n", open ("a.txt", O_RDONLY)); // Should be 0 08 //int f2 = open ("/dev/fd/0", O_WRONLY); 09 int f2 = dup(0); 10 printf ("%d\n", f2); 11 write (f2, "Hello world\n", 12); 12 return 0; 13 }
Let’s run the program with an empty
a.txt
. Certainly the write
function in Line 11 is going to fail.Now, let’s comment out Line 9 and uncomment line 8 and try it again.
First I ran it in Solaris, the
write
call still failed. The behavior is like what APUE tells us.Try it again in Linux - It was successful!
It seems that in Linux,
/dev/fd/0
is considered by open
as nothing but a normal symlink to a.txt
. So it returns a completely new descriptor instead of a duplicate of the old.Let’s try it again with a shell script:
rm -f a.txt touch a.txt exec 0<a.txt exec 3>/dev/fd/0 echo 'Hello world' >&3 cat a.txt
Run it in Linux (with DASH or BASH): Both outputed ‘Hello world’.
Run it in Solaris (with Bourne shell and BASH): Both failed, outputting nothing (Bourne shell) or failing with ‘Bad file number’ (BASH).
Conclusion:
(1) Solaris handles
/dev/fd/..
specially, as APUE tells us;(2) Linux simply consider
/dev/fd/0
a symlink to the actual file.(I’ll try later how Linux handles
open("/dev/fd/0",mode)
if the descriptor is an anonymous pipe or socket or something else that a normal symlink is unable to link to.Kernels used in the above tests:
Linux:
Linux desktop 2.6.28-gentoo #4 SMP Mon Jan 12 17:39:23 CST 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz GenuineIntel GNU/Linux
Solaris:
SunOS caesar 5.8 Generic_117350-51 sun4u sparc SUNW,Ultra-80 Solaris
gspca in Linux 2.6.27
only works with v4l2, but not v4l. So it can lead to problems - programs using v4l gives strange pictures as well as annoying error messages.
My webcam worked with gspcav1 and Linux 2.6.26, but it failed in Linux 2.6.27 (with its in-kernel gspca drivers):
Solution: Install libv4l, and use a command like this:
Reference
Linux kernel bug #11860.
My webcam worked with gspcav1 and Linux 2.6.26, but it failed in Linux 2.6.27 (with its in-kernel gspca drivers):
>>cmcapture err -1 cvsync err : Invalid argument cmcapture: Invalid argument
Solution: Install libv4l, and use a command like this:
LD_PRELOAD=/usr/lib/libv4l/v4l2convert.so skype
.Reference
Linux kernel bug #11860.
BASH’s compat31 option
Several of my bash scripts failed when I migrated from Debian to Gentoo almost one year ago for the different ways bash interpretes commands like this:
Today I finally found the reason: I was using bash 3.1 in Debian and 3.2 in Gentoo. Bash 3.2 by default mandates that regular expressions not be surrounded by quotes; however, the behavior can be modified using
[[ "$x" =~ '^[0-9]$' ]]
. This command succeeded in Debian when $x
is a single digit, but failed in Gentoo. I had to remove the single quotes surround the regular expression to make it work in Gentoo.Today I finally found the reason: I was using bash 3.1 in Debian and 3.2 in Gentoo. Bash 3.2 by default mandates that regular expressions not be surrounded by quotes; however, the behavior can be modified using
shopt -s compat31
.
Leap year bug crashes Zune
Microsoft’s 30GB Zune players fail to work today (Dec 31).
The problem has been identified - A bug in the freescale firmware leads to an infinite loop on the last day of a leap year.
If such poor codes were found in an airplane, or a medical device, ooops, it should be terrible..
The problem has been identified - A bug in the freescale firmware leads to an infinite loop on the last day of a leap year.
year = ORIGINYEAR; /* = 1980 */ while (days > 365) { if (IsLeapYear(year)) { if (days > 366) { days -= 366; year += 1; } } else { days -= 365; year += 1; } }
If such poor codes were found in an airplane, or a medical device, ooops, it should be terrible..
Migrating to EXT4
Ext4, the successor to ext3 which was formerly known as ext4dev, is marked stable in Linux kernel 2.6.28, meaning the Linux kernel team now recommends using ext4 in production.
To convert a file system from ext3 to ext4, use
Finally do not forget to modify
An ext4 file system created this way is not a “true” ext4 - the extents feature, the main advantage of ext4 comapred to ext3, is not automatically applied to old files. New files created afterwards are in the extents format.
Unlike the 100% backward compatibility of ext3 with ext2, an ext4 file system can no longer be mounted as if it were an ext3, unless the extents feature is disabled. (If you want to disable extents, why not simply use ext3?)
To convert a file system from ext3 to ext4, use
tune2fs -O extents /dev/DEVand remount the file system as ext4. (Two
e2fsck
runs are recommended before and after tune2fs
.) Some documentations also include the -E test_fs
option. This is not necessary now since ext4 is no longer experimental.Finally do not forget to modify
/etc/fstab
.An ext4 file system created this way is not a “true” ext4 - the extents feature, the main advantage of ext4 comapred to ext3, is not automatically applied to old files. New files created afterwards are in the extents format.
Unlike the 100% backward compatibility of ext3 with ext2, an ext4 file system can no longer be mounted as if it were an ext3, unless the extents feature is disabled. (If you want to disable extents, why not simply use ext3?)
Subscribe to:
Posts (Atom)