ebuild.sh
had pipelines like this:find ..... -print0 |
while read -r -d $'\0' x; do
# Do something with file $x
done
This makes it possible to handle any strange filenames correctly, even if the filename contains newline (
'\n'
) or carriage return ('\r'
) characters. (Some other commands, including sort
and xargs
, have options to make null character the delimiter based on the same reason.)Because BASH internally uses C-style strings, in which
'\0'
is the terminator, read -d $'\0'
is essentially equivalent to read -d ''
. This is why I believed read
did not accept null-delimited strings. However, it turns out that BASH actually handles this correctly.I checked BASH’s souce code and found the delimiter was simply determined by
delim = *list_optarg;
(bash-3.2/builtins/read.def
, line 296) where list_optarg
points to the argument following -d
. Therefore, it makes no difference to the value of delim
whether $'\0'
or ''
is used.
This is fantastic, thank you. I've encountered this problem in years past, and have been struggling with it the past couple days. If only bash's "for" loop could be used this way, it would be more elegant.
ReplyDeleteDon't forget to quote the $x as in "$x" :)
ReplyDeleteHmm, I discovered this same thing today myself, then I realized: it is still not safe! It appears to handle the null just fine, but it cannot handle trailing newlines, they are still gobbled up. Someone should fix 'read' to not do gobble them when the delimiter is null.
ReplyDeleteThe 'readarray' command strangely does support trailing newlines (it actually keeps them in the array), but since it does not support defining an alternate delimiter (such as null) it is of no use!!!
But ultimately, it would be better if IFS supported null, then you could actually assign the values to an array without loosing your context (a problem which makes 'readarray' and 'read' way less useful even if they worked properly).
If you don't need to split the line, just its whole content (ie. when reading filenames), you can use that:
DeleteIFS=;
find -print0 | while read -r -d $'\0' X
find -print0 | while IFS= read -r -d $'\0' X
Deleteavoids affecting $IFS outside the loop
Very useful for reading /proc/$$/cmdline
ReplyDeleteThanks, this post helps me much.
ReplyDeleteNice find, but sadly read still trims values, so filenames cannot have spaces at the end:
ReplyDeleteecho -e ' test \0 string ' |
{ read -rd '' s; read -rd '' x; echo "-$s- -$x-" | od -tx1z; }
0000000 2d 74 65 73 74 2d 20 2d 73 74 72 69 6e 67 2d 0a >-test- -string-.<
But sometimes there is a workaround
find . -printf "%p.\\0" |
while read -rd '' name;
do name="${name%.}";
echo "-$name-";
done
Hi, you're misleaded: $'\0' won't ever be a valid command line argument as NUL character is not a valid character for command lines and variable in bash. What happens here is that \0 is silently removed, and what you are doing is:
ReplyDeleteread -r -d '' X
which happens to be understood by read as separating on the NUL character. You can try your example by removing $'\0' it'll work the same.
Remember: variable and command line argument can't hold NUL characters: they are silently skipped (it's the only char they can't hold). Of course, pipes support all binary data: so you can write or read NUL characters.
oups, just saw your last paragraph about this ! ;) A good lesson that I should read the entire post carefully before answering ;)
DeleteHowever, being explicit about it, using -d $'\0' rather than '', makes it obvious what you're expecting the delimiter to be. Readability!
Deleteif [[ -z $(read -r -p "Hi there null:" imNULL) ]];then
ReplyDeleteecho "true"
else
echo "you are not nothing you are nothing not even null"
fi
Thanks for the tip! I was playing around with this some more and I have a slight possible improvement. In your example, the "while" command runs in a subshell. So if the commands try to set variables, the rest of the script won't see them. Instead, you can make the left hand side run in a subshell instead, like this:
ReplyDeletewhile IFS="" read -r -d $'\0' x ; do
echo ">>$x<<"
last_found="$x"
done < <( find -name \*.txt -print0 )
echo "last_found = $last_found"
IFS="" stops the line being read being broken up into words.