END{ for (x=.14; x<=.15; x+=.01) { print "x is", x; } for (y=.04; y<=.05; y+=.01) { print "y is", y; } }What do you expect its output to be? Well, on quite a few different machines, it's:
% awk 'END{ for (x=.14; x<=.15; x+=.01) { print "x is", x; } for (y=.04; y<=.05; y+=.01) { print "y is", y;}}' /dev/null x is 0.14 y is 0.04 y is 0.05What explains this? (It's not awk-specific, by the way; try the equivalent C program, for instance.)
Solution hint: how are numbers represented on a computer? (I am a fan of CS322.)
echo "the unix" | awk '/[W-Z]/{print "yes"}'where you might not expect any output. And yet, on my home linux machine,
% echo "the unix" | awk '/[W-Z]/{print "yes"}' yesHuh? Diagnostics:
% awk --version | head -1 GNU Awk 3.1.3Running at Cornell on an entirely different linux machine:
% echo "the unix" | awk '/[W-Z]/{print "yes"}' yesBack home:
% echo "the unix" | awk '/[X-Z]/{print "yes"}' %Wha?!! And then, on a mac laptop:
% echo "the unix" | awk '/[W-Z]/{print "yes"}' %Whoa. Is it a mac vs. unix thing?
Solution: A Google search for gawk problem uppercase range yields a promising-snippet page:
http://ftp.wayne.edu/pub/gnu/Manuals/gawk-3.1.0/html_chapter/gawk_4.htmlHowever, I get 404'ed. Luckily, there's still the cached version around (thank you, Google!): http://216.239.51.104/search?q=cache:JAQwH7uqCbwJ:ftp.wayne.edu/pub/gnu/Manuals/gawk-3.1.0/html_chapter/gawk_4.html+gawk+problem+uppercase+range&hl=en&ct=clnk&cd=18&gl=us So one can search around for a "clean" version, e.g.
http://www.delorie.com/gnu/docs/gawk/gawk_29.htmlwhich is the gawk user's guide. It says:
Within a character list, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, using the locale's collating sequence and character set. For example, in the default C locale, `[a-dx-z]' is equivalent to `[abcdxyz]'. Many locales sort characters in dictionary order, and in these locales, `[a-dx-z]' is typically not equivalent to `[abcdxyz]'; instead it might be equivalent to `[aBbCcDdxXyYz]', for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value `C' [or POSIX is a possible choice, too -- LL.].Indeed,
% echo "the unix" | awk '/[WXYZ]/{print "yes"}' % % echo 3 | awk '{printf("X\nx\nW\nY\nZ\n")}' | sort W x X Y Z % % setenv LC_COLLATE POSIX % echo 3 | awk '{printf("X\nx\nW\nY\nZ\n")}' | sort W X Y Z x % echo "the unix" | awk '/[W-Z]/{print "yes"}' %
And this is the behavior one had hoped for ...
Phew!
Back to Lillian Lee's home page.