Discussion:
Broken silent glibc-specific assumptions uncovered by musl
(too old to reply)
Rich Felker
2013-05-17 17:37:10 UTC
Permalink
Hi all,

There's been at least one request for putting together a list of
"silent" application bugs uncovered by building against musl
applications which were previously used mainly/only with glibc. By
silent, I mean things that are not easily caught as configure- or
compile-time errors, but which cause the application to misbehave at
runtime.

I'm writing down here what I can think of off-hand. This list should
probably be expanded by the community and perhaps put on the wiki.
Here's what I have so far:



Assuming that dlerror is thread-local. (POSIX previously required it
to be global; as of 2008-TC1, either behavior is allowed.)

Assuming dlclose actually unloads a library (and calls dtors), so that
a future dlopen will reset static objects to their initial state (and
re-run ctors). (POSIX leaves this implementation-defined, and
unloading is impossible to do safely in general, so robust
implementations will not do it.)

Making wrong assumptions about fsync and fdatasync. (I'm not familiar
with this issue so somebody else will have to fill it in.)

Calling exit from global destructors. (If an application calls exit
more than once, the behavior is undefined.)

Assuming pthread_cancel unwinds and calls destructors. (Interaction
between cancellation and C++ is undefined.)

Use of GNU extensions in regular expressions, especially
backslash-prefixed versions of ERE operators in BRE. (Undefined.)

Assuming iconv reports characters that cannot be represented in the
dest charset via EILSEQ. (This behavior is non-conforming; POSIX
requires an implementation-defined replacement and positive return
value in this case.)

Use of deprecated charset aliases with iconv_open, for example, using
"UNICODE" to mean UCS-2. (The list of charsets is
implementation-defined, but common sense dictates using the IANA
preferred MIME charset names, and especially not misleading names.)
Szabolcs Nagy
2013-05-18 09:18:20 UTC
Permalink
Post by Rich Felker
Making wrong assumptions about fsync and fdatasync. (I'm not familiar
with this issue so somebody else will have to fill it in.)
apparently i didn't remember this correctly it's
just an old-linux vs new-linux break:
linux used to implement fsync,O_SYNC with the
same guarantees as fdatasync,O_DSYNC, so many
applications use fsync when they actually mean
fdatasync (faster, no mtime sync)

then some filesystems started to support O_SYNC
properly with a new flag, but it took some time
to trickle down as distros often use older libc
(where the O_DSYNC and O_SYNC definition was the
same so even on new kernel the applications kept
getting the old flag, hence i remembered it as
a libc issue: when i first tried musl on some
storage application it was significantly slower
due to this difference)

other runtime differences:
- "%Ld" instead of "%lld" as mentioned by sh4rm4 on irc
- lfs64 problems: eg printing off_t with "%d"
- serializing abi incompatible structures with (char*) cast
- relying on some locale specific behaviour (LC_NUMERIC)
- /proc fs issue with writev in musl stdio
- relying on LD_* or other env vars for glibc or the loader
- relying on /etc/* files used by glibc or the loader
- dlopen with RTLD_LAZY
- timezone files are not yet supported in musl
- crypt sha2 with long key input
- using constructors with priority gcc extension
- relying on the random generator algorithm to be the same
- musl's err does not print __progname, it might annoy one
- musl have some stubs
- ppc double-double long double
Daniel Cegiełka
2013-05-18 09:31:22 UTC
Permalink
Post by Szabolcs Nagy
- musl's err does not print __progname, it might annoy one
http://git.musl-libc.org/cgit/musl/commit/?id=b4ea63856a6af3d1bcc2db12537785371ac2024c

Daniel
Rich Felker
2013-05-18 14:15:51 UTC
Permalink
Post by Szabolcs Nagy
- "%Ld" instead of "%lld" as mentioned by sh4rm4 on irc
Yes.
Post by Szabolcs Nagy
- lfs64 problems: eg printing off_t with "%d"
Is this common? I would think most apps now are built with 64-bit
off_t, at least in distros, since mixing can be dangerous.
Post by Szabolcs Nagy
- serializing abi incompatible structures with (char*) cast
Have you seen examples of this?
Post by Szabolcs Nagy
- relying on some locale specific behaviour (LC_NUMERIC)
Do you mean relying on being able to request specific non-default
behavior through a hard-coded locale name? Or something else?
Post by Szabolcs Nagy
- /proc fs issue with writev in musl stdio
Yes, basically this is "assumptions about how stdio writes translate
into underlying writes to the file descriptor".

We might also add the issue that glibc incorrectly allows reads after
the EOF flag is set, and some apps might depend on this. I don't think
we've encountered any that do, but the glibc excuse for not fixing the
bug is that some might.
Post by Szabolcs Nagy
- relying on LD_* or other env vars for glibc or the loader
We do support the main ones that could be used reasonably by
applications: LD_PRELOAD and LD_LIBRARY_PATH. Most of the others are
for debugging/"audit" stuff, I think.
Post by Szabolcs Nagy
- relying on /etc/* files used by glibc or the loader
Examples?
Post by Szabolcs Nagy
- dlopen with RTLD_LAZY
"Assuming that undefined function references in loaded libraries will
not produce an error as long as another library is loaded to satisfy
the reference before the first use, or the function is never used."
Post by Szabolcs Nagy
- timezone files are not yet supported in musl
Yes, this is not so much relying on a glibc bug/quirk though, just
musl being incomplete in this area.
Post by Szabolcs Nagy
- crypt sha2 with long key input
Have you seen examples of this? Or is it just theoretical?
Post by Szabolcs Nagy
- using constructors with priority gcc extension
Do you know if musl just ignores the order, or fails to run them at
all?
Post by Szabolcs Nagy
- relying on the random generator algorithm to be the same
I doubt applications directly make this assumption, but for programs
that let you generate random images/sounds/etc. and give you the seed
as a way of reproducing the same output again, seeds would not
necessarily be compatible between different systems. Such programs
really should be using their own prngs, however.
Post by Szabolcs Nagy
- musl's err does not print __progname, it might annoy one
This should be fixed now that we have __progname, but I don't think it
_breaks_ anything.
Post by Szabolcs Nagy
- musl have some stubs
Yes, this too falls under musl deficiencies, though, at least in most
cases. I wonder if anyone feels up to making a list of stubs to
discuss which ones should be de-stub-ified.
Post by Szabolcs Nagy
- ppc double-double long double
I really doubt anyone depends on this or even wants it.. but is ppc
really using double-double still? The gcc docs make it sound like they
switched to IEEE quad when they made long double 128-bit, ignoring
what IBM did, but the glibc people seem to consider it double-double.

Rich
Szabolcs Nagy
2013-05-18 22:51:47 UTC
Permalink
Post by Rich Felker
Post by Szabolcs Nagy
- lfs64 problems: eg printing off_t with "%d"
Is this common? I would think most apps now are built with 64-bit
off_t, at least in distros, since mixing can be dangerous.
no, but when i checked the sabotage patches i saw a lot
of printf fixes (none of them are lfs64 related though)
and i was thinking about legitimate reasons why printf
would break and this came to mind
Post by Rich Felker
Post by Szabolcs Nagy
- serializing abi incompatible structures with (char*) cast
Have you seen examples of this?
i've seen a lot of fragile serialization code
but they did not work with obscure libc structures
so it is probably not relevant
Post by Rich Felker
Post by Szabolcs Nagy
- relying on some locale specific behaviour (LC_NUMERIC)
Do you mean relying on being able to request specific non-default
behavior through a hard-coded locale name? Or something else?
i mean behaviour of shell utils:
eg. sort -n checks for the locale specific decimal and thousand
separators so it can behave differently with glibc vs musl
in the same environment

it shouldnt break things and is not unexpected but observable
runtime behaviour difference
Post by Rich Felker
Post by Szabolcs Nagy
- relying on LD_* or other env vars for glibc or the loader
We do support the main ones that could be used reasonably by
applications: LD_PRELOAD and LD_LIBRARY_PATH. Most of the others are
for debugging/"audit" stuff, I think.
yes other ld_ stuff does not seem useful

there are many envvars used by glibc, eg now i found these:
TMPDIR (tempname)
MALLOC_* (malloc debug)
CHARSET (toutf8)
LANGUAGE (gettext)
DATEMSK (getdate)
MSGVERB (fmtmsg)
SEV_LEVEL (fmtmsg)
LOCPATH (newlocale)
_<PID>_GNU_nonoption_argv_flags_ (getopt)
POSIXLY_CORRECT (getopt, fnmatch)
ARGP_HELP_FMT (argp-help)
RESOLV_* (resolv)
RES_OPTIONS (resolv)
HOSTALIASES (resolv)
LOCALDOMAIN (resolv)
NLSPATH (catgets)
GCONV_PATH (gconv)
GETCONF_DIR (sysconf)
TZDIR (tzfile)
LIBC_FATAL_STDERR_ (libc_fatal)

none of them seem very interesting, but the list
shows some functionality that one may rely upon
(eg musl has no complex resolv thing and that
is a runtime behaviour difference)
Post by Rich Felker
Post by Szabolcs Nagy
- relying on /etc/* files used by glibc or the loader
Examples?
some paths glibc references but musl does not:

/etc/ld.so.preload (rtld)
/etc/.pwd.lock (shadow)
/etc/{host.conf,networks,protocols,..} (resolv)
/etc/gai.conf (getaddrinfo)
/dev/console (syslog)
/usr/lib/pt_chown (grantpt)
/usr/local/etc/zoneinfo (timezone)
Post by Rich Felker
Post by Szabolcs Nagy
- crypt sha2 with long key input
Have you seen examples of this? Or is it just theoretical?
theoretical

iirc solardiz said at some point that there might
be crypt based benchmarks which use larger keybuf
Post by Rich Felker
Post by Szabolcs Nagy
- using constructors with priority gcc extension
Do you know if musl just ignores the order, or fails to run them at
all?
ok, the priority is solved by the linker and musl runs them

here (i386) constructors are put into .ctors.* sections
which get sorted by the linker

on arm they are put into .init_array.*

it seems the linker and glibc support mixing these:
the order in which init things are run is

preinit_array
ctors (priority sorted by the linker)
init_array (priority sorted by the linker)

on i386 i have to explicitly request something to get
into an .init_array section, and then it will be run
by glibc but not by musl

i think musl does not support .preinit_array at all

these are probably rarely used features
Post by Rich Felker
Post by Szabolcs Nagy
- ppc double-double long double
I really doubt anyone depends on this or even wants it.. but is ppc
really using double-double still? The gcc docs make it sound like they
switched to IEEE quad when they made long double 128-bit, ignoring
what IBM did, but the glibc people seem to consider it double-double.
yes, i dont think ppl depend on this but you can never know

some applications may have different behaviour under
a glibc ld128ibm/ld128ieee toolchain vs a musl ld64 toolchain
Rich Felker
2013-05-19 22:08:32 UTC
Permalink
Post by Szabolcs Nagy
Post by Rich Felker
Post by Szabolcs Nagy
- relying on LD_* or other env vars for glibc or the loader
We do support the main ones that could be used reasonably by
applications: LD_PRELOAD and LD_LIBRARY_PATH. Most of the others are
for debugging/"audit" stuff, I think.
yes other ld_ stuff does not seem useful
[...]
CHARSET (toutf8)
What is toutf8? (Just curious)
Post by Szabolcs Nagy
DATEMSK (getdate)
This is POSIX and supported by musl. :)
Post by Szabolcs Nagy
MSGVERB (fmtmsg)
SEV_LEVEL (fmtmsg)
I believe these are standard too, but we presently don't have fmtmsg.
It's one of the few missing XSI interfaces.
Post by Szabolcs Nagy
RESOLV_* (resolv)
RES_OPTIONS (resolv)
HOSTALIASES (resolv)
LOCALDOMAIN (resolv)
Some of these may be desirable at some point.
Post by Szabolcs Nagy
NLSPATH (catgets)
I believe this is standard too.
Post by Szabolcs Nagy
Post by Rich Felker
Post by Szabolcs Nagy
- using constructors with priority gcc extension
Do you know if musl just ignores the order, or fails to run them at
all?
ok, the priority is solved by the linker and musl runs them
here (i386) constructors are put into .ctors.* sections
which get sorted by the linker
How does this work for dynamic linking? Is priority only respected
within a single DSO, and not between multiple DSOs?
Post by Szabolcs Nagy
on arm they are put into .init_array.*
the order in which init things are run is
preinit_array
ctors (priority sorted by the linker)
init_array (priority sorted by the linker)
on i386 i have to explicitly request something to get
into an .init_array section, and then it will be run
by glibc but not by musl
i think musl does not support .preinit_array at all
these are probably rarely used features
Yes, at some point we should probably revisit this. In addition, it
seems that the init_array stuff might eventually be used on more and
more archs, so we might need to investigate whether there's a way to
write the code for calling it in C rather than asm, and then somehow
merge the C and asm object files when generating crti.o and crtn.o...

Unfortunately, however, I'm skeptical of whether this can give
reasonable code generation that works for both PIC(PIE) and non-PIC
cases...
Post by Szabolcs Nagy
Post by Rich Felker
Post by Szabolcs Nagy
- ppc double-double long double
I really doubt anyone depends on this or even wants it.. but is ppc
really using double-double still? The gcc docs make it sound like they
switched to IEEE quad when they made long double 128-bit, ignoring
what IBM did, but the glibc people seem to consider it double-double.
yes, i dont think ppl depend on this but you can never know
some applications may have different behaviour under
a glibc ld128ibm/ld128ieee toolchain vs a musl ld64 toolchain
I think for this list, if we're going to publish it, we should focus
on glibc-specific assumptions that were actually found in practice.
Bringing in lots of theoretical ones just adds doubt to whether musl
will meet people's needs, and unless they're clearly marked and
separated from the issues we actually found, I think it makes the list
less informative -- the fact that we actually found certain types of
problems, rather than just reasoning about ones that might arise, is
in many ways the most informative aspect of the list. Of course, it
could be supplemented by an additional list of more theoretical
considerations.

Rich
Szabolcs Nagy
2013-05-20 00:17:05 UTC
Permalink
Post by Rich Felker
Post by Szabolcs Nagy
CHARSET (toutf8)
What is toutf8? (Just curious)
it is in libidn
"toutf8.c --- Convert strings from system locale into UTF-8."
Post by Rich Felker
Post by Szabolcs Nagy
here (i386) constructors are put into .ctors.* sections
which get sorted by the linker
How does this work for dynamic linking? Is priority only respected
within a single DSO, and not between multiple DSOs?
i think ordering is only guaranteed within a single dso
and this is not clearly documented

ld -verbose

shows the actual linker script that merges the
relevant sections
Rich Felker
2013-05-20 00:23:19 UTC
Permalink
Post by Szabolcs Nagy
Post by Rich Felker
Post by Szabolcs Nagy
CHARSET (toutf8)
What is toutf8? (Just curious)
it is in libidn
"toutf8.c --- Convert strings from system locale into UTF-8."
Oh. Speaking of which, we need to add IDN support at some point...
Post by Szabolcs Nagy
Post by Rich Felker
Post by Szabolcs Nagy
here (i386) constructors are put into .ctors.* sections
which get sorted by the linker
How does this work for dynamic linking? Is priority only respected
within a single DSO, and not between multiple DSOs?
i think ordering is only guaranteed within a single dso
and this is not clearly documented
Yes, I think that's the only approach that makes sense anyway. And it
makes the whole ctor-priority system even more ugly because it causes
program semantics to change depending on how the program is broken up
into shared libraries, counteracting all the hard work ELF did to make
dynamic linking transparent with respect to program semantics...

Rich

Loading...