Discussion:
Status towards next release (1.1.4)
(too old to reply)
Rich Felker
2014-07-12 05:10:35 UTC
Permalink
I think we're pretty well on-schedule for the next release. Here's a
summary of progress so far:

- Private futex support, not committed. If we can demonstrate any
performance benefit, it can be committed, but otherwise I'm inclined
to throw it out. There's no point in adding complexity with no
evidence of benefit.

- Locale framework. Right now this is mostly just a framework and does
nothing useful.

- Byte-based C locale, not committed. As discussed previously, this is
non-essential for conforming to current standards, so I'm inclined
to omit it for now. But if there's demand for it we can consider
adding it.

- Gettext/mo file lookup core. This is not integrated with libc yet,
but tested and working.

- Openrisc (or1k) port. Stefan Kristiansson's work seems basically
complete and is in the testing phase now. I'm hoping to merge it
in the next few days.

There are several things we need to focus on now:

- The Big Bikeshed: where to find locale files? These will be somewhat
musl-specific (to the extent that no other implementation uses the
design I have in mind, though it would be easy for others to use
it), so there's no existing practice to simply adopt. The files are
not machine-specific (we'll support either endianness .mo file) so
/usr/share (or other prefix variants) is the natural base location.

- Minor coding tasks for locale. Really, this is minor. The policy of
where to find the files is a much bigger issue to work out.

- Adding non-stub public gettext API. I'd like this to happen along
with the locale work since it uses the same core operation, but it
may turn out that there are various bloated gettext features which
applications use which we don't want in the core libc itself uses
for locale, in which case we'd end up with two implementations.

- What to do with if_nameindex and getifaddrs? This issue has been
deferred for a couple releases now so I really want to solve it this
time.

The other items on the roadmap are all secondary and related to ports.
I'll be happy if we can just get or1k into this release, since it's a
nice way to draw some publicity for both projects (musl and openrisc).
But if there's time, I might do the bits refactoring (and other
port-related cleanup) in this release cycle once or1k is committed.

Rich
Isaac Dunham
2014-07-12 06:02:28 UTC
Permalink
Post by Rich Felker
I think we're pretty well on-schedule for the next release. Here's a
<snip>
Post by Rich Felker
- Locale framework. Right now this is mostly just a framework and does
nothing useful.
- Byte-based C locale, not committed. As discussed previously, this is
non-essential for conforming to current standards, so I'm inclined
to omit it for now. But if there's demand for it we can consider
adding it.
I'd like to at least test this to see how well it works.
I just discovered that sword built with C++11 regex support dies with
complaints related to the locale:
terminate called after throwing an instance of 'std::runtime_error'
what(): locale::facet::_S_create_c_locale name not valid

...so I'm wondering whether this will improve compatability.
(I'm not eager to go hunt down the issue right now; I expect it's
some variant of the usual locale issues.)
Post by Rich Felker
- Gettext/mo file lookup core. This is not integrated with libc yet,
but tested and working.
- Openrisc (or1k) port. Stefan Kristiansson's work seems basically
complete and is in the testing phase now. I'm hoping to merge it
in the next few days.
- The Big Bikeshed: where to find locale files? These will be somewhat
musl-specific (to the extent that no other implementation uses the
design I have in mind, though it would be easy for others to use
it), so there's no existing practice to simply adopt. The files are
not machine-specific (we'll support either endianness .mo file) so
/usr/share (or other prefix variants) is the natural base location.
/usr/share/muslnls is awkward, maybe newnls?
I don't care exactly what gets decided, but a couple issues come to mind:
-the name should NOT be .../"locale" or any other name in use on Linux
systems. otherwise parallel installs break.
-it would be nice if the end of the path is at most 6 chars, since
paths have to be stored somewhere...
(Actually, this implies that 4 chars would be ideal:
"/usr/share/" is 11 non-zero bytes, then add 4, then NUL, making 16 bytes,
which shouldn't need any padding. This is, of course, decidedly premature
optimization. ;-) )
Post by Rich Felker
Rich
Thanks,
Isaac Dunham
Rich Felker
2014-07-12 14:26:06 UTC
Permalink
Post by Isaac Dunham
Post by Rich Felker
I think we're pretty well on-schedule for the next release. Here's a
<snip>
Post by Rich Felker
- Locale framework. Right now this is mostly just a framework and does
nothing useful.
- Byte-based C locale, not committed. As discussed previously, this is
non-essential for conforming to current standards, so I'm inclined
to omit it for now. But if there's demand for it we can consider
adding it.
I'd like to at least test this to see how well it works.
I just discovered that sword built with C++11 regex support dies with
terminate called after throwing an instance of 'std::runtime_error'
what(): locale::facet::_S_create_c_locale name not valid
What musl version? (1.1.3 or git?) I doubt this has anything to do
with musl's actual locale implementation, which has essentially no
outwardly-visible behavior right now, but we can check.

If you're not using git, see if git fixes it. 1.1.3 and earlier
rejected unknown locale names (anything but C, C.UTF-8, or POSIX).
Now, any name is accepted, and unknown names are all aliases for
C.UTF-8.
Post by Isaac Dunham
Post by Rich Felker
- The Big Bikeshed: where to find locale files? These will be somewhat
musl-specific (to the extent that no other implementation uses the
design I have in mind, though it would be easy for others to use
it), so there's no existing practice to simply adopt. The files are
not machine-specific (we'll support either endianness .mo file) so
/usr/share (or other prefix variants) is the natural base location.
/usr/share/muslnls is awkward, maybe newnls?
FWIW, on glibc it's a mix of /usr/share/locale (messages,
non-machine-specific) and /usr/lib/locale (nasty machine-specific
binary stuff for other locale categories).
Post by Isaac Dunham
-the name should NOT be .../"locale" or any other name in use on Linux
systems. otherwise parallel installs break.
Agreed. We should not use a pathname with existing precedent for an
incompatible purpose.
Post by Isaac Dunham
-it would be nice if the end of the path is at most 6 chars, since
paths have to be stored somewhere...
"/usr/share/" is 11 non-zero bytes, then add 4, then NUL, making 16 bytes,
which shouldn't need any padding. This is, of course, decidedly premature
optimization. ;-) )
Yes, I think it's premature optimization. I'd rather the name be clean
and reasonable to users than needlessly short. There's no fundamental
reason strings need padding to 16-byte boundaries anyway; if they are
padded as such, it's a toolchain issue and we should try to fix it at
the toolchain level.

Rich
Isaac Dunham
2014-07-12 19:13:54 UTC
Permalink
Post by Rich Felker
Post by Isaac Dunham
I'd like to at least test this to see how well it works.
I just discovered that sword built with C++11 regex support dies with
terminate called after throwing an instance of 'std::runtime_error'
what(): locale::facet::_S_create_c_locale name not valid
What musl version? (1.1.3 or git?) I doubt this has anything to do
with musl's actual locale implementation, which has essentially no
outwardly-visible behavior right now, but we can check.
If you're not using git, see if git fixes it. 1.1.3 and earlier
rejected unknown locale names (anything but C, C.UTF-8, or POSIX).
Now, any name is accepted, and unknown names are all aliases for
C.UTF-8.
I was using Alpine's package (1.1.3 and cherry-picked fixes).
But after running git pull; ./configure; make; the new libc.so does not
fix this problem (tried with both LANG and LC_ALL set to each of C,
C.UTF-8, and POSIX).

Also, this error happens with mongodb on glibc systems where
localization isn't properly set up, so the error happens somewhere in the C++
toolchain/library stack (libstdc++ or perhaps icu?).

Thanks,
Isaac Dunham
u***@aetey.se
2014-07-12 07:24:09 UTC
Permalink
Post by Rich Felker
- The Big Bikeshed: where to find locale files? These will be somewhat
musl-specific (to the extent that no other implementation uses the
design I have in mind, though it would be easy for others to use
it), so there's no existing practice to simply adopt. The files are
not machine-specific (we'll support either endianness .mo file) so
/usr/share (or other prefix variants) is the natural base location.
For me it looks like you take a wrong kind of responsibility and try to
make a decision which does not belong to a library developer.

This is an "integrator" decision, the one who knows how the library will
be used and what is the corresponding environment's policy of placing
stuff around in the file system.

In other words, as long as it is configurable, any "default" goes.
You can not know (and imho do _not_ have to pretend to) what is best or
sensible for the actual deployment.

As an "integrator" I am concerned in the following way:

- If locale is mostly static (additions or changes to locale
can be done at the same time as library recompilations/upgrades)
then a "default" placement is totally irrelevant, but I must be
able to choose the actual one at compilation time - I guess this is
expected and hence a non-issue

With the paranoia-hat on:

- if locale data is supposed to be available from more sources than the
library upstream (then potentially even with different licenses)
and/or if it is supposed to change often, then

I'd badly need a possibility to tell an application at runtime where
to look for the data (presumably via an environment variable specific
to musl).

Hope such kind of locale data is not expected to exist.

Regards,
Rune
Laurent Bercot
2014-07-12 08:44:48 UTC
Permalink
Post by u***@aetey.se
For me it looks like you take a wrong kind of responsibility and try to
make a decision which does not belong to a library developer.
This is an "integrator" decision, the one who knows how the library will
be used and what is the corresponding environment's policy of placing
stuff around in the file system.
In other words, as long as it is configurable, any "default" goes.
You can not know (and imho do _not_ have to pretend to) what is best or
sensible for the actual deployment.
- If locale is mostly static (additions or changes to locale
can be done at the same time as library recompilations/upgrades)
then a "default" placement is totally irrelevant, but I must be
able to choose the actual one at compilation time - I guess this is
expected and hence a non-issue
+1 and QFT.
Policy should not be included in software, but delegated to the user
(sysadmins and distributors). There should be a reasonable default, but
configurability is a lot more important than the exact default value.
--
Laurent
Rich Felker
2014-07-12 14:55:25 UTC
Permalink
Post by u***@aetey.se
Post by Rich Felker
- The Big Bikeshed: where to find locale files? These will be somewhat
musl-specific (to the extent that no other implementation uses the
design I have in mind, though it would be easy for others to use
it), so there's no existing practice to simply adopt. The files are
not machine-specific (we'll support either endianness .mo file) so
/usr/share (or other prefix variants) is the natural base location.
For me it looks like you take a wrong kind of responsibility and try to
make a decision which does not belong to a library developer.
This is an "integrator" decision, the one who knows how the library will
be used and what is the corresponding environment's policy of placing
stuff around in the file system.
In other words, as long as it is configurable, any "default" goes.
You can not know (and imho do _not_ have to pretend to) what is best or
sensible for the actual deployment.
I understand that configuring this matters for your usage case where
you're configuring ALL of the paths where configuration/data/etc. is
read from to isolate each program in its own bubble. However I don't
see any value in configuring this one location when other things (like
the place timezones are searched for) is fixed.
Post by u***@aetey.se
- If locale is mostly static (additions or changes to locale
can be done at the same time as library recompilations/upgrades)
then a "default" placement is totally irrelevant, but I must be
able to choose the actual one at compilation time - I guess this is
expected and hence a non-issue
No, the intent is that they're produced independently of musl, or at
least independently of my part of the development/maintenance process.
I don't want to be a locale maintainer. BTW, locale definitions are a
much bigger "imposing policy" issue than a standard pathname.
Post by u***@aetey.se
- if locale data is supposed to be available from more sources than the
library upstream (then potentially even with different licenses)
and/or if it is supposed to change often, then
I'd badly need a possibility to tell an application at runtime where
to look for the data (presumably via an environment variable specific
to musl).
Hope such kind of locale data is not expected to exist.
Runtime configuration of the path is a big problem for many usage
cases, possibly even if it's blocked for suid. The recent glibc
CVE-2014-0475 has me concerned and wanting to avoid any dubious
practices with how locales are searched out. This is potentially a
much bigger issue than timezones, because for timezones, invalid data
probably results in compromises no worse than a crash or information
leak. With locales, invalid data can result in full code execution
(via injection of %n into format strings, and possibly other ways).

On the other hand, runtime configuration is something I'd really like
to have, so that users can use locales that are not installed by the
system administrator and develop/test/debug locales without
installing. But this is a sufficiently big opening for environmental
state to alter the behavior of the program that I'm very concerned
about the safety of it and frustrated by the whole process...

Rich
u***@aetey.se
2014-07-12 16:29:44 UTC
Permalink
Post by Rich Felker
Post by u***@aetey.se
You can not know (and imho do _not_ have to pretend to) what is best or
sensible for the actual deployment.
I understand that configuring this matters for your usage case where
you're configuring ALL of the paths where configuration/data/etc. is
read from to isolate each program in its own bubble. However I don't
see any value in configuring this one location when other things (like
the place timezones are searched for) is fixed.
Exactly! :) It is hardly tenable to hardcode the path to any database,
including the timezone one. Fortunately TZ syntax allows escaping the trap
(so actually per design it is not strictly enforced how the user may
supply the timezone information, at least according to the gnu description).
Post by Rich Felker
Post by u***@aetey.se
- If locale is mostly static (additions or changes to locale
can be done at the same time as library recompilations/upgrades)
then a "default" placement is totally irrelevant, but I must be
able to choose the actual one at compilation time - I guess this is
expected and hence a non-issue
No, the intent is that they're produced independently of musl, or at
least independently of my part of the development/maintenance process.
I don't want to be a locale maintainer. BTW, locale definitions are a
much bigger "imposing policy" issue than a standard pathname.
Then the library should not postulate nor hardcode the location, given
that the expected maintenance routines for the data are unclear.
Post by Rich Felker
Runtime configuration of the path is a big problem for many usage
cases, possibly even if it's blocked for suid. The recent glibc
CVE-2014-0475 has me concerned and wanting to avoid any dubious
practices with how locales are searched out. This is potentially a
I understand your concern about security but disallowing something at
the library level just to prevent a certain possible mode of failure of
a third party's flawed security model? This feels almost like designing
flats without windows [no pun] to prevent children from falling out.
Post by Rich Felker
much bigger issue than timezones, because for timezones, invalid data
probably results in compromises no worse than a crash or information
leak. With locales, invalid data can result in full code execution
(via injection of %n into format strings, and possibly other ways).
Allowing a user to set environment variables is giving her freedom to
control her applications iow a policy question. The low level library has
no proper knowledge to make policy decisions.

Again, I feel you assume more responsibility for musl than is due.

The policy enforcer (ssh) would fare perfectly fine - just don't list
the hypothetical MUSL_LOCALE_DIR in the variables allowed to be set,
this will end the issue. Of course the Big Brother has to properly set
the variable if locales are supposed to be available - or compile in
the path to where he stores the "approved" locale defintions. Not worse
than this and safe - unless the policy maker wants "allow all variables
except a list", which is inherently unsafe.

So this doesn't look like a security concern for musl.
Post by Rich Felker
On the other hand, runtime configuration is something I'd really like
to have, so that users can use locales that are not installed by the
system administrator and develop/test/debug locales without
installing. But this is a sufficiently big opening for environmental
state to alter the behavior of the program that I'm very concerned
about the safety of it and frustrated by the whole process...
:(

If you strongly feel for providing hardwired and unmutable behaviour then
let the run-time envvar-driven choices be compile-time conditionals. This
also will save several bytes for the control freaks :) while still allowing
flexible deployment.

Most of the traditional paranoia about the code being mislead by the user
comes from the role of setuid in *nix which implies hardcoded references
"as much as possible". In a setuid-free milieux (which we always have in
a distributed/global context) this is a pure nuisance.

By the way, it is easy to wrap binaries, resetting/protecting/checking
variables accordingly to the actual purpose.

This means the extra protection in a more complete form is available
when needed, without putting it into the library and sacrificing
functionality.

Thanks for listening Rich, the decisions are yours anyway.

Regards,
Rune
Rich Felker
2014-07-12 17:00:08 UTC
Permalink
Post by u***@aetey.se
Post by Rich Felker
Runtime configuration of the path is a big problem for many usage
cases, possibly even if it's blocked for suid. The recent glibc
CVE-2014-0475 has me concerned and wanting to avoid any dubious
practices with how locales are searched out. This is potentially a
I understand your concern about security but disallowing something at
the library level just to prevent a certain possible mode of failure of
a third party's flawed security model? This feels almost like designing
flats without windows [no pun] to prevent children from falling out.
Post by Rich Felker
much bigger issue than timezones, because for timezones, invalid data
probably results in compromises no worse than a crash or information
leak. With locales, invalid data can result in full code execution
(via injection of %n into format strings, and possibly other ways).
Allowing a user to set environment variables is giving her freedom to
control her applications iow a policy question. The low level library has
no proper knowledge to make policy decisions.
Again, I feel you assume more responsibility for musl than is due.
I partly agree with you here, and that's why I've raised a question on
oss-security as to whether CVE-2014-0475 was even a valid
vulnerability rather than just an ordinary non-security bug.

However, format string vulnerabilities are also a sufficiently serious
issue that extra precautions need to be taken to avoid introducing
them in situations where it might be at all non-obvious that they
could arise. This is why (see my other email in the thread spun off
this one) I'm working on a design that avoids the format string issue
entirely.

I think we'll be able to work something out where locale path is
configurable locally (per-process), or at least where absolute paths
are allowed. Of course in suid processes both need to be forbidden;
until we can be sure of what's safe, it might be necessary just to
forbid all non-builtin locales for suid (libc.secure) programs.

Rich
u***@aetey.se
2014-07-12 17:15:29 UTC
Permalink
Post by Rich Felker
Of course in suid processes both need to be forbidden;
until we can be sure of what's safe, it might be necessary just to
forbid all non-builtin locales for suid (libc.secure) programs.
+1

Rune
Weldon Goree
2014-07-13 08:46:30 UTC
Permalink
Just because I figure someone should propose the most brute possible
strategy: what about storing the .mo data in the library itself? Port
the built-ins to the format, and you have a single code path for locale
access, and it doesn't involve persistent storage. If I'm understanding
your idea right and you're talking about the equivalent of
SYS_LC_MESSAGES and parts of LC_TIME and LC_COLLATE, this isn't nearly
as bloated as it sounds at first (particularly if one is putting, say, 4
locales in a given build rather than 446).

Now, obviously maintainers wouldn't like the choice of either 1 bloated
binary or 446 non-bloated binaries (or God forbid the Cartesian product
of all the possible locale combinations), and this kind of violates the
basic idea of locale that you shouldn't need to recompile software to
get it to speak French, but I just wanted to throw that idea out there.

Weldon
Rich Felker
2014-07-14 03:48:26 UTC
Permalink
Post by Weldon Goree
Just because I figure someone should propose the most brute possible
strategy: what about storing the .mo data in the library itself? Port
the built-ins to the format, and you have a single code path for locale
access, and it doesn't involve persistent storage. If I'm understanding
your idea right and you're talking about the equivalent of
SYS_LC_MESSAGES and parts of LC_TIME and LC_COLLATE, this isn't nearly
as bloated as it sounds at first (particularly if one is putting, say, 4
locales in a given build rather than 446).
Now, obviously maintainers wouldn't like the choice of either 1 bloated
binary or 446 non-bloated binaries (or God forbid the Cartesian product
of all the possible locale combinations), and this kind of violates the
basic idea of locale that you shouldn't need to recompile software to
get it to speak French, but I just wanted to throw that idea out there.
Indeed, this idea violates that and many other principles:

- That support for Unicode should be cheap (your idea makes
setlocale(), which any portable program supporting non-ASCII text
needs to call, pull in a huge part of the library)

- That the person compiling the software generally has no idea what
languages the user will care about.

- That while character encodings and character identity are
essentially finished, settled matters that won't change, language
and culture are fluid. A program with locale data hard-linked into
is is basically guaranteed not only to be incomplete in the future,
but outright WRONG in the future. The best analogy I can think of
would be hard-coding timezones into the binary.

And probably many others. So I don't think this idea is viable.

Rich
Rich Felker
2014-07-14 17:55:25 UTC
Permalink
Post by Rich Felker
Post by u***@aetey.se
Allowing a user to set environment variables is giving her freedom to
control her applications iow a policy question. The low level library has
no proper knowledge to make policy decisions.
Again, I feel you assume more responsibility for musl than is due.
I partly agree with you here, and that's why I've raised a question on
oss-security as to whether CVE-2014-0475 was even a valid
vulnerability rather than just an ordinary non-security bug.
See the answer from glibc's side here:

http://www.openwall.com/lists/oss-security/2014/07/14/3

They consider absolute pathnames or directory traversal outside of the
locale base to be a vulnerability, but allow the base to be overridden
via LOCPATH which also comes from the environment. To me this seems a
bit contradictory, but I _think_ the idea is that they see it as
important to accept and trust LC_* from the user even when the source
of these vars is a different privilege domain, so that a properly
localized environment can be provided to the user. (Presumably,
LOCPATH and other arbitrary non-whitelisted env vars would not be
accepted in such situations, and suid programs would not honor
LOCPATH.)

If we want to follow a similar approach, I think we should at least
consider using the same var (LOCPATH) and having the musl locale data
reside in a directory under that base, since this would avoid adding
new vars that users need to be aware of that could affect the behavior
and safety (but hopefully, we've covered all the safety issues) of
programs.

By the way, as far as absolute pathnames go, we're under no obligation
from POSIX to support them, since we do not support the POSIX
localedef system (leading / means the LC_* refers to a locale
definition built by the localedef utility). If we do decide we want to
support them, on the other hand, we should use a different syntax
so as not to overlap with the form for POSIX localedef (which we don't
support).

Rich

Matias A. Fonzo
2014-07-12 14:41:43 UTC
Permalink
Hello,
Post by Rich Felker
[..]
- The Big Bikeshed: where to find locale files? These will be somewhat
musl-specific (to the extent that no other implementation uses the design I
have in mind, though it would be easy for others to use it), so there's no
existing practice to simply adopt. The files are not machine-specific
(we'll support either endianness .mo file) so
/usr/share (or other prefix variants) is the natural base location.
Exist:
/usr/lib/locale (glibc)
/usr/share/locale - /usr/local/share/locale (most programs)
/usr/share/X11/locale (X11 programs)
/usr/share/nls (Message catalogs for Native language support)

The default for musl can be /usr/local/share/musl/locale
Rich Felker
2014-07-12 14:58:20 UTC
Permalink
Post by Matias A. Fonzo
Hello,
Post by Rich Felker
[..]
- The Big Bikeshed: where to find locale files? These will be somewhat
musl-specific (to the extent that no other implementation uses the design I
have in mind, though it would be easy for others to use it), so there's no
existing practice to simply adopt. The files are not machine-specific
(we'll support either endianness .mo file) so
/usr/share (or other prefix variants) is the natural base location.
/usr/lib/locale (glibc)
/usr/share/locale - /usr/local/share/locale (most programs)
/usr/share/X11/locale (X11 programs)
/usr/share/nls (Message catalogs for Native language support)
The default for musl can be /usr/local/share/musl/locale
The location definitely shouldn't be something under /usr/local unless
it's just based on the prefix. That would only make sense for installs
in /usr/local/musl as a non-default libc for use with the wrapper, but
not for musl-based systems or deployment in various mixed
environments.

Rich
Rich Felker
2014-07-12 15:03:33 UTC
Permalink
Post by Rich Felker
- The Big Bikeshed: where to find locale files? These will be somewhat
musl-specific (to the extent that no other implementation uses the
design I have in mind, though it would be easy for others to use
it), so there's no existing practice to simply adopt. The files are
not machine-specific (we'll support either endianness .mo file) so
/usr/share (or other prefix variants) is the natural base location.
One idea for this: just don't accept anything except the built-in
locales (C, C.UTF-8, POSIX) and absolute pathnames. For suid programs,
the latter could be rejected completely (the safest and probably what
we should do) or restricted to a set of reasonable paths where each
path component is checked for permissions.

Another idea is pulling the search path from /etc/musl-locale.conf or
similar. Obviously this is not the most friendly to Rune's usage case,
but it would just be one more hard-coded path to override in the
custom build, or if absolute pathnames were also accepted for locales
the support for /etc/musl-locale.conf could just be stripped out.

Rich
Rich Felker
2014-07-12 16:41:56 UTC
Permalink
Post by Rich Felker
Post by Rich Felker
- The Big Bikeshed: where to find locale files? These will be somewhat
musl-specific (to the extent that no other implementation uses the
design I have in mind, though it would be easy for others to use
it), so there's no existing practice to simply adopt. The files are
not machine-specific (we'll support either endianness .mo file) so
/usr/share (or other prefix variants) is the natural base location.
One idea for this: just don't accept anything except the built-in
locales (C, C.UTF-8, POSIX) and absolute pathnames. For suid programs,
the latter could be rejected completely (the safest and probably what
we should do) or restricted to a set of reasonable paths where each
path component is checked for permissions.
Another idea is pulling the search path from /etc/musl-locale.conf or
similar. Obviously this is not the most friendly to Rune's usage case,
but it would just be one more hard-coded path to override in the
custom build, or if absolute pathnames were also accepted for locales
the support for /etc/musl-locale.conf could just be stripped out.
From a usability standpoint, I think it's desirable to have some sort
of search path, even if absolute pathnames are also supported.
Consider mixed environments where the user has something like
LANG=fr_FR.UTF-8 for glibc programs; assuming the corresponding locale
is also installed for musl, the reasonable user expectation is that
musl-linked programs also use French messages, time formatting,
collation, etc.

glibc honors the non-POSIX environment variable LOCPATH to control its
search for locales. While this is something of a consideration for
applications trying to avoid unwanted environment-influenced behavior
for security purposes or otherwise, it's not a big conformance problem
since setlocale already depends on the environment anyway (and thus
can't be called safely in parallel with modifications to the
environment, per POSIX). We could honor the same variable and just
append "/musl/" to the value (this would be nice from the standpoint
of not introducing another variable apps have to be aware of when they
want to filter it) but that's somewhat ugly since the glibc one is
intended to point to a "lib" (arch-specific) dir whereas musl's is
portable data. Using a separate variable might be preferable if we
even want to support an environment variable as a way to configure
this at runtime -- and I think doing so may be valuable since users
may want locales that are not installed by the system administrator.

In light of glibc CVE-2014-0475, which I'm not sure is even really a
proper "vulnerability" but rather just a complication of the standard
locale semantics that makes it hard to write secure programs without
filtering out locale vars from untrusted sources, a major goal I'd
like to pursue is minimizing the potential security impact of an
untrusted/malicious locale file. Obviously suid/AT_SECURE programs
should not even honor locale files except possibly from a hard-coded
trusted source, but ideally even programs without formally elevated
privilegs -- think gitolite type setups with ssh authorized_keys --
would not yield code execution or information leak when fed a
malicious locale file.

Here are the security aspects I have in mind:

- For libc itself (obviously we can't control application use of
gettext), only translate literal strings, never printf/scanf format
strings. For dlerror this requires some refactoring of the message
strings but otherwise I think this property is easy to satisfy. The
purpose of this property is to prevent format string injection via
locales and limit the scope of bad messages to literal copying of
those messages into the program output.

- Avoid loading as a locale any file which was not intended to be a
locale. This entails checking the magic number, sanity-checking the
headers, and also doing a single gettext-type string lookup for a
key string associated with our locale file format (a specialization
of general mo files). If the key is not found, the file can be
rejected; it's probably a mo file but not one that satisfies the
needs of libc for the requested locale category. The purpose of this
check is to prevent disclosure of contents of files that were not
intended to be locales.

- During gettext lookup (binary search), validate all offsets as lying
within the address range of the mapping. The purpose of this check
is to preclude information disclosure due to reading strings from
locations outside the mapping.

Obviously as long as mmap is used, there is a possibility of DoS via
file truncation and SIGBUS. I don't think it's worth trying to work
around this since the scope is limited to crashing your own programs
(or allowing someone else to crash them if you use a locale file
writable by someone else). As previoysly discussed for zoneinfo, one
option would be to malloc, read, and validate (rather than mmap), but
IMO this is cost-prohibitive.

Rich
u***@aetey.se
2014-07-12 17:04:45 UTC
Permalink
Post by Rich Felker
Another idea is pulling the search path from /etc/musl-locale.conf or
similar. Obviously this is not the most friendly to Rune's usage case,
Thanks for the thought :)

Such a file would enforce the configuration to be strictly "one per computer".
No differences between different users on the computer would be allowed.

Hardcoding a reference to a _globally_ placed file like this (you know,
we have no local places to count with - our programs work without
bothering the local admin of an unknown distro) would be disasterous.

(Any change in the file or in the data in the paths given there would
instantly affect a potentially infinite set of users. And of course
different users have different needs, one size does not fit all.)
Post by Rich Felker
but it would just be one more hard-coded path to override in the
custom build, or if absolute pathnames were also accepted for locales
the support for /etc/musl-locale.conf could just be stripped out.
Absolute locale names can not replace using short (standard) locale
names with adjustable (not necessarily standard) databases.

Please leave a possibility to specify the directory containing the
locale definitions (or a path to search if you prefer) at run time,
per application instance - I am not aware of anything adequate besides
a dedicated environment variable.

Rune
Continue reading on narkive:
Loading...