Discussion:
[musl] Cortex-M support
Pierluigi Passaro
2018-09-13 00:39:04 UTC
Permalink
Hi,
I was wondering if musl already supports Cortex-M or not.
Thanks
Regards
Pier
Christopher Friedt
2018-09-13 00:41:15 UTC
Permalink
It does :) Although only through thumbv2, so there would be some assembly
rework required for cortex-m0.

On Wed., Sep. 12, 2018, 8:39 p.m. Pierluigi Passaro, <
Post by Pierluigi Passaro
Hi,
I was wondering if musl already supports Cortex-M or not.
Thanks
Regards
Pier
Rich Felker
2018-09-13 00:52:57 UTC
Permalink
Post by Christopher Friedt
It does :) Although only through thumbv2, so there would be some assembly
rework required for cortex-m0.
There's also no fdpic ABI support yet, so it's only going to be
nonshared text. For bare-metal/pseudo-kernel or a single-program
userspace on Linux it probably doesn't matter, but if you're trying to
run a real userspace it's very inefficient. I'd like to add fdpic
soon.

Also, I think it won't work unless the kernel traps and emulates the
cp15 thread-pointer access, since we don't support the get_tls
syscall. (We can add it if needed, but the whole idea of the syscall
is silly since it's no more efficient than trapping on the kernel
side, but support for switching to it makes userspace slower/heavier.

Rich
Post by Christopher Friedt
On Wed., Sep. 12, 2018, 8:39 p.m. Pierluigi Passaro, <
Post by Pierluigi Passaro
Hi,
I was wondering if musl already supports Cortex-M or not.
Thanks
Regards
Pier
Pierluigi Passaro
2018-09-13 01:06:52 UTC
Permalink
This looks a good starting point: I'm targeting Cortex-M4 / M7 and maybe
Cortex-R.
I'm inspecting the code and trying to get a build.
I have a few questions:

1) NOMMU support looks disabled
I'm wondering if in the file arch/arm/reloc.h, some code should be added
(or not).
Somthing like
   #if (__ARM_ARCH_PROFILE == 'M') || (__ARM_ARCH_PROFILE == 'R')
   #define DL_NOMMU_SUPPORT 1
   #endif

2) trying to enable hardfp support, the build fails
- fabs try calling vabs.f64 assembly instruction
- sqrt try calling vsqrt.f64 assembly instruction
As far as I understood, vXXX.f64 instructions are only available with
single/double precision FPU, not with half precision.
I'm wondering if the assembly optimization should be wrapped by
something like
   #if ... && (__ARM_FP > 7)

I'm still trying to setup a reasonable build/test environment, but I
suppose I need some suggestions on how to proceed.
Any hints?

Thanks
Regards
Pier
Post by Rich Felker
Post by Christopher Friedt
It does :) Although only through thumbv2, so there would be some assembly
rework required for cortex-m0.
There's also no fdpic ABI support yet, so it's only going to be
nonshared text. For bare-metal/pseudo-kernel or a single-program
userspace on Linux it probably doesn't matter, but if you're trying to
run a real userspace it's very inefficient. I'd like to add fdpic
soon.
Also, I think it won't work unless the kernel traps and emulates the
cp15 thread-pointer access, since we don't support the get_tls
syscall. (We can add it if needed, but the whole idea of the syscall
is silly since it's no more efficient than trapping on the kernel
side, but support for switching to it makes userspace slower/heavier.
Rich
Post by Christopher Friedt
On Wed., Sep. 12, 2018, 8:39 p.m. Pierluigi Passaro, <
Post by Pierluigi Passaro
Hi,
I was wondering if musl already supports Cortex-M or not.
Thanks
Regards
Pier
Rich Felker
2018-09-13 01:24:10 UTC
Permalink
Post by Pierluigi Passaro
This looks a good starting point: I'm targeting Cortex-M4 / M7 and
maybe Cortex-R.
I'm inspecting the code and trying to get a build.
1) NOMMU support looks disabled
I'm wondering if in the file arch/arm/reloc.h, some code should be
added (or not).
Somthing like
   #if (__ARM_ARCH_PROFILE == 'M') || (__ARM_ARCH_PROFILE == 'R')
   #define DL_NOMMU_SUPPORT 1
   #endif
This is only going to matter if you want to do dynamic linking, which
is *really* bad without fdpic/shared-text. You'll have a whole copy of
each shared lib for each process. Once fdpic support is added and
dynamic linking makes sense, it should probably be fixed, but I'd like
to rethink some of this and make it so the dynamic linker doesn't need
to be aware of whether it's nommu-compatible.
Post by Pierluigi Passaro
2) trying to enable hardfp support, the build fails
- fabs try calling vabs.f64 assembly instruction
- sqrt try calling vsqrt.f64 assembly instruction
As far as I understood, vXXX.f64 instructions are only available
with single/double precision FPU, not with half precision.
I'm wondering if the assembly optimization should be wrapped by
something like
   #if ... && (__ARM_FP > 7)
I'm still trying to setup a reasonable build/test environment, but I
suppose I need some suggestions on how to proceed.
Any hints?
Configurations where float and double are anything other than IEEE
single and double with IEEE-conforming semantics, or where long double
does not have IEEE-conforming semantics, are not supported/supportable
by musl, by intent. I would assume there's some way to configure the
compiler to offer a separate half-precision hardfloat type on an
otherwise soft-float EABI target with conforming float/double, but if
not this is a compiler deficiency that makes it impossible to use at
this time.

There were some SH4 models that also had this limitation (only
single-precision FPU) and since GCC's only profile for them redefines
double as single, rather than doing hard-single and soft-double, we
just don't support hardfloat at all on them.

In principle you could build musl as soft-float (with a softfloat
toolchain) but use a separate toolchain with half-precision hardfloat
for your applications. There would be no way to call stdlib interfaces
that take floating point arguments without a glue layer though. Use of
setjmp/longjmp might also be problematic (failure to restore float
regs) but this could possibly be mitigated.

Rich
Szabolcs Nagy
2018-09-13 09:43:27 UTC
Permalink
Post by Rich Felker
Post by Pierluigi Passaro
2) trying to enable hardfp support, the build fails
- fabs try calling vabs.f64 assembly instruction
- sqrt try calling vsqrt.f64 assembly instruction
As far as I understood, vXXX.f64 instructions are only available
with single/double precision FPU, not with half precision.
i think half precision is used incorrectly here: single is 32bit
half is 16bit (arm has half precision instructions so mixing these
can be confusing)
Post by Rich Felker
Post by Pierluigi Passaro
I'm wondering if the assembly optimization should be wrapped by
something like
   #if ... && (__ARM_FP > 7)
I'm still trying to setup a reasonable build/test environment, but I
suppose I need some suggestions on how to proceed.
Any hints?
Configurations where float and double are anything other than IEEE
single and double with IEEE-conforming semantics, or where long double
does not have IEEE-conforming semantics, are not supported/supportable
by musl, by intent. I would assume there's some way to configure the
compiler to offer a separate half-precision hardfloat type on an
otherwise soft-float EABI target with conforming float/double, but if
not this is a compiler deficiency that makes it impossible to use at
this time.
There were some SH4 models that also had this limitation (only
single-precision FPU) and since GCC's only profile for them redefines
double as single, rather than doing hard-single and soft-double, we
just don't support hardfloat at all on them.
In principle you could build musl as soft-float (with a softfloat
toolchain) but use a separate toolchain with half-precision hardfloat
for your applications. There would be no way to call stdlib interfaces
that take floating point arguments without a glue layer though. Use of
setjmp/longjmp might also be problematic (failure to restore float
regs) but this could possibly be mitigated.
note that a large part of the float code in libc is in the
math library which expects efficient double arithmetics,
i plan to rewrite the most important single precision math
functions using double arithmetics, this gives significant
benefits on all systems except ones with single precision
only hw.

(for soft float systems using int arithmetics is the best,
for single prec only systems using single prec arithmetics
is, there are also many places where musl could be better
size optimized, but musl does not have such implementation
variations for maintainability reasons.)

Loading...