Post by Luca Barbato Post by Rich Felker Post by Luca Barbato Post by Rich Felker
If we want to achieve an alignment of 8, the above definition is
wrong; it will no longer have alignment 8 once the bug is fixed.
However I'm not convinced it's the right thing to do. Defining it as 8
is tightening malloc's contract to always return 8-byte-aligned memory
(note that it presently returns at least 16-byte alignment anyway, but
this is an implementation detail that's not meant to be observable,
not part of the interface contract).
The current natural alignment shouldn't be 32 for AVX and 16 for SSE ?
Not sure how wasteful would be but it would be surely a boon for the
applications I'm mostly involved.
If you're working with data that needs additional alignment, you have
That's the part that is annoying, the larger register is 32byte in those
And it will keep getting larger. Obviously changing the definition of
types and/or the ABI again and again is not the solution. The solution
is requesting the alignment you want.
BTW note that in the case of audio and video, depending on which
sample you start at, your data will not be aligned even if the start
of the buffer is aligned (think video filters working on a sub-image,
for example). So in cases like that your code should just handle the
misaligned head/tail parts separately. Note that GCC (and AFAIK
clang/llvm) already do this for you with -O3 and a -march that
supports vector ops.
Post by Luca Barbato Post by Rich Felker
to use aligned_alloc (C11), posix_memalign (POSIX), or memalign
(legacy). Just assuming the result of malloc will be aligned beyond
the alignment requirements of any standard type is unsafe.
That we do already obviously, with the additional fun of not having a
realloc matching the mentioned functions in most platforms.
This is really a minor limitation. realloc cannot realistically be
expected to grow an object in-place in most cases; the only common
exception is when you're working with new memory at the top of the
heap and there's nothing else making small allocations that might get
placed right after your buffer that needs to grow.
In particular, if you're using a sane growth stratgey (geometric),
almost all calls to realloc are likely to move the buffer, so you're
just as well off calling malloc (or aligned_alloc) and free yourself.
The main exception to this might be for HUGE buffers where realloc can
use mremap with MREMAP_MAYMOVE.
Post by Luca Barbato
Having the memory functions 32-byte aligned and a mean to probe for it
would simplify a lot of code.
I think it would complicate the code because now you'd have two cases
to maintain, the case for implementations that always give excessive
alignment and the case for ones that don't. And one code path would
likely bitrot and have bugs.
In any case, the overhead would be undesirable. If/when I make some
improvements to malloc and its strategy for returning memory for use
by other processes (freeing commit charge), I'm also hoping to drop
the granularity on 64-bit platforms from 32 down to 16 or maybe even
smaller. There's really no need to store a size_t in the headers for
chunks which are only used for allocation sizes up to 128k/256k. This
kind of thing potentially makes a big difference for bloated OO/C++
apps that are allocating tons of tiny objects like linked list nodes
that are just 3 pointers (next/prev/data) and short strings.