Rich Felker
2018-04-20 20:09:04 UTC
I've talked on and off about musl's malloc needing to be overhauled or
replaced, and gaining the ability to experiment with that is one of
the motivations for making malloc interposable/replacable. Here's a
brief write-up of what's wrong that needs to be addressed:
The main benefits of musl's malloc vs the standard dlmalloc algorithms
it's based on is the fine-grained locking. As long as there are binned
free chunks of various sizes available, threads calling malloc will
only contend for a lock when they're requesting allocations of same or
similar size. This works out well under artificial random loads; I'm
not sure how much it helps on typical real-world loads.
In any case, the fine-grained locking also created a problem I didn't
anticipate when I designed it: when allocating memory that involves
splitting a free chunk, or when freeing memory that will get merged
with an adjacent free chunk, the operation as a whole is not atomic;
rather, a large free chunk is consumed, possibly emptying the bin it
lies in, split or merged, then returned either to the same or a
different bin. By saying this operation is non-atomic, I mean other
threads see the intermediate state where the large free chunk has been
consumed but not yet returned, and when this happens, they
unnecessarily split another free chunk or expand the heap rather than
waiting for it to be returned. This yields bad, potentially
catastrophic, fragmentation.
The malloc side already partially mitigates this with the "pretrim"
function, which is used whenever the leftover part after splitting
would remain in the same bin it was already in. In particular his
covers splitting small allocations off a large "top of heap" zone, but
it doesn't help with splitting small chunks. And the free side is much
worse -- large free zones momentarily disappear every time something
adjacent to them is freed.
In order to overcome this fully while maintaining the fine-grained
locking, we'd need the ability to hold multiple locks concurrently,
which would require a protocol for lock acquisition order to avoid
deadlock. But there may be cheap mitigations that could be made in
free like on the malloc side. For instance it might be possible to
make a simpler protocol whereby we could always hold the lock for bin
63, thereby avoiding exposing a state where large free zones are
unavailable. If there's a way to make this work, moving the binmap
masking for newly-emptied bins from unbin() to unlock_bin() would
ensure that malloc doesn't "see" a "temporary empty" state.
In the long term, I think the whole malloc design we're using now is
wrong, and should be replaced, but until the time comes to actually do
that, it may make sense to try to improve some of the worst
properties, or even to reduce or eliminate the granularity of the
locking if it proves not to be very valuable on real-world loads.
Rich
replaced, and gaining the ability to experiment with that is one of
the motivations for making malloc interposable/replacable. Here's a
brief write-up of what's wrong that needs to be addressed:
The main benefits of musl's malloc vs the standard dlmalloc algorithms
it's based on is the fine-grained locking. As long as there are binned
free chunks of various sizes available, threads calling malloc will
only contend for a lock when they're requesting allocations of same or
similar size. This works out well under artificial random loads; I'm
not sure how much it helps on typical real-world loads.
In any case, the fine-grained locking also created a problem I didn't
anticipate when I designed it: when allocating memory that involves
splitting a free chunk, or when freeing memory that will get merged
with an adjacent free chunk, the operation as a whole is not atomic;
rather, a large free chunk is consumed, possibly emptying the bin it
lies in, split or merged, then returned either to the same or a
different bin. By saying this operation is non-atomic, I mean other
threads see the intermediate state where the large free chunk has been
consumed but not yet returned, and when this happens, they
unnecessarily split another free chunk or expand the heap rather than
waiting for it to be returned. This yields bad, potentially
catastrophic, fragmentation.
The malloc side already partially mitigates this with the "pretrim"
function, which is used whenever the leftover part after splitting
would remain in the same bin it was already in. In particular his
covers splitting small allocations off a large "top of heap" zone, but
it doesn't help with splitting small chunks. And the free side is much
worse -- large free zones momentarily disappear every time something
adjacent to them is freed.
In order to overcome this fully while maintaining the fine-grained
locking, we'd need the ability to hold multiple locks concurrently,
which would require a protocol for lock acquisition order to avoid
deadlock. But there may be cheap mitigations that could be made in
free like on the malloc side. For instance it might be possible to
make a simpler protocol whereby we could always hold the lock for bin
63, thereby avoiding exposing a state where large free zones are
unavailable. If there's a way to make this work, moving the binmap
masking for newly-emptied bins from unbin() to unlock_bin() would
ensure that malloc doesn't "see" a "temporary empty" state.
In the long term, I think the whole malloc design we're using now is
wrong, and should be replaced, but until the time comes to actually do
that, it may make sense to try to improve some of the worst
properties, or even to reduce or eliminate the granularity of the
locking if it proves not to be very valuable on real-world loads.
Rich