Discussion:
[musl] musl: about malloc 'expand heap' issue
zhangwentao (M)
2018-10-30 11:11:07 UTC
Permalink
Hi all,
I am using musl in my project and I found an issue about the malloc function in musl:

Issue Description:
* When in muti-threads environment, malloc/free are called in high concurrency<http://dict.cn/high%20concurrency>.

Malloc:
Will find 'struct bin' from bitmap(without lock), and allocate memory from the bin (with lock).

Free:
Will merge the chunk together if the free memory is 'connected' to the existing chunk.

? It will remove the old chunk first then combine the chunk to a larger one.

? After merge operation done, insert the chunk to the bin list.

? Each of the chunk operation is locked while merging, but the whole steps aren't within a lock.

So here is the issue:

1. There is only one chunk in largest bin list, and Free is on process, just remove the largest bins chunk from bin, the bitmap(mal.binmap) on that bit will be zero.

2. A malloc comes, the bitmap is zero, and goes to expand heap. (Actually there is enough memories in process)

3. Free operation goes on, and put the merged big chunk to bins.

But in operation 2, the process has expand heap.

If we have a loop on step 1-3, the process will expand heap frequently.
So it will cost more Virtual Memory (of course, physical memory would be freed by calling '__madvise' if the chunk is big enough)

In my environment , we do not have that much virtual memory. I think stop expand heap would a better choice.

Do you have plan to fix it ??

THANKS
Best Regard
Rich Felker
2018-10-30 15:00:39 UTC
Permalink
Post by zhangwentao (M)
Hi all,
* When in muti-threads environment, malloc/free are called in high concurrency<http://dict.cn/high%20concurrency>.
Will find 'struct bin' from bitmap(without lock), and allocate memory from the bin (with lock).
Will merge the chunk together if the free memory is 'connected' to the existing chunk.
? It will remove the old chunk first then combine the chunk to a larger one..
? After merge operation done, insert the chunk to the bin list.
? Each of the chunk operation is locked while merging, but the whole steps aren't within a lock.
1. There is only one chunk in largest bin list, and Free is on process, just remove the largest bins chunk from bin, the bitmap(mal.binmap) on that bit will be zero.
2. A malloc comes, the bitmap is zero, and goes to expand heap. (Actually there is enough memories in process)
3. Free operation goes on, and put the merged big chunk to bins.
But in operation 2, the process has expand heap.
If we have a loop on step 1-3, the process will expand heap frequently.
So it will cost more Virtual Memory (of course, physical memory would be freed by calling '__madvise' if the chunk is big enough)
In my environment , we do not have that much virtual memory. I think stop expand heap would a better choice.
Do you have plan to fix it ??
This is a known issue, and intended to be fixed in the complete
redesign of malloc. Fixing it right in the current design seems to
impose significant performance costs that I thought were equivalent
to, or worse than, just using one global lock. However if it's causing
major problems I may be able to make a quick fix that's not too
expensive -- I'll take a look again today or tomorrow.

Rich
zhangwentao (M)
2018-10-31 01:19:11 UTC
Permalink
Hi

Now we are using a global lock to solve the issue. And as you said the performance maybe cost too much.
And if you have solution about this and the fix is not that expensive, that's great.
If you finish it, would you send the patch to me ?

Thanks a lot~
From Wentao


-----邮件原件-----
发件人: Rich Felker [mailto:***@aerifal.cx] 代表 Rich Felker
发送时间: 2018年10月30日 23:01
收件人: ***@lists.openwall.com
主题: Re: [musl] musl: about malloc 'expand heap' issue
Post by zhangwentao (M)
Hi all,
* When in muti-threads environment, malloc/free are called in high concurrency<http://dict.cn/high%20concurrency>.
Will find 'struct bin' from bitmap(without lock), and allocate memory from the bin (with lock).
Will merge the chunk together if the free memory is 'connected' to the existing chunk.
? It will remove the old chunk first then combine the chunk to a larger one..
? After merge operation done, insert the chunk to the bin list.
? Each of the chunk operation is locked while merging, but the whole steps aren't within a lock.
1. There is only one chunk in largest bin list, and Free is on process, just remove the largest bins chunk from bin, the bitmap(mal.binmap) on that bit will be zero.
2. A malloc comes, the bitmap is zero, and goes to expand heap. (Actually there is enough memories in process)
3. Free operation goes on, and put the merged big chunk to bins.
But in operation 2, the process has expand heap.
If we have a loop on step 1-3, the process will expand heap frequently.
So it will cost more Virtual Memory (of course, physical memory would be freed by calling '__madvise' if the chunk is big enough)
In my environment , we do not have that much virtual memory. I think stop expand heap would a better choice.
Do you have plan to fix it ??
This is a known issue, and intended to be fixed in the complete
redesign of malloc. Fixing it right in the current design seems to
impose significant performance costs that I thought were equivalent
to, or worse than, just using one global lock. However if it's causing
major problems I may be able to make a quick fix that's not too
expensive -- I'll take a look again t
Rich Felker
2018-10-31 03:44:36 UTC
Permalink
Post by zhangwentao (M)
Hi
Now we are using a global lock to solve the issue. And as you said the performance maybe cost too much.
And if you have solution about this and the fix is not that expensive, that's great.
If you finish it, would you send the patch to me ?
If I get a solution that's acceptable for upstream, it will be
committed in git master and I'll follow up on this thread to mention
it.

Unfortunately, looking again at where the spurious empty-bin situation
happens (as a result of alloc_rev/_fwd during free), I don't see an
easy fix that's not costly. The best we can do might be something like
this:

In malloc, only use the per-bin lock to obtain a chunk if the
exact-sized bin is non-empty. If it's empty, take a global lock
(free_lock, but its semantics might need to be adjusted somewhat) to
ensure a consistent view of what larger free chunks are available for
splitting (which might end up revealing that a chunk of the exact
desired bin was actually available). Ideally this will not have much
impact on malloc performance under conditions where lots of free
chunks are available (heavy load).

Rich
Post by zhangwentao (M)
-----邮件原件-----
发送时间: 2018年10月30日 23:01
主题: Re: [musl] musl: about malloc 'expand heap' issue
Post by zhangwentao (M)
Hi all,
* When in muti-threads environment, malloc/free are called in high concurrency<http://dict.cn/high%20concurrency>.
Will find 'struct bin' from bitmap(without lock), and allocate memory from the bin (with lock).
Will merge the chunk together if the free memory is 'connected' to the existing chunk.
? It will remove the old chunk first then combine the chunk to a larger one..
? After merge operation done, insert the chunk to the bin list.
? Each of the chunk operation is locked while merging, but the whole steps aren't within a lock.
1. There is only one chunk in largest bin list, and Free is on process, just remove the largest bins chunk from bin, the bitmap(mal.binmap) on that bit will be zero.
2. A malloc comes, the bitmap is zero, and goes to expand heap. (Actually there is enough memories in process)
3. Free operation goes on, and put the merged big chunk to bins.
But in operation 2, the process has expand heap.
If we have a loop on step 1-3, the process will expand heap frequently.
So it will cost more Virtual Memory (of course, physical memory would be freed by calling '__madvise' if the chunk is big enough)
In my environment , we do not have that much virtual memory. I think stop expand heap would a better choice.
Do you have plan to fix it ??
This is a known issue, and intended to be fixed in the complete
redesign of malloc. Fixing it right in the current design seems to
impose significant performance costs that I thought were equivalent
to, or worse than, just using one global lock. However if it's causing
major problems I may be able to make a quick fix that's not too
expensive -- I'll take a look again today or tomorrow.
Rich
Loading...