should we implement https://github.com/GrapheneOS/hardened_malloc/ it is hardened memory allocate it will increase the security of fedora according to the graphene os team it can be ported to linux as well need to look at it
martin luther wrote:
should we implement https://github.com/GrapheneOS/hardened_malloc/ it is hardened memory allocate it will increase the security of fedora according to the graphene os team it can be ported to linux as well need to look at it
There are several questions that come up: * Against what exact threats does this protect? Use-after-free? Heap buffer overflow? Others? * How does it relate to _FORTIFY_SOURCE? Can they be used together? (If not, it might actually reduce rather than increase the security of Fedora.) * How does it perform, both in terms of speed and memory consumption (overhead)? Better or worse than the glibc malloc? (If it is much worse than the glibc malloc, it is not going to be a suitable default for Fedora.) * How does it compare to the glibc malloc in terms of quality of implementation issues, such as that realloc should avoid copying the whole block whenever an in-place resize is possible? * Can hardening be added to the existing glibc malloc implementation or is a complete rewrite as the suggested one really needed? * How do you suggest it getting used distro-wide instead of the glibc implementation? Upstream's suggestion is to link it as an additional dynamic shared object, so then the order of linking is important, and you also have to take care to link it into all applications (and there are lots of build systems out there). The alternative, I suppose, would be to modify glibc.
Kevin Kofler
On 8/13/22 08:04, Kevin Kofler via devel wrote:
martin luther wrote:
should we implement https://github.com/GrapheneOS/hardened_malloc/ it is hardened memory allocate it will increase the security of fedora according to the graphene os team it can be ported to linux as well need to look at it
CCing Daniel Micay who wrote hardened_malloc.
There are several questions that come up:
- Against what exact threats does this protect? Use-after-free? Heap buffer
overflow? Others?> * How does it relate to _FORTIFY_SOURCE? Can they be used together? (If not, it might actually reduce rather than increase the security of Fedora.)> * How does it perform, both in terms of speed and memory consumption (overhead)? Better or worse than the glibc malloc? (If it is much worse than the glibc malloc, it is not going to be a suitable default for Fedora.)> * How does it compare to the glibc malloc in terms of quality of implementation issues, such as that realloc should avoid copying the whole block whenever an in-place resize is possible?
- Can hardening be added to the existing glibc malloc implementation or is a
complete rewrite as the suggested one really needed?> * How do you suggest it getting used distro-wide instead of the glibc implementation? Upstream's suggestion is to link it as an additional dynamic shared object, so then the order of linking is important, and you also have to take care to link it into all applications (and there are lots of build systems out there). The alternative, I suppose, would be to modify glibc.
Kevin Kofler
On 8/13/22, Demi Marie Obenour wrote:
On 8/13/22, Kevin Kofler via devel wrote:
martin luther wrote:
should we implement https://github.com/GrapheneOS/hardened_malloc/ it is hardened memory allocate it will increase the security of fedora according to the graphene os team it can be ported to linux as well need to look at it
CCing Daniel Micay who wrote hardened_malloc.
There are several questions that come up: [[snip]]
It seems to me that hardened_malloc could increase working set and RAM desired by something like 10% compared to glibc for some important workloads, such as Fedora re-builds. From page 22 of [1] (attached here; 203KB), the graph of number of requests versus requested size shows that blocks of size <= 128 were requested tens to thousands of times more often than all the rest.
For sizes from 0 through 128, the "Size classes" section of README.md of [2] documents worst-case internal fragmentation (in "slabs") of 93.75% to 11.72%. That seems too high. Where are actual measurements for workloads such as Fedora re-builds?
(Also note that the important special case of malloc(0), which is analogous to (gensym) of Lisp and is implemented internally as malloc(1), consumes 16 bytes and has a fragmentation of 93.75% for both glibc and hardened_malloc. The worst fragmentation happens for *every* call to malloc(0), which occurred about 800,000 times in the sample. Yikes!)
[1] https://blog.linuxplumbersconf.org/2016/ocw/system/presentations/3921/origin... [2] https://github.com/GrapheneOS/hardened_malloc/
On Mon, Aug 15, 2022 at 07:39:46PM -0700, John Reiser wrote:
On 8/13/22, Demi Marie Obenour wrote:
On 8/13/22, Kevin Kofler via devel wrote:
martin luther wrote:
should we implement https://github.com/GrapheneOS/hardened_malloc/ it is hardened memory allocate it will increase the security of fedora according to the graphene os team it can be ported to linux as well need to look at it
CCing Daniel Micay who wrote hardened_malloc.
There are several questions that come up: [[snip]]
It seems to me that hardened_malloc could increase working set and RAM desired by something like 10% compared to glibc for some important workloads, such as Fedora re-builds. From page 22 of [1] (attached here; 203KB), the graph of number of requests versus requested size shows that blocks of size <= 128 were requested tens to thousands of times more often than all the rest.
It has far less fragmentation than glibc malloc. It also has far lower metadata overhead since there are no headers on allocations and only a few bits consumed per small allocation. glibc has over 100% metadata overhead for 16 byte allocations while for hardened_malloc it's a very low percentage. Of course, you need to compare with slab allocation quarantines and slab allocation canaries disabled in hardened_malloc.
For sizes from 0 through 128, the "Size classes" section of README.md of [2] documents worst-case internal fragmentation (in "slabs") of 93.75% to 11.72%. That seems too high. Where are actual measurements for workloads such as Fedora re-builds?
Internal malloc means fragmentation caused by size class rounding. There is no way to have size classes that aren't multiples of 16 due to it being required by the x86_64 and arm64 ABI. glibc has over 100% overhead for 16 byte allocations due to header metadata and other metadata. It definitely isn't lighter for those compared to a modern slab allocator.
There's a 16 byte alignment requirement for malloc on x86_64 and arm64 so there's no way to have any size classes between the initial multiples of 16.
Slab allocation canaries are an optional hardened_malloc feature adding 8 byte random canaries to the end of allocations, which in many cases will increase the size class if there isn't room within the padding. Slab allocation quarantines are another optional feature which require dedicating substantial memory to avoiding reuse of allocations.
You should compare without the optional features enabled as a baseline because glibc doesn't have any of those security features, and the baseline hardened_malloc design is far more secure.
(Also note that the important special case of malloc(0), which is analogous to (gensym) of Lisp and is implemented internally as malloc(1), consumes 16 bytes and has a fragmentation of 93.75% for both glibc and hardened_malloc. The worst fragmentation happens for *every* call to malloc(0), which occurred about 800,000 times in the sample. Yikes!)
malloc(0) is not implemented as malloc(1) in hardened_malloc and does not use any memory for the data, only the metadata, which is a small percentage of the allocation size even for 16 byte allocations since there is only slab metadata for the entire slab and bitmaps to track which slots are used. There are no allocation headers.
Doing hundreds of thousands of malloc(0) allocations only uses a few bytes of memory in hardened_malloc. Each allocation requires a bit in the bitmap and each slab of 256x 16 byte allocations (4096 byte slab) has slab metadata. All the metadata is in a dedicated metadata region.
I strong recommend reading all the documentation thoroughly:
https://github.com/GrapheneOS/hardened_malloc/blob/main/README.md
hardened_malloc is oriented towards security and provides a bunch of important security properties unavailable with glibc malloc. It also has lower fragmentation and with the optional security features disabled also lower memory usage for large processes and especially over time. If you enable the slab quarantines, that's going to use a lot of memory. If you enable slab canaries, you give up some of the memory usage reduction from not having per-allocation metadata headers. Neither of those features exists in glibc malloc, jemalloc, etc. so it's not really fair to enable the optional security features for hardened_malloc and compare with allocators without them.
Slab allocation quarantines in particular inherently require a ton of memory in order to delay reuse of allocations for as long of a time as is feasible. This pairs well with zero-on-free + write-after-free-check based on zero-on-free, since if any non-zero write occurs while quarantined/freed it will be detected before the allocation is reused. As long as zero-on-free is enabled, which it is even for the sample light configuration, then all memory is known to be zeroed at allocation time, which is how the write-after-free-check works. All of these things are supplementary optional features, NOT the core security features. The core security features are the baseline design of not having any inline metadata, having entirely separate fully statically reserved address space regions each with their own high entropy random base for all allocation metadata (1 region) and each size class (each has a separate region), absolutely never reusing address space between the regions, etc. It provides very substantial security benefits over a completely non-hardened allocator with a legacy easy to exploit design such as glibc malloc and only a few mostly non-working sanity checks.
There are other approaches which take a middle ground, but hardened_malloc is focused on security first, with very low fragmentation (dramatically lower than glibc) and also lower memory usage for large processes when slab allocation quarantines are disabled and especially when slab canaries are also disabled. Try using hardened_malloc for something like the Matrix Synapse server with the light configuration (slab allocation quarantine not used) and compare to glibc malloc. It uses far less memory and unlike glibc malloc doesn't end up 'leaking' tons of memory over time from fragmentation. Disable slab canaries and try again, it will be even lower, although probably not particularly noticeably.
If you choose to use the very memory expensive slab quarantine feature which is not enabled in the standard light configuration, that's your choice.
Also, you hardened_malloc doesn't use a thread cache for security reasons. It invalidates many of the security properties. If you compare to glibc malloc in the light configuration with tcache disabled in glibc malloc it will compare well, and hardened_malloc can scale better when given enough arenas. If you want to make the substantial security sacrifices required for a traditional thread cache, then I don't think hardened_malloc makes sense, which is why it doesn't include the option to do thread caching even though it'd be easy to implement. It may one day include the option to do thread batched allocation, but it isn't feasible to do it for deallocation without losing a ton of the strong security properties.
On 8/26/22 12:22, Daniel Micay via devel wrote:
Also, you hardened_malloc doesn't use a thread cache for security reasons. It invalidates many of the security properties. If you compare to glibc malloc in the light configuration with tcache disabled in glibc malloc it will compare well, and hardened_malloc can scale better when given enough arenas. If you want to make the substantial security sacrifices required for a traditional thread cache, then I don't think hardened_malloc makes sense, which is why it doesn't include the option to do thread caching even though it'd be easy to implement. It may one day include the option to do thread batched allocation, but it isn't feasible to do it for deallocation without losing a ton of the strong security properties.
I'm an upstream glibc developer, but I've tried to remove my bias here and present the facts as they are for the existing heap-based allocator that is in use by the distributions today and why it's hard to change.
(1) Pick your own allocator vs. use the default.
We allow any end user to make those choices by interposing the final allocator with an allocator of their choice depending on specific workload criteria. This means that distributions don't have a strong incentive to change system allocators unless they are making a strategic change in their core values or vision for the distribution (like Graphene OS makes for security).
At the ELF level we make sure that we can interpose a new allocator, and we work carefully to ensure that newer features at the compiler level can be supported incrementally (_FORTIFIY_SOURCE=3 and __builtin_dynamic_object_size) by newer allocators.
In summary: If the "good enough" allocator doesn't meet your requirements, then you can use one of the alternatives.
(2) Switching the default vs. improving the default.
It is arguably lower TCO for all distributions using glibc to improve glibc's malloc. Some improvements can't be made, but some buy enough benefit that there is no strong reason to change allocators.
For example: - jemalloc/tcmalloc used a fast per-thread cache. - glibc implemented fast per-thread caching in 2.26 (2017) (DJ Delorie's work)
- Chromium started using safe-linking pointer hardening. - glibc implemented safe-linking pointer hardening for fastbins and tcache (2020) (Eyal Itkin's work)
Next steps for glibc's malloc is probably:
- Improve internal fragmentation [1] - Round-robin arena assignment with uniform arena assignment as a goal. - Provide a packed arena for sub 16-byte sized allocations to improve utilization. - We have seen some C++ workloads/frameworks that create trillions of 13-byte objects.
(3) Requirements vs. change.
While Facebook/BSD (jemalloc), Google (tcmalloc), Microsoft (mimalloc) have very good allocators, issues seen with those allocators can be more difficult to correct because of the impact those changes have on wider workloads beyond distribution workloads.
For example if Graphene OS, with it's own goals, and Fedora with it's own goals had a conflict of interest for the direction of the allocator e.g. cost vs. security, what kind of choice would the hardened_allocator maintainers make?
Upstream glibc has largely been aligned with traditional distribution requirements for a long time, and continues to be aligned with the notion of a "general purpose" distribution via the contributors and deep network of developers in the distributions: https://sourceware.org/glibc/wiki/MAINTAINERS#Distribution_Maintainers
---
The combination of (1), (2) and (3) mean that for general purpose distributions the choice of staying with glibc's malloc means having an ecosystem of distributions that are using the same allocator and benefit from wide application testing and development and support when required.
It would be easier to approach glibc upstream and convince them that the default allocator in glibc should be replaced with hardened_alloc or jemalloc or tcmalloc or mimalloc...
On Sat, Aug 27, 2022 at 9:14 AM Carlos O'Donell carlos@redhat.com wrote:
(2) Switching the default vs. improving the default.
A third option (or maybe it's an improvement to the default?), since the choice of allocators seems to come up consistently, could be to consider seriously (and is likely not a trivial project) the idea of making it easier to switch allocators.
However to echo what Timothée said, there's value in packaging hardened_malloc for Fedora to make it available to users. It's much too early to start talking about switching defaults IMO.
Sid
On Mon, Aug 15, 2022 at 07:39:46PM -0700, John Reiser wrote:
On 8/13/22, Demi Marie Obenour wrote:
On 8/13/22, Kevin Kofler via devel wrote:
martin luther wrote:
should we implement https://github.com/GrapheneOS/hardened_malloc/ it is hardened memory allocate it will increase the security of fedora according to the graphene os team it can be ported to linux as well need to look at it
CCing Daniel Micay who wrote hardened_malloc.
There are several questions that come up: [[snip]]
It seems to me that hardened_malloc could increase working set and RAM desired by something like 10% compared to glibc for some important workloads, such as Fedora re-builds. From page 22 of [1] (attached here; 203KB), the graph of number of requests versus requested size shows that blocks of size <= 128 were requested tens to thousands of times more often than all the rest.
The lightweight configuration, hardened_malloc uses substantially less memory for small allocations than glibc malloc.
None of the GrapheneOS or hardened_malloc developers or project members has proposed that Fedora switch to hardened_malloc, but it would reduce rather than increasing memory usage if you used in without the slab quarantine features. Slab canaries use extra memory too, but the overhead is lower than glibc metadata overhead. The sample lightweight configuration still uses slab canaries.
If you bolted on a jemalloc-style array-based thread cache or a problematic TCMalloc-style one as was copied for glibc, then you would be able to get comparable performance and better scalability than glibc malloc, but that is outside the scope of what hardened_malloc is intended to provide. We aren't trying to serve that niche in hardened_malloc. It does not mean that glibc malloc is well suited to being the chosen allocator. That really can't be justified for any technical reasons. If you replaced glibc malloc with jemalloc, the only people who would be unhappy are people who care about the loss of ASLR bits from chunk alignment, which if you make the chunks small enough and configure ASLR properly really doesn't matter on 64-bit. I can't think of a case where glibc malloc would be better than jemalloc with small chunk sizes when using either 4k pages with a 48-bit address space or larger pages. glibc malloc's overall design is simply not competitive anymore, and it wastes tons of memory from both metadata overhead and also fragmentation. I can't really understand what justification there would be for not replacing it outright with a more modern design and adding the necessary additional APIs required for that as we did ourselves for our own security-focused allocator.
For sizes from 0 through 128, the "Size classes" section of README.md of [2] documents worst-case internal fragmentation (in "slabs") of 93.75% to 11.72%. That seems too high. Where are actual measurements for workloads such as Fedora re-builds?
The minimum alignment is 16 bytes. glibc malloc has far more metadata overhead, internal and external fragmentation than hardened_malloc in reality. It has headers on allocations, rounds to much less fine grained bucket sizes and fragments all the memory with the traditional dlmalloc style approach. There was a time when that approach was a massive improvement over past ones but that time was the 90s, not 2022.
(Also note that the important special case of malloc(0), which is analogous to (gensym) of Lisp and is implemented internally as malloc(1), consumes 16 bytes and has a fragmentation of 93.75% for both glibc and hardened_malloc. The worst fragmentation happens for *every* call to malloc(0), which occurred about 800,000 times in the sample. Yikes!)
glibc malloc has headers giving it more than 100% pure overhead for a 16 byte allocation. It cannot do finer grained rounding than we do for 16 through 128 bytes, and sticking headers on allocations makes it far worse. It also gets even worse with aligned allocations, such as common 64 byte aligned allocations, where slab allocation means any allocation up to the page size already has their natural alignment such as 64 byte for 64 byte, 128 byte for 128 byte, 256 byte for 256 byte, etc.
0 byte doesn't really make sense to compare because in hardened_malloc it's a pointer to non-allocated pages with PROT_NONE memory protection.
I think that the first steps here would be to: - package it in Fedora - write a documentation page on how to use it (the quick docs may be a good place: https://docs.fedoraproject.org/en-US/quick-docs/) - do a lot of testing and benchmarks to get memory and performance numbers for each major Fedora use case (workstation, server, IoT, etc.)
Here is one end-to-end performance measurement of using hardened_malloc.
sudo sh -c "echo 1 >/proc/sys/vm/drop_caches" /usr/bin/time rpmbuild -bc kernel-5.15.11-100.fc34.spec >rpmbuild.out 2>&1
For glibc, the result was 19274.30user 2522.87system 1:49:06elapsed 332%CPU (0avgtext+0avgdata 3389052maxresident)k 148504inputs+217900040outputs (18221major+1005715216minor)pagefaults 0swaps
For the same task, but preceded by export LD_PRELOAD=/usr/lib64/libhardened_malloc.so the result was 26108.73user 4805.55system 2:22:43elapsed 360%CPU (0avgtext+0avgdata 1881564maxresident)k 586704inputs+217900504outputs (31876major+1848825755minor)pagefaults 0swaps
So compared to glibc-2.33-21.fc34.x86_64, hardened_malloc used 1.3 times as much wall clock (8563 / 6536 in seconds) 1.35 times as much user CPU (26108 / 19274) 1.9 times as much sys CPU ( 4805 / 2522).
The environment was a physical machine running fedora 5.17.12-100.fc34.x86_64: Intel Core i5-6500 @3.2GHz (4 CPU, 4 cores, 256kB L2 cache per core, 6MB L3 shared) 32GB DDR4 RAM /usr ext4 on SSD, /data ext4 on 4TB spinning commodity hard drive
In the .spec, I changed to: %define make_opts -j4 so that much of the compiling ran 4 jobs in parallel. /usr/bin/top showed minimal use of swapspace: 4MB,
hardened_malloc required (as documented in its README.md): ----- /etc/sysctl.d/hardened_malloc.conf # (Fedora 5.17.12) default is 65530 (2**16 - 6), # libhardened_malloc suggests 1048576 (2**20) # we choose 1048570 (2**20 - 6) vm.max_map_count = 1048570 ----- else the job crashed: BTF .btf.vmlinux.bin.o memory exhausted
The libhardened_malloc source code version was: commit 72fb3576f568481a03076c62df37984f96bfdfeb of Tue Aug 16 07:47:26 2022 -0400
Bottom line opinion: hardened_malloc's added security against exploit by malware costs too much. I will not choose hardened_malloc for this task.
Adding Daniel for awareness.
Regards. Pablo
El mié., 31 ago. 2022 16:09, John Reiser jreiser@bitwagon.com escribió:
Here is one end-to-end performance measurement of using hardened_malloc.
sudo sh -c "echo 1 >/proc/sys/vm/drop_caches" /usr/bin/time rpmbuild -bc kernel-5.15.11-100.fc34.spec >rpmbuild.out
2>&1
For glibc, the result was 19274.30user 2522.87system 1:49:06elapsed 332%CPU (0avgtext+0avgdata 3389052maxresident)k 148504inputs+217900040outputs (18221major+1005715216minor)pagefaults 0swaps
For the same task, but preceded by export LD_PRELOAD=/usr/lib64/libhardened_malloc.so the result was 26108.73user 4805.55system 2:22:43elapsed 360%CPU (0avgtext+0avgdata 1881564maxresident)k 586704inputs+217900504outputs (31876major+1848825755minor)pagefaults 0swaps
So compared to glibc-2.33-21.fc34.x86_64, hardened_malloc used 1.3 times as much wall clock (8563 / 6536 in seconds) 1.35 times as much user CPU (26108 / 19274) 1.9 times as much sys CPU ( 4805 / 2522).
The environment was a physical machine running fedora 5.17.12-100.fc34.x86_64: Intel Core i5-6500 @3.2GHz (4 CPU, 4 cores, 256kB L2 cache per core, 6MB L3 shared) 32GB DDR4 RAM /usr ext4 on SSD, /data ext4 on 4TB spinning commodity hard drive
In the .spec, I changed to: %define make_opts -j4 so that much of the compiling ran 4 jobs in parallel. /usr/bin/top showed minimal use of swapspace: 4MB,
hardened_malloc required (as documented in its README.md): ----- /etc/sysctl.d/hardened_malloc.conf # (Fedora 5.17.12) default is 65530 (2**16 - 6), # libhardened_malloc suggests 1048576 (2**20) # we choose 1048570 (2**20 - 6) vm.max_map_count = 1048570 ----- else the job crashed: BTF .btf.vmlinux.bin.o memory exhausted
The libhardened_malloc source code version was: commit 72fb3576f568481a03076c62df37984f96bfdfeb of Tue Aug 16 07:47:26 2022 -0400
Bottom line opinion: hardened_malloc's added security against exploit by malware costs too much. I will not choose hardened_malloc for this task. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Wed, Aug 31, 2022 at 05:59:42PM +0200, Pablo Mendez Hernandez wrote:
Adding Daniel for awareness.
Why was the heavyweight rather than lightweight configuration used? Why compare with all the expensive optional security features enabled? Even the lightweight configuration has 2 of the optional security features enabled: slab canaries and full zero-on-free. Both of those should be disabled to measure the baseline performance. Using the heavyweight configuration means having large slab allocation quarantines and not just zero-on-free but checking that data is still zeroed on allocation (which more than doubles the cost), slot randomization and multiple other features. It just doesn't make sense to turn security up to 11 with optional features and then present that as if it's the performance offered.
I'm here to provide clarifications about my project and to counter incorrect beliefs about it. I don't think it makes much sense for Fedora to use it as a default allocator but the claims being made about memory usage and performance are very wrong. I already responded and provided both concise and detailed explanations. I don't know what these nonsense measurements completely disregarding all that are meant to demonstrate.
It's a huge hassle for me to respond here because I have no interest in this list and don't want to be subscribed to it. I didn't propose that Fedora uses it and don't think it makes sense for Fedora. At the same time I already explained that glibc malloc is ALSO a very bad choice in detail. Linux distributions not willing to sacrifice much for security would be better served by using jemalloc with small chunk sizes on 64 bit operating systems. ASLR is too low entropy on 32 bit to afford the sacrifice of a few bits for chunk alignment though. It can be configured with extra sanity checks enabled and with certain very non-essential features disabled to provide a better balance of security vs. performance. The defaults are optimized for long running server processes. It's very configurable, including by individual applications.
hardened_malloc builds both a lightweight and heavyweight library itself. The lightweight library still has the optional slab allocation canary and full zero-on-free features enabled. Both those should be disabled to truly measure the baseline cost. None of those optional features is provided by glibc malloc. None of them is needed to get the benefits of hardened_malloc's 100% out-of-line metadata, 100% invalid free detection, entirely separate never reused address space regions for all allocator metadata and each slab allocation size class (which covers up to 128k by default), virtual memory quarantines + random guards for large allocations, etc. etc.
The optional security features are optional because they're expensive. That's the point of building both a sample lightweight and heavyweight configuration by default. Lightweight configuration is essentially the recommended configuration if you aren't willing to make more significant sacrifices for security. It's not the highest performance configuration it offers, just a reasonable compromise.
Slab allocation canaries slightly increase memory usage. Slab allocation quarantines (disabled in lightweight configuration, which is built by default) greatly increase memory usage, especially with the default configuration. The whole point of quarantines is that they delay reuse of the memory and since these are slab allocations within slabs the memory gets held onto.
If you wanted to do measure the baseline performance, then you'd do as I suggested and measure with all the optional features disabled (disable at least those 2 features included in optional) and compare that to both glibc malloc and glibc malloc with tcache disabled.
I explained previously that hardened_malloc could provide an array-based thread cache as an opt-in feature, but currently it isn't done because it inherently reduces security. No more 100% reliable detection of all invalid frees and a lot more security properties lost. Also hardly makes sense to have optional features like quarantines and slot randomization underneath unless the thread caches are doing the same thing.
As I said previously, if you compare hardened_malloc with optional features disabled to glibc malloc with tcache disabled, it performs as well and has much lower fragmentation and lower metadata overhead. If you stick a small array-based thread cache onto hardened_malloc, then it can perform as well as glibc with much larger freelist-based thread caches since it has a different approach to scaling with jemalloc-style arenas.
On 9/5/22 21:02, Daniel Micay via devel wrote:
On Wed, Aug 31, 2022 at 05:59:42PM +0200, Pablo Mendez Hernandez wrote:
Adding Daniel for awareness.
Why was the heavyweight rather than lightweight configuration used? Why compare with all the expensive optional security features enabled?
The default configuration was used. " ...; make" produces out/libhardened_malloc.so and no other shared library.
Even the lightweight configuration has 2 of the optional security features enabled: slab canaries and full zero-on-free. Both of those should be disabled to measure the baseline performance. Using the heavyweight configuration means having large slab allocation quarantines and not just zero-on-free but checking that data is still zeroed on allocation (which more than doubles the cost), slot randomization and multiple other features. It just doesn't make sense to turn security up to 11 with optional features and then present that as if it's the performance offered.
The use case is a builder and distributor of software packages to a large, diverse audience. There is concern about the possibility of malware attacking the build process, a "supply-chain attack". Of course there are other protections already in place, but the possibility of better protection is reasonable to investigate. A network search revealed a dearth of end-to-end performance measurements, and/or comparisons based on actual data.
I'm here to provide clarifications about my project and to counter incorrect beliefs about it. I don't think it makes much sense for Fedora to use it as a default allocator but the claims being made about memory usage and performance are very wrong. I already responded and provided both concise and detailed explanations. I don't know what these nonsense measurements completely disregarding all that are meant to demonstrate.
I reported an actual measurement and comparison of two allocators using commonly-available tools and a documented, repeatable methodology. The choice of which two allocators is reasonable for the use case.
[[snip]]
Bottom line opinion: hardened_malloc ... costs too much.
Attempting to be constructive: Psychologically, I might be willing to pay a "security tax" of something like 17%, partly on the basis of similarity to the VAT rate (Value Added Tax) in some parts of the developed world.
On Wed, Aug 31, 2022 at 10:19:51AM -0700, John Reiser wrote:
Bottom line opinion: hardened_malloc ... costs too much.
Attempting to be constructive: Psychologically, I might be willing to pay a "security tax" of something like 17%, partly on the basis of similarity to the VAT rate (Value Added Tax) in some parts of the developed world.
The comparison is being done incorrectly. Since hardened_malloc builds both a lightweight and heavyweight library by default, and since I already explained this and that the lightweight library still has optional security features enabled, it doesn't seem to have been done in good faith. My previous posts where I provided both concise and detailed information explaining differences and the approach were ignored. Why is that?
As I said previously, hardened_malloc has a baseline very hardened allocator design. It also has entirely optional, expensive security features layered on top of that. I explained in detail that some of those features have a memory cost. Slab allocation canaries have a small memory cost and slab allocation quarantines have a very large memory cost especially with the default configuration. Those expensive optional features each have an added performance cost too.
Measuring with 100% of the expensive optional features enabled and trying to portray the performance of the allocator solely based on that is simply incredibly misleading and disregards all of my previous posts in the thread.
hardened_malloc builds both a lightweight and heavyweight library by default. The lightweight library still has 2 of the optional security features enabled. None of the optional security features is provided by glibc malloc and if you want to compare the baseline performance then none of those should be enabled for a baseline comparison.
Take the light configuration, disable slab allocation canaries and full zero-on-free, and there you go.
I also previously explained that hardened_malloc does not include a thread cache for security reasons inherent to the concept of a thread cache. An array-based thread cache with out-of-line metadata would still hurt security, but would be a more suitable approach than a free list compromising the otherwise complete lack of inline metadata.
Compare hardened_malloc with the optional security features disabled to glibc malloc and also to glibc malloc with tcache disabled. It's easy enough to stick a thread cache onto hardened_malloc and if there was demand for that I could implement it in half an hour. At the moment, the current users of hardened_malloc don't want to make the sacrifice of losing 100% reliable detection of invalid frees along with the many other benefits lost by doing that.
On Mon, 2022-09-05 at 22:45 -0400, Daniel Micay via devel wrote:
The comparison is being done incorrectly. Since hardened_malloc builds both a lightweight and heavyweight library by default, and since I already explained this and that the lightweight library still has optional security features enabled, it doesn't seem to have been done in good faith. My previous posts where I provided both concise and detailed information explaining differences and the approach were ignored. Why is that?
I agree. I decided to do a more fair test myself (I'm quite interested in hardened_malloc). First, I downloaded the source RPM for my current kernel:
dnf download --source kernel-5.19.6-200.fc36.x86_64
Then made both heavy and light variants:
sysctl -p /etc/sysctl.d/hardened_malloc.conf make VARIANT=light
Setup the chroot:
mock -r fedora-36-x86_64 --init
Create our SRPM:
mock -r fedora-36-x86_64 --buildsrpm --spec kernel.spec --sources $PWD --resultdir $PWD
Now do the compilations:
cp out-light/libhardened_malloc.so . ./preload.sh /usr/bin/time mock -r fedora-36-x86_64 --rebuild kernel- 5.19.6-200.fc36.src.rpm >light.out 2>&1
/usr/bin/time mock -r fedora-36-x86_64 --rebuild kernel-5.19.6- 200.fc36.src.rpm >no_preload.out 2>&1
On 9/5/22 19:45, Daniel Micay wrote:
On Wed, Aug 31, 2022 at 10:19:51AM -0700, John Reiser wrote:
Bottom line opinion: hardened_malloc ... costs too much.
Attempting to be constructive: Psychologically, I might be willing to pay a "security tax" of something like 17%, partly on the basis of similarity to the VAT rate (Value Added Tax) in some parts of the developed world.
The comparison is being done incorrectly. Since hardened_malloc builds both a lightweight and heavyweight library by default,
That claim is false. The Makefile for commit 72fb3576f568481a03076c62df37984f96bfdfeb on Tue Aug 16 07:47:26 2022 -0400 (which is the HEAD of the trunk) begins ===== VARIANT := default
ifneq ($(VARIANT),) CONFIG_FILE := config/$(VARIANT).mk include config/$(VARIANT).mk endif
ifeq ($(VARIANT),default) SUFFIX := else SUFFIX := -$(VARIANT) endif
OUT := out$(SUFFIX) ===== and builds only one library, namely $OUT/libhardened_malloc$SUFFIX.so which for the case of "no options specified" is out/libhardened_malloc.so .
If would be better for external perception if the name "libhardened_malloc.so" were changed to something like "libhardened_malloc-strong.so". Having both -strong and -light versions built every time would highlight the difference, and force the user to decide, and encourage the analysis that is required to make an informed choice.
and since I
already explained this and that the lightweight library still has optional security features enabled, it doesn't seem to have been done in good faith. My previous posts where I provided both concise and detailed information explaining differences and the approach were ignored. Why is that?
As I said previously, hardened_malloc has a baseline very hardened allocator design. It also has entirely optional, expensive security features layered on top of that. I explained in detail that some of those features have a memory cost. Slab allocation canaries have a small memory cost and slab allocation quarantines have a very large memory cost especially with the default configuration. Those expensive optional features each have an added performance cost too.
Measuring with 100% of the expensive optional features enabled and trying to portray the performance of the allocator solely based on that is simply incredibly misleading and disregards all of my previous posts in the thread.
I measured the result of building and using with the default options. Unpack the source, use "as-is" with no adjustment, no tweaking, no tuning. If the default source is not appropriate to use as widely as implied by the name "malloc" (with no prefix and no suffix on the subroutine name), then the package is not suitable for general use. Say so immediately at the beginning of the README.md: "This software is not suitable for widespread general use, unless adjusted according to the actual use cases."
hardened_malloc builds both a lightweight and heavyweight library by default. The lightweight library still has 2 of the optional security features enabled. None of the optional security features is provided by glibc malloc and if you want to compare the baseline performance then none of those should be enabled for a baseline comparison.
Take the light configuration, disable slab allocation canaries and full zero-on-free, and there you go.
I reported an end-to-end measurement and comparison based on data. Where have you reported actual end-to-end measurements and comparisons?
[[snip]]
On Wed, Sep 07, 2022 at 08:39:56AM -0700, John Reiser wrote:
On 9/5/22 19:45, Daniel Micay wrote:
On Wed, Aug 31, 2022 at 10:19:51AM -0700, John Reiser wrote:
Bottom line opinion: hardened_malloc ... costs too much.
Attempting to be constructive: Psychologically, I might be willing to pay a "security tax" of something like 17%, partly on the basis of similarity to the VAT rate (Value Added Tax) in some parts of the developed world.
The comparison is being done incorrectly. Since hardened_malloc builds both a lightweight and heavyweight library by default,
That claim is false.
You're not following the official approach to packaging and installing hardened_malloc. It has 2 official build configurations and packaging that's done properly includes both. We don't currently define other configurations, but we could define a 'lightest' one too.
I've given both concise and detailed explanations here, which you've gone out of the way to ignore.
The Makefile for commit 72fb3576f568481a03076c62df37984f96bfdfeb
on Tue Aug 16 07:47:26 2022 -0400 (which is the HEAD of the trunk) begins
VARIANT := default
ifneq ($(VARIANT),) CONFIG_FILE := config/$(VARIANT).mk include config/$(VARIANT).mk endif
ifeq ($(VARIANT),default) SUFFIX := else SUFFIX := -$(VARIANT) endif
OUT := out$(SUFFIX)
and builds only one library, namely $OUT/libhardened_malloc$SUFFIX.so which for the case of "no options specified" is out/libhardened_malloc.so .
If would be better for external perception if the name "libhardened_malloc.so" were changed to something like "libhardened_malloc-strong.so". Having both -strong and -light versions built every time would highlight the difference, and force the user to decide, and encourage the analysis that is required to make an informed choice.
The 2 default configurations are not the only choices. The light configuration still has full zero-on-free and canaries enabled.
If we felt like matching or even exceeding glibc malloc performance on microbenchmarks we could add an optional thread cache and a performance configuration but it's not the point of the project at all, and glibc malloc is not a high performance allocator. hardened_malloc can provide similar performance with all optional features disabled vs. glibc malloc with tcache disabled. If hardened_malloc had array-based thread caching added (free lists would lose even the very basic 100% out-of-line metadata security property) then with optional features disabled it would be comparable to default glibc malloc configuration. We're already done extensive testing. There's no thread cache included because it simply isn't within the scope of it. It's a hardened allocator, and a thread cache bypasses hardening and makes invalid free detection, randomization, quarantines, and other features not work properly. It has been tested with a thread cache. We know the impact of it. I don't think it makes sense to use it with one.
already explained this and that the lightweight library still has optional security features enabled, it doesn't seem to have been done in good faith. My previous posts where I provided both concise and detailed information explaining differences and the approach were ignored. Why is that?
As I said previously, hardened_malloc has a baseline very hardened allocator design. It also has entirely optional, expensive security features layered on top of that. I explained in detail that some of those features have a memory cost. Slab allocation canaries have a small memory cost and slab allocation quarantines have a very large memory cost especially with the default configuration. Those expensive optional features each have an added performance cost too.
Measuring with 100% of the expensive optional features enabled and trying to portray the performance of the allocator solely based on that is simply incredibly misleading and disregards all of my previous posts in the thread.
I measured the result of building and using with the default options. Unpack the source, use "as-is" with no adjustment, no tweaking, no tuning. If the default source is not appropriate to use as widely as implied by the name "malloc" (with no prefix and no suffix on the subroutine name), then the package is not suitable for general use. Say so immediately at the beginning of the README.md: "This software is not suitable for widespread general use, unless adjusted according to the actual use cases."
The hardened_malloc project is perfectly suitable for general purpose use and heavily preferring security over both performance and memory usage for one of the 2 default configurations doesn't make it any less general purpose. The chosen compromises do not impact whether or not it is a general purpose allocator. Both default configurations are suitable for general purpose, widespread use. GrapheneOS has used the non-light configuration since hardened_malloc was introduced. It performs much better than the previous OpenBSD malloc port. It also performs better than the also general purpose musl malloc-ng heavily focused on low memory usage and low fragmentation. That's a general purpose allocator too and is not any less suitable for widespread usage either.
I've added bitwagon.com to the block list for mail.grapheneos.org based on your continued dishonest spin and misinformation. Zero interest in having any further communication. You should consider it a ban from the rest of the GrapheneOS community too. If you ever end up wanting that to be undone you'll need to make up for your behavior here.
hardened_malloc builds both a lightweight and heavyweight library by default. The lightweight library still has 2 of the optional security features enabled. None of the optional security features is provided by glibc malloc and if you want to compare the baseline performance then none of those should be enabled for a baseline comparison.
Take the light configuration, disable slab allocation canaries and full zero-on-free, and there you go.
I reported an end-to-end measurement and comparison based on data. Where have you reported actual end-to-end measurements and comparisons?
We've published posts with detailed information and performance / memory usage comparisons. We've provided detailed information of the cost of the different features and 2 sample configurations based on different balances between memory usage / performance vs. security. The light configuration is not the lightest hardened_malloc configuration, as I've explained previously, and the optional features it enables are not done by jemalloc, glibc malloc, etc. A direct comparison would be to compare hardened_malloc with all optional features disabled vs. glibc malloc with tcache disabled, which is still comparing with a very hardened design with 100% out-of-line metadata, no reuse of address space between size classes and metadata, etc. It is a fair, direct comparison though.
If you want thread caching, which you presumably don't if you want even basic hardening, that's easy enough to build on top of hardened_malloc. Maybe I'll add optional thread caching and a predefined performance configuration simply to make it easier to debunk misinformation like what you're spreading. I see no reason for someone to want to use that though. If you want a performance-oriented allocator, you aren't going to be using hardened_malloc, and you also aren't going to be using glibc malloc either, which is not high performance, scalable or low fragmentation and only has being a default with the 2nd most widely used Linux libc going for it.
I'm not interested in engaging with a troll any further. I'll likely add bitwagon.com to the mail.grapheneos.org block list if you push any more disingenuous spin and misinformation. Not a problem with me.
devel@lists.stg.fedoraproject.org