Hans de Goede <j.w.r.degoede <at> hhs.nl> writes:
Hi,
I just read this interesting article on lwn: http://lwn.net/Articles/106214/ (lwn subscriber only)
Yay :)
This talks about things like: 1 Stack Smash Protection 2 PAX (alternative Exec Shield) 3 Position Independent Executables.
Stack Smash Protection sounds like a cool feature to me. I don't know what the performance impact is, but as a developer even if it is to slow to use by default I would love to have it intergrated into the gcc shipped by Fedora to make debugging easier.
I use it.
The performance impact isn't noticable, but it's highly variable. It's theoretical maximum is something like 8%; and it drops when a function gets bigger, or when functions aren't protected.
You can use heuristics (-fstack-protector) to protect only functions with a local character array; or just protect all (-fstack-protector-all). With the heuristics, it's likely you don't much encounter protected functions.
My best guess looking at the design (not the code) is that it takes about 4 instructions to protect a function on the base, plus one more per passed argument, based on the below.
SSP rearranges variables at compile time. This produces no runtime overhead.
SSP protects a function with a local char[] using a __guard value. This would require expanding the stack frame. The best way is to allocate the entire stack frame at once, so I assume the GCC devs are smart and that this produces 0 extra instructions. Next, you need to check the GOT for __guard (I'm assuming O(1), so one instruction), and use MOV to copy __guard to local_guard (1 insn).
At return in a protected function, __guard must be checked with CMP. If it's fine, then JE past the code that calls __stack_smash_handler(). Two more instructions. This totals 4 for a __guard protection on a function.
To protect passed arguments, the stack frame has to be made bigger (no overhead?). A MOV based on an offset from an address in some register has to be made for each argument (1 insn for each argument).
I'm guessing at internals, but I think most of this is possible. I'm not sure about the O(1) GOT lookup. I'm probably wrong there.
PAX uses tricks to get a non executable stack, and assignes random addresses to PIE executables, which Fedora already has in the form of Exec Shield, good! But if I undertand it correctly PAX does more for example also make data pages non executable, this might be something worth looking into.
PaX makes a strict separation between Writable and Executable memory. It also has more accurate NX emulation on x86. Ingo has admitted that PaX is competetive with Exec Shield in recent LKML posts:
(but no doubt PaX is fine and protects against exploits at least as effectively as (and in some cases more effectively than) exec-shield, so you've definitely not made a bad choice.)
It's also older and still actively developed (older and inactive is bad, older and active is good, younger and active is not quite as good unless your developer is quite a lot more competant on the subject).
SEGMEXEC on x86 splits the address space in half to emulate an NX bit; I've never seen this cause a problem in anything. PAGEEXEC used to use kernel-assisted MMU walking, which can be very high overhead depending on memory access patterns; now it uses the same method Exec Shield uses, but falls back to kernel-assisted MMU walking if that fails (due to mprotect()ing a higher address with PROT_EXEC).
PIE we already have, good!
Feh, your PIE is a joke. My ENTIRE SYSTEM is PIE, save for what won't compile PIE. I don't use Fedora, but last I heard, only a few programs were PIE.
The overhead of PIC (based on nbyte-bench) is something like 0.99002% on x86, and 0.02% on amd64. There's another caveat: -fomit-frame-pointer usually gives a -5% overhead (i.e. it removes overhead and thus programs use less CPU). This is lost on x86; no effect (ok, 0.01%) on amd64.
That being said, you have to understand that libraries are ALL PIC. You lose NOTHING in libraries by going to PIE. All plug-ins to anything (except gimp?), all encoder and decoder libraries, libtheora, libogg, libvorbis, libvorbisenc, liblame, libmad, libzlib and libbzip2, ALL YOUR HEAVY LIFTING is done in libraries.
I haven't profiled it but I'm fairly sure that any large amounts of CPU are typically spent in PIC code anyway. I'm pretty certain that the real-world system-wide impact of PIE is dismally low due to the low amount of time the actual executable load module spends on the CPU versus the libraries it uses.
Regards,
Hans