On 04/01/2013 04:58 AM, Adam Jackson wrote:
On Fri, 2013-03-29 at 10:48 -0700, John Reiser wrote:
-fPIE code is larger and takes longer to execute. The cost varies from minimal (< 2%) in many cases to 10% or more for "non-dynamic" arrays on i686.
Citation needed.
ftp://ftp.inf.ethz.ch/doc/tech-reports/7xx/766.pdf which is cited by the FESCO ticket https://fedorahosted.org/fesco/ticket/1104#comment:11
It's also easy to see the mechanism: $ cat foo.c extern int a[];
void foo(int j) { a[j]=j; } $ gcc -m32 -fPIE -O -S foo.c $ cat foo.s # edited for brevity foo: # 25 bytes; about 15 cycles (incl. 3*3 cycles data cache fetch latency) call __x86.get_pc_thunk.cx addl $_GLOBAL_OFFSET_TABLE_, %ecx movl 4(%esp), %eax movl a@GOT(%ecx), %edx movl %eax, (%edx,%eax,4) ret $ gcc -m32 -O -S foo.c $ cat foo.s # edited for brevity foo: # 12 bytes; about 6 cycles (incl. 1*3 cycles data cache fetch latency) movl 4(%esp), %eax movl %eax, a(,%eax,4) ret $
-fPIE forces an additional level of run-time indirection which often costs around 13 bytes (CALL + ADD + fetch GOT - d32) and 2 to 5 cycles (fetch @GOT and cache latency). Some of the cost might be shared with other nearby uses, but scarcity of registers often inhibits sharing or requires spill code.
-fPIE for Thumb mode on ARM is particularly painful.
Citation needed.
The same code above applies. Thumb mode has no double indexing, so an explicit ADD is required. Registers are in still in short supply; HI registers (>=8) have dedicated usage or restricted access. Also, the range of the offset in base_register+offset addressing mode is severely restricted, which often requires more explicit ADDs.
--