IMPORTANT: GCC 6 generates position independent executables by default on Linux

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

IMPORTANT: GCC 6 generates position independent executables by default on Linux

Ronie Salgado
 
Hello,

I was debugging a strange crash when calling sqrt via a Lowcode instruction in the interpreter, which I tracked to currentBytecode stored in register(EBX), having a very large value. When debugging the generated assembly code with GDB, I noticed that GCC was generating position independent code and using EBX for doing a call without spilling/unspilling its value.

By googling, it seems that position independent executable generation was turned on GCC 6 by default ( https://www.open-mesh.org/issues/304 ). To disable PIE, we have to compile the sources with -fno-pie and link with the -no-pie options.

Best regards,
Ronie
Reply | Threaded
Open this post in threaded view
|

Re: IMPORTANT: GCC 6 generates position independent executables by default on Linux

Ronie Salgado
 
Correction: this is not because of GCC, but because of Ubuntu 16.10. The same happens with GCC 5

2017-02-21 0:35 GMT-03:00 Ronie Salgado <[hidden email]>:
Hello,

I was debugging a strange crash when calling sqrt via a Lowcode instruction in the interpreter, which I tracked to currentBytecode stored in register(EBX), having a very large value. When debugging the generated assembly code with GDB, I noticed that GCC was generating position independent code and using EBX for doing a call without spilling/unspilling its value.

By googling, it seems that position independent executable generation was turned on GCC 6 by default ( https://www.open-mesh.org/issues/304 ). To disable PIE, we have to compile the sources with -fno-pie and link with the -no-pie options.

Best regards,
Ronie

Reply | Threaded
Open this post in threaded view
|

Re: IMPORTANT: GCC 6 generates position independent executables by default on Linux

Ben Coman
 
On Tue, Feb 21, 2017 at 1:23 PM, Ronie Salgado <[hidden email]> wrote:

>
> Correction: this is not because of GCC, but because of Ubuntu 16.10. The same happens with GCC 5
>
> 2017-02-21 0:35 GMT-03:00 Ronie Salgado <[hidden email]>:
>>
>> Hello,
>>
>> I was debugging a strange crash when calling sqrt via a Lowcode instruction in the interpreter, which I tracked to currentBytecode stored in register(EBX), having a very large value. When debugging the generated assembly code with GDB, I noticed that GCC was generating position independent code and using EBX for doing a call without spilling/unspilling its value.
>>
>> By googling, it seems that position independent executable generation was turned on GCC 6 by default ( https://www.open-mesh.org/issues/304 ). To disable PIE, we have to compile the sources with -fno-pie and link with the -no-pie options.

Would that only be applicable to 32-bit?
To familiarise myself with these concepts I found this a good
explanation of Position Independent Code...
* http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries
which says it "... will explain only how PIC works on x86, picking
this older architecture specifically because (unlike x64) it wasn't
designed with PIC in mind, so implementing PIC on it is a bit trickier
... Some non-Intel architectures like SPARC64 force PIC-only code for
shared libraries, and many others (for example, ARM) include
IP-relative addressing modes to make PIC more efficient. Both are true
for the successor of x86, the x64 architecture."

and the sister article on Load Time Relocation...
* http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/
says "... some modern systems (such as x86-64) no longer support
load-time relocation."

and at the bottom here describes why -nopic on 64-bit requires -mcmodel=large.
* http://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: IMPORTANT: GCC 6 generates position independent executables by default on Linux

Holger Freyther
In reply to this post by Ronie Salgado
 

> On 21 Feb 2017, at 10:35, Ronie Salgado <[hidden email]> wrote:
>
> Hello,
>
> I was debugging a strange crash when calling sqrt via a Lowcode instruction in the interpreter, which I tracked to currentBytecode stored in register(EBX), having a very large value. When debugging the generated assembly code with GDB, I noticed that GCC was generating position independent code and using EBX for doing a call without spilling/unspilling its value.

Can you elaborate on the misbehavior? As of the ABI[1] EBX is a local register, can GCC know that EBX has been used for something else?

holger

[1] http://www.sco.com/developers/devspecs/abi386-4.pdf
Reply | Threaded
Open this post in threaded view
|

Re: IMPORTANT: GCC 6 generates position independent executables by default on Linux

Ronie Salgado
 
Hello,

Would that only be applicable to 32-bit?
To familiarise myself with these concepts I found this a good
explanation of Position Independent Code...
* http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries
which says it "... will explain only how PIC works on x86, picking
this older architecture specifically because (unlike x64) it wasn't
designed with PIC in mind, so implementing PIC on it is a bit trickier
... Some non-Intel architectures like SPARC64 force PIC-only code for
shared libraries, and many others (for example, ARM) include
IP-relative addressing modes to make PIC more efficient. Both are true
for the successor of x86, the x64 architecture."

and the sister article on Load Time Relocation...
* http://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/
says "... some modern systems (such as x86-64) no longer support
load-time relocation."

and at the bottom here describes why -nopic on 64-bit requires -mcmodel=large.
* http://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64
Perphaps. I have not tested the 64 bits Linux without forcing -fno-pie and -no-pie on my build. The default option on Ubuntu 16.10 is not -fPIC, but -fpie which is to allow Address Space Layout Randomization (a technique to mitigate security exploits based on buffer overflows, and return oriented programming) of the code of the executables themselves. The executables, unlike shared libraries usually are not compiled as position independent code even on x86_64. On x86_64, the difference on PIC/no PIC is not only in using rip relative addressing, but also in the usage the GOT and the PLT tables, for doing calls. The relocation of the position dependent code of executables happens in linking time, long before load time.

> On 21 Feb 2017, at 10:35, Ronie Salgado <[hidden email]> wrote:
>
> Hello,
>
> I was debugging a strange crash when calling sqrt via a Lowcode instruction in the interpreter, which I tracked to currentBytecode stored in register(EBX), having a very large value. When debugging the generated assembly code with GDB, I noticed that GCC was generating position independent code and using EBX for doing a call without spilling/unspilling its value.

Can you elaborate on the misbehavior? As of the ABI[1] EBX is a local register, can GCC know that EBX has been used for something else? 

spursrc/vm/gcc3x-cointerp.c:

sqInt
interpret(void)
{   DECL_MAYBE_SQ_GLOBAL_STRUCT
    register sqInt currentBytecode CB_REG;
...

platforms/unix/vm/sqGnu.h:

#elif defined(__i386__)
# define IP_REG __asm__("%esi")
# define SP_REG __asm__("%edi")
# if (__GNUC__ > 2) || ((__GNUC__ == 2) && (__GNUC_MINOR__ >= 95))
#   define CB_REG __asm__("%ebx")
# else
#   define CB_REG /* avoid undue register pressure */
# endif


GDB session:

(gdb) list interpret
2604    /*    If stacklimit is zero then the stack pages have not been initialized. */
2605   
2606        /* StackInterpreter>>#interpret */
2607    sqInt
2608    interpret(void)
2609    {   DECL_MAYBE_SQ_GLOBAL_STRUCT
2610        register sqInt currentBytecode CB_REG;
2611        sqInt extA;
2612        sqInt extB;
2613        sqInt lkupClassTag;
(gdb) break 25313
Punto de interrupción 1 at 0x55c77: file /home/ronie/projects/osvm-lowcode-clean/spurlowcodesrc/vm/gcc3x-cointerp.c, line 25313.
(gdb) break 25314
Punto de interrupción 2 at 0x55c9b: file /home/ronie/projects/osvm-lowcode-clean/spurlowcodesrc/vm/gcc3x-cointerp.c, line 25314.
(gdb) break 25315

25313 result14 = sqrt(value17);
25314 /* begin internalPushFloat32: */
25315 nativeSP = (nativeStackPointerIn(localFP)) - BytesPerOop;

Nota: punto de rotura 2 también fijar en pc 0x55c9b.
Punto de interrupción 3 at 0x55c9b: file /home/ronie/projects/osvm-lowcode-clean/spurlowcodesrc/vm/gcc3x-cointerp.c, line 25315.
(gdb) run
Starting program: /home/ronie/projects/osvm-lowcode-clean/products/debug/phcoglowcodelinuxht/lib/pharo/5.0-201702210706-LowcodeFixup/pharo
[Depuración de hilo usando libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Nuevo Thread 0xf728db40 (LWP 4716)]

Thread 1 "pharo" hit Breakpoint 1, interpret () at /home/ronie/projects/osvm-lowcode-clean/spurlowcodesrc/vm/gcc3x-cointerp.c:25313
25313                                result14 = sqrt(value17);
(gdb) print currentBytecode
$1 = 504
(gdb) info registers
eax            0xf778f009    -143069175
ecx            0x5678ec88    1450765448
edx            0xf778f009    -143069175
ebx            0x1f8    504
esp            0xfffc2540    0xfffc2540
ebp            0xfffc6558    0xfffc6558
esi            0x594e702d    1498312749
edi            0xfffcb390    -216176
eip            0x565aac77    0x565aac77 <interpret+214836>
eflags         0x282    [ SF IF ]
cs             0x23    35
ss             0x2b    43
ds             0x2b    43
es             0x2b    43
fs             0x0    0
gs             0x63    99
(gdb) continue
Continuando.

Thread 1 "pharo" hit Breakpoint 2, interpret () at /home/ronie/projects/osvm-lowcode-clean/spurlowcodesrc/vm/gcc3x-cointerp.c:25315
25315                                nativeSP = (nativeStackPointerIn(localFP)) - BytesPerOop;
(gdb) info registers
eax            0xf778f009    -143069175
ecx            0x5678ec88    1450765448
edx            0x127f    4735
ebx            0x5678ec88    1450765448
esp            0xfffc2540    0xfffc2540
ebp            0xfffc6558    0xfffc6558
esi            0x594e702d    1498312749
edi            0xfffcb390    -216176
eip            0x565aac9b    0x565aac9b <interpret+214872>
eflags         0x282    [ SF IF ]
cs             0x23    35
ss             0x2b    43
ds             0x2b    43
es             0x2b    43
fs             0x0    0
gs             0x63    99
(gdb) print currentBytecode
$2 = 1450765448

GDB layout asm on the line 25313 shows the generated code.

B+>│0x565aac77 <interpret+214836>   flds   -0x1d6c(%ebp)                                                                                                        │
   │0x565aac7d <interpret+214842>   sub    $0x8,%esp                                                                                                            │
   │0x565aac80 <interpret+214845>   lea    -0x8(%esp),%esp                                                                                                      │
   │0x565aac84 <interpret+214849>   fstpl  (%esp)                                                                                                               │
   │0x565aac87 <interpret+214852>   mov    -0x4008(%ebp),%ebx                                                                                                   │
   │0x565aac8d <interpret+214858>   call   0x56570840


Of special importance, is the instruction: mov    -0x4008(%ebp),%ebx . this is the PLT entry for sqrt, and this is where ebx with the currentBytecode is destroyed.

Best regards,
Ronie

2017-02-21 22:35 GMT-03:00 Holger Freyther <[hidden email]>:


> On 21 Feb 2017, at 10:35, Ronie Salgado <[hidden email]> wrote:
>
> Hello,
>
> I was debugging a strange crash when calling sqrt via a Lowcode instruction in the interpreter, which I tracked to currentBytecode stored in register(EBX), having a very large value. When debugging the generated assembly code with GDB, I noticed that GCC was generating position independent code and using EBX for doing a call without spilling/unspilling its value.

Can you elaborate on the misbehavior? As of the ABI[1] EBX is a local register, can GCC know that EBX has been used for something else?

holger

[1] http://www.sco.com/developers/devspecs/abi386-4.pdf

Reply | Threaded
Open this post in threaded view
|

Re: IMPORTANT: GCC 6 generates position independent executables by default on Linux

Holger Freyther
 

> On 22 Feb 2017, at 14:58, Ronie Salgado <[hidden email]> wrote:


Dear Ronie,


> GDB layout asm on the line 25313 shows the generated code.
>
> B+>│0x565aac77 <interpret+214836>   flds   -0x1d6c(%ebp)                                                                                                        │
>    │0x565aac7d <interpret+214842>   sub    $0x8,%esp                                                                                                            │
>    │0x565aac80 <interpret+214845>   lea    -0x8(%esp),%esp                                                                                                      │
>    │0x565aac84 <interpret+214849>   fstpl  (%esp)                                                                                                               │
>    │0x565aac87 <interpret+214852>   mov    -0x4008(%ebp),%ebx                                                                                                   │
>    │0x565aac8d <interpret+214858>   call   0x56570840
>
>
> Of special importance, is the instruction: mov    -0x4008(%ebp),%ebx . this is the PLT entry for sqrt, and this is where ebx with the currentBytecode is destroyed.

I tried to reproduce it but I think I don't generate enough register pressure?

#include <stdint.h>
#include <sys/types.h>
#include <math.h>

int interpret(int *ops, const size_t num_ops)
{
        register int op __asm__("%ebx");
        size_t off = 0;

        while (off < num_ops) {
                op = ops[off];
                switch(op) {
                case 1:
                case 2:
                        sqrt(op + num_ops);
                        break;
                default:
                        break;
                }
                off += 1;
        }
}


can you think of a way to get closer to the interpreter? Is it using computed goto? If there is a reproducer I am happy to open a bug with the GCC project and try to bring it to a resolution.

holger