Discussion:
[Bug tools/23673] New: TEST ./tests/backtrace-dwarf fails on s390x in 0.174 release
mliska at suse dot cz
2018-09-17 10:59:47 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

Bug ID: 23673
Summary: TEST ./tests/backtrace-dwarf fails on s390x in 0.174
release
Product: elfutils
Version: unspecified
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: tools
Assignee: unassigned at sourceware dot org
Reporter: mliska at suse dot cz
CC: elfutils-devel at sourceware dot org
Target Milestone: ---

Following test-case fails:

$ ./tests/backtrace-dwarf
0x3ffbd840622 raise
0x3ffbd823ce2 abort
./tests/backtrace-dwarf: dwfl_thread_getframes: no error

Fortunately I have an access to s390x machine, thus I can help with debugging.

The binary is build with GCC 8.1.1.
--
You are receiving this mail because:
You are on the CC list for the bug.
mark at klomp dot org
2018-09-17 11:41:52 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #1 from Mark Wielaard <mark at klomp dot org> ---
Note that we have an s390x fedora buildbot worker that also uses GCC 8.1.1:
https://builder.wildebeest.org/buildbot/#/workers/5
That one is green.

So I suspect it is either a different binutils or glibc (the above buildbot
worker has glibc 2.27 and binutils 2.29.1) or different build/CFLAGS/defaults.
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-09-17 11:45:34 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #2 from Martin Liska <mliska at suse dot cz> ---
$ ld --version
GNU ld (GNU Binutils; openSUSE:Factory:zSystems) 2.31

$ /lib64/libc.so.6
GNU C Library (GNU libc) stable release version 2.28 (git 3c03baca37fd).
--
You are receiving this mail because:
You are on the CC list for the bug.
mark at klomp dot org
2018-09-17 19:44:20 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #3 from Mark Wielaard <mark at klomp dot org> ---
It does seem to work correctly on Fedora 29 with gcc 8.2, binutils 2.31 and
glibc 2.28:

https://kojipkgs.fedoraproject.org//packages/elfutils/0.174/1.fc29/data/logs/s390x/build.log

PASS: run-backtrace-dwarf.sh

So it is probably some difference is default/build flags.
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-09-18 07:38:50 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #4 from Martin Liska <mliska at suse dot cz> ---
Created attachment 11257
--> https://sourceware.org/bugzilla/attachment.cgi?id=11257&action=edit
openSUSE build log

I'm attaching my build log. In general, I guess following flags are used:

-std=gnu99 -Wall -Wshadow -Wformat=2 -Wold-style-definition -Wstrict-prototypes
-Wlogical-op -Wduplicated-cond -Wnull-dereference -Wimplicit-fallthrough=5
-Werror -Wunused -Wextra -Wstack-usage=262144 -fPIC -O2 -g -m64
-fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables
-fasynchronous-unwind-tables -g
--
You are receiving this mail because:
You are on the CC list for the bug.
mark at klomp dot org
2018-09-18 15:21:49 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

Mark Wielaard <mark at klomp dot org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |INVALID

--- Comment #5 from Mark Wielaard <mark at klomp dot org> ---
We reviewed this on irc and came to the surprising conclusion that this was
caused by ptrace TRACEME failing with EPERM. That is really odd. But not a bug
in elfutils IMHO.
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-09-19 03:14:00 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

Martin Liska <mliska at suse dot cz> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Resolution|INVALID |---

--- Comment #6 from Martin Liska <mliska at suse dot cz> ---
I've just played with that and I did an error: one can't utilize ptrace and
open an executable in gdb. That causes the EPERM errno.
So the issue is still valid in my opinion.
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-09-19 03:19:14 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

Martin Liska <mliska at suse dot cz> changed:

What |Removed |Added
----------------------------------------------------------------------------
Summary|TEST |TEST
|./tests/backtrace-dwarf |./tests/backtrace-dwarf
|fails on s390x in 0.174 |fails on s390x in at least
|release |0.173

--- Comment #7 from Martin Liska <mliska at suse dot cz> ---
Note that it's not related to 0.174. I can see it also in 0.173, so as Mark
mentioned it's dependent on glibc, bintuils, ..
--
You are receiving this mail because:
You are on the CC list for the bug.
ldv at sourceware dot org
2018-09-19 03:59:47 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #8 from Dmitry V. Levin <ldv at sourceware dot org> ---
If a process is not being traced and PTRACE_TRACEME fails with EPERM, then it
must be a kernel issue.
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-09-19 04:19:41 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #9 from Martin Liska <mliska at suse dot cz> ---
Hm, on x86_64 (on trunk) I see all tests OK, but:

$ ./backtrace-dwarf
backtrace-dwarf: backtrace-dwarf.c:146: main: Assertion `errno == 0' failed.
0x7ffff7a4f08b raise
0x7ffff7a384e9 abort
0x7ffff7a383c1 __assert_fail_base.cold.0
0x7ffff7a476f2 __assert_fail
0x40135a main

which should not happen. On my machine I see errno == 2.

I would expect the test will fail with:

diff --git a/tests/backtrace-dwarf.c b/tests/backtrace-dwarf.c
index e1eb4928..273d2b5e 100644
--- a/tests/backtrace-dwarf.c
+++ b/tests/backtrace-dwarf.c
@@ -143,8 +143,8 @@ main (int argc __attribute__ ((unused)), char **argv)
abort ();
case 0:;
long l = ptrace (PTRACE_TRACEME, 0, NULL, NULL);
- assert (errno == 0);
- assert (l == 0);
+ if (errno != 0 || l != 0)
+ return -1;
cleanup_13_main ();
abort ();
default:

but it's still fine, while:
./backtrace-dwarf
backtrace-dwarf: backtrace-dwarf.c:159: main: Assertion `WIFSTOPPED (status)'
failed.
Aborted (core dumped)

That said, the tests looks to me very fragile..
--
You are receiving this mail because:
You are on the CC list for the bug.
ldv at sourceware dot org
2018-09-19 10:32:32 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

Dmitry V. Levin <ldv at sourceware dot org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |ldv at sourceware dot org

--- Comment #10 from Dmitry V. Levin <ldv at sourceware dot org> ---
I'd suggest the following change to enhance error diagnostics:

diff --git a/tests/backtrace-dwarf.c b/tests/backtrace-dwarf.c
index 35f25ed6..3a22db31 100644
--- a/tests/backtrace-dwarf.c
+++ b/tests/backtrace-dwarf.c
@@ -143,9 +143,8 @@ main (int argc __attribute__ ((unused)), char **argv)
case -1:
abort ();
case 0:;
- long l = ptrace (PTRACE_TRACEME, 0, NULL, NULL);
- assert (errno == 0);
- assert (l == 0);
+ if (ptrace (PTRACE_TRACEME, 0, NULL, NULL))
+ _exit(errno ?: -1);
cleanup_13_main ();
abort ();
default:
@@ -155,10 +154,12 @@ main (int argc __attribute__ ((unused)), char **argv)
errno = 0;
int status;
pid_t got = waitpid (pid, &status, 0);
- assert (errno == 0);
- assert (got == pid);
- assert (WIFSTOPPED (status));
- assert (WSTOPSIG (status) == SIGABRT);
+ if (got != pid)
+ error (1, errno, "waitpid returned %d", got);
+ if (!WIFSTOPPED (status))
+ error (1, 0, "unexpected wait status %u", status);
+ if (WSTOPSIG (status) != SIGABRT)
+ error (1, 0, "unexpected signal %u", WSTOPSIG (status));

Dwfl *dwfl = pid_to_dwfl (pid);
dwfl_getthreads (dwfl, thread_callback, NULL);
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-09-19 10:50:10 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #11 from Martin Liska <mliska at suse dot cz> ---
With the suggested patch I see following in test-suite.log on s390x:

[ 86s] + cat tests/test-suite.log
[ 86s] ==========================================
[ 86s] elfutils 0.174: tests/test-suite.log
[ 86s] ==========================================
[ 86s]
[ 86s] # TOTAL: 202
[ 86s] # PASS: 194
[ 86s] # SKIP: 7
[ 86s] # XFAIL: 0
[ 86s] # FAIL: 1
[ 86s] # XPASS: 0
[ 86s] # ERROR: 0
[ 86s]
[ 86s] .. contents:: :depth: 2
[ 86s]
[ 86s] SKIP: run-addr2line-i-demangle-test.sh
[ 86s] ======================================
[ 86s]
[ 86s] demangler unsupported
[ 86s] SKIP run-addr2line-i-demangle-test.sh (exit status: 77)
[ 86s]
[ 86s] SKIP: run-backtrace-data.sh
[ 86s] ===========================
[ 86s]
[ 86s] /home/abuild/rpmbuild/BUILD/elfutils-0.174/tests/backtrace-data:
Unwinding not supported for this architecture
[ 86s] data: arch not supported
[ 86s] SKIP run-backtrace-data.sh (exit status: 77)
[ 86s]
[ 86s] FAIL: run-backtrace-dwarf.sh
[ 86s] ============================
[ 86s]
[ 86s] 0x3ffbda40622 raise
[ 86s] 0x3ffbda23ce2 abort
[ 86s] /home/abuild/rpmbuild/BUILD/elfutils-0.174/tests/backtrace-dwarf:
dwfl_thread_getframes: no error
[ 86s] dwarf: no main
[ 86s] FAIL run-backtrace-dwarf.sh (exit status: 1)
[ 86s]
[ 86s] SKIP: run-backtrace-native-core.sh
[ 86s] ==================================
[ 86s]
[ 86s] No core.12202 file generated
[ 86s] SKIP run-backtrace-native-core.sh (exit status: 77)
[ 86s]
[ 86s] SKIP: run-backtrace-native-core-biarch.sh
[ 86s] =========================================
[ 86s]
[ 86s] No core.12218 file generated
[ 86s] SKIP run-backtrace-native-core-biarch.sh (exit status: 77)
[ 86s]
[ 86s] SKIP: run-backtrace-demangle.sh
[ 86s] ===============================
[ 86s]
[ 86s] demangler unsupported
[ 86s] SKIP run-backtrace-demangle.sh (exit status: 77)
[ 86s]
[ 86s] SKIP: run-stack-demangled-test.sh
[ 86s] =================================
[ 86s]
[ 86s] demangler unsupported
[ 86s] SKIP run-stack-demangled-test.sh (exit status: 77)
[ 86s]
[ 86s] SKIP: run-lfs-symbols.sh
[ 86s] ========================
[ 86s]
[ 86s] LFS testing is irrelevent on this system
[ 86s] SKIP run-lfs-symbols.sh (exit status: 77)
[ 86s]
--
You are receiving this mail because:
You are on the CC list for the bug.
ldv at sourceware dot org
2018-09-19 05:31:03 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #12 from Dmitry V. Levin <ldv at sourceware dot org> ---
(In reply to Martin Liska from comment #11)
[...]
Post by mliska at suse dot cz
[ 86s] FAIL: run-backtrace-dwarf.sh
[ 86s] ============================
[ 86s]
[ 86s] 0x3ffbda40622 raise
[ 86s] 0x3ffbda23ce2 abort
dwfl_thread_getframes: no error
[ 86s] dwarf: no main
[ 86s] FAIL run-backtrace-dwarf.sh (exit status: 1)
This doesn't look like a PTRACE_TRACEME failing with EPERM, abort() has
actually been invoked by the tracee.
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-09-19 11:09:15 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #13 from Martin Liska <mliska at suse dot cz> ---
(In reply to Dmitry V. Levin from comment #12)
Post by ldv at sourceware dot org
(In reply to Martin Liska from comment #11)
[...]
Post by mliska at suse dot cz
[ 86s] FAIL: run-backtrace-dwarf.sh
[ 86s] ============================
[ 86s]
[ 86s] 0x3ffbda40622 raise
[ 86s] 0x3ffbda23ce2 abort
dwfl_thread_getframes: no error
[ 86s] dwarf: no main
[ 86s] FAIL run-backtrace-dwarf.sh (exit status: 1)
This doesn't look like a PTRACE_TRACEME failing with EPERM, abort() has
actually been invoked by the tracee.
Agree with that, question is how to debug that. Any idea?
--
You are receiving this mail because:
You are on the CC list for the bug.
mark at klomp dot org
2018-09-19 12:44:31 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #14 from Mark Wielaard <mark at klomp dot org> ---
The test case does use assert and abort too much. How about we extend Dmitry's
patch to get rid of them all (the only abort that should be there is the one in
cleanup-13.c).

diff --git a/tests/backtrace-dwarf.c b/tests/backtrace-dwarf.c
index 35f25ed..498416f 100644
--- a/tests/backtrace-dwarf.c
+++ b/tests/backtrace-dwarf.c
@@ -16,7 +16,6 @@
along with this program. If not, see <http://www.gnu.org/licenses/>. */

#include <config.h>
-#include <assert.h>
#include <inttypes.h>
#include <stdio_ext.h>
#include <locale.h>
@@ -141,13 +140,18 @@ main (int argc __attribute__ ((unused)), char **argv)
switch (pid)
{
case -1:
- abort ();
+ perror ("fork failed");
+ exit (-1);
case 0:;
long l = ptrace (PTRACE_TRACEME, 0, NULL, NULL);
- assert (errno == 0);
- assert (l == 0);
+ if (l != 0)
+ {
+ perror ("PTRACE_TRACEME failed");
+ exit (-1);
+ }
cleanup_13_main ();
- abort ();
+ printf ("cleanup_13_main returned, impossible...\n");
+ exit (-1);
default:
break;
}
@@ -155,16 +159,20 @@ main (int argc __attribute__ ((unused)), char **argv)
errno = 0;
int status;
pid_t got = waitpid (pid, &status, 0);
- assert (errno == 0);
- assert (got == pid);
- assert (WIFSTOPPED (status));
- assert (WSTOPSIG (status) == SIGABRT);
+ if (got != pid)
+ error (1, errno, "waitpid returned %d", got);
+ if (!WIFSTOPPED (status))
+ error (1, 0, "unexpected wait status %u", status);
+ if (WSTOPSIG (status) != SIGABRT)
+ error (1, 0, "unexpected signal %u", WSTOPSIG (status));

Dwfl *dwfl = pid_to_dwfl (pid);
- dwfl_getthreads (dwfl, thread_callback, NULL);
+ if (dwfl_getthreads (dwfl, thread_callback, NULL) == -1)
+ error (1, 0, "dwfl_getthreads: %s", dwfl_errmsg (-1));

/* There is an exit (0) call if we find the "main" frame, */
- error (1, 0, "dwfl_getthreads: %s", dwfl_errmsg (-1));
+ printf ("dwfl_getthreads returned, main not found\n");
+ exit (-1);
}

#endif /* ! __linux__ */
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-09-21 08:16:19 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #15 from Martin Liska <mliska at suse dot cz> ---
Thanks Mark, I installed the patch but I see still the same. For now, I'm
leaving that, I'm not so much interested in s390x ;)
--
You are receiving this mail because:
You are on the CC list for the bug.
mark at klomp dot org
2018-09-21 09:06:33 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #16 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Martin Liska from comment #15)
Post by mliska at suse dot cz
Thanks Mark, I installed the patch but I see still the same.
The output was exactly the same? That is surprising. So there is no additional
output that explains which failure path was taken? I would have expected at
least a message about the dwfl_getthreads call.
Post by mliska at suse dot cz
For now, I'm
leaving that, I'm not so much interested in s390x ;)
Understood if it is too much work to track down. We have other s390x setups
that seems fine. But I still don't fully understand the issue.
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-09-21 09:20:52 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #17 from Martin Liska <mliska at suse dot cz> ---
(In reply to Mark Wielaard from comment #16)
Post by mark at klomp dot org
(In reply to Martin Liska from comment #15)
Post by mliska at suse dot cz
Thanks Mark, I installed the patch but I see still the same.
The output was exactly the same? That is surprising. So there is no
additional output that explains which failure path was taken? I would have
expected at least a message about the dwfl_getthreads call.
Yes:

$ ./backtrace-dwarf
0x3ff8a9c0622 raise
0x3ff8a9a3ce2 abort
./backtrace-dwarf: dwfl_thread_getframes: no error

Looks that child correctly triggers assert.
Post by mark at klomp dot org
Post by mliska at suse dot cz
For now, I'm
leaving that, I'm not so much interested in s390x ;)
Understood if it is too much work to track down. We have other s390x setups
that seems fine. But I still don't fully understand the issue.
--
You are receiving this mail because:
You are on the CC list for the bug.
mark at klomp dot org
2018-09-21 11:37:57 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #18 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Martin Liska from comment #17)
Post by mliska at suse dot cz
(In reply to Mark Wielaard from comment #16)
Post by mark at klomp dot org
(In reply to Martin Liska from comment #15)
Post by mliska at suse dot cz
Thanks Mark, I installed the patch but I see still the same.
The output was exactly the same? That is surprising. So there is no
additional output that explains which failure path was taken? I would have
expected at least a message about the dwfl_getthreads call.
$ ./backtrace-dwarf
0x3ff8a9c0622 raise
0x3ff8a9a3ce2 abort
./backtrace-dwarf: dwfl_thread_getframes: no error
Looks that child correctly triggers assert.
Aha, ok, yes, I missed that dwfl_thread_getthreads just calls
dwfl_thread_getframes (there is only one thread) and this does indeed not find
the main frame. I'll tweak the testcase a bit more to make it show that.

But we now know for sure that it isn't the testframe infrastructure failing,
but that the unwinder really seems to not unwind through abort and so doesn't
find main. Still don't know what is happening though.
--
You are receiving this mail because:
You are on the CC list for the bug.
michael.hudson at canonical dot com
2018-10-16 00:13:32 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

Michael Hudson-Doyle <michael.hudson at canonical dot com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |michael.hudson at canonical dot co
| |m

--- Comment #19 from Michael Hudson-Doyle <michael.hudson at canonical dot com> ---
I see a similar looking failure on arm64 on Ubuntu 18.10:


https://launchpadlibrarian.net/391377304/buildlog_ubuntu-cosmic-arm64.elfutils_0.170-0.5_BUILDING.txt.gz

I've gdb-ed this to the point that the key difference between a working system
(Ubuntu 18.04) and the failing one is that libc.so.6 has a lot more entries in
.eh_frame_hdr in the failing system. On 18.04 it fails to find a fde for
abort() (or raise, I think) and unwinds using .debug_frame and that succeeds.
On 18.10 it finds a fde for both raise and abort but fails to successfully
unwind past abort using it. I don't know either why the newer libc.so.6 has a
bigger eh_frame_hdr (it is glibc 2.28 vs 2.27 but also built with newer gcc and
binutils) or why unwinding using eh_frame info fails.
--
You are receiving this mail because:
You are on the CC list for the bug.
mark at klomp dot org
2018-10-17 20:41:16 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #20 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Michael Hudson-Doyle from comment #19)
Post by michael.hudson at canonical dot com
https://launchpadlibrarian.net/391377304/buildlog_ubuntu-cosmic-arm64.
elfutils_0.170-0.5_BUILDING.txt.gz
So, if possible could you build with current git or 0.174 + the patch from
comment #14 or commit 69d6e67eee30c483ba53a8e1da1b3568033e3ddecommit
69d6e67eee30c483ba53a8e1da1b3568033e3dde
Post by michael.hudson at canonical dot com
I've gdb-ed this to the point that the key difference between a working
system (Ubuntu 18.04) and the failing one is that libc.so.6 has a lot more
entries in .eh_frame_hdr in the failing system. On 18.04 it fails to find a
fde for abort() (or raise, I think) and unwinds using .debug_frame and that
succeeds. On 18.10 it finds a fde for both raise and abort but fails to
successfully unwind past abort using it. I don't know either why the newer
libc.so.6 has a bigger eh_frame_hdr (it is glibc 2.28 vs 2.27 but also built
with newer gcc and binutils) or why unwinding using eh_frame info fails.
In principle the .eh_frame and .debug_frame should provide the same CFI,
although encoded slightly differently. Maybe there is a difference? You should
be able to find both with eu-readelf --debug-dump=frame
--
You are receiving this mail because:
You are on the CC list for the bug.
michael.hudson at canonical dot com
2018-10-18 02:18:25 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #21 from Michael Hudson-Doyle <michael.hudson at canonical dot com> ---
(In reply to Mark Wielaard from comment #20)
Post by mark at klomp dot org
(In reply to Michael Hudson-Doyle from comment #19)
Post by michael.hudson at canonical dot com
https://launchpadlibrarian.net/391377304/buildlog_ubuntu-cosmic-arm64.
elfutils_0.170-0.5_BUILDING.txt.gz
So, if possible could you build with current git or 0.174 + the patch from
comment #14 or commit 69d6e67eee30c483ba53a8e1da1b3568033e3ddecommit
69d6e67eee30c483ba53a8e1da1b3568033e3dde
Oh hmm current git passes! Sorry for the noise.

Oh and obviously f881459ffc95b6fad51aa055a158ee14814073aa fixes this (somehow I
failed to read the git log correctly and had to bisect to find it but there's
no real excuse for that).
Post by mark at klomp dot org
Post by michael.hudson at canonical dot com
I've gdb-ed this to the point that the key difference between a working
system (Ubuntu 18.04) and the failing one is that libc.so.6 has a lot more
entries in .eh_frame_hdr in the failing system. On 18.04 it fails to find a
fde for abort() (or raise, I think) and unwinds using .debug_frame and that
succeeds. On 18.10 it finds a fde for both raise and abort but fails to
successfully unwind past abort using it. I don't know either why the newer
libc.so.6 has a bigger eh_frame_hdr (it is glibc 2.28 vs 2.27 but also built
with newer gcc and binutils) or why unwinding using eh_frame info fails.
In principle the .eh_frame and .debug_frame should provide the same CFI,
although encoded slightly differently. Maybe there is a difference? You
should be able to find both with eu-readelf --debug-dump=frame
I wrote most of what follows while waiting for the test run above to complete
but for the record...

So something I forgot to mention is that the newer glibc has no .debug_frame
(not even in the /usr/lib/debug file that has the other debug data). So in a
sense the fact that elfutils is trying to unwind using eh_frame and not trying
the debug_frame data at all is actually not relevant here.

That said, here is the debug_frame CFI from libc in the working environment:

[ 3d28] FDE length=36 cie=[ 3d18]
CIE_pointer: 15640
initial_location: +0x0000000000033760 <abort>
address_range: 0x228

Program:
advance_loc 1 to 0x4
def_cfa_offset 320
offset r29 (x29) at cfa-320
offset r30 (x30) at cfa-312
advance_loc 2 to 0xc
def_cfa_register r29 (x29)
advance_loc 1 to 0x10
offset r19 (x19) at cfa-304
offset r20 (x20) at cfa-296

And here is the eh_frame CFI from the libc that fails:

[ 2b08] FDE length=28 cie=[ 0]
CIE_pointer: 11020
initial_location: +0x00000000000207d8 <abort> (offset: 0x207d8)
address_range: 0x214 (end offset: 0x209ec)

Program:
advance_loc 1 to 0x207dc
def_cfa_offset 320
offset r29 (x29) at cfa-320
offset r30 (x30) at cfa-312
advance_loc 4 to 0x207ec
offset r19 (x19) at cfa-304
offset r20 (x20) at cfa-296
nop
nop

I guess it's the lack of the def_cfa_register r29 in the eh_frame data that is
making the difference.
--
You are receiving this mail because:
You are on the CC list for the bug.
mark at klomp dot org
2018-10-18 06:27:30 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #22 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Michael Hudson-Doyle from comment #21)
Post by michael.hudson at canonical dot com
(In reply to Mark Wielaard from comment #20)
Post by mark at klomp dot org
(In reply to Michael Hudson-Doyle from comment #19)
Post by michael.hudson at canonical dot com
https://launchpadlibrarian.net/391377304/buildlog_ubuntu-cosmic-arm64.
elfutils_0.170-0.5_BUILDING.txt.gz
So, if possible could you build with current git or 0.174 + the patch from
comment #14 or commit 69d6e67eee30c483ba53a8e1da1b3568033e3ddecommit
69d6e67eee30c483ba53a8e1da1b3568033e3dde
Oh hmm current git passes! Sorry for the noise.
Oh and obviously f881459ffc95b6fad51aa055a158ee14814073aa fixes this
Cool. So this is different from the s390x issue.
Which we sadly don't yet understand.

But if that happens again on s390x an inspection of the CFI and whether it
comes from .eh_frame or .debug_frame might be helpful.
--
You are receiving this mail because:
You are on the CC list for the bug.
mliska at suse dot cz
2018-11-16 13:33:03 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

--- Comment #23 from Martin Liska <mliska at suse dot cz> ---
Just for the record, as of version 0.175 the test works fine on all targets I
can test (including s390x).
--
You are receiving this mail because:
You are on the CC list for the bug.
mark at klomp dot org
2018-11-16 08:35:25 UTC
Permalink
https://sourceware.org/bugzilla/show_bug.cgi?id=23673

Mark Wielaard <mark at klomp dot org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |WORKSFORME

--- Comment #24 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Martin Liska from comment #23)
Post by mliska at suse dot cz
Just for the record, as of version 0.175 the test works fine on all targets
I can test (including s390x).
Lets close this for now. It can be reopened if we have a new test failure.
--
You are receiving this mail because:
You are on the CC list for the bug.
Loading...