Hello All, Not sure if it's only me but every time I'm trying to build Erlang on F-30 or Rawhide it takes forever to complete. It's all started relatively recently (maybe a month or two). See the following logs for the example.
* https://koji.fedoraproject.org/koji/taskinfo?taskID=36131019 * https://koji.fedoraproject.org/koji/taskinfo?taskID=36131065 (s390x task, two days and still work-in-progress)
Does anyone experience the same issue?
It's likely the big endian emulation running on little endian machines which is killing performance. I also have some time sensitive package tests failing on s390x. On Thursday, July 11, 2019, 05:30:28 AM EDT, Peter Lemenkov lemenkov@gmail.com wrote:
Hello All, Not sure if it's only me but every time I'm trying to build Erlang on F-30 or Rawhide it takes forever to complete. It's all started relatively recently (maybe a month or two). See the following logs for the example.
* https://koji.fedoraproject.org/koji/taskinfo?taskID=36131019 * https://koji.fedoraproject.org/koji/taskinfo?taskID=36131065 (s390x task, two days and still work-in-progress)
Does anyone experience the same issue?
On Thu, 11 Jul 2019 10:11:40 +0000 (UTC) Philip Kovacs via devel devel@lists.fedoraproject.org wrote:
It's likely the big endian emulation running on little endian machines which is killing performance. I also have some time sensitive package tests failing on s390x. On Thursday, July 11,
there is no emulation in place, all builds are native, so it could be a weirdness in the VM or the host for the builders being overloaded.
I'm running a local rebuild in rawhide now, so will see how fast this goes.
Dan
2019, 05:30:28 AM EDT, Peter Lemenkov lemenkov@gmail.com wrote: Hello All, Not sure if it's only me but every time I'm trying to build Erlang on F-30 or Rawhide it takes forever to complete. It's all started relatively recently (maybe a month or two). See the following logs for the example.
- https://koji.fedoraproject.org/koji/taskinfo?taskID=36131019
- https://koji.fedoraproject.org/koji/taskinfo?taskID=36131065 (s390x
task, two days and still work-in-progress)
Does anyone experience the same issue?
With best regards, Peter Lemenkov. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Thu, 11 Jul 2019 12:33:58 +0200 Dan Horák dan@danny.cz wrote:
On Thu, 11 Jul 2019 10:11:40 +0000 (UTC) Philip Kovacs via devel devel@lists.fedoraproject.org wrote:
It's likely the big endian emulation running on little endian machines which is killing performance. I also have some time sensitive package tests failing on s390x. On Thursday, July 11,
there is no emulation in place, all builds are native, so it could be a weirdness in the VM or the host for the builders being overloaded.
I'm running a local rebuild in rawhide now, so will see how fast this goes.
and it took ~1 hour, so likely the builder VM deserves a reboot
Dan
Dan
2019, 05:30:28 AM EDT, Peter Lemenkov lemenkov@gmail.com wrote: Hello All, Not sure if it's only me but every time I'm trying to build Erlang on F-30 or Rawhide it takes forever to complete. It's all started relatively recently (maybe a month or two). See the following logs for the example.
- https://koji.fedoraproject.org/koji/taskinfo?taskID=36131019
- https://koji.fedoraproject.org/koji/taskinfo?taskID=36131065
(s390x task, two days and still work-in-progress)
Does anyone experience the same issue?
With best regards, Peter Lemenkov. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Thu, 11 Jul 2019 at 06:47, Philip Kovacs via devel < devel@lists.fedoraproject.org> wrote:
It's likely the big endian emulation running on little endian machines which is killing performance. I also have some time sensitive package tests failing on s390x.
I am a bit confused by what you are saying. Could you restate? Compilation on s390x happens on native hardware. But are you meaning that the erlang VM is little endian and the big-endian s390x is having to emulate it?
We have been having problems with the s390x systems lately due to multiple reasons outside of infrastructure's control. Kevin Fenzi has been working on trying to get a replacement set of builders in place over the last week to try and deal with this. I do not know if your build got stuck on a builder which was transitioning or something similar.
On Thursday, July 11, 2019, 05:30:28 AM EDT, Peter Lemenkov < lemenkov@gmail.com> wrote:
Hello All, Not sure if it's only me but every time I'm trying to build Erlang on F-30 or Rawhide it takes forever to complete. It's all started relatively recently (maybe a month or two). See the following logs for the example.
- https://koji.fedoraproject.org/koji/taskinfo?taskID=36131019
- https://koji.fedoraproject.org/koji/taskinfo?taskID=36131065 (s390x
task, two days and still work-in-progress)
Does anyone experience the same issue?
With best regards, Peter Lemenkov. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On 7/11/19 4:13 AM, Stephen John Smoogen wrote:
On Thu, 11 Jul 2019 at 06:47, Philip Kovacs via devel < devel@lists.fedoraproject.org> wrote:
It's likely the big endian emulation running on little endian machines which is killing performance. I also have some time sensitive package tests failing on s390x.
I am a bit confused by what you are saying. Could you restate? Compilation on s390x happens on native hardware. But are you meaning that the erlang VM is little endian and the big-endian s390x is having to emulate it?
We have been having problems with the s390x systems lately due to multiple reasons outside of infrastructure's control. Kevin Fenzi has been working on trying to get a replacement set of builders in place over the last week to try and deal with this. I do not know if your build got stuck on a builder which was transitioning or something similar.
So, yeah, part of this is my fault. :(
If you look at https://koji.fedoraproject.org/koji/taskinfo?taskID=36131065 you will note that there are 4 buildroots listed. That means the build was restarted 4 times. Likely this happened when I was installing/updating/shuffling builders around there. Sorry about that.
That said, it still should have finished long ago.
The current state of s390x builders:
buildvm-s390x-01 to 14 are all the existing Z/VM instances we have had for a long while. Likely we are going to keep them for a while until we are sure the kvm instances are reliable and happy.
buildvm-s390x-15 to 24 are new kvm instances. They have a higher 'weight' than the others and from my testing are much faster.
Your build seems perhaps stuck in tests?
I've freed it (again) and it's now running on buildvm-s390x-22 (a kvm instance). Lets see how it does there. I will in the meantime update/reboot the Z/VM instances.
kevin
Hello All! It started to get stuck again. Right now I'm experiencing this issue with RabbitMQ for F-30 and F-31:
* https://koji.fedoraproject.org/koji/taskinfo?taskID=36457376 * https://koji.fedoraproject.org/koji/taskinfo?taskID=36457345
чт, 11 июл. 2019 г. в 19:14, Kevin Fenzi kevin@scrye.com:
On 7/11/19 4:13 AM, Stephen John Smoogen wrote:
On Thu, 11 Jul 2019 at 06:47, Philip Kovacs via devel < devel@lists.fedoraproject.org> wrote:
It's likely the big endian emulation running on little endian machines which is killing performance. I also have some time sensitive package tests failing on s390x.
I am a bit confused by what you are saying. Could you restate? Compilation on s390x happens on native hardware. But are you meaning that the erlang VM is little endian and the big-endian s390x is having to emulate it?
We have been having problems with the s390x systems lately due to multiple reasons outside of infrastructure's control. Kevin Fenzi has been working on trying to get a replacement set of builders in place over the last week to try and deal with this. I do not know if your build got stuck on a builder which was transitioning or something similar.
So, yeah, part of this is my fault. :(
If you look at https://koji.fedoraproject.org/koji/taskinfo?taskID=36131065 you will note that there are 4 buildroots listed. That means the build was restarted 4 times. Likely this happened when I was installing/updating/shuffling builders around there. Sorry about that.
That said, it still should have finished long ago.
The current state of s390x builders:
buildvm-s390x-01 to 14 are all the existing Z/VM instances we have had for a long while. Likely we are going to keep them for a while until we are sure the kvm instances are reliable and happy.
buildvm-s390x-15 to 24 are new kvm instances. They have a higher 'weight' than the others and from my testing are much faster.
Your build seems perhaps stuck in tests?
I've freed it (again) and it's now running on buildvm-s390x-22 (a kvm instance). Lets see how it does there. I will in the meantime update/reboot the Z/VM instances.
kevin
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On 7/25/19 3:04 AM, Peter Lemenkov wrote:
Hello All! It started to get stuck again. Right now I'm experiencing this issue with RabbitMQ for F-30 and F-31:
So, yeah.
| |-kojid,31279 /usr/sbin/kojid --fg --force-lock --verbose | | `-mock,31584 -tt /usr/libexec/mock/mock -r koji/f30-build-16961487-1222718 --old-chroot --no-clean --target s390x ... | | `-rpmbuild,32205 -bb --target s390x --nodeps /builddir/build/SPECS/rabbitmq-server.spec | | `-sh,32237 -e /var/tmp/rpm-tmp.GwzEQt | | `-make,32238 -j4 VERSION=3.7.16 V=1 | | `-sh,32318 -c... | | `-make,2112 -C /builddir/build/BUILD/rabbitmq-server-3.7.16/deps/amqp10_client IS_DEP=1 | | `-make,2237 --no-print-directory app-build | | `-beam.smp,2302 -sbtu -A0 -- -root /usr/lib64/erlang -progname erl -- -home /builddir -- ... | | |-{beam.smp},2303 | | |-{beam.smp},2304 | | |-erl_child_setup,2305 1024 | | |-{beam.smp},2306 | | |-{beam.smp},2307 | | |-{beam.smp},2308 | | |-{beam.smp},2309 | | |-{beam.smp},2310 | | |-{beam.smp},2311 | | |-{beam.smp},2312 | | |-{beam.smp},2313 | | |-{beam.smp},2314 | | |-{beam.smp},2315 | | |-{beam.smp},2316 | | |-{beam.smp},2317 | | |-{beam.smp},2318 | | |-{beam.smp},2319 | | |-{beam.smp},2320 | | |-{beam.smp},2321 | | |-{beam.smp},2322 | | |-{beam.smp},2323 | | |-{beam.smp},2324 | | `-{beam.smp},2325
When I strace the 2302 process:
strace: Process 2302 attached with 23 threads [pid 2324] ppoll([{fd=12, events=POLLIN|POLLRDNORM}], 1, NULL, NULL, 8 <unfinished ...> [pid 2320] futex(0x3ff58800550, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2321] futex(0x3ff58800590, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2319] futex(0x3ff58800510, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2318] futex(0x3ff588004d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2317] futex(0x3ff58800490, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2316] futex(0x3ff58800450, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2315] futex(0x3ff58800410, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2313] futex(0x3ff58800390, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2312] futex(0x3ff58800350, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2308] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2303] read(14, <unfinished ...> [pid 2302] select(0, NULL, NULL, NULL, NULL <unfinished ...> [pid 2309] restart_syscall(<... resuming interrupted select ...> <unfinished ...> [pid 2323] futex(0x3ff58800610, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2322] futex(0x3ff588005d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2314] futex(0x3ff588003d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2311] futex(0x3ff58800310, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2310] futex(0x3ff588002d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2306] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2304] futex(0x2aa3d9af520, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 2307] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_n sec=0}}, NULL) = 0 [pid 2325] epoll_wait(4, <unfinished ...> [pid 2307] futex(0x3ff588001d0, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 2306] <... restart_syscall resumed>) = 0 [pid 2307] fcntl(2, F_GETFL <unfinished ...> [pid 2306] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=23, tv_ nsec=941692107}}, <unfinished ...>
and then... it starts going again. So something was stuck and strace unstuck it?
So, it looks like some odd signal thing with s390x?
Not sure, but perhaps we should file a bug and try and track it more?
kevin
Hello All,
I've just got hit by this again - lockups on s390. Looks like I have 100% reproducer (just try to build Erlang and it will stuck eventually).
* https://koji.fedoraproject.org/koji/taskinfo?taskID=37327589
Where should I open a ticket? Bugzilla.redhat.com or somewhere else?
чт, 25 июл. 2019 г. в 17:44, Kevin Fenzi kevin@scrye.com:
On 7/25/19 3:04 AM, Peter Lemenkov wrote:
Hello All! It started to get stuck again. Right now I'm experiencing this issue with RabbitMQ for F-30 and F-31:
So, yeah.
| |-kojid,31279 /usr/sbin/kojid --fg --force-lock --verbose | | `-mock,31584 -tt /usr/libexec/mock/mock -r koji/f30-build-16961487-1222718 --old-chroot --no-clean --target s390x ... | | `-rpmbuild,32205 -bb --target s390x --nodeps /builddir/build/SPECS/rabbitmq-server.spec | | `-sh,32237 -e /var/tmp/rpm-tmp.GwzEQt | | `-make,32238 -j4 VERSION=3.7.16 V=1 | | `-sh,32318 -c... | | `-make,2112 -C /builddir/build/BUILD/rabbitmq-server-3.7.16/deps/amqp10_client IS_DEP=1 | | `-make,2237 --no-print-directory app-build | | `-beam.smp,2302 -sbtu -A0 -- -root /usr/lib64/erlang -progname erl -- -home /builddir -- ... | | |-{beam.smp},2303 | | |-{beam.smp},2304 | | |-erl_child_setup,2305 1024 | | |-{beam.smp},2306 | | |-{beam.smp},2307 | | |-{beam.smp},2308 | | |-{beam.smp},2309 | | |-{beam.smp},2310 | | |-{beam.smp},2311 | | |-{beam.smp},2312 | | |-{beam.smp},2313 | | |-{beam.smp},2314 | | |-{beam.smp},2315 | | |-{beam.smp},2316 | | |-{beam.smp},2317 | | |-{beam.smp},2318 | | |-{beam.smp},2319 | | |-{beam.smp},2320 | | |-{beam.smp},2321 | | |-{beam.smp},2322 | | |-{beam.smp},2323 | | |-{beam.smp},2324 | | `-{beam.smp},2325
When I strace the 2302 process:
strace: Process 2302 attached with 23 threads [pid 2324] ppoll([{fd=12, events=POLLIN|POLLRDNORM}], 1, NULL, NULL, 8 <unfinished ...> [pid 2320] futex(0x3ff58800550, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2321] futex(0x3ff58800590, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2319] futex(0x3ff58800510, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2318] futex(0x3ff588004d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2317] futex(0x3ff58800490, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2316] futex(0x3ff58800450, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2315] futex(0x3ff58800410, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2313] futex(0x3ff58800390, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2312] futex(0x3ff58800350, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2308] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2303] read(14, <unfinished ...> [pid 2302] select(0, NULL, NULL, NULL, NULL <unfinished ...> [pid 2309] restart_syscall(<... resuming interrupted select ...> <unfinished ...> [pid 2323] futex(0x3ff58800610, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2322] futex(0x3ff588005d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2314] futex(0x3ff588003d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2311] futex(0x3ff58800310, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2310] futex(0x3ff588002d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2306] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2304] futex(0x2aa3d9af520, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 2307] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_n sec=0}}, NULL) = 0 [pid 2325] epoll_wait(4, <unfinished ...> [pid 2307] futex(0x3ff588001d0, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 2306] <... restart_syscall resumed>) = 0 [pid 2307] fcntl(2, F_GETFL <unfinished ...> [pid 2306] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=23, tv_ nsec=941692107}}, <unfinished ...>
and then... it starts going again. So something was stuck and strace unstuck it?
So, it looks like some odd signal thing with s390x?
Not sure, but perhaps we should file a bug and try and track it more?
kevin
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On 8/28/19 2:13 PM, Peter Lemenkov wrote:
Hello All,
I've just got hit by this again - lockups on s390. Looks like I have 100% reproducer (just try to build Erlang and it will stuck eventually).
Where should I open a ticket? Bugzilla.redhat.com or somewhere else?
I'm really not fully sure. ;(
Perhaps a kernel bug and get the s390x folks involved?
kevin --
чт, 25 июл. 2019 г. в 17:44, Kevin Fenzi kevin@scrye.com:
On 7/25/19 3:04 AM, Peter Lemenkov wrote:
Hello All! It started to get stuck again. Right now I'm experiencing this issue with RabbitMQ for F-30 and F-31:
So, yeah.
| |-kojid,31279 /usr/sbin/kojid --fg --force-lock --verbose | | `-mock,31584 -tt /usr/libexec/mock/mock -r koji/f30-build-16961487-1222718 --old-chroot --no-clean --target s390x ... | | `-rpmbuild,32205 -bb --target s390x --nodeps /builddir/build/SPECS/rabbitmq-server.spec | | `-sh,32237 -e /var/tmp/rpm-tmp.GwzEQt | | `-make,32238 -j4 VERSION=3.7.16 V=1 | | `-sh,32318 -c... | | `-make,2112 -C /builddir/build/BUILD/rabbitmq-server-3.7.16/deps/amqp10_client IS_DEP=1 | | `-make,2237 --no-print-directory app-build | | `-beam.smp,2302 -sbtu -A0 -- -root /usr/lib64/erlang -progname erl -- -home /builddir -- ... | | |-{beam.smp},2303 | | |-{beam.smp},2304 | | |-erl_child_setup,2305 1024 | | |-{beam.smp},2306 | | |-{beam.smp},2307 | | |-{beam.smp},2308 | | |-{beam.smp},2309 | | |-{beam.smp},2310 | | |-{beam.smp},2311 | | |-{beam.smp},2312 | | |-{beam.smp},2313 | | |-{beam.smp},2314 | | |-{beam.smp},2315 | | |-{beam.smp},2316 | | |-{beam.smp},2317 | | |-{beam.smp},2318 | | |-{beam.smp},2319 | | |-{beam.smp},2320 | | |-{beam.smp},2321 | | |-{beam.smp},2322 | | |-{beam.smp},2323 | | |-{beam.smp},2324 | | `-{beam.smp},2325
When I strace the 2302 process:
strace: Process 2302 attached with 23 threads [pid 2324] ppoll([{fd=12, events=POLLIN|POLLRDNORM}], 1, NULL, NULL, 8 <unfinished ...> [pid 2320] futex(0x3ff58800550, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2321] futex(0x3ff58800590, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2319] futex(0x3ff58800510, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2318] futex(0x3ff588004d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2317] futex(0x3ff58800490, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2316] futex(0x3ff58800450, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2315] futex(0x3ff58800410, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2313] futex(0x3ff58800390, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2312] futex(0x3ff58800350, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2308] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2303] read(14, <unfinished ...> [pid 2302] select(0, NULL, NULL, NULL, NULL <unfinished ...> [pid 2309] restart_syscall(<... resuming interrupted select ...> <unfinished ...> [pid 2323] futex(0x3ff58800610, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2322] futex(0x3ff588005d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2314] futex(0x3ff588003d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2311] futex(0x3ff58800310, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2310] futex(0x3ff588002d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2306] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2304] futex(0x2aa3d9af520, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 2307] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_n sec=0}}, NULL) = 0 [pid 2325] epoll_wait(4, <unfinished ...> [pid 2307] futex(0x3ff588001d0, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 2306] <... restart_syscall resumed>) = 0 [pid 2307] fcntl(2, F_GETFL <unfinished ...> [pid 2306] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=23, tv_ nsec=941692107}}, <unfinished ...>
and then... it starts going again. So something was stuck and strace unstuck it?
So, it looks like some odd signal thing with s390x?
Not sure, but perhaps we should file a bug and try and track it more?
kevin
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Fri, 30 Aug 2019 11:19:37 -0700 Kevin Fenzi kevin@scrye.com wrote:
On 8/28/19 2:13 PM, Peter Lemenkov wrote:
Hello All,
I've just got hit by this again - lockups on s390. Looks like I have 100% reproducer (just try to build Erlang and it will stuck eventually).
Where should I open a ticket? Bugzilla.redhat.com or somewhere else?
I'm really not fully sure. ;(
Perhaps a kernel bug and get the s390x folks involved?
I've already replied in https://pagure.io/releng/issue/8711 - if F-30 builds are OK, the builders are the same, then it should be a change in F-31+, perhaps glibc ... I'm going to give it a try locally.
Dan
kevin
чт, 25 июл. 2019 г. в 17:44, Kevin Fenzi kevin@scrye.com:
On 7/25/19 3:04 AM, Peter Lemenkov wrote:
Hello All! It started to get stuck again. Right now I'm experiencing this issue with RabbitMQ for F-30 and F-31:
So, yeah.
| |-kojid,31279 /usr/sbin/kojid --fg --force-lock --verbose | | `-mock,31584 -tt /usr/libexec/mock/mock -r | | koji/f30-build-16961487-1222718 --old-chroot --no-clean | | --target s390x ... `-rpmbuild,32205 -bb --target s390x | | --nodeps /builddir/build/SPECS/rabbitmq-server.spec `-sh, | | 32237 -e /var/tmp/rpm-tmp.GwzEQt `-make,32238 -j4 | | VERSION=3.7.16 V=1 `-sh,32318 -c... `-make,2112 | | -C /builddir/build/BUILD/rabbitmq-server-3.7.16/deps/amqp10_client | | IS_DEP=1 `-make,2237 --no-print-directory app-build | | `-beam.smp,2302 -sbtu -A0 -- -root /usr/lib64/erlang | | -progname erl -- -home /builddir -- ... | | |-{beam.smp},2303 | | |-{beam.smp},2304 | | |-erl_child_setup,2305 | | |1024 -{beam.smp},2306 | | |-{beam.smp},2307 | | |-{beam.smp},2308 | | |-{beam.smp},2309 | | |-{beam.smp},2310 | | |-{beam.smp},2311 | | |-{beam.smp},2312 | | |-{beam.smp},2313 | | |-{beam.smp},2314 | | |-{beam.smp},2315 | | |-{beam.smp},2316 | | |-{beam.smp},2317 | | |-{beam.smp},2318 | | |-{beam.smp},2319 | | |-{beam.smp},2320 | | |-{beam.smp},2321 | | |-{beam.smp},2322 | | |-{beam.smp},2323 | | |-{beam.smp},2324 | | `-{beam.smp},2325
When I strace the 2302 process:
strace: Process 2302 attached with 23 threads [pid 2324] ppoll([{fd=12, events=POLLIN|POLLRDNORM}], 1, NULL, NULL, 8 <unfinished ...> [pid 2320] futex(0x3ff58800550, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2321] futex(0x3ff58800590, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2319] futex(0x3ff58800510, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2318] futex(0x3ff588004d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2317] futex(0x3ff58800490, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2316] futex(0x3ff58800450, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2315] futex(0x3ff58800410, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2313] futex(0x3ff58800390, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2312] futex(0x3ff58800350, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2308] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2303] read(14, <unfinished ...> [pid 2302] select(0, NULL, NULL, NULL, NULL <unfinished ...> [pid 2309] restart_syscall(<... resuming interrupted select ...> <unfinished ...> [pid 2323] futex(0x3ff58800610, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2322] futex(0x3ff588005d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2314] futex(0x3ff588003d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2311] futex(0x3ff58800310, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2310] futex(0x3ff588002d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2306] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2304] futex(0x2aa3d9af520, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 2307] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_n sec=0}}, NULL) = 0 [pid 2325] epoll_wait(4, <unfinished ...> [pid 2307] futex(0x3ff588001d0, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 2306] <... restart_syscall resumed>) = 0 [pid 2307] fcntl(2, F_GETFL <unfinished ...> [pid 2306] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=23, tv_ nsec=941692107}}, <unfinished ...>
and then... it starts going again. So something was stuck and strace unstuck it?
So, it looks like some odd signal thing with s390x?
Not sure, but perhaps we should file a bug and try and track it more?
kevin
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Wed, 2019-08-28 at 23:13 +0200, Peter Lemenkov wrote:
Hello All,
I've just got hit by this again - lockups on s390. Looks like I have 100% reproducer (just try to build Erlang and it will stuck eventually).
Where should I open a ticket? Bugzilla.redhat.com or somewhere else?
Maybe you should cancel this build and start a new one , this build [1] seems stuck and stopped , some times may happen .
[1] https://koji.fedoraproject.org/koji/watchlogs?taskID=37327589
чт, 25 июл. 2019 г. в 17:44, Kevin Fenzi kevin@scrye.com:
On 7/25/19 3:04 AM, Peter Lemenkov wrote:
Hello All! It started to get stuck again. Right now I'm experiencing this issue with RabbitMQ for F-30 and F-31:
So, yeah.
| |-kojid,31279 /usr/sbin/kojid --fg --force-lock --verbose | | `-mock,31584 -tt /usr/libexec/mock/mock -r koji/f30- build-16961487-1222718 --old-chroot --no-clean --target s390x ... | | `-rpmbuild,32205 -bb --target s390x --nodeps /builddir/build/SPECS/rabbitmq-server.spec | | `-sh,32237 -e /var/tmp/rpm-tmp.GwzEQt | | `-make,32238 -j4 VERSION=3.7.16 V=1 | | `-sh,32318 -c... | | `-make,2112 -C /builddir/build/BUILD/rabbitmq-server-3.7.16/deps/amqp10_client IS_DEP=1 | | `-make,2237 --no-print- directory app-build | | `-beam.smp,2302 -sbtu -A0 -- -root /usr/lib64/erlang -progname erl -- -home /builddir -- ... | | |-{beam.smp},2303 | | |-{beam.smp},2304 | | |-erl_child_setup,2305 1024 | | |-{beam.smp},2306 | | |-{beam.smp},2307 | | |-{beam.smp},2308 | | |-{beam.smp},2309 | | |-{beam.smp},2310 | | |-{beam.smp},2311 | | |-{beam.smp},2312 | | |-{beam.smp},2313 | | |-{beam.smp},2314 | | |-{beam.smp},2315 | | |-{beam.smp},2316 | | |-{beam.smp},2317 | | |-{beam.smp},2318 | | |-{beam.smp},2319 | | |-{beam.smp},2320 | | |-{beam.smp},2321 | | |-{beam.smp},2322 | | |-{beam.smp},2323 | | |-{beam.smp},2324 | | `-{beam.smp},2325
When I strace the 2302 process:
strace: Process 2302 attached with 23 threads [pid 2324] ppoll([{fd=12, events=POLLIN|POLLRDNORM}], 1, NULL, NULL, 8 <unfinished ...> [pid 2320] futex(0x3ff58800550, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2321] futex(0x3ff58800590, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2319] futex(0x3ff58800510, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2318] futex(0x3ff588004d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2317] futex(0x3ff58800490, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2316] futex(0x3ff58800450, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2315] futex(0x3ff58800410, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2313] futex(0x3ff58800390, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2312] futex(0x3ff58800350, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2308] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2303] read(14, <unfinished ...> [pid 2302] select(0, NULL, NULL, NULL, NULL <unfinished ...> [pid 2309] restart_syscall(<... resuming interrupted select ...> <unfinished ...> [pid 2323] futex(0x3ff58800610, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2322] futex(0x3ff588005d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2314] futex(0x3ff588003d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2311] futex(0x3ff58800310, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2310] futex(0x3ff588002d0, FUTEX_WAIT_PRIVATE, 4294967295, NULL <unfinished ...> [pid 2306] restart_syscall(<... resuming interrupted syscall_0xfffffffffffffdfc ...> <unfinish ed ...> [pid 2304] futex(0x2aa3d9af520, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...> [pid 2307] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_n sec=0}}, NULL) = 0 [pid 2325] epoll_wait(4, <unfinished ...> [pid 2307] futex(0x3ff588001d0, FUTEX_WAKE_PRIVATE, 1) = 1 [pid 2306] <... restart_syscall resumed>) = 0 [pid 2307] fcntl(2, F_GETFL <unfinished ...> [pid 2306] timerfd_settime(11, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=23, tv_ nsec=941692107}}, <unfinished ...>
and then... it starts going again. So something was stuck and strace unstuck it?
So, it looks like some odd signal thing with s390x?
Not sure, but perhaps we should file a bug and try and track it more?
kevin
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
-- With best regards, Peter Lemenkov. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Thu, Jul 11, 2019, 6:06 AM Peter Lemenkov lemenkov@gmail.com wrote:
Hello All, Not sure if it's only me but every time I'm trying to build Erlang on F-30 or Rawhide it takes forever to complete. It's all started relatively recently (maybe a month or two). See the following logs for the example.
- https://koji.fedoraproject.org/koji/taskinfo?taskID=36131019
- https://koji.fedoraproject.org/koji/taskinfo?taskID=36131065 (s390x
task, two days and still work-in-progress)
Does anyone experience the same issue?
With best regards, Peter Lemenkov. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org