I was hitting a hang when doing ostree installs which the ever awesome Brian Lane tracked down to be generic hangs with any kickstart installation relating to udev.
The problem is udevadm settle vs network namespaces. A variety of systemd services now use PrivateNetwork= which ends up creating a new network namespace, and that "eats" udev events from the perspective of the system udev.
This means that udev will hang waiting for a sequence number it won't see.
I talked to Kay about this and he bumped it up on his priority list, he says it's obviously a systemd bug since they're both in the same codebase now. In the meantime we may need to figure out a workaround.
Options: 1) Postprocess all systemd units in lorax and remove PrivateNetwork= (I think this might work) 2) Replace "udev_settle()" calls in blivet with "sleep 1m" or something 3) Investigate a better API than udev settle; it seems to me we could more intelligently wait for just storage for example.
Any other ideas?
But yeah, just a generic heads up that kickstart installation with rawhide seems to trigger this reliably.
On Fri, 2014-04-11 at 19:09 +0000, Colin Walters wrote:
I was hitting a hang when doing ostree installs which the ever awesome Brian Lane tracked down to be generic hangs with any kickstart installation relating to udev.
The problem is udevadm settle vs network namespaces. A variety of systemd services now use PrivateNetwork= which ends up creating a new network namespace, and that "eats" udev events from the perspective of the system udev.
This means that udev will hang waiting for a sequence number it won't see.
I talked to Kay about this and he bumped it up on his priority list, he says it's obviously a systemd bug since they're both in the same codebase now. In the meantime we may need to figure out a workaround.
Options:
- Postprocess all systemd units in lorax and remove PrivateNetwork= (I
think this might work) 2) Replace "udev_settle()" calls in blivet with "sleep 1m" or something 3) Investigate a better API than udev settle; it seems to me we could more intelligently wait for just storage for example.
4) Investigate a better implementation for what udev does which however means quite a lot of work, but would prevent many many issues all over the system.
Other than that I believe that we shouldn't do any workarounds for something that new systemd+udev version broke. If any package seriously breaks the system, it should be reverted or fixed ASAP. It's not other packages' (and their mantainers') responsibility to work around those issues.
On Mon, 2014-04-14 at 09:58 +0200, Vratislav Podzimek wrote:
On Fri, 2014-04-11 at 19:09 +0000, Colin Walters wrote:
I was hitting a hang when doing ostree installs which the ever awesome Brian Lane tracked down to be generic hangs with any kickstart installation relating to udev.
The problem is udevadm settle vs network namespaces. A variety of systemd services now use PrivateNetwork= which ends up creating a new network namespace, and that "eats" udev events from the perspective of the system udev.
This means that udev will hang waiting for a sequence number it won't see.
I talked to Kay about this and he bumped it up on his priority list, he says it's obviously a systemd bug since they're both in the same codebase now. In the meantime we may need to figure out a workaround.
Options:
- Postprocess all systemd units in lorax and remove PrivateNetwork= (I
think this might work) 2) Replace "udev_settle()" calls in blivet with "sleep 1m" or something 3) Investigate a better API than udev settle; it seems to me we could more intelligently wait for just storage for example.
- Investigate a better implementation for what udev does which however
means quite a lot of work, but would prevent many many issues all over the system.
Other than that I believe that we shouldn't do any workarounds for something that new systemd+udev version broke. If any package seriously breaks the system, it should be reverted or fixed ASAP. It's not other packages' (and their mantainers') responsibility to work around those issues.
Exactly, such severe regression should be fixed at the source ASAP, not hacked around in the rest of the system.
On Mon, Apr 14, 2014 at 12:58 AM, Vratislav Podzimek vpodzime@redhat.com wrote:
Other than that I believe that we shouldn't do any workarounds for something that new systemd+udev version broke. If any package seriously breaks the system, it should be reverted or fixed ASAP.
It is fixed now apparently:
http://cgit.freedesktop.org/systemd/systemd/commit/?id=9ea28c55a2488e6cd4a44...
It's not other packages' (and their mantainers') responsibility to work around those issues.
I don't think the issue is as black-and-white as that. Sometimes it's worthwhile to carry fixes in two places, and in this case it's a completely safe one-liner.
anaconda-devel@lists.stg.fedoraproject.org