In current systemd implementation, nofail mount will not block local-fs.target, which means our kdump.sh (in dracut-pre-pivot.service) can't wait for nofail mount. And kdump.sh could run early than nofail mount happens.
For short term, let's stop passing nofail to mount. As for sysroot.mount, since we have explicitly specify to wait for it, "nofail" isn't a problem.
Signed-off-by: WANG Chao chaowang@redhat.com --- mkdumprd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mkdumprd b/mkdumprd index 84f1e18..6e9dc47 100644 --- a/mkdumprd +++ b/mkdumprd @@ -103,7 +103,9 @@ to_mount() { _t=$(findmnt -k -f -n -r -o TARGET,FSTYPE $_dev) _o=$(findmnt -k -f -n -r -o OPTIONS $_dev) _o=${_o/#ro/rw} #mount fs target as rw in 2nd kernel - _o="${_o},nofail" #with nofail set, systemd won't block for mount failure + # "nofail" mount could be run later than kdump.sh. So we don't pass nofail + # for short term. + #_o="${_o},nofail" #with nofail set, systemd won't block for mount failure _mntopts="$_t $_o" #for non-nfs _dev converting to use udev persistent name if [ -b "$_s" ]; then
On Tue, Apr 08, 2014 at 01:15:26PM +0800, WANG Chao wrote:
In current systemd implementation, nofail mount will not block local-fs.target, which means our kdump.sh (in dracut-pre-pivot.service) can't wait for nofail mount. And kdump.sh could run early than nofail mount happens.
For short term, let's stop passing nofail to mount. As for sysroot.mount, since we have explicitly specify to wait for it, "nofail" isn't a problem.
Signed-off-by: WANG Chao chaowang@redhat.com
Chao,
I see that we are passing rootflags=nofail. What's the effect of that?
We also need to specify in chagnelog the flip side of the patch. That is now in case of failure, we probably will not get control and I think systemd can put us in rescue mode.
I talked to lennart and he was open to the idea of resue being replaced by something else. I will send him a mail to implement that. After that I am hoping that we can replace systemd rescue with something kdump specific so that we get control in case of failure and then we can run our policies.
Thanks Vivek
mkdumprd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mkdumprd b/mkdumprd index 84f1e18..6e9dc47 100644 --- a/mkdumprd +++ b/mkdumprd @@ -103,7 +103,9 @@ to_mount() { _t=$(findmnt -k -f -n -r -o TARGET,FSTYPE $_dev) _o=$(findmnt -k -f -n -r -o OPTIONS $_dev) _o=${_o/#ro/rw} #mount fs target as rw in 2nd kernel
- _o="${_o},nofail" #with nofail set, systemd won't block for mount failure
- # "nofail" mount could be run later than kdump.sh. So we don't pass nofail
- # for short term.
- #_o="${_o},nofail" #with nofail set, systemd won't block for mount failure _mntopts="$_t $_o" #for non-nfs _dev converting to use udev persistent name if [ -b "$_s" ]; then
-- 1.8.5.3
On 04/08/14 at 10:01am, Vivek Goyal wrote:
On Tue, Apr 08, 2014 at 01:15:26PM +0800, WANG Chao wrote:
In current systemd implementation, nofail mount will not block local-fs.target, which means our kdump.sh (in dracut-pre-pivot.service) can't wait for nofail mount. And kdump.sh could run early than nofail mount happens.
For short term, let's stop passing nofail to mount. As for sysroot.mount, since we have explicitly specify to wait for it, "nofail" isn't a problem.
Signed-off-by: WANG Chao chaowang@redhat.com
Chao,
I see that we are passing rootflags=nofail. What's the effect of that?
Same effect as other mount. But since we will explicitly wait for sysroot.mount in dracut-pre-pivot.service, we should be worried about sysroot.mount. rootflags=nofail works as expected.
We also need to specify in chagnelog the flip side of the patch. That is now in case of failure, we probably will not get control and I think systemd can put us in rescue mode.
No, we disable dropping to shell. So we hang in case of such failure.
I talked to lennart and he was open to the idea of resue being replaced by something else. I will send him a mail to implement that. After that I am hoping that we can replace systemd rescue with something kdump specific so that we get control in case of failure and then we can run our policies.
Before we have such facility in systemd, do you think we should remove "nofail"? Or we just leave as it is because remove nofail will lead failure to hang?
Thanks WANG Chao
Thanks Vivek
mkdumprd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mkdumprd b/mkdumprd index 84f1e18..6e9dc47 100644 --- a/mkdumprd +++ b/mkdumprd @@ -103,7 +103,9 @@ to_mount() { _t=$(findmnt -k -f -n -r -o TARGET,FSTYPE $_dev) _o=$(findmnt -k -f -n -r -o OPTIONS $_dev) _o=${_o/#ro/rw} #mount fs target as rw in 2nd kernel
- _o="${_o},nofail" #with nofail set, systemd won't block for mount failure
- # "nofail" mount could be run later than kdump.sh. So we don't pass nofail
- # for short term.
- #_o="${_o},nofail" #with nofail set, systemd won't block for mount failure _mntopts="$_t $_o" #for non-nfs _dev converting to use udev persistent name if [ -b "$_s" ]; then
-- 1.8.5.3
On Wed, Apr 09, 2014 at 12:35:01AM +0800, WANG Chao wrote:
On 04/08/14 at 10:01am, Vivek Goyal wrote:
On Tue, Apr 08, 2014 at 01:15:26PM +0800, WANG Chao wrote:
In current systemd implementation, nofail mount will not block local-fs.target, which means our kdump.sh (in dracut-pre-pivot.service) can't wait for nofail mount. And kdump.sh could run early than nofail mount happens.
For short term, let's stop passing nofail to mount. As for sysroot.mount, since we have explicitly specify to wait for it, "nofail" isn't a problem.
Signed-off-by: WANG Chao chaowang@redhat.com
Chao,
I see that we are passing rootflags=nofail. What's the effect of that?
Same effect as other mount. But since we will explicitly wait for sysroot.mount in dracut-pre-pivot.service, we should be worried about sysroot.mount. rootflags=nofail works as expected.
Sorry I did not get this. So how does rootflags=nofail work? We will wait for root to show up before we go with pre-pivot hooks?
We also need to specify in chagnelog the flip side of the patch. That is now in case of failure, we probably will not get control and I think systemd can put us in rescue mode.
No, we disable dropping to shell. So we hang in case of such failure.
If we always hang, what was the point of disabling dropping to shell?
I see that emergency shell is invoked by dracut directly. I think in those cases it will return immediately and dracut script will continue even after failure.
So question I have is that can we drop another file say module-emergency-handling and emergecny shell will call that. And kdump can drop module-emergency-handling file or create this link to kdump-error-handling and we can handle the error.
IOW, once dracut has encountedred the failure, is there any point in continuing further and then expect to drop into kdump module from pre-pivot hook.
I think if we error out early, I guess root might not be available and I think that's fine. There are so many places things can go wrong and we can't guarantee that root is available as backup target.
Second place of failure is from systemd. I see there are two emergency services rescue.service and dracut-emergency.service. They both call /bin/emergency-shell. So to me if we fix /bin/emergency-shell to call /bin/module-emergency-shell that would automatically make sure that system will not hang and kdump will get control after failure?
Though I am not sure who calls dracut-emergency.service. What's the dependency tree here.
I talked to lennart and he was open to the idea of resue being replaced by something else. I will send him a mail to implement that. After that I am hoping that we can replace systemd rescue with something kdump specific so that we get control in case of failure and then we can run our policies.
Before we have such facility in systemd, do you think we should remove "nofail"? Or we just leave as it is because remove nofail will lead failure to hang?
I think we have no choice but to remove "nofail" otherwise we will seek kdump failures as target might not be mounted. I am not sure if this problem is limited to non-root tarets only or not.
And then fix the error handling path.
Thanks Vivek
On 04/08/14 at 05:25pm, Vivek Goyal wrote:
On Wed, Apr 09, 2014 at 12:35:01AM +0800, WANG Chao wrote:
On 04/08/14 at 10:01am, Vivek Goyal wrote:
On Tue, Apr 08, 2014 at 01:15:26PM +0800, WANG Chao wrote:
In current systemd implementation, nofail mount will not block local-fs.target, which means our kdump.sh (in dracut-pre-pivot.service) can't wait for nofail mount. And kdump.sh could run early than nofail mount happens.
For short term, let's stop passing nofail to mount. As for sysroot.mount, since we have explicitly specify to wait for it, "nofail" isn't a problem.
Signed-off-by: WANG Chao chaowang@redhat.com
Chao,
I see that we are passing rootflags=nofail. What's the effect of that?
Same effect as other mount. But since we will explicitly wait for sysroot.mount in dracut-pre-pivot.service, we should be worried about sysroot.mount. rootflags=nofail works as expected.
Sorry I did not get this. So how does rootflags=nofail work? We will wait for root to show up before we go with pre-pivot hooks?
Sorry I didn't make myself clear.
rootflags=nofail has the following effect on sysroot.mount: - initrd-root-fs.target "Wants" sysroot.mount
W/o nofail: - initrd-root-fs.target "Requires" sysroot.mount - sysroot.mount is started "Before" initrd-root-fs.target
In both case, dracut-pre-pivot.service is getting started "After" sysroot.mount.
We also need to specify in chagnelog the flip side of the patch. That is now in case of failure, we probably will not get control and I think systemd can put us in rescue mode.
No, we disable dropping to shell. So we hang in case of such failure.
If we always hang, what was the point of disabling dropping to shell?
At the very beginning, we disabled dropping to shell at arbitrary point of the boot process, because we want to get into kdump.sh and do whatever error handling within kdump.sh itself.
For example, user specify "default reboot" and if we don't disable shell, we will drop into shell instead of action "reboot"
And given the fact shell is disabled, we introduced "nofail" to solve the hang issue when running into a disk failure.
I see that emergency shell is invoked by dracut directly. I think in those cases it will return immediately and dracut script will continue even after failure.
A disk failure would cause mount unit failure. A mount unit failure would cause a local-fs.target never be reached. local-fs.target never being reached would cause dracut-pre-pivot.service never get started.
However if "nofail" is specified for the mount unit, local-fs.target would only "Wants" the mount unit, rather "Requires" in "fail" mode. That said, local-fs.target would still be reached in case of "nofail" mount failure, but would never be reached in case of "fail" mount failure.
So question I have is that can we drop another file say module-emergency-handling and emergecny shell will call that. And kdump can drop module-emergency-handling file or create this link to kdump-error-handling and we can handle the error.
IOW, once dracut has encountedred the failure, is there any point in continuing further and then expect to drop into kdump module from pre-pivot hook.
I think if we error out early, I guess root might not be available and I think that's fine. There are so many places things can go wrong and we can't guarantee that root is available as backup target.
Second place of failure is from systemd. I see there are two emergency services rescue.service and dracut-emergency.service. They both call /bin/emergency-shell. So to me if we fix /bin/emergency-shell to call /bin/module-emergency-shell that would automatically make sure that system will not hang and kdump will get control after failure?
This proposal makes more sense to me. We can implement our own error handler and override the default one provided by dracut or systemd. This would give us more flexiability and scalability. We can do whatever we like depending on the failure type, user specified failsafe action and other fators.
The only thing is how we implement. I think an alternative approach is to create a new emergency service and put it under /etc/systemd/ or /run/systemd/. So it can take procedence over the default one under /usr/lib/systemd/.
In addition, as you said, Lennart is likely going to facilitate such replacing. We can wait until that happens and then make the decision.
Though I am not sure who calls dracut-emergency.service. What's the dependency tree here.
I talked to lennart and he was open to the idea of resue being replaced by something else. I will send him a mail to implement that. After that I am hoping that we can replace systemd rescue with something kdump specific so that we get control in case of failure and then we can run our policies.
Before we have such facility in systemd, do you think we should remove "nofail"? Or we just leave as it is because remove nofail will lead failure to hang?
I think we have no choice but to remove "nofail" otherwise we will seek kdump failures as target might not be mounted. I am not sure if this problem is limited to non-root tarets only or not.
Yes, it's limited to non-root mount.
Thanks WANG Chao
And then fix the error handling path.
Thanks Vivek
On Wed, Apr 09, 2014 at 03:03:49PM +0800, WANG Chao wrote: [..]
Sorry I did not get this. So how does rootflags=nofail work? We will wait for root to show up before we go with pre-pivot hooks?
Sorry I didn't make myself clear.
rootflags=nofail has the following effect on sysroot.mount:
- initrd-root-fs.target "Wants" sysroot.mount
W/o nofail:
- initrd-root-fs.target "Requires" sysroot.mount
- sysroot.mount is started "Before" initrd-root-fs.target
In both case, dracut-pre-pivot.service is getting started "After" sysroot.mount.
Ok so rootfs=nofail will change "Requires=sysroot.mount" to "Wants=sysroot.mount" in initrd-root-fs.target.
I think that's perfect. IIUC, Changing it to Wants= will mean that we will wait for sysroot.mount to activate and if activation fails, initrd-root-fs.target will be activated. That's the behavior we want in kdump.
So why should we get rid of rootfs=nofail?
We also need to specify in chagnelog the flip side of the patch. That is now in case of failure, we probably will not get control and I think systemd can put us in rescue mode.
No, we disable dropping to shell. So we hang in case of such failure.
If we always hang, what was the point of disabling dropping to shell?
At the very beginning, we disabled dropping to shell at arbitrary point of the boot process, because we want to get into kdump.sh and do whatever error handling within kdump.sh itself.
Yes and we thought by not dropping into shell we will continue with processing and ultimately reach kdump. But I think that's a very bad way to handle errors. Once an error has occured we should have a direct way to jump into kdump error handler.
For example, user specify "default reboot" and if we don't disable shell, we will drop into shell instead of action "reboot"
And given the fact shell is disabled, we introduced "nofail" to solve the hang issue when running into a disk failure.
What is "hang". Are you defining "hang" as dropping into shell? I am not able to understand what will happen if we don't pass "nofail".
I see that emergency shell is invoked by dracut directly. I think in those cases it will return immediately and dracut script will continue even after failure.
A disk failure would cause mount unit failure. A mount unit failure would cause a local-fs.target never be reached. local-fs.target never being reached would cause dracut-pre-pivot.service never get started.
dracut-pre-pivot.service has After= dependency on sysroot.mount.
After=initrd.target initrd-parse-etc.service sysroot.mount
So if sysroot.mount fails, dracut-pre-pivot should still be started? Are you sure that it does not get started if sysroot.mount fails.
However if "nofail" is specified for the mount unit, local-fs.target would only "Wants" the mount unit, rather "Requires" in "fail" mode.
I am not sure what's the logic behind converting Requires= to Wants= with nofail. So if by default we have "Wants=" dependencies on all mount files, then local-fs.target will reach even if some mount failed.
But systemd folks might not like this idea as they might have other reasons for why they are using Requires= by default.
That said, local-fs.target would still be reached in case of "nofail" mount failure, but would never be reached in case of "fail" mount failure.
Got it. By default local-fs.target has Requires= dependencies and nofail converts that into Wants= dependency and that helps in our case.
[..]
So question I have is that can we drop another file say module-emergency-handling and emergecny shell will call that. And kdump can drop module-emergency-handling file or create this link to kdump-error-handling and we can handle the error.
IOW, once dracut has encountedred the failure, is there any point in continuing further and then expect to drop into kdump module from pre-pivot hook.
I think if we error out early, I guess root might not be available and I think that's fine. There are so many places things can go wrong and we can't guarantee that root is available as backup target.
Second place of failure is from systemd. I see there are two emergency services rescue.service and dracut-emergency.service. They both call /bin/emergency-shell. So to me if we fix /bin/emergency-shell to call /bin/module-emergency-shell that would automatically make sure that system will not hang and kdump will get control after failure?
This proposal makes more sense to me. We can implement our own error handler and override the default one provided by dracut or systemd. This would give us more flexiability and scalability. We can do whatever we like depending on the failure type, user specified failsafe action and other fators.
The only thing is how we implement. I think an alternative approach is to create a new emergency service and put it under /etc/systemd/ or /run/systemd/. So it can take procedence over the default one under /usr/lib/systemd/.
In addition, as you said, Lennart is likely going to facilitate such replacing. We can wait until that happens and then make the decision.
Lennart asked me to send a mail to him. I am not sure what exactly was he planning to do. Before I send a mail to him I want to be sure that we understand problem well and what we want to do.
Actually dropping an overriding emergency service in /run/systemd/ sounds reasonable. Can you give it a try and see if it works.
We also need to see if calling kdump directly from error path is working or not.
I think we probably can't implement "mount_root_run_init" logic in this path as failure has occurred.
I think we will have to keep our default actions simple and minimal. Boot path is complicated and now dracut and systemd control it completely. We will not have too much of flexibility w.r.t error handling.
Thanks Vivek
On 04/09/14 at 11:34am, Vivek Goyal wrote:
On Wed, Apr 09, 2014 at 03:03:49PM +0800, WANG Chao wrote: [..]
Sorry I did not get this. So how does rootflags=nofail work? We will wait for root to show up before we go with pre-pivot hooks?
Sorry I didn't make myself clear.
rootflags=nofail has the following effect on sysroot.mount:
- initrd-root-fs.target "Wants" sysroot.mount
W/o nofail:
- initrd-root-fs.target "Requires" sysroot.mount
- sysroot.mount is started "Before" initrd-root-fs.target
In both case, dracut-pre-pivot.service is getting started "After" sysroot.mount.
Ok so rootfs=nofail will change "Requires=sysroot.mount" to "Wants=sysroot.mount" in initrd-root-fs.target.
I think that's perfect. IIUC, Changing it to Wants= will mean that we will wait for sysroot.mount to activate and if activation fails, initrd-root-fs.target will be activated. That's the behavior we want in kdump.
So why should we get rid of rootfs=nofail?
No, we don't. I was explaining why we only remove nofail for non-root file system and keep rootflags=nofail ...
We also need to specify in chagnelog the flip side of the patch. That is now in case of failure, we probably will not get control and I think systemd can put us in rescue mode.
No, we disable dropping to shell. So we hang in case of such failure.
If we always hang, what was the point of disabling dropping to shell?
At the very beginning, we disabled dropping to shell at arbitrary point of the boot process, because we want to get into kdump.sh and do whatever error handling within kdump.sh itself.
Yes and we thought by not dropping into shell we will continue with processing and ultimately reach kdump. But I think that's a very bad way to handle errors. Once an error has occured we should have a direct way to jump into kdump error handler.
For example, user specify "default reboot" and if we don't disable shell, we will drop into shell instead of action "reboot"
And given the fact shell is disabled, we introduced "nofail" to solve the hang issue when running into a disk failure.
What is "hang". Are you defining "hang" as dropping into shell? I am not able to understand what will happen if we don't pass "nofail".
By "hang", I mean systemd stops running any service, because a certain target isn't reached, all the services which need to run after this target get blocked. Hence all services about to run is getting stuck and system hang.
I see that emergency shell is invoked by dracut directly. I think in those cases it will return immediately and dracut script will continue even after failure.
A disk failure would cause mount unit failure. A mount unit failure would cause a local-fs.target never be reached. local-fs.target never being reached would cause dracut-pre-pivot.service never get started.
dracut-pre-pivot.service has After= dependency on sysroot.mount.
After=initrd.target initrd-parse-etc.service sysroot.mount
So if sysroot.mount fails, dracut-pre-pivot should still be started? Are you sure that it does not get started if sysroot.mount fails.
dracut-pre-pivot still gets started because dracut-pre-pivot.service doesn't "Requires=" sysroot.mount. So no matter if sysroot.mount fails or not, as long as it's started, dracut-pre-pivot will run.
However if "nofail" is specified for the mount unit, local-fs.target would only "Wants" the mount unit, rather "Requires" in "fail" mode.
I am not sure what's the logic behind converting Requires= to Wants= with nofail. So if by default we have "Wants=" dependencies on all mount files, then local-fs.target will reach even if some mount failed.
"Wants=" will not order the service starting sequence.
- W/ "nofail", local-fs.target "Wants=" mount unit. - W/o "nofail", local-fs.target "Requires=" mount unit and mount unit runs "Before=" local-fs.target.
So w/ "nofail", local-fs.target can be reached no matter what. But when local-fs.target is reached, it doesn't mean all mount unit get started, ie. /etc/fstab all entries are mounted.
But systemd folks might not like this idea as they might have other reasons for why they are using Requires= by default.
Because "fail" mode is the default mode. That said, any failure would cause a error handler (emergency.service).
If user want "nofail" mode, a weak dependency "Wants" would be the best choice, because the mount failure wouldn't block local-fs.target from being reached.
Like we've discussed in the systemd-devel thread, the preferrable way to handle this is to remove "nofail" and create our own error handler.
That said, local-fs.target would still be reached in case of "nofail" mount failure, but would never be reached in case of "fail" mount failure.
Got it. By default local-fs.target has Requires= dependencies and nofail converts that into Wants= dependency and that helps in our case.
[..]
So question I have is that can we drop another file say module-emergency-handling and emergecny shell will call that. And kdump can drop module-emergency-handling file or create this link to kdump-error-handling and we can handle the error.
IOW, once dracut has encountedred the failure, is there any point in continuing further and then expect to drop into kdump module from pre-pivot hook.
I think if we error out early, I guess root might not be available and I think that's fine. There are so many places things can go wrong and we can't guarantee that root is available as backup target.
Second place of failure is from systemd. I see there are two emergency services rescue.service and dracut-emergency.service. They both call /bin/emergency-shell. So to me if we fix /bin/emergency-shell to call /bin/module-emergency-shell that would automatically make sure that system will not hang and kdump will get control after failure?
This proposal makes more sense to me. We can implement our own error handler and override the default one provided by dracut or systemd. This would give us more flexiability and scalability. We can do whatever we like depending on the failure type, user specified failsafe action and other fators.
The only thing is how we implement. I think an alternative approach is to create a new emergency service and put it under /etc/systemd/ or /run/systemd/. So it can take procedence over the default one under /usr/lib/systemd/.
In addition, as you said, Lennart is likely going to facilitate such replacing. We can wait until that happens and then make the decision.
Lennart asked me to send a mail to him. I am not sure what exactly was he planning to do. Before I send a mail to him I want to be sure that we understand problem well and what we want to do.
Actually dropping an overriding emergency service in /run/systemd/ sounds reasonable. Can you give it a try and see if it works.
We also need to see if calling kdump directly from error path is working or not.
Sounds good to me. We can make use of the existing error handling code.
I think we probably can't implement "mount_root_run_init" logic in this path as failure has occurred.
We don't implement it now. As I remembered rhel6 has such "default" option, but we have moved on.
I think we will have to keep our default actions simple and minimal. Boot path is complicated and now dracut and systemd control it completely. We will not have too much of flexibility w.r.t error handling.
I think it's a good idea directly jumping to our kdump.sh. We can extend current kdump.sh. Do some reasonable checking, if the error is trivial and dump target is ready, then we dump. Otherwise do default action.
What do you think?
On Thu, Apr 10, 2014 at 01:30:48PM +0800, WANG Chao wrote:
[..]
dracut-pre-pivot.service has After= dependency on sysroot.mount.
After=initrd.target initrd-parse-etc.service sysroot.mount
So if sysroot.mount fails, dracut-pre-pivot should still be started? Are you sure that it does not get started if sysroot.mount fails.
dracut-pre-pivot still gets started because dracut-pre-pivot.service doesn't "Requires=" sysroot.mount. So no matter if sysroot.mount fails or not, as long as it's started, dracut-pre-pivot will run.
If dracut-pre-pivot will run even if sysroot.mount fails, why did we introduce "nofail" to begin with. (For non-root targets).
I don't see any Requires= dependencies in dracut-pre-pivit.service. Everything is Wants= and there are some ordering directieves with After=.
That means even if file system mounting failed, dracut-pre-pivot should run. That means kdump should get control.
If that's the case, why did we introduce "nofail" and why do we think that removing "nofail" will hang if failure happens.
Given the fact that currently emergency shell does not do anything in second kernel, I think kdump should still run even if failure happens. Can we test it.
Given the current situation, I think even without "nofail", kdump.sh should be reached. If that's the case, we don't have a problem at all?
[..]
I think we will have to keep our default actions simple and minimal. Boot path is complicated and now dracut and systemd control it completely. We will not have too much of flexibility w.r.t error handling.
I think it's a good idea directly jumping to our kdump.sh. We can extend current kdump.sh. Do some reasonable checking, if the error is trivial and dump target is ready, then we dump. Otherwise do default action.
What do you think?
Yep. Either we can create a seaparate kdump handler or service (kdump-error-handler.service) and that service can take care of calling kdump.sh or hook this into emergency handler and force it to call kdump-error-handler.sh which in turn can call kdump.sh.
So how about starting simple. Can you write a patch where emergency shell first checks if kdump error handler is present and calls it (instead of returning and not doing anything). If that works, we can just post a dracut patch and we will not need anything from systemd?
Thanks Vivek
On 04/10/14 at 10:05am, Vivek Goyal wrote:
On Thu, Apr 10, 2014 at 01:30:48PM +0800, WANG Chao wrote:
[..]
dracut-pre-pivot.service has After= dependency on sysroot.mount.
After=initrd.target initrd-parse-etc.service sysroot.mount
So if sysroot.mount fails, dracut-pre-pivot should still be started? Are you sure that it does not get started if sysroot.mount fails.
dracut-pre-pivot still gets started because dracut-pre-pivot.service doesn't "Requires=" sysroot.mount. So no matter if sysroot.mount fails or not, as long as it's started, dracut-pre-pivot will run.
If dracut-pre-pivot will run even if sysroot.mount fails, why did we introduce "nofail" to begin with. (For non-root targets).
Sorry, I made a mistake here last time.
W/o "nofail", sysroot.mount failure causes initrd-root-fs.target never get reached (or active). initrd.target isn't reached because of its dependency on initrd-root-fs.target. dracut-pre-pivot.service runs "After" initrd.target. So when initrd.target isn't reached, dracut-pre-pivot appears like spin itself waiting for initrd.target. And that's when a so-called "hang" happens.
I don't see any Requires= dependencies in dracut-pre-pivit.service. Everything is Wants= and there are some ordering directieves with After=.
Exactly. "After" enforces that dracut-pre-pivot should wait until those specific targets are reached or mount or service units get started.
That means even if file system mounting failed, dracut-pre-pivot should run. That means kdump should get control.
dracut-pre-pivot would wait because it runs "After" these targets, which are not reached because of a "fail" mode sysroot.mount failure.
If that's the case, why did we introduce "nofail" and why do we think that removing "nofail" will hang if failure happens.
Explained above.
Given the fact that currently emergency shell does not do anything in second kernel, I think kdump should still run even if failure happens. Can we test it.
I've tested "fail" mode before and I did it again today. It did hang like before:
dracut-initqueue[207]: Warning: Could not boot. dracut-initqueue[207]: Warning: Not dropping to emergency shell, because /lib/dracut/no-emergency-shell exists. [ OK ] Started dracut initqueue hook. Mounting /sysroot... [ OK ] Reached target Remote File Systems (Pre). [ OK ] Reached target Remote File Systems. [FAILED] Failed to mount /sysroot. See 'systemctl status sysroot.mount' for details. [DEPEND] Dependency failed for Initrd Root File System. [DEPEND] Dependency failed for Reload Configuration from the Real Root.
You can see initrd-root-fs.target ("Initrd Root File System") is not active because it "Requires" sysroot.mount and sysroot.mount fails.
While in "nofail" mode, initrd-root-fs.target only "Wants" sysroot.mount. "Wants" is weaker than "Requires" and it means initrd-root-fs.target would want sysroot.mount to get started but do not really care if sysroot.mount succeeds or not.
And that's the reason why in sysroot.mount failure case, dracut-pre-pivot can get started in "nofail" mode and can not in "fail" mode.
Given the current situation, I think even without "nofail", kdump.sh should be reached. If that's the case, we don't have a problem at all?
[..]
I think we will have to keep our default actions simple and minimal. Boot path is complicated and now dracut and systemd control it completely. We will not have too much of flexibility w.r.t error handling.
I think it's a good idea directly jumping to our kdump.sh. We can extend current kdump.sh. Do some reasonable checking, if the error is trivial and dump target is ready, then we dump. Otherwise do default action.
What do you think?
Yep. Either we can create a seaparate kdump handler or service (kdump-error-handler.service) and that service can take care of calling kdump.sh or hook this into emergency handler and force it to call kdump-error-handler.sh which in turn can call kdump.sh.
To be honest, I'm a bit confused. Are you saying that we can do either of the following?
- create kdump handler or service, replacing the current dracut/systemd error handler
- hook into current dracut/systemd error handler or service, and call our handler which in turn would call kdump.sh
Personally I like the first idea, a new kdump handler or service can give us more flexable way to control the context. Hooking seems hacky.
Anyway, this isn't decided yet. I'll try both ways.
So how about starting simple. Can you write a patch where emergency shell first checks if kdump error handler is present and calls it (instead of returning and not doing anything). If that works, we can just post a dracut patch and we will not need anything from systemd?
I'll dive deeper and give it a try.
Thanks WANG Chao
On Fri, Apr 11, 2014 at 02:10:40PM +0800, WANG Chao wrote:
On 04/10/14 at 10:05am, Vivek Goyal wrote:
On Thu, Apr 10, 2014 at 01:30:48PM +0800, WANG Chao wrote:
[..]
dracut-pre-pivot.service has After= dependency on sysroot.mount.
After=initrd.target initrd-parse-etc.service sysroot.mount
So if sysroot.mount fails, dracut-pre-pivot should still be started? Are you sure that it does not get started if sysroot.mount fails.
dracut-pre-pivot still gets started because dracut-pre-pivot.service doesn't "Requires=" sysroot.mount. So no matter if sysroot.mount fails or not, as long as it's started, dracut-pre-pivot will run.
If dracut-pre-pivot will run even if sysroot.mount fails, why did we introduce "nofail" to begin with. (For non-root targets).
Sorry, I made a mistake here last time.
W/o "nofail", sysroot.mount failure causes initrd-root-fs.target never get reached (or active). initrd.target isn't reached because of its dependency on initrd-root-fs.target. dracut-pre-pivot.service runs "After" initrd.target. So when initrd.target isn't reached, dracut-pre-pivot appears like spin itself waiting for initrd.target. And that's when a so-called "hang" happens.
This is strange. So there is no error propagation mechanism?
There is no such thing as "error" for targets. Lets say target foo.target Requires=bar.service. Say, bar.service fails activation. That means foo.target will *never* be activated.
Now if some service say abc.service says After=foo.target, it will hang for infinite time?
I thought if a target can not be reached, then we should proagate that error upwards.
Thanks Vivek
On 04/14/14 at 04:23pm, Vivek Goyal wrote:
On Fri, Apr 11, 2014 at 02:10:40PM +0800, WANG Chao wrote:
On 04/10/14 at 10:05am, Vivek Goyal wrote:
On Thu, Apr 10, 2014 at 01:30:48PM +0800, WANG Chao wrote:
[..]
dracut-pre-pivot.service has After= dependency on sysroot.mount.
After=initrd.target initrd-parse-etc.service sysroot.mount
So if sysroot.mount fails, dracut-pre-pivot should still be started? Are you sure that it does not get started if sysroot.mount fails.
dracut-pre-pivot still gets started because dracut-pre-pivot.service doesn't "Requires=" sysroot.mount. So no matter if sysroot.mount fails or not, as long as it's started, dracut-pre-pivot will run.
If dracut-pre-pivot will run even if sysroot.mount fails, why did we introduce "nofail" to begin with. (For non-root targets).
Sorry, I made a mistake here last time.
W/o "nofail", sysroot.mount failure causes initrd-root-fs.target never get reached (or active). initrd.target isn't reached because of its dependency on initrd-root-fs.target. dracut-pre-pivot.service runs "After" initrd.target. So when initrd.target isn't reached, dracut-pre-pivot appears like spin itself waiting for initrd.target. And that's when a so-called "hang" happens.
This is strange. So there is no error propagation mechanism?
I'm not sure. But I think that's probably because that emergency shell service is disabled and we hang instead of dropping to shell.
There is no such thing as "error" for targets. Lets say target foo.target Requires=bar.service. Say, bar.service fails activation. That means foo.target will *never* be activated.
Yep.
Now if some service say abc.service says After=foo.target, it will hang for infinite time?
Yep, from what I observed.
I thought if a target can not be reached, then we should proagate that error upwards.
We disabled emergency shell in the first place. But if we defined our own error handler service, I think it would be triggered.
Thanks WANG Chao
On Thu, Apr 17, 2014 at 01:05:32PM +0800, WANG Chao wrote:
[..]
We disabled emergency shell in the first place. But if we defined our own error handler service, I think it would be triggered.
Ok, so let us give that a try. Let us define our own error handler which can be invoked from emergency shell and do the error handling there and see what happens.
Thanks Vivek
On Tue, Apr 08, 2014 at 01:15:26PM +0800, WANG Chao wrote:
In current systemd implementation, nofail mount will not block local-fs.target, which means our kdump.sh (in dracut-pre-pivot.service) can't wait for nofail mount. And kdump.sh could run early than nofail mount happens.
For short term, let's stop passing nofail to mount. As for sysroot.mount, since we have explicitly specify to wait for it, "nofail" isn't a problem.
Signed-off-by: WANG Chao chaowang@redhat.com
I think we should take this patch in while Chao is working on sorting out error handling path.
With this patch we should be able to dump to non-root file systems. What will be broken temporarily is "default" action handling.
Soon chao's patches should be in to fix default action handling.
Acked-by: Vivek Goyal vgoyal@redhat.com
Thanks Vivek
mkdumprd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mkdumprd b/mkdumprd index 84f1e18..6e9dc47 100644 --- a/mkdumprd +++ b/mkdumprd @@ -103,7 +103,9 @@ to_mount() { _t=$(findmnt -k -f -n -r -o TARGET,FSTYPE $_dev) _o=$(findmnt -k -f -n -r -o OPTIONS $_dev) _o=${_o/#ro/rw} #mount fs target as rw in 2nd kernel
- _o="${_o},nofail" #with nofail set, systemd won't block for mount failure
- # "nofail" mount could be run later than kdump.sh. So we don't pass nofail
- # for short term.
- #_o="${_o},nofail" #with nofail set, systemd won't block for mount failure _mntopts="$_t $_o" #for non-nfs _dev converting to use udev persistent name if [ -b "$_s" ]; then
-- 1.8.5.3
kexec mailing list kexec@lists.fedoraproject.org https://lists.fedoraproject.org/mailman/listinfo/kexec