Re: WARNING: CPU: 1 PID: 0 at kernel/time/tick-broadcast.c:668 tick_broadcast_oneshot_control+0x17d/0x190()

List overview All Threads
Download

newer

older

Re: WARNING: CPU: 1 PID: 0 at...

WARNING: CPU: 0 PID: 2243 at...

poma

10 Feb 2014 10 Feb '14

12:59 p.m.

On 10.02.2014 11:06, Thomas Gleixner wrote:

...

On Mon, 10 Feb 2014, poma wrote:

...
[ 83.558551] [<ffffffff81025b17>] amd_e400_idle+0x87/0x130

So this seems to happen only on AMD machines which use that e400 idle mode. I have no idea at the moment whats wrong there. I'll find one of those machines and try to reproduce.

Thanks,

tglx

Thanks for your response! :) https://bugzilla.redhat.com/show_bug.cgi?id=1031296#c24

poma

Show replies by date

Stanislaw Gruszka

11 Feb 11 Feb

2:23 a.m.

New subject: WARNING: CPU: 1 PID: 0 at kernel/time/tick-broadcast.c:668 tick_broadcast_oneshot_control+0x17d/0x190()

On Mon, Feb 10, 2014 at 07:59:39PM +0100, poma wrote:

...

On 10.02.2014 11:06, Thomas Gleixner wrote:

...
On Mon, 10 Feb 2014, poma wrote:

...
[ 83.558551] [<ffffffff81025b17>] amd_e400_idle+0x87/0x130

So this seems to happen only on AMD machines which use that e400 idle mode. I have no idea at the moment whats wrong there. I'll find one of those machines and try to reproduce.

I tried to debug that warn as well. Even if I found machine with proper family and model number, HW C1E bug do not happen there, hence I just hack kernel to always use amd_e400_idle (and remove AMD rdmsr specific instructions to do not crash). That make issue 100% reproducible when suspend/resume.

It happens when cpu become idle, call CLOCK_EVT_NOTIFY_BROADCAST_ENTER, but before CLOCK_EVT_NOTIFY_BROADCAST_EXIT, interrupt trigger on that cpu. IRQ is handled by hrtimer code, which want to switch to hres and call:

tick_switch_to_oneshot() -> ... -> tick_broadcast_setup_oneshot()

Since we have already proper handler there, last procedure clear tick_broadcast_oneshot_mask, but tick_broadcast_pending_mask stay set. When amd_e400_idle next time call CLOCK_EVT_NOTIFY_BROADCAST_ENTER, the warning will happen.

I came with a below patch, which also clear pending mask, but perhaps oneshot_mask should not be cleared on tick_broadcast_setup_oneshot(), or should be cleared only conditionally, or some other solution is needed. Anyway, patch make the warning gone on my hacked setup, I was waiting for testing results on real C1E hardware.

Thanks Stanislaw

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 43780ab..98977a5 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -756,6 +756,7 @@ out: static void tick_broadcast_clear_oneshot(int cpu) { cpumask_clear_cpu(cpu, tick_broadcast_oneshot_mask); + cpumask_clear_cpu(cpu, tick_broadcast_pending_mask); }

static void tick_broadcast_init_next_event(struct cpumask *mask,

Thomas Gleixner

10:07 a.m.

New subject: WARNING: CPU: 1 PID: 0 at kernel/time/tick-broadcast.c:668 tick_broadcast_oneshot_control+0x17d/0x190()

On Tue, 11 Feb 2014, Stanislaw Gruszka wrote:

...

On Mon, Feb 10, 2014 at 07:59:39PM +0100, poma wrote:

...
On 10.02.2014 11:06, Thomas Gleixner wrote:

...
On Mon, 10 Feb 2014, poma wrote:

...
[ 83.558551] [<ffffffff81025b17>] amd_e400_idle+0x87/0x130

So this seems to happen only on AMD machines which use that e400 idle mode. I have no idea at the moment whats wrong there. I'll find one of those machines and try to reproduce.

I tried to debug that warn as well. Even if I found machine with proper family and model number, HW C1E bug do not happen there, hence I just hack kernel to always use amd_e400_idle (and remove AMD rdmsr specific instructions to do not crash). That make issue 100% reproducible when suspend/resume.

It's also reproducible on cpu online/offline.

...

It happens when cpu become idle, call CLOCK_EVT_NOTIFY_BROADCAST_ENTER, but before CLOCK_EVT_NOTIFY_BROADCAST_EXIT, interrupt trigger on that cpu. IRQ is handled by hrtimer code, which want to switch to hres and call:

tick_switch_to_oneshot() -> ... -> tick_broadcast_setup_oneshot()

Since we have already proper handler there, last procedure clear tick_broadcast_oneshot_mask, but tick_broadcast_pending_mask stay set. When amd_e400_idle next time call CLOCK_EVT_NOTIFY_BROADCAST_ENTER, the warning will happen.

I came with a below patch, which also clear pending mask, but perhaps

Fun. I came up with the exact same solution independent of you and I tested it on real C1E contaminated hardware.

...

oneshot_mask should not be cleared on tick_broadcast_setup_oneshot(), or should be cleared only conditionally, or some other solution is

We can do it unconditionally. It creates consistent state in all corner cases.

There are other solutions to the problem, but that needs a major rework of the broadcast code. I so wish that this mess would have never been necessary at all ...

Thanks,

tglx

Stanislaw Gruszka

10:27 a.m.

New subject: WARNING: CPU: 1 PID: 0 at kernel/time/tick-broadcast.c:668 tick_broadcast_oneshot_control+0x17d/0x190()

...

...
I came with a below patch, which also clear pending mask, but perhaps

Fun. I came up with the exact same solution independent of you and I tested it on real C1E contaminated hardware.

...
oneshot_mask should not be cleared on tick_broadcast_setup_oneshot(), or should be cleared only conditionally, or some other solution is

We can do it unconditionally. It creates consistent state in all corner cases.

There are other solutions to the problem, but that needs a major rework of the broadcast code. I so wish that this mess would have never been necessary at all ...

Thomas, please post/apply patch, which you think is the most appropriate.

Thanks Stanislaw

3735

Age (days ago)

3736

Last active (days ago)

kernel@lists.fedoraproject.org

3 comments

3 participants

tags (0)

participants (3)

poma
Stanislaw Gruszka
Thomas Gleixner