Continued from the 'background and summary' email: https://lists.fedoraproject.org/archives/list/desktop@lists.fedoraproject.or...
Adding Hans and Matthew and Lennart to cc.
Questions I have are:
- WG is considering dropping creation of swap partitions by default, in favor of swap-on-ZRAM. Any concerns? (We do know it's not possible to use a ZRAM device for hibernation, but the kernel will look to another swap device for contiguous free space to write out a hibernation image.)
- What's the status of s2idle in the kernel?
- What sort of work is needed outside the kernel to properly support s2idle, or is this predominantly kernel work? Microsoft documents on Modern Standby suggest minimal application effort. [1]
- Prospect of kernel support to separate swap and hibernation partitions (and/or swap files)? Or systemd method of creating then activating swapfiles on demand?
- Prospect of hibernation supported with UEFI Secure Boot?
- Is hibernation a better fallback than poweroff, given the significant reliability differential? Why? Poweroff is universal, hibernation isn't. What's the argument that a non-universally available fallback is better than the universal fallback?
- What are the implications of hibernation if Fedora will move to measured boot? (I'm not sure how mainstream that function is expected to be, or it's use case specific opt-in.)
- There's some anecdotal evidence users are disabling UEFI Secure Boot, possibly quite a lot [2]. Does there need to be an effort at making the signing of user built kernel and module easier? Can it be made easier? I don't know custom or out of tree modules is a significant motive for disabling SB, vs other explanations.
- A systemd TODO makes me wonder: Does anyone have (corroborating) data on the reliability of firmware or battery reporting, when the system, and thus the battery, are under significant load? [3] I've discussed with a reliable source that on 2+ year old hardware, the vast majority of batteries are effectively broken and aren't likely to report anything reliably if they are under significant load, in particular waking from S3. By anything, I mean, time remaining, power remaining, and current power consumption rate. Would s2idle instead of S3 would make this more reliable?
- It doesn't sound like S1 is really used at all, even though kernel docs say it's supported as shallow/standby. (?) Is it more or less reliable than S3?
- I'm inclined to think we should mimic what hardware vendors, Microsoft, Apple, and Google (with Chromebooks and Android) have been doing for a while: faster boots, and S0 low power idle - and skip the things making devs and users crazy. But I invite a persuasive contrary argument.
- Any other questions?
[1] https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/... [2] https://twitter.com/hughsient/status/1225826488903249920 [3] see line "beef up hibernation" https://github.com/systemd/systemd/blob/master/TODO
Hi,
I'm providing answers to the questions which I believe I have some insight on below.
On 2/10/20 5:11 AM, Chris Murphy wrote:
Continued from the 'background and summary' email: https://lists.fedoraproject.org/archives/list/desktop@lists.fedoraproject.or...
Adding Hans and Matthew and Lennart to cc.
Questions I have are:
- WG is considering dropping creation of swap partitions by default,
in favor of swap-on-ZRAM. Any concerns? (We do know it's not possible to use a ZRAM device for hibernation, but the kernel will look to another swap device for contiguous free space to write out a hibernation image.)
- What's the status of s2idle in the kernel?
It mostly works as a replacement for S3 suspend, but we do not really have support for tricks like keep playing music using a DSP to decode MP3 while keep everything else powered down (or keep e.g. a large download ongoing, etc.). These more advanced features are not really being worked on AFAIK.
- What sort of work is needed outside the kernel to properly support
s2idle, or is this predominantly kernel work? Microsoft documents on Modern Standby suggest minimal application effort. [1]
As "suspend" replacement no work is needed outside of the kernel, if want to move into the realm of "connected standby" then AFAIK work will be needed but what is needed has not been scoped out.
- Prospect of kernel support to separate swap and hibernation
partitions (and/or swap files)? Or systemd method of creating then activating swapfiles on demand?
Prospect of hibernation supported with UEFI Secure Boot?
Is hibernation a better fallback than poweroff, given the
significant reliability differential? Why? Poweroff is universal, hibernation isn't. What's the argument that a non-universally available fallback is better than the universal fallback?
- What are the implications of hibernation if Fedora will move to
measured boot? (I'm not sure how mainstream that function is expected to be, or it's use case specific opt-in.)
- There's some anecdotal evidence users are disabling UEFI Secure
Boot, possibly quite a lot [2]. Does there need to be an effort at making the signing of user built kernel and module easier? Can it be made easier? I don't know custom or out of tree modules is a significant motive for disabling SB, vs other explanations.
The problem is that disabling secure boot is a lot easier then enrolling your own keys. Granted we could make generating and signing with your own key easier, but users still end up fighting with the firmware UI to get there own key enrolled.
A better question is why are people disabling secure-boot (outside of people running there own kernels like me). Some possible answers outside of out control:
1) VirtualBox hypervisor kernel modules 2) nvidia binary driver
The question is are there other reasons where we could do better so that people do not feel the need to disable secure boot?
- A systemd TODO makes me wonder: Does anyone have (corroborating)
data on the reliability of firmware or battery reporting, when the system, and thus the battery, are under significant load? [3] I've discussed with a reliable source that on 2+ year old hardware, the vast majority of batteries are effectively broken and aren't likely to report anything reliably if they are under significant load, in particular waking from S3. By anything, I mean, time remaining, power remaining, and current power consumption rate. Would s2idle instead of S3 would make this more reliable?
- It doesn't sound like S1 is really used at all, even though kernel
docs say it's supported as shallow/standby. (?) Is it more or less reliable than S3?
I don't think I've ever seen hardware where the ACPI tables claim S1 is supported. S1 and S2 were a cute idea when the spec was written, but in practice they are never used (AFAIK).
- I'm inclined to think we should mimic what hardware vendors,
Microsoft, Apple, and Google (with Chromebooks and Android) have been doing for a while: faster boots, and S0 low power idle - and skip the things making devs and users crazy. But I invite a persuasive contrary argument.
Ack, as long as we are trying to run on hardware designed for Windows, the kernel-side answer to get things to work (atleast somewhat) reliable has always been figure out what Windows does and mimmick it. Note that a lot of work has been done on S0 low power idle support upstream, with recent kernels that should work reasonably well.
Regards,
Hans
On Mon, Feb 10, 2020 at 10:51 AM Hans de Goede hdegoede@redhat.com wrote:
A better question is why are people disabling secure-boot (outside of people running there own kernels like me). Some possible answers outside of out control:
- VirtualBox hypervisor kernel modules
- nvidia binary driver
The question is are there other reasons where we could do better so that people do not feel the need to disable secure boot?
I disable SB so that I can use hibernation. Suspend to RAM is unreliable on my desktop (no idea why, probably misbehaving hardware) and hibernation is my only option to not lose any in-progress work. And it's also faster and more convenient than starting all apps again, placing them into workspaces, loading all pinned browser tabs, etc.
(Now having some regret that I started two threads :P it seemed logical at the time.)
https://lore.kernel.org/linux-mm/20191226220205.128664-2-semenzato@google.co...
Wow.
Let's say someone implements paging out anything above 50% RAM usage to swap. Let's further say someone implements signed hibernation images to support UEFI Secure Boot. Why couldn't an attacker target the unsigned swap contents?
The reason, at least what I found out, is that the kernel first copies the whole memory into your memory (ouch), before writing it to the swap partition. So if you have 16 GB RAM, you can't hibernate if you use more than 8 GB, and therefore an 8 GB swap partition (fully unoccupied) is enough for you. I have tested this multiple times, it works for me exactly as written. So, actually, a swap device sized 1:1 with RAM is already an overkill (unless you use more than 50% of swap size with just regular usage), and 0.5:1 ratio would be perfectly fine if you made sure that swap got used just for hibernation.
Where I came up with 2:1 is from anaconda/blivet code: anaconda/pyanaconda/storage/utils.py:642: :param bool hibernation: calculate swap size big enough for hibernation https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/storage/utils...
Note on line 673 it actually could be 3x RAM, if --hibernation were used, but this flag isn't used on Fedora Workstation so this computation never gets used. And yet there is a 'resume=UUID' boot parameter included. Why is this boot parameter set as if we're supporting hibernation out of the box?
-- Chris Murphy
On Mon, Feb 10, 2020 at 1:42 PM Chris Murphy lists@colorremedies.com wrote:
https://lore.kernel.org/linux-mm/20191226220205.128664-2-semenzato@google.co...
Wow.
Summarizes the "hibernation image creation only works if you haven't used more than 50% RAM" problem. https://marc.info/?l=linux-kernel&m=157177497015315
And all I can say is wow yet again...
On Mon, Feb 10, 2020 at 9:43 PM Chris Murphy lists@colorremedies.com wrote:
Where I came up with 2:1 is from anaconda/blivet code: anaconda/pyanaconda/storage/utils.py:642: :param bool hibernation: calculate swap size big enough for hibernation
https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/storage/utils...
Note on line 673 it actually could be 3x RAM, if --hibernation were used, but this flag isn't used on Fedora Workstation so this computation never gets used.
I guess that's because if you search the Internets, the swap recommendations are all over the place, so some median numbers were picked :-) I had no idea about the hibernation limitation either, until the web browsers started to eat 4+GBs RAM... :-)
The problem is that you don't know how much swap the system will use during regular usage, so you don't know how much headroom you need on top of the 0.5x mem size needed for hibernation. I think nowadays a new formula could be devised, something like `swap size = 0.5x mem size + 2 GB`. I think that's plenty for regular usage (even SSDs are too slow to be used as RAM, not to mention it will wear them out quickly) and it should allow for hibernation most of the time.
And yet there is a 'resume=UUID' boot parameter included. Why is this boot parameter set as if we're supporting hibernation out of the box?
Probably because of this? https://bugzilla.redhat.com/show_bug.cgi?id=1206936 https://github.com/rhinstaller/anaconda/pull/1360
And yes, we currently support hibernation out of the box, and it works. If you don't have SecureBoot enabled, and if you use non-GNOME or know how to call it from GNOME.
On Tue, Feb 11, 2020 at 3:00 AM Kamil Paral kparal@redhat.com wrote:
On Mon, Feb 10, 2020 at 9:43 PM Chris Murphy lists@colorremedies.com wrote:
Where I came up with 2:1 is from anaconda/blivet code: anaconda/pyanaconda/storage/utils.py:642: :param bool hibernation: calculate swap size big enough for hibernation https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/storage/utils...
Note on line 673 it actually could be 3x RAM, if --hibernation were used, but this flag isn't used on Fedora Workstation so this computation never gets used.
I guess that's because if you search the Internets, the swap recommendations are all over the place, so some median numbers were picked :-) I had no idea about the hibernation limitation either, until the web browsers started to eat 4+GBs RAM... :-)
The problem is that you don't know how much swap the system will use during regular usage, so you don't know how much headroom you need on top of the 0.5x mem size needed for hibernation. I think nowadays a new formula could be devised, something like `swap size = 0.5x mem size + 2 GB`. I think that's plenty for regular usage (even SSDs are too slow to be used as RAM, not to mention it will wear them out quickly) and it should allow for hibernation most of the time.
I don't think that's enough to be reliable.
Example system: 32G RAM, all of it used, plus 2G of page outs (into the swap device).
+ 2G already paged out to swap + 16GB needs to be paged out to swap, to free up enough memory to create the hibernation image + 8-16GB for the (compressed) hibernation image to be written to a *contiguous* range within swap
That's 26-34G needed for the swap device. Since the swap device is shared for pages and hibernation image, the actual size isn't knowable until the approximate time hibernation is called.
As Bastien noted in the pagure/workstation issue: https://pagure.io/fedora-workstation/issue/121#comment-620831
I've started a followup thread upstream to get more information about all of this. Kamil, can you do a brief write up of your use case or reproduce steps? I want upstream to know that your case is real world, and not some already known contrived case. You can either reply here and I'll reference it on linux-mm@ or you can post directly to this thread if you prefer. Thanks.
https://lore.kernel.org/linux-mm/CAA25o9TvFMEJnF45NFVqAfdxzKy5umzHHVDs+SCxrC...
And yet there is a 'resume=UUID' boot parameter included. Why is this boot parameter set as if we're supporting hibernation out of the box?
Probably because of this? https://bugzilla.redhat.com/show_bug.cgi?id=1206936 https://github.com/rhinstaller/anaconda/pull/1360
Right but anaconda is not called with --hibernation is what I mean. The higher partition requirements for hibernation are tied to the --hibernation flag, but the resume= boot parameter insertion is not. And it seems to me those things should be tied to --hibernation, and then also whether --hibernation is used should be an edition specific decision, and that it should be explicit because otherwise we're just setting users up for misadventure.
And yes, we currently support hibernation out of the box, and it works. If you don't have SecureBoot enabled, and if you use non-GNOME or know how to call it from GNOME.
It definitely doesn't work in all cases, and we don't have any evidence that it works in most cases. I have a laptop that 100% of the time fails to resume from hibernation because the hibernation image is considered corrupt by the kernel. I don't know if it's corrupt during image write out, or read in. S4 relies on ACPI, and isn't even reliable on Windows or macOS 100% of the time. And as you've experienced, S3 isn't certainly reliable.
Whereas S0 low power mode is reliable, which is why so much effort is going there. Just leave ACPI and the (logic board) firmware out of consideration.
I would rather hibernation work. But I don't think it's OK to, by default, create huge swap partitions when this very clearly is not at all reliable as evidenced by your own experience, which requires either luck or esoteric knowledge to get the active page amount below some ~50% threshold in order for hibernation image creation to succeed.
-- Chris Murphy
On Tue, Feb 11, 2020 at 11:48 PM Chris Murphy lists@colorremedies.com wrote:
On Tue, Feb 11, 2020 at 3:00 AM Kamil Paral kparal@redhat.com wrote:
On Mon, Feb 10, 2020 at 9:43 PM Chris Murphy lists@colorremedies.com
wrote:
Where I came up with 2:1 is from anaconda/blivet code: anaconda/pyanaconda/storage/utils.py:642: :param bool hibernation: calculate swap size big enough for hibernation
https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/storage/utils...
Note on line 673 it actually could be 3x RAM, if --hibernation were used, but this flag isn't used on Fedora Workstation so this computation never gets used.
I guess that's because if you search the Internets, the swap
recommendations are all over the place, so some median numbers were picked :-) I had no idea about the hibernation limitation either, until the web browsers started to eat 4+GBs RAM... :-)
The problem is that you don't know how much swap the system will use
during regular usage, so you don't know how much headroom you need on top of the 0.5x mem size needed for hibernation. I think nowadays a new formula could be devised, something like `swap size = 0.5x mem size + 2 GB`. I think that's plenty for regular usage (even SSDs are too slow to be used as RAM, not to mention it will wear them out quickly) and it should allow for hibernation most of the time.
I don't think that's enough to be reliable.
Example system: 32G RAM, all of it used, plus 2G of page outs (into the swap device).
- 2G already paged out to swap
- 16GB needs to be paged out to swap, to free up enough memory to
create the hibernation image
If I understood the kernel discussions correctly, currently there's no simple and reliable mechanism to achieve this (move the excess memory to swap). So I wouldn't count it like this. If you have more than 50% memory occupied, you're out of luck, hibernation will be a no-op operation (it would be nice if it returned some user-friendly GUI message outside of system journal). If you are under 50% memory utilization, but your swap space is insufficient, the hibernation will also be aborted (this time not immediately, though, but only after it compresses the memory and finds out it doesn't fit into the free swap space). Again, some visible error message would be nice. It sounds like there are too many cases where the functionality fails, and that's true, but that's what we have, and often I consider it better to close a few extra tabs than having to power off. But it's annoying and far from polished, yes.
- 8-16GB for the (compressed) hibernation image to be written to a
*contiguous* range within swap
That's 26-34G needed for the swap device. Since the swap device is shared for pages and hibernation image, the actual size isn't knowable until the approximate time hibernation is called.
Windows, AFAIK, pre-allocates the hibernation image and doesn't share it with swap space. It's very inefficient regarding disk size, but it improves reliability, obviously.
Btw, you have a very good remark that the memory image gets compressed. So for hibernation image, you can know its max size, but you can't know the optimal size - that depends on how well your memory gets compressed. And that's why you can often hibernate even with quite small swap - it just compressed well.
As Bastien noted in the pagure/workstation issue: https://pagure.io/fedora-workstation/issue/121#comment-620831
I've started a followup thread upstream to get more information about all of this. Kamil, can you do a brief write up of your use case or reproduce steps? I want upstream to know that your case is real world, and not some already known contrived case. You can either reply here and I'll reference it on linux-mm@ or you can post directly to this thread if you prefer. Thanks.
https://lore.kernel.org/linux-mm/CAA25o9TvFMEJnF45NFVqAfdxzKy5umzHHVDs+SCxrC...
Hmm, what exactly do you want to know? :) Why I use hibernation?
On my desktop: * Due to some hardware/firmware flaky-ness I can't suspend to RAM, because I sometimes get random memory corruption on resume (note: this is not faulty hardware in the usual sense, the problem occurs *only* when suspending to RAM, everything is super reliable otherwise). So I hibernate. I don't want to log in and open all the windows every time I return to the computer (that would be the case if I powered it off). Even if I didn't have flaky hardware, I'd want sometimes to suspend to RAM and sometimes hibernate, depending on whether I want to turn off the power strip at that time (e.g. when leaving the flat for a few days, during a big storm, when having other appliances in the same power strip and wanting to save power, etc).
On my laptop: * I use suspend to RAM, and it works well. But occasionally the power drain is surprisingly high during suspend (perhaps a firmware issue) and the battery can be depleted overnight. Usually it can last 2-3 days, which is fine during work-week, but it might be risky over the weekend. It requires you to keep track of the battery level when suspending (especially on Fridays), or simply power it off. I don't want to say it's a problem, it's not, just not as comfortable as on Windows. A hybrid sleep would work wonders here. You simply suspend it and it hibernates after a few hours automatically. However, GNOME doesn't actually allow me configure it and it even overrides systemd configs. I'd ideally want to pick the right action that suits my current need (suspend, hybrid sleep, hibernate) with a configurable default for closing the lid.
On my parents'/wife's laptop: * This is the most difficult use case. They can't and will not think about battery level and how suspend works when closing the laptop lid. They expect the system to behave intelligently. If they come back to the laptop in a few days and it's drained to 0% so that it won't even turn on and all the progress in opened applications has been lost, that's a big failure. They don't consider it their fault ("I should have powered it off instead"), but the system fault. After all, this situation doesn't happen on Windows. For these regular users, I believe that hybrid sleep is a necessity. Every time I install Linux to a new person, I have to note that they need to be much more careful about power management and using suspend.
And yet there is a 'resume=UUID' boot parameter included. Why is this boot parameter set as if we're supporting hibernation out of the box?
Probably because of this? https://bugzilla.redhat.com/show_bug.cgi?id=1206936 https://github.com/rhinstaller/anaconda/pull/1360
Right but anaconda is not called with --hibernation is what I mean. The higher partition requirements for hibernation are tied to the --hibernation flag, but the resume= boot parameter insertion is not. And it seems to me those things should be tied to --hibernation, and then also whether --hibernation is used should be an edition specific decision, and that it should be explicit because otherwise we're just setting users up for misadventure.
I somewhat disagree. The --hibernation flag just affects swap size. Larger swap makes it a bit more likely that hibernation will work, yes. But that doesn't mean it wouldn't work even with smaller size swap. The resume= parameter is required in all cases, whether you target a system that has large swap or medium swap size by default. Also, don't forget these are just default values, the users can change them during partitioning. It would be sad if the user intentionally configured large swap because she knows she wants to hibernate, and it wouldn't work just because the resume= parameter was missing, right? If some edition wants to prevent hibernation (that alone sounds like a bad idea), they can do it better by not offering the GUI option, or in the extreme case by somehow overriding "systemctl hibernate" functionality (let's not do that). If you allow to hibernate but break the resume (by omitting resume=), then the user just have lost some data.
And yes, we currently support hibernation out of the box, and it works.
If you don't have SecureBoot enabled, and if you use non-GNOME or know how to call it from GNOME.
It definitely doesn't work in all cases, and we don't have any evidence that it works in most cases.
Yes, I meant it works in general, except for all the usual hardware issues of course (which are many, yes; I myself have ~50% success rate with my past hardware).
I have a laptop that 100% of the time fails to resume from hibernation because the hibernation image is considered corrupt by the kernel. I don't know if it's corrupt during image write out, or read in. S4 relies on ACPI, and isn't even reliable on Windows or macOS 100% of the time. And as you've experienced, S3 isn't certainly reliable.
Whereas S0 low power mode is reliable, which is why so much effort is going there. Just leave ACPI and the (logic board) firmware out of consideration.
I would rather hibernation work. But I don't think it's OK to, by default, create huge swap partitions when this very clearly is not at all reliable as evidenced by your own experience, which requires either luck or esoteric knowledge to get the active page amount below some ~50% threshold in order for hibernation image creation to succeed.
Yes, and I have no objections to lowering the default swap size (and thus making hibernation very unlikely to work) for these very reasons. But I'd like to keep it "functional" (except hardware/firmware issues) out of the box if the user decides to create a larger swap during installation (for this use case or some other).
On Thu, Feb 13, 2020 at 9:32 AM Kamil Paral kparal@redhat.com wrote:
On Tue, Feb 11, 2020 at 11:48 PM Chris Murphy lists@colorremedies.com wrote:
Kamil, can you do a brief write up of your use case or reproduce steps? I want upstream to know that your case is real world, and not some already known contrived case. You can either reply here and I'll reference it on linux-mm@ or you can post directly to this thread if you prefer. Thanks.
https://lore.kernel.org/linux-mm/CAA25o9TvFMEJnF45NFVqAfdxzKy5umzHHVDs+SCxrC...
Hmm, what exactly do you want to know? :) Why I use hibernation?
Reproduce steps for hibernation failing. I don't know if it's as easy as "Launch Firefox, and load 34 pages of bbc.com, then try to hibernate with 'systemctl hibernate'" - and voila, not enough memory error.
Basically demonstrate to upstream that this is busted in the kernel, if it's their intention that the portion of unevictable pages should be swapped out so there's enough room for hibernate to succeed. And if not, then what's the facility that needs to be invented (systemd and DE coordination?) for it to succeed?
On Thu, Feb 13, 2020 at 11:16 PM Chris Murphy lists@colorremedies.com wrote:
On Thu, Feb 13, 2020 at 9:32 AM Kamil Paral kparal@redhat.com wrote:
On Tue, Feb 11, 2020 at 11:48 PM Chris Murphy lists@colorremedies.com
wrote:
Kamil, can you do a brief write up of your use case or reproduce steps? I want upstream to know that your case is real world, and not some already known contrived case. You can either reply here and I'll reference it on linux-mm@ or you can post directly to this thread if you prefer. Thanks.
https://lore.kernel.org/linux-mm/CAA25o9TvFMEJnF45NFVqAfdxzKy5umzHHVDs+SCxrC...
Hmm, what exactly do you want to know? :) Why I use hibernation?
Reproduce steps for hibernation failing. I don't know if it's as easy as "Launch Firefox, and load 34 pages of bbc.com, then try to hibernate with 'systemctl hibernate'" - and voila, not enough memory error.
Yes, in my experience it is that easy. If I look into htop before hibernating and I'm over 50% usage (not counting buffers/cache, I think those get flushed before hibernating), the hibernation doesn't succeed. Often it is just a matter of closing some browser tabs so that I fall under the 50% threshold, and hibernation works fine. I'm sure there has to be some tool to allocate X amount of memory so that this is easier to play with than relying on a web browser, but I don't know it (it should be trivial to write for anyone who knows C, though). But I found out you can use GIMP to upscale an image to a ridiculous size and it will even tell you beforehand in a warning dialog how much memory it will consume. This is what I see if the memory snapshot can't be created (over 50% memory used):
Feb 14 12:03:26 titan systemd[1]: Starting Hibernate... Feb 14 12:03:26 titan systemd-sleep[3424]: Suspending system... Feb 14 12:03:26 titan kernel: PM: hibernation entry Feb 14 12:03:26 titan rtkit-daemon[713]: Successfully made thread 3423 of process 1482 (/usr/bin/pulseaudio) owned by '1000' RT at priority 5. Feb 14 12:03:26 titan rtkit-daemon[713]: Supervising 4 threads of 2 processes of 1 users. Feb 14 12:03:28 titan kernel: Filesystems sync: 0.019 seconds Feb 14 12:03:28 titan kernel: Freezing user space processes ... (elapsed 0.001 seconds) done. Feb 14 12:03:28 titan kernel: OOM killer disabled. Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0x00000000-0x00000fff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0x00058000-0x00058fff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0x0009f000-0x000fffff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0xca0d8000-0xca0d8fff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0xca0e8000-0xca0e9fff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0xca106000-0xca106fff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0xca66e000-0xca674fff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0xcb151000-0xcb6a9fff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0xd15c7000-0xd160bfff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0xddd9c000-0xdde33fff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0xdde80000-0xdf7fefff] Feb 14 12:03:28 titan kernel: PM: Marking nosave pages: [mem 0xdf800000-0xffffffff] Feb 14 12:03:28 titan kernel: PM: Basic memory bitmaps created Feb 14 12:03:28 titan kernel: PM: Preallocating image memory... Feb 14 12:03:28 titan kernel: PM: Basic memory bitmaps freed Feb 14 12:03:28 titan kernel: OOM killer enabled. Feb 14 12:03:28 titan kernel: Restarting tasks ... done. Feb 14 12:03:28 titan kernel: PM: hibernation exit Feb 14 12:03:28 titan systemd-sleep[3424]: Failed to suspend system. System resumed again: Cannot allocate memory
On Fri, Feb 14, 2020 at 4:06 AM Kamil Paral kparal@redhat.com wrote:
On Thu, Feb 13, 2020 at 11:16 PM Chris Murphy lists@colorremedies.com wrote:
Reproduce steps for hibernation failing. I don't know if it's as easy as "Launch Firefox, and load 34 pages of bbc.com, then try to hibernate with 'systemctl hibernate'" - and voila, not enough memory error.
Yes, in my experience it is that easy. If I look into htop before hibernating and I'm over 50% usage (not counting buffers/cache
I'm not familiar with htop. If I run it without options, I see on the top left my CPUs followed by "Mem" which seems to be the same thing as /proc/meminfo's "Active(file):" value (if I divide by 1MiB). Is your 50% usage based on this "Mem" reporting?
There isn't a corresponding value in top, the "Mem ... Used" value is totally different there.
Feb 14 12:03:28 titan kernel: PM: hibernation exit Feb 14 12:03:28 titan systemd-sleep[3424]: Failed to suspend system. System resumed again: Cannot allocate memory
This is a systemd error, without a corresponding kernel message. So is it really a kernel complaint or problem? Looking at systemd ./src/sleep/sleep.c
220 r = write_state(&f, states); 221 if (r < 0) 222 log_struct_errno(LOG_ERR, r, 223 "MESSAGE_ID=" SD_MESSAGE_SLEEP_STOP_STR, 224 LOG_MESSAGE("Failed to suspend system. System resumed again: %m"), 225 "SLEEP=%s", arg_verb);
I don't see where "Cannot allocate memory" comes from in systemd, but in the kernel:
./Documentation/admin-guide/sysctl/vm.rst:864:"fork: Cannot allocate memory". which is in https://www.kernel.org/doc/html/v5.5/admin-guide/sysctl/vm.html under user_reserve_kbytes
Default value on Fedora is $ cat /proc/sys/vm/user_reserve_kbytes 131072 $ cat /proc/sys/vm/overcommit_memory 0
Hmm. Anyway, I'm not sure what's running out of memory or why. Right before this, hibernation exit comes from
./power/hibernate.c:781: pr_info("hibernation exit\n");
Looks like it got to here:
732 error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM); 733 if (error || freezer_test_done) 734 goto Free_bitmaps;
*shrug*
It definitely didn't get to writing out the hibernation image.
On Fri, Feb 14, 2020 at 9:40 PM Chris Murphy lists@colorremedies.com wrote:
Yes, in my experience it is that easy. If I look into htop before
hibernating and I'm over 50% usage (not counting buffers/cache
I'm not familiar with htop. If I run it without options, I see on the top left my CPUs followed by "Mem" which seems to be the same thing as /proc/meminfo's "Active(file):" value (if I divide by 1MiB). Is your 50% usage based on this "Mem" reporting?
There isn't a corresponding value in top, the "Mem ... Used" value is totally different there.
Well, you got me. I'm not too knowledgeable about memory counting on Linux. I simply look at the green portion of the Mem line in htop (should be the same as Mem->Used: value) :-) I always thought it was the same value you can see in `free -h` under "used" column, but now I see it's somewhat different, no idea why. Either way, when the used memory value is /roughly/ above 50%, hibernation consistently fails, if it is below, it consistently works. I never did any experiments to figure out whether the threshold is exactly 50%. But it roughly matched the kernel threads I found (and that you found) saying 50% memory needs to be free in order to make a memory snapshot.
Feb 14 12:03:28 titan kernel: PM: hibernation exit Feb 14 12:03:28 titan systemd-sleep[3424]: Failed to suspend system.
System resumed again: Cannot allocate memory
This is a systemd error, without a corresponding kernel message. So is it really a kernel complaint or problem? Looking at systemd ./src/sleep/sleep.c
220 r = write_state(&f, states); 221 if (r < 0) 222 log_struct_errno(LOG_ERR, r, 223 "MESSAGE_ID=" SD_MESSAGE_SLEEP_STOP_STR, 224 LOG_MESSAGE("Failed to suspend system. System resumed again: %m"), 225 "SLEEP=%s", arg_verb);
I don't see where "Cannot allocate memory" comes from in systemd, but in the kernel:
./Documentation/admin-guide/sysctl/vm.rst:864:"fork: Cannot allocate memory". which is in https://www.kernel.org/doc/html/v5.5/admin-guide/sysctl/vm.html under user_reserve_kbytes
Default value on Fedora is $ cat /proc/sys/vm/user_reserve_kbytes 131072 $ cat /proc/sys/vm/overcommit_memory 0
Hmm. Anyway, I'm not sure what's running out of memory or why. Right before this, hibernation exit comes from
./power/hibernate.c:781: pr_info("hibernation exit\n");
Looks like it got to here:
732 error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM); 733 if (error || freezer_test_done) 734 goto Free_bitmaps;
*shrug*
It definitely didn't get to writing out the hibernation image.
Well that would match the expectation that it can't create the memory snapshot, no? When I have more than 50% of memory occupied, it can't make another copy of the used memory, and that's why it can't write the hibernation image (compressing it in the process), and instead it returns with an error. This was my assumption and it seems trivial to reproduce (on my machines), but the low-level debugging needs to be done by someone better than me. I'm happy to help with the reproduction steps, if needed, of course.
On Mon, Feb 17, 2020 at 2:53 AM Kamil Paral kparal@redhat.com wrote:
Well, you got me. I'm not too knowledgeable about memory counting on Linux. I simply look at the green portion of the Mem line in htop (should be the same as Mem->Used: value) :-) I always thought it was the same value you can see in `free -h` under "used" column, but now I see it's somewhat different, no idea why. Either way, when the used memory value is /roughly/ above 50%, hibernation consistently fails, if it is below, it consistently works. I never did any experiments to figure out whether the threshold is exactly 50%. But it roughly matched the kernel threads I found (and that you found) saying 50% memory needs to be free in order to make a memory snapshot.
As confusing as it is, /proc/meminfo is the most reliable source. I'm having too much difficulty with hibernation [1] on baremetal, so I'm going to try and find a reproducer in a VM.
But maybe you can beat me to it, and provide a /proc/meminfo prior to hibernation attempt, after it fails (used memory above 50%); and then use whatever technique you normally use to get used memory below 50%, then capture another /proc/meminfo, successfully hibernate and resume, then take a fourth /proc/meminfo.
-- Chris Murphy
[1]
My test: Fedora Workstation 31, laptop with 8G RAM, 8G swap partition, fill up memory using Firefox tabs pointed to various websites, and then I followed [1] to issue two commands:
# echo reboot > /sys/power/disk # echo disk > /sys/power/state
I experience twice as many failures as successes. Curiously, the successes show pageout does happen. Before hibernate there is no swap in-use, but after resume ~2GiB swap is in-use and RAM usage is about 50%.
The (entry) failures never fail gracefully. Last log entry indicates hibernation entry. Screen goes blank, heat and fans increase, and this is an indefinite state (I forced power off after 15 minutes). Not fail safe at all.
On Wed, Feb 19, 2020 at 2:12 PM Chris Murphy lists@colorremedies.com wrote:
But maybe you can beat me to it, and provide a /proc/meminfo prior to hibernation attempt, after it fails (used memory above 50%); and then use whatever technique you normally use to get used memory below 50%, then capture another /proc/meminfo, successfully hibernate and resume, then take a fourth /proc/meminfo.
Hibernation succeeds with qemu-kvm, using 'virsh edit uefivm' such that:
<pm> <suspend-to-mem enabled='yes'/> <suspend-to-disk enabled='yes'/> </pm>
Memory pretty close to 90% full, anon pages consume 73%, perhaps a couple hundred MiB swap is used. And upon:
# echo reboot > /sys/power/disk # echo disk > /sys/power/state
The system appears to page out to swap (indirect observation based on significant swap device utilization after resume, and that same amount being written to the raw backing file for the VM), produce a hibernation image, reboot, and resume. Not flawless, the guest qxl kernel driver gets mad, but it seems non-fatal. The resume very quickly brings me back to the desktop, and restores ssh sessions - without passphrases required. (Likely a consequence of neither systemd nor GNOME initiating hibernation.)
Thus, I can't reproduce the > 50% RAM use resulting in hibernation failure problem. Two possibilities:
A. Can you first get it to fail using the usual way you instigate hibernation and then try with the above two commands? I wonder if this limitation you're running into is a systemd one, and if so, whether it's a bug or exists for some reason. You might lose data for all I know, so take precautions.
B. I wonder if the > 50% RAM used problem only begins once a certain amount of memory is installed, e.g. some kind of max amount of anon page outs (?) I've asked on linux-mm@
On Wed, Feb 19, 2020 at 8:11 PM Chris Murphy lists@colorremedies.com wrote:
Thus, I can't reproduce the > 50% RAM use resulting in hibernation failure problem. Two possibilities:
A. Can you first get it to fail using the usual way you instigate hibernation and then try with the above two commands? I wonder if this limitation you're running into is a systemd one, and if so, whether it's a bug or exists for some reason. You might lose data for all I know, so take precautions.
I tried to hibernate with systemctl hibernate, with RAM full and some swap used. Still works.
On Wed, Feb 19, 2020 at 10:13 PM Chris Murphy lists@colorremedies.com wrote:
My test: Fedora Workstation 31, laptop with 8G RAM, 8G swap partition, fill up memory using Firefox tabs pointed to various websites, and then I followed [1] to issue two commands:
# echo reboot > /sys/power/disk # echo disk > /sys/power/state
I experience twice as many failures as successes. Curiously, the successes show pageout does happen. Before hibernate there is no swap in-use, but after resume ~2GiB swap is in-use and RAM usage is about 50%.
Sigh. Turns out this is "my" mistake. 🤦 Hibernation apparently gets affected by sysctl value vm.swappiness, in my case vm.swappiness=0. When the value is zero, the hibernation never swaps out the extra memory over 50% and therefore can't hibernate. When I set it to any positive value (including 1), it works as you described. And all those people on kernel mailing lists probably also used vm.swappiness=0 and didn't realize. This might even be a kernel bug, because the documentation doesn't specify this should affect hibernation behavior, and I'd expect it _should_ affect only live system usage and not hibernation. But I can't really tell.
I'm sorry for having confused up this discussion :-/
In case it's interesting, my testing approach was to open up gimp, open a picture and enlarge it to 40000 px wide, which takes over 8GB RAM. In total, I have then about 9GB RAM usage out of total 16GB RAM. Then I issued "systemctl hibernate". With vm.swappiness=0, there's the out of memory error I already posted before. I tested that the same occurs when I directly instruct the kernel to hibernate: # echo disk > /sys/power/state -bash: echo: write error: Cannot allocate memory
When I switch to vm.swappiness=1 (or the default 60), I can hibernate just fine, and I resume with ~6GB RAM in memory and ~3GB in swap. If it is still relevant, I can provide exact numbers from /proc/meminfo. But I guess now that we see it doesn't affect people by default, it's no longer that important. This also invalides most of my previous suggestions about ideal swap size. OTOH I'm very happy that you proved me wrong and I discovered this, because now I can again hibernate even when my memory is quite full.
On Thu, Feb 20, 2020 at 10:33 AM Kamil Paral kparal@redhat.com wrote:
On Wed, Feb 19, 2020 at 10:13 PM Chris Murphy lists@colorremedies.com wrote:
My test: Fedora Workstation 31, laptop with 8G RAM, 8G swap partition, fill up memory using Firefox tabs pointed to various websites, and then I followed [1] to issue two commands:
# echo reboot > /sys/power/disk # echo disk > /sys/power/state
I experience twice as many failures as successes. Curiously, the successes show pageout does happen. Before hibernate there is no swap in-use, but after resume ~2GiB swap is in-use and RAM usage is about 50%.
Sigh. Turns out this is "my" mistake. 🤦 Hibernation apparently gets affected by sysctl value vm.swappiness, in my case vm.swappiness=0. When the value is zero, the hibernation never swaps out the extra memory over 50% and therefore can't hibernate. When I set it to any positive value (including 1), it works as you described. And all those people on kernel mailing lists probably also used vm.swappiness=0 and didn't realize. This might even be a kernel bug, because the documentation doesn't specify this should affect hibernation behavior, and I'd expect it _should_ affect only live system usage and not hibernation. But I can't really tell.
I'm sorry for having confused up this discussion :-/
Nope, it's a major clue! This comes up all the time with performance complaints about swap, oh hey throw this vm.swappiness=0 spaghetti at the wall. It's entirely plausible this is the origin of this 50% business. I'll ask about it on linux-mm@. And I think you're correct to point out that this needs to be a documented consequence of using this value.
But how in the world did you get suspicious of your custom vm.swappiness value? :D
In case it's interesting, my testing approach was to open up gimp, open a picture and enlarge it to 40000 px wide, which takes over 8GB RAM. In total, I have then about 9GB RAM usage out of total 16GB RAM. Then I issued "systemctl hibernate". With vm.swappiness=0, there's the out of memory error I already posted before. I tested that the same occurs when I directly instruct the kernel to hibernate: # echo disk > /sys/power/state -bash: echo: write error: Cannot allocate memory
When I switch to vm.swappiness=1 (or the default 60), I can hibernate just fine, and I resume with ~6GB RAM in memory and ~3GB in swap. If it is still relevant, I can provide exact numbers from /proc/meminfo. But I guess now that we see it doesn't affect people by default, it's no longer that important. This also invalides most of my previous suggestions about ideal swap size. OTOH I'm very happy that you proved me wrong and I discovered this, because now I can again hibernate even when my memory is quite full.
I'm vaguely curious. If vm.swappiness=0 and when /proc/meminfo AnonPages is > 50% of MemTotal that results in this failure. However...
Even in the idealized VM environment, I've had a couple failures just after hibernation entry where the VM hangs indefinitely and has to be killed off. I didn't have a serial console setup to see if the problem can be captured via virsh console.
I think the biggest two sticking points for this issue:
A. Most x86_64 users have hardware with Secure Boot enabled out of the box; and Secure Boot and hibernation are mutually exclusive for the foreseeable future. https://pagure.io/fedora-workstation/issue/121#comment-627418
B. Before any work could progress on application state saving capability, as an alternative to hibernation, a cultural shift needs to happen. https://pagure.io/fedora-workstation/issue/121#comment-627281
Perhaps it's true that wishful thinking about hibernation encourages a reluctance to admit the application's role in automatically state saving. But hibernation isn't a data preserving function as much as it's a convenience. It can't save your data if hibernation fails entry or resume. It can't save your data with Secure Boot on. It can't save your data in event of a crash or power failure. And it can't save your data if you forget to save your data.
I think we should have a funeral for hibernation, and symbolically move on, in order to accept the reality that reliable user data and application state saving isn't going to happen as a system level function. Application developers have to do some of this or it isn't going to happen.
If we have Secure Boot compatible hibernation support in 5 years, what's achieved? None of the "it can't" listed above are addressed. Still.
And in everyone's defense, Microsoft and Apple struggled with this for nearly 20 years before they more or less gave up on the idea, and their modern apps now do mostly save state.
-- Chris Murphy
On Thu, Feb 20, 2020 at 9:28 PM Chris Murphy lists@colorremedies.com wrote:
Nope, it's a major clue! This comes up all the time with performance complaints about swap, oh hey throw this vm.swappiness=0 spaghetti at the wall. It's entirely plausible this is the origin of this 50% business. I'll ask about it on linux-mm@. And I think you're correct to point out that this needs to be a documented consequence of using this value.
But how in the world did you get suspicious of your custom vm.swappiness value? :D
Once you told me it works for you, it had to be something about my system. And there are not that many places where I changed some default values related to swapping...
Even in the idealized VM environment, I've had a couple failures just after hibernation entry where the VM hangs indefinitely and has to be killed off. I didn't have a serial console setup to see if the problem can be captured via virsh console.
I saw the same problem (the system not taking any input, having a black/frozen screen, its CPU going wild, and never powering off) even on bare metal. But it occurs extremely rarely. I assume it's some kind of a firmware bug or a race condition in the kernel. Sometimes, after a hard poweroff and reboot, the system actually resumed! So the hibernation image was saved fine, just power-off got stuck. In other times, it did not resume. It's inconvenient, but very rare for me. (I had much more troubles with suspend-to-ram on my Thinkpad T480s, which often got automatically resumed a few minutes after suspend, until I figured I need to disable XHC in /proc/acpi/wakeup. Hibernation woes were golden compared to this.)
On Thu, Feb 20, 2020 at 10:33 AM Kamil Paral kparal@redhat.com wrote:
On Wed, Feb 19, 2020 at 10:13 PM Chris Murphy lists@colorremedies.com
wrote:
My test: Fedora Workstation 31, laptop with 8G RAM, 8G swap partition, fill up memory using Firefox tabs pointed to various websites, and then I followed [1] to issue two commands:
# echo reboot > /sys/power/disk # echo disk > /sys/power/state
I experience twice as many failures as successes. Curiously, the successes show pageout does happen. Before hibernate there is no swap in-use, but after resume ~2GiB swap is in-use and RAM usage is about 50%.
Sigh. Turns out this is "my" mistake. 🤦 Hibernation apparently gets
affected by sysctl value vm.swappiness, in my case vm.swappiness=0. When the value is zero, the hibernation never swaps out the extra memory over 50% and therefore can't hibernate. When I set it to any positive value (including 1), it works as you described. And all those people on kernel mailing lists probably also used vm.swappiness=0 and didn't realize. This might even be a kernel bug, because the documentation doesn't specify this should affect hibernation behavior, and I'd expect it _should_ affect only live system usage and not hibernation. But I can't really tell.
It can still fail with vm.swappiness=60
https://lore.kernel.org/linux-mm/CAA25o9Q=36fiYHtbpcPPmGEPnORm2ZM7MfqRcsvNxs...
And I thought the mystery was resolved. Fun.
So, if you want to debug this further, here are my /proc/meminfo with vm.swappiness=0: https://pastebin.com/raw/Vb9RYGW2 https://pastebin.com/raw/q03XmRNA https://pastebin.com/raw/0D5VY2Em https://pastebin.com/raw/HJCFssEq
I wonder if the fellow Fedora
contributor's workload has a lot of file pages, so that discarding them is enough for the image allocator to succeed. In that case "sync; echo 1 > /proc/sys/vm/drop_caches" would be a better way of achieving the same result.
This is not the case, the case above works/fails the same even if I drop caches beforehand.
Here are memory printouts with vm.swappiness=1: https://pastebin.com/raw/AK7K209J https://pastebin.com/raw/3drsEQ7E
I also tried hibernating with almost completely full RAM (16 GB RAM mostly full, 16 GB swap empty), and not just slightly over half, and it worked perfectly (with vm.swappiness=1). After resume, there were half the pages stored in swap and the other half in RAM. So I don't really know why it doesn't work for Luigi with vm.swappiness=60.
On Thu, Feb 20, 2020 at 10:33 AM Kamil Paral kparal@redhat.com wrote:
On Wed, Feb 19, 2020 at 10:13 PM Chris Murphy lists@colorremedies.com wrote:
My test: Fedora Workstation 31, laptop with 8G RAM, 8G swap partition, fill up memory using Firefox tabs pointed to various websites, and then I followed [1] to issue two commands:
# echo reboot > /sys/power/disk # echo disk > /sys/power/state
I experience twice as many failures as successes. Curiously, the successes show pageout does happen. Before hibernate there is no swap in-use, but after resume ~2GiB swap is in-use and RAM usage is about 50%.
Sigh. Turns out this is "my" mistake. 🤦 Hibernation apparently gets affected by sysctl value vm.swappiness, in my case vm.swappiness=0. When the value is zero, the hibernation never swaps out the extra memory over 50% and therefore can't hibernate. When I set it to any positive value (including 1), it works as you described. And all those people on kernel mailing lists probably also used vm.swappiness=0 and didn't realize. This might even be a kernel bug, because the documentation doesn't specify this should affect hibernation behavior, and I'd expect it _should_ affect only live system usage and not hibernation. But I can't really tell.
It can still fail with vm.swappiness=60
https://lore.kernel.org/linux-mm/CAA25o9Q=36fiYHtbpcPPmGEPnORm2ZM7MfqRcsvNxs...
On Thu, Feb 13, 2020, at 11:31 AM, Kamil Paral wrote:
On Tue, Feb 11, 2020 at 11:48 PM Chris Murphy lists@colorremedies.com wrote:
On Tue, Feb 11, 2020 at 3:00 AM Kamil Paral kparal@redhat.com wrote:
On Mon, Feb 10, 2020 at 9:43 PM Chris Murphy lists@colorremedies.com wrote:
Where I came up with 2:1 is from anaconda/blivet code: anaconda/pyanaconda/storage/utils.py:642: :param bool hibernation: calculate swap size big enough for hibernation https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/storage/utils...
Note on line 673 it actually could be 3x RAM, if --hibernation were used, but this flag isn't used on Fedora Workstation so this computation never gets used.
I guess that's because if you search the Internets, the swap recommendations are all over the place, so some median numbers were picked :-) I had no idea about the hibernation limitation either, until the web browsers started to eat 4+GBs RAM... :-)
The problem is that you don't know how much swap the system will use during regular usage, so you don't know how much headroom you need on top of the 0.5x mem size needed for hibernation. I think nowadays a new formula could be devised, something like `swap size = 0.5x mem size + 2 GB`. I think that's plenty for regular usage (even SSDs are too slow to be used as RAM, not to mention it will wear them out quickly) and it should allow for hibernation most of the time.
I don't think that's enough to be reliable.
Example system: 32G RAM, all of it used, plus 2G of page outs (into the swap device).
- 2G already paged out to swap
- 16GB needs to be paged out to swap, to free up enough memory to
create the hibernation image
If I understood the kernel discussions correctly, currently there's no simple and reliable mechanism to achieve this (move the excess memory to swap). So I wouldn't count it like this. If you have more than 50% memory occupied, you're out of luck, hibernation will be a no-op operation (it would be nice if it returned some user-friendly GUI message outside of system journal). If you are under 50% memory utilization, but your swap space is insufficient, the hibernation will also be aborted (this time not immediately, though, but only after it compresses the memory and finds out it doesn't fit into the free swap space). Again, some visible error message would be nice. It sounds like there are too many cases where the functionality fails, and that's true, but that's what we have, and often I consider it better to close a few extra tabs than having to power off. But it's annoying and far from polished, yes.
- 8-16GB for the (compressed) hibernation image to be written to a
*contiguous* range within swap
That's 26-34G needed for the swap device. Since the swap device is shared for pages and hibernation image, the actual size isn't knowable until the approximate time hibernation is called.
Windows, AFAIK, pre-allocates the hibernation image and doesn't share it with swap space. It's very inefficient regarding disk size, but it improves reliability, obviously.
Btw, you have a very good remark that the memory image gets compressed. So for hibernation image, you can know its max size, but you can't know the optimal size - that depends on how well your memory gets compressed. And that's why you can often hibernate even with quite small swap - it just compressed well.
As Bastien noted in the pagure/workstation issue: https://pagure.io/fedora-workstation/issue/121#comment-620831
I've started a followup thread upstream to get more information about all of this. Kamil, can you do a brief write up of your use case or reproduce steps? I want upstream to know that your case is real world, and not some already known contrived case. You can either reply here and I'll reference it on linux-mm@ or you can post directly to this thread if you prefer. Thanks.
https://lore.kernel.org/linux-mm/CAA25o9TvFMEJnF45NFVqAfdxzKy5umzHHVDs+SCxrC...
Hmm, what exactly do you want to know? :) Why I use hibernation?
On my desktop:
- Due to some hardware/firmware flaky-ness I can't suspend to RAM,
because I sometimes get random memory corruption on resume (note: this is not faulty hardware in the usual sense, the problem occurs *only* when suspending to RAM, everything is super reliable otherwise). So I hibernate. I don't want to log in and open all the windows every time I return to the computer (that would be the case if I powered it off). Even if I didn't have flaky hardware, I'd want sometimes to suspend to RAM and sometimes hibernate, depending on whether I want to turn off the power strip at that time (e.g. when leaving the flat for a few days, during a big storm, when having other appliances in the same power strip and wanting to save power, etc).
On my laptop:
- I use suspend to RAM, and it works well. But occasionally the power
drain is surprisingly high during suspend (perhaps a firmware issue) and the battery can be depleted overnight. Usually it can last 2-3 days, which is fine during work-week, but it might be risky over the weekend. It requires you to keep track of the battery level when suspending (especially on Fridays), or simply power it off. I don't want to say it's a problem, it's not, just not as comfortable as on Windows. A hybrid sleep would work wonders here. You simply suspend it and it hibernates after a few hours automatically. However, GNOME doesn't actually allow me configure it and it even overrides systemd configs. I'd ideally want to pick the right action that suits my current need (suspend, hybrid sleep, hibernate) with a configurable default for closing the lid.
This exactly. My current laptop burns 6% battery per hour while suspended, which means it's dead in under a day. My use case for hibernation is having my personal laptop in my bag all day while I work on company equipment, and during my commute. (I do use it during e.g., lunch, etc.) Resume speed is not critical.
My machine has a ton of memory and is usually under 50% usage, and if not, I'll just shut down some of the VMs I might have running.
And yet there is a 'resume=UUID' boot parameter included. Why is this boot parameter set as if we're supporting hibernation out of the box?
Probably because of this? https://bugzilla.redhat.com/show_bug.cgi?id=1206936 https://github.com/rhinstaller/anaconda/pull/1360
Right but anaconda is not called with --hibernation is what I mean. The higher partition requirements for hibernation are tied to the --hibernation flag, but the resume= boot parameter insertion is not. And it seems to me those things should be tied to --hibernation, and then also whether --hibernation is used should be an edition specific decision, and that it should be explicit because otherwise we're just setting users up for misadventure.
I somewhat disagree. The --hibernation flag just affects swap size. Larger swap makes it a bit more likely that hibernation will work, yes. But that doesn't mean it wouldn't work even with smaller size swap. The resume= parameter is required in all cases, whether you target a system that has large swap or medium swap size by default. Also, don't forget these are just default values, the users can change them during partitioning. It would be sad if the user intentionally configured large swap because she knows she wants to hibernate, and it wouldn't work just because the resume= parameter was missing, right? If some edition wants to prevent hibernation (that alone sounds like a bad idea), they can do it better by not offering the GUI option, or in the extreme case by somehow overriding "systemctl hibernate" functionality (let's not do that). If you allow to hibernate but break the resume (by omitting resume=), then the user just have lost some data.
This exactly happened to me because somehow the resume= option didn't get added or got lost. Eventually, I figured how to type it when booting, and later figured how to add it persistently.
V/r, James Cassell
On Thu, Feb 13, 2020 at 9:32 AM Kamil Paral kparal@redhat.com wrote:
If I understood the kernel discussions correctly, currently there's no simple and reliable mechanism to achieve this (move the excess memory to swap).
It's not clear what conditions they expect this to work automatically, or if they expect some other facility to do it (like memcgroups).
Windows, AFAIK, pre-allocates the hibernation image and doesn't share it with swap space. It's very inefficient regarding disk size, but it improves reliability, obviously.
hiberfil.sys and pagefile.sys are dynamically sized. The hiberfil.sys is actually a generic image based on the current kernel and kernel drives, and is used for both resuming from hibernation as well as fast boot. There's a swapfile.sys that contains saved application state information, so that apps can quick launch. It's a bit different implementation on macOS, but analogous functionality.
If hibernation innovation isn't going anywhere on linux, including no plan to have a Secure Boot compatible implementation of hibernation - it's not long term sustainable anyway. And maybe then the glib, glibc, and desktop folks, can have a serious discussion about how to make it possible or easier for applications to save state.
On my parents'/wife's laptop:
- This is the most difficult use case. They can't and will not think about battery level and how suspend works when closing the laptop lid. They expect the system to behave intelligently. If they come back to the laptop in a few days and it's drained to 0% so that it won't even turn on and all the progress in opened applications has been lost, that's a big failure. They don't consider it their fault ("I should have powered it off instead"), but the system fault. After all, this situation doesn't happen on Windows. For these regular users, I believe that hybrid sleep is a necessity. Every time I install Linux to a new person, I have to note that they need to be much more careful about power management and using suspend.
And I advocate for their position completely. It's not their fault. The present functionality is inadequate.
But also, Secure Boot is thus far considered a higher order necessity than s2h (suspend-to-hibernate), there's also a release blocking criterion for SB.
I somewhat disagree. The --hibernation flag just affects swap size. Larger swap makes it a bit more likely that hibernation will work, yes. But that doesn't mean it wouldn't work even with smaller size swap. The resume= parameter is required in all cases, whether you target a system that has large swap or medium swap size by default. Also, don't forget these are just default values, the users can change them during partitioning. It would be sad if the user intentionally configured large swap because she knows she wants to hibernate, and it wouldn't work just because the resume= parameter was missing, right? If some edition wants to prevent hibernation (that alone sounds like a bad idea), they can do it better by not offering the GUI option, or in the extreme case by somehow overriding "systemctl hibernate" functionality (let's not do that). If you allow to hibernate but break the resume (by omitting resume=), then the user just have lost some data.
I mean insofar as default/automatic partitioning, where the tentative idea is, no swap partition, and use swap-on-ZRAM instead. But if that seems dramatic, it's effectively the case already with UEFI Secure Boot systems - the resume parameter is pointless, these users get a power off when the battery reaches a low threshold. That's data loss. Right? It's long term not sustainable, in my view, to just keep ignoring both the lack of hibernation support in any meaningful way, and also ignoring user's data being tossed on a poweroff.
I know user data loss upon battery low threshold does not happen on Windows, macOS, and Chromebooks. Out of the box it's preserved.
Yes, and I have no objections to lowering the default swap size (and thus making hibernation very unlikely to work) for these very reasons. But I'd like to keep it "functional" (except hardware/firmware issues) out of the box if the user decides to create a larger swap during installation (for this use case or some other).
If we're not going to take hibernation functionality seriously anyway, I question carving out so much space from people's systems by default. I have no complaint about custom partitioning making a swap partition by default at 1:1 ratio or whatever is discovered to be most reliable without being wasteful, and also setting a resume parameter.
-- Chris Murphy
On Fri, Feb 14, 2020 at 7:55 AM Chris Murphy lists@colorremedies.com wrote:
I mean insofar as default/automatic partitioning, where the tentative idea is, no swap partition, and use swap-on-ZRAM instead. But if that seems dramatic,
Not my decision :-) The only concern I have is e.g. when playing a game, the game might want to allocate as much as possible in order to speed itself up (cached objects, textures, etc), but due to memory compression, it might actually be slower than allocating just a smaller amount of memory (which wouldn't trigger zram usage). And the game of course doesn't know this. But it's hard to judge, for general workflow it's definitely better than swapping to disk.
it's effectively the case already with UEFI Secure Boot systems - the resume parameter is pointless, these users get a power off when the battery reaches a low threshold. That's data loss.
Yes, but at least they get notified beforehand. That helps a little.
If we're not going to take hibernation functionality seriously anyway, I question carving out so much space from people's systems by default. I have no complaint about custom partitioning making a swap partition by default at 1:1 ratio or whatever is discovered to be most reliable without being wasteful, and also setting a resume parameter.
I agree.
On Mon, Feb 10, 2020 at 4:12 AM Chris Murphy lists@colorremedies.com wrote:
Continued from the 'background and summary' email: https://lists.fedoraproject.org/archives/list/desktop@lists.fedoraproject.or...
Adding Hans and Matthew and Lennart to cc.
Questions I have are:
- WG is considering dropping creation of swap partitions by default,
in favor of swap-on-ZRAM. Any concerns? (We do know it's not possible to use a ZRAM device for hibernation, but the kernel will look to another swap device for contiguous free space to write out a hibernation image.)
What's the status of s2idle in the kernel?
What sort of work is needed outside the kernel to properly support
s2idle, or is this predominantly kernel work? Microsoft documents on Modern Standby suggest minimal application effort. [1]
A lot of the functionality is implemented in a non standard extension to ACPI called PEP: https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/using-peps-...
- Prospect of kernel support to separate swap and hibernation
partitions (and/or swap files)? Or systemd method of creating then activating swapfiles on demand?
Prospect of hibernation supported with UEFI Secure Boot?
Is hibernation a better fallback than poweroff, given the
significant reliability differential? Why? Poweroff is universal, hibernation isn't. What's the argument that a non-universally available fallback is better than the universal fallback?
- What are the implications of hibernation if Fedora will move to
measured boot? (I'm not sure how mainstream that function is expected to be, or it's use case specific opt-in.)
There's still a lot of work to be done there, but at least for IoT we're looking at least being able to opt-in to this soon, I suspect it will be quite a bit more work for something like workstation.
- There's some anecdotal evidence users are disabling UEFI Secure
Boot, possibly quite a lot [2]. Does there need to be an effort at making the signing of user built kernel and module easier? Can it be made easier? I don't know custom or out of tree modules is a significant motive for disabling SB, vs other explanations.
I don't think we have a break down between distros there, a lot of distros such as Debian didn't support it until very recently, distros like Gentoo and Arch still don't and so there's a lot of docs in those communities where the first instruction is to disable it.
- A systemd TODO makes me wonder: Does anyone have (corroborating)
data on the reliability of firmware or battery reporting, when the system, and thus the battery, are under significant load? [3] I've discussed with a reliable source that on 2+ year old hardware, the vast majority of batteries are effectively broken and aren't likely to report anything reliably if they are under significant load, in particular waking from S3. By anything, I mean, time remaining, power remaining, and current power consumption rate. Would s2idle instead of S3 would make this more reliable?
- It doesn't sound like S1 is really used at all, even though kernel
docs say it's supported as shallow/standby. (?) Is it more or less reliable than S3?
- I'm inclined to think we should mimic what hardware vendors,
Microsoft, Apple, and Google (with Chromebooks and Android) have been doing for a while: faster boots, and S0 low power idle - and skip the things making devs and users crazy. But I invite a persuasive contrary argument.
- Any other questions?
[1] https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/... [2] https://twitter.com/hughsient/status/1225826488903249920 [3] see line "beef up hibernation" https://github.com/systemd/systemd/blob/master/TODO -- Chris Murphy _______________________________________________ desktop mailing list -- desktop@lists.fedoraproject.org To unsubscribe send an email to desktop-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/desktop@lists.fedoraproject.or...
desktop@lists.stg.fedoraproject.org