Hi, This is yet another follow-up for this thread: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
Basics: "zswap" compresses swap and uses a defined memory pool as a cache, with spill over (still compressed) going into a conventional swap partition. The memory pool doesn't appear as a separate block device. A conventional swap partition on a drive is required. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Docume...
"swap on ZRAM" A ZRAM device appears as a block device, and is effectively a compressed RAM disk. It's common for this to be the exclusive swap device, of course it is volatile so in that configuration your system can't hibernate. But it's also possible to use swap priority in fstab to cause the ZRAM device to be used with higher priority, and a conventional swap partition on a drive with a lower priority. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Docume...
What they do: Either strategy can help avoid swap thrashing, by moderating the transition from exclusively RAM based work, to heavy swapping on disk. In my testing, the most aggressive memory starved workloads still result in an unresponsive system. Neither are a complete solution, they really seem to just be moderators that kick the can down the road. But I do think it's an improvement especially in the incidental swap use case, where transition from memory to swap isn't noticeable.
Which is better? I don't know. Seriously, that's what all of my testing as come down to. A user won't likely notice the difference. Both dynamically allocate memory to their "memory pools" on demand. But otherwise, they really are two very different implementations. Regardless, Fedora Workstation and probably even Fedora Server, should use one of them by default out of the box.
IoT folks are already using swap on ZRAM by default, in lieu of a disk based swap partition. And Anaconda folks are doing the same for low memory devices when the installer is launched. I've been using zswap on Fedora Workstation edition on my laptop, and Fedora Server on an Intel NUC, for maybe two years (earlier this summer I switched both of them swap on ZRAM to compare).
How are they different? There are several "swap on ZRAM" implementations. The zram package in Fedora right now is what IoT folks are using which installs a systemd service unit to setup the ZRAM block device, mkswap on it, and then swapon, during system startup. Simple.
The ideal scenario is to get everyone on the same page, and so far it looks like systemd's zram-generator, built in Rust, meets all the requirements. That needs to be confirmed, but also right now there's a small problem, it's not working. So we kinda need a someone familiar with Rust and systemd to take this on, if we want to use the same thing everywhere. https://github.com/systemd/zram-generator/issues/4
Whereas zswap is setup by using boot parameters, which we could have the installer set, contingent on a conventional swap partition being created. zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=20 zswap.zpool=zbud
Zswap upstream tells me they're close to dropping the experimental status, hopefully by the end of the summer. It might be a bit longer before they're as confident with zpool type z3fold.
Hackfest anyone?
----- Original Message -----
From: "Chris Murphy" lists@colorremedies.com To: "Development discussions related to Fedora" devel@lists.fedoraproject.org Sent: Friday, August 30, 2019 9:55:52 PM Subject: swap on ZRAM, zswap, and Rust was: Better interactivity in low-memory situations
Hi, This is yet another follow-up for this thread: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
Basics: "zswap" compresses swap and uses a defined memory pool as a cache, with spill over (still compressed) going into a conventional swap partition. The memory pool doesn't appear as a separate block device. A conventional swap partition on a drive is required. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Docume...
"swap on ZRAM" A ZRAM device appears as a block device, and is effectively a compressed RAM disk. It's common for this to be the exclusive swap device, of course it is volatile so in that configuration your system can't hibernate. But it's also possible to use swap priority in fstab to cause the ZRAM device to be used with higher priority, and a conventional swap partition on a drive with a lower priority. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Docume...
Just a slight addition to this comparison - AFAIK there is a slight difference in how zswap and zram handle the in-ram swap device being full & making use of the swap device on harddrive.
If the zswap device becomes full, zswap will, according to the docs, free up spase in RAM by moving the least recently used pages to the disk, so that the "hot" pages stay in ram & new pages can be placed there.
In comparison AFAIK, there is no such mechanism for zram and the priority value simply means which swap will be used as the first one and once it becomes full, new pages will simply go to the next swap with lower priority. Please correct me if I am completely wrong and the Linux swap allocation algorithm actually moves pages between swap devices based on priority. :)
What they do: Either strategy can help avoid swap thrashing, by moderating the transition from exclusively RAM based work, to heavy swapping on disk. In my testing, the most aggressive memory starved workloads still result in an unresponsive system. Neither are a complete solution, they really seem to just be moderators that kick the can down the road. But I do think it's an improvement especially in the incidental swap use case, where transition from memory to swap isn't noticeable.
Which is better? I don't know. Seriously, that's what all of my testing as come down to. A user won't likely notice the difference. Both dynamically allocate memory to their "memory pools" on demand. But otherwise, they really are two very different implementations. Regardless, Fedora Workstation and probably even Fedora Server, should use one of them by default out of the box.
IoT folks are already using swap on ZRAM by default, in lieu of a disk based swap partition. And Anaconda folks are doing the same for low memory devices when the installer is launched. I've been using zswap on Fedora Workstation edition on my laptop, and Fedora Server on an Intel NUC, for maybe two years (earlier this summer I switched both of them swap on ZRAM to compare).
How are they different? There are several "swap on ZRAM" implementations. The zram package in Fedora right now is what IoT folks are using which installs a systemd service unit to setup the ZRAM block device, mkswap on it, and then swapon, during system startup. Simple.
The ideal scenario is to get everyone on the same page, and so far it looks like systemd's zram-generator, built in Rust, meets all the requirements. That needs to be confirmed, but also right now there's a small problem, it's not working. So we kinda need a someone familiar with Rust and systemd to take this on, if we want to use the same thing everywhere. https://github.com/systemd/zram-generator/issues/4
Whereas zswap is setup by using boot parameters, which we could have the installer set, contingent on a conventional swap partition being created. zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=20 zswap.zpool=zbud
Zswap upstream tells me they're close to dropping the experimental status, hopefully by the end of the summer. It might be a bit longer before they're as confident with zpool type z3fold.
Indeed, I've had issues with stability in the past when I tried the z3fold option, but no issue with the default values in the last ~year, so it really seems to be ready with the default values.
Hackfest anyone?
-- Chris Murphy _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
On Fri, Aug 30, 2019 at 1:55 PM Chris Murphy lists@colorremedies.com wrote:
Hi, This is yet another follow-up for this thread: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
(Benchmarks being fraught with peril, synthetic benchmarks even more fraught with peril but at least their bias is obvious rather than obfuscated behind unknown cutsy attempts to simulate an environment no one has.)
This old bash fork bomb example fails (Fedora 31, earlier versions not tested) $ :(){ :|:& };: [ 1765.728408] cgroup: fork rejected by pids controller in /user.slice/user-1000.slice/session-3.scope
So use 'munch' instead https://gist.github.com/n3rve/7897c8ce1e17c22dc17a1df1b4e645f4 kernel 5.2.13
$ time ./munch
Measuring the time it takes to fill all memory and swap. I'm waiting about 1 minutes between each run, with no other activity happening during that time.
The typical outcome is: "Allocated 14729 MB Killed"
8GiB RAM, 8GiB swap on SSD plain partition
1. 0m43s 2. 0m47s 3. 0m46s
now enable zswap, lz4/zbud, with a 20% pool
1. 0m10s 2. 0m10s 3. 0m11s
now disable all of that (zswap and the swap on SSD) enable swap on ZRAM at 1:1 ratio with installed RAM, using lz4
1. 0m11s 2. 0m10s 3. 0m11s
-----------
In the 2nd case, with zswap, swapon does in fact report 8GiB gets used just before the kill. That's a little confusing because zswap compresses both the memory pool as well as what spills over to the swap partition. Do I have more swap available because it's compressed? Doesn't seem to be the case as reported by 'free' or 'vmstat'. Do I use less swap on disk? swapon says no. And yet the speed it's getting to the swap partition suggests it's really completely compressed already and isn't really generating much of any writes (because it's a synthetic test, I assume all zeros, and thus highly compressible).
?
So, it's vaguely interesting.
Slightly more interesting, comparison to swapfiles on the same SSD, per file system.
Btrfs 1. 0m50s 2. 0m46s 3. 0m53s
Ext4 1. 1m18s 2. 1m2s 3. 1m6s
XFS 1. 0m48s 2. 1m2s 3. 1m1s
Btrfs had the disadvantage in that it was not a new file system, rather substantially used, many file system resizes. The ext4 and XFS file systems were created just for the test, and were created on partitions that had 'blkdiscard' issued beforehand. It's a requirement to use chattr +C (nodatacow) on Btrfs, which implies nocsum and nocompression. Anyway, they aren't ridiculously out of line with using a plain partition.
So plausibly someone could create a systemd generator that dynamically creates and destroys swapfiles. And also creates a hibernation file just before hibernating. Not sure about encryption for any of that though.
Hi,
thank you for all the testing and comparisons between different approaches. It looks really interesting.
The ideal scenario is to get everyone on the same page, and so far it looks like systemd's zram-generator, built in Rust, meets all the requirements. That needs to be confirmed, but also right now there's a small problem, it's not working. So we kinda need a someone familiar with Rust and systemd to take this on, if we want to use the same thing everywhere. https://github.com/systemd/zram-generator/issues/4
For a while, the only feedback I had for zram-generator was from people interested in rust. It's great that somebody is giving it a go ;)
I think the report in that issue is a slight exaggeration — IIUC, this failure only occurs if zram-generator.conf is created and systemctl daemon-reload and systemctl start swap.target called on a running system. After reboot, things would still work. Obviously, it would be better to handle this case too. I pushed some commits to the master branch now that close all the four open issues, and this case should be handled too now. If anything is wrong, please report it here or in bugzilla.
I'll tag a new version with those changes in a few days if nothing else pops up.
Zbyszek
PS. I had a really surprising failure mode: on a VM with 2GB RAM (as shown by free), the genarator was doing nothing and simply exiting with no error. It turns out that the machine had "maximum allocation" bigger than "current allocation", and for a brief momment during boot /proc/meminfo would report more memory. Took me a while to figure this one out.
Zbyszek
Zbyszek,
Do you have any advice on how to assess 'swap on ZRAM' versus 'zswap' by default for Fedora Workstation? They're really too similar from a user point of view, I think it really comes down to the technical arguments.
1a. 'swap on ZRAM' compresses only that which goes to the ZRAM device 1b. zswap compresses everything whether it goes to the memory pool or swap on disk. 2a. 'swap on ZRAM' must be configured to give priority to the ZRAM device; once full, disk swap (if present) is used 2b. zswap anticipates the future usage of data, favoring the memory or disk swap locations accordingly
They both appear equally easy to enable by default for clean installs and upgrades.
I'd say 'swap on ZRAM' is well suited for the cases where there's no existing swap partition, and low memory devices. Whereas zswap is better suited for average to higher end systems, where the main goal is swap avoidance, but where zswap can help moderate the worst performance effects of the transition to disk based swap.
It seems premature to drop the creation of a swap partition at installation time. I think that'd be unexpected by most users. And might have some consequences other than (unsupported) hibernation use case.
So my assessment, at this point, would be to recommend zswap for Fedora Workstation. Likely using zbud/lz4. Maybe by Fedora 33 there will be more confidence and testing done on z3fold.
Hi Chris,
Does zswap actually keep the data compressed when the DRAM-based swap is full, and it writes to the spill-over non-volatile swap device?
I'm not an expert on this at all, however my understanding was that zswap must decompress the data before it writes to the backing swap. But perhaps I am misunderstanding the purpose of zswap_writeback_entry()[1] and/or what it does.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/z...
On Wed, Sep 18, 2019 at 6:57 PM Tom Seewald tseewald@gmail.com wrote:
Hi Chris,
Does zswap actually keep the data compressed when the DRAM-based swap is full, and it writes to the spill-over non-volatile swap device?
I'm not an expert on this at all, however my understanding was that zswap must decompress the data before it writes to the backing swap. But perhaps I am misunderstanding the purpose of zswap_writeback_entry()[1] and/or what it does.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/z...
I don't know. But based on the tests I mention upthread, I'm not sure how uncompressed pages are being swapped to disk. But then those times don't account for even a 2:1 compression ratio, which is the best that zbud/lz4 can achieve.
-- Chris Murphy
devel@lists.stg.fedoraproject.org