Technical Spec, better upgrade/rollback control

List overview All Threads
Download

newer

older

Flock registration and talk...

Polishing the installer

Chris Murphy

20 Feb 2014 20 Feb '14

11:46 p.m.

The Workstation PRD includes "better upgrade/rollback control" under plans & policies.

"• Better upgrade/rollback control If there are any problems with an upgrade or an upgrade breaks a configuration script we want to offer an easy way for users to roll-back such upgrades and changes."

The Technical Specification doesn't address this requirement. And I'm also not finding anything in the list archives about it.

I'm aware of three possible candidates for implementing snapshots and rollbacks:

a.) Roller Derby Project: It lists a dependency on LVM Thin Provisioning, which is a new option in Fedora 20. I'm uncertain if it's considered stable for production use or not. Also uncertain is if the project is adaptable for Btrfs, although it seems likely. https://fedorahosted.org/roller-derby/ http://fedoraproject.org/wiki/Changes/Rollback https://lists.fedoraproject.org/pipermail/devel-announce/2013-July/001204.ht...

b.) Snapper: Lists a dependency on either LVM Thin Provisioning, or Btrfs. Snapper+Btrfs is the most mature and actively maintained option at this time. It's presently used in openSUSE, by default when the file system is Btrfs, for at least a couple of years. https://github.com/openSUSE/snapper/blob/master/README http://snapper.io/ http://snapper.io/faq.html

c. ) Gnome: Richard Hughes has expressed interest in making this happen within Gnome. As far as I know it doesn't yet exist, but is expected to depend on either Btrfs or LVM Thin Provisioning snapshots.

Am I missing any others?

It seems to me the PRD requirement necessitates some assessment of file systems, the various snapshotting/rollback strategies and software, before the spec can detail an implementation. Should the facts as they're presently known be included in the spec in the meantime, with a TDB status?

Chris Murphy

Show replies by thread

Richard Hughes

21 Feb 21 Feb

2:54 a.m.

On 21 February 2014 05:46, Chris Murphy lists@colorremedies.com wrote:

...

I'm aware of three possible candidates for implementing snapshots and rollbacks:

Yes, sorry about this, I have an email I've been meaning to send you for a few weeks now.

...

LVM Thin Provisioning

Now the kernel doesn't explode when thinp runs out of physical space I'm happier considering this.

...

b.) Snapper: Lists a dependency on either LVM Thin Provisioning, or Btrfs. Snapper+Btrfs is the most mature and actively maintained option at this time. It's presently used in openSUSE, by default when the file system is Btrfs, for at least a couple of years. https://github.com/openSUSE/snapper/blob/master/README http://snapper.io/ http://snapper.io/faq.html

I helped package snapper in Fedora and have been using it on and off for a few months on btrfs. btrfs has been stable for me, but suffers hugely when trying to use a VM on top of it without chattr +C.

...

c. ) Gnome: Richard Hughes has expressed interest in making this happen within Gnome. As far as I know it doesn't yet exist, but is expected to depend on either Btrfs or LVM Thin Provisioning snapshots.

I was intending to use snapper, but the real problems lie as we also want to snapshot /boot, might not want LVM/btrfs for performance/stability reasons, and that we actually want some things to be excluded from the snapshot, e.g. the systemd journal. I'm really open for ideas here, as you know much more about all this than me. Thanks.

Richard.

Christian Schaller

3:36 a.m.

----- Original Message -----

...

From: "Chris Murphy" lists@colorremedies.com To: "Discussions about development for the Fedora desktop" desktop@lists.fedoraproject.org Sent: Friday, February 21, 2014 6:46:01 AM Subject: Technical Spec, better upgrade/rollback control

The Workstation PRD includes "better upgrade/rollback control" under plans & policies.

"• Better upgrade/rollback control If there are any problems with an upgrade or an upgrade breaks a configuration script we want to offer an easy way for users to roll-back such upgrades and changes."

The Technical Specification doesn't address this requirement. And I'm also not finding anything in the list archives about it.

I'm aware of three possible candidates for implementing snapshots and rollbacks:

a.) Roller Derby Project: It lists a dependency on LVM Thin Provisioning, which is a new option in Fedora 20. I'm uncertain if it's considered stable for production use or not. Also uncertain is if the project is adaptable for Btrfs, although it seems likely. https://fedorahosted.org/roller-derby/ http://fedoraproject.org/wiki/Changes/Rollback https://lists.fedoraproject.org/pipermail/devel-announce/2013-July/001204.ht...

b.) Snapper: Lists a dependency on either LVM Thin Provisioning, or Btrfs. Snapper+Btrfs is the most mature and actively maintained option at this time. It's presently used in openSUSE, by default when the file system is Btrfs, for at least a couple of years. https://github.com/openSUSE/snapper/blob/master/README http://snapper.io/ http://snapper.io/faq.html

c. ) Gnome: Richard Hughes has expressed interest in making this happen within Gnome. As far as I know it doesn't yet exist, but is expected to depend on either Btrfs or LVM Thin Provisioning snapshots.

Am I missing any others?

It seems to me the PRD requirement necessitates some assessment of file systems, the various snapshotting/rollback strategies and software, before the spec can detail an implementation. Should the facts as they're presently known be included in the spec in the meantime, with a TDB status?

I think most times when this feature has been discussed it has been in the context of btrfs. I think the current text in the technical specification doesn't got deeper on the subject partly because we are all a little on the fence for at what time we feel confident enough about btrfs to dare propose it as the default filesystem for the desktop. I spoke with the btrfs developer at Red Hat during a conference on the US West Coast towards the end of last year, and he thought that btrfs was ready for the desktop usecase, although he was not ready to recommend it for server use due to the (small) risk of data corruption. The argument being that if lets say you need to rollback to a half a day older snapshot due to data corruption once a year on a desktop that is probably fine due to desktops local data usually being slow moving, while on a database server is not really an option. That said, I think this item has stalled a bit due to a feeling of uncertainty if that once a year occurrence is really fine. (On the other hand it is not that the current options are 100% risk free either).

Christian

drago01

3:57 a.m.

On Fri, Feb 21, 2014 at 10:36 AM, Christian Schaller cschalle@redhat.com wrote:

...

I think most times when this feature has been discussed it has been in the context of btrfs. I think the current text in the technical specification doesn't got deeper on the subject partly because we are all a little on the fence for at what time we feel confident enough about btrfs to dare propose it as the default filesystem for the desktop. I spoke with the btrfs developer at Red Hat during a conference on the US West Coast towards the end of last year, and he thought that btrfs was ready for the desktop usecase, although he was not ready to recommend it for server use due to the (small) risk of data corruption.

Sorry but I disagree here ... data corruption is not acceptable on a desktop system either. A server usually have multiple backups (if administered correctly) while a desktop most of the time has none.

Colin Walters

6:53 a.m.

On Fri, Feb 21, 2014 at 12:46 AM, Chris Murphy lists@colorremedies.com wrote:

...

The Workstation PRD includes "better upgrade/rollback control" under plans & policies.

"• Better upgrade/rollback control If there are any problems with an upgrade or an upgrade breaks a configuration script we want to offer an easy way for users to roll-back such upgrades and changes."

The Technical Specification doesn't address this requirement. And I'm also not finding anything in the list archives about it.

I'm aware of three possible candidates for implementing snapshots and rollbacks:

There's also OSTree - it's a more invasive vision, as it forces *every upgrade* to be atomic, and at present, you need to reboot. But on the plus side, every upgrade is atomic =) And by virtue of being constantly used, it's also constantly tested in real world scenarios.

A lot of "rollback" tools as below badly suffer from the fact that they're only used "in anger" when something went wrong.

I haven't talked much here about using OSTree for the traditional user-owns-machine "workstation" case - as it's near the end of my multi-year plan. It requires rpm/dnf to be aware of OSTree underneath.

Currently though, I think OSTree works very well for replication-based workstation deployments. This is the case where you run a corporate IT shop, and you have a "corporate standard build" of the OS preloaded with apps, and no ability for users to install apps individually, or change the OS.

Stephen Gallagher

6:54 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 02/21/2014 12:46 AM, Chris Murphy wrote:

...

The Workstation PRD includes "better upgrade/rollback control" under plans & policies.

"• Better upgrade/rollback control If there are any problems with an upgrade or an upgrade breaks a configuration script we want to offer an easy way for users to roll-back such upgrades and changes."

The Technical Specification doesn't address this requirement. And I'm also not finding anything in the list archives about it.

I'm aware of three possible candidates for implementing snapshots and rollbacks:

a.) Roller Derby Project: It lists a dependency on LVM Thin Provisioning, which is a new option in Fedora 20. I'm uncertain if it's considered stable for production use or not. Also uncertain is if the project is adaptable for Btrfs, although it seems likely. https://fedorahosted.org/roller-derby/ http://fedoraproject.org/wiki/Changes/Rollback https://lists.fedoraproject.org/pipermail/devel-announce/2013-July/001204.ht...

The

...

Roller Derby project is one that's being developed by Colin Walters and myself, though it's still pre-alpha and temporarily on hiatus while other more immediate priorities have our attention. I'd recommend against relying on this for the immediate future, but as a long-term solution, we hope to make it the best answer. And yes, BTRFS support will be included. It wasn't done in the PoC simply because BTRFS isn't generally considered safe for production use yet.

...

b.) Snapper: Lists a dependency on either LVM Thin Provisioning, or Btrfs. Snapper+Btrfs is the most mature and actively maintained option at this time. It's presently used in openSUSE, by default when the file system is Btrfs, for at least a couple of years. https://github.com/openSUSE/snapper/blob/master/README http://snapper.io/ http://snapper.io/faq.html

Snapper is nice, but it's lacking a few features (namely the ability to recover data if you have to jump backwards in time). It's probably an acceptable short-term solution if we need coarse rollback immediately.

...

c. ) Gnome: Richard Hughes has expressed interest in making this happen within Gnome. As far as I know it doesn't yet exist, but is expected to depend on either Btrfs or LVM Thin Provisioning snapshots.

I'd strongly recommend that Richard get aboard the Roller Derby project so we can focus our efforts.

...

Am I missing any others?

It seems to me the PRD requirement necessitates some assessment of file systems, the various snapshotting/rollback strategies and software, before the spec can detail an implementation. Should the facts as they're presently known be included in the spec in the meantime, with a TDB status?

I should note that the PRD is not generally intended to be a guidepost for the very next version of Fedora. It's acceptable to work towards those goals over several releases.

...PGP SIGNATURE...

Josh Boyer

7:40 a.m.

On Fri, Feb 21, 2014 at 4:36 AM, Christian Schaller cschalle@redhat.com wrote:

...

I think most times when this feature has been discussed it has been in the context of btrfs. I think the current text in the technical specification doesn't got deeper on the subject partly because we are all a little on the fence for at what time we feel confident enough about btrfs to dare propose it as the default filesystem for the desktop. I spoke with the btrfs developer at Red Hat during a conference on the US West Coast towards the end of last year, and he thought that btrfs was ready for the desktop usecase, although he was not ready to recommend it for server use due to the (small) risk of data corruption. The argument being that if lets say you need to rollback to a half a day older snapshot due to data corruption once a year on a desktop that is probably fine due to desktops local data usually being slow moving, while on a database server is not really an option. That said, I think this item has stalled a bit due to a feeling of uncertainty if that once a year occurrence is really fine. (On the other hand it is not that the current options are 100% risk free either).

I discussed btrfs with some of our FS experts at DevConf a couple weeks ago, and then further via email after. I'm not convinced it's ready to be the default FS for any product in Fedora yet. I'm hoping that I can get some/one of these experts to attend Flock this year and give a talk on btrfs. Where it's at, what it needs to be the default fs, etc. We may wind up with a feature-reduced btrfs option in the not too distant future being viable (e.g. no multi-device spanning, no RAID).

I realize btrfs is something people are really excited about and really want, but I'm not willing to let hype or "mostly" working features swing our decision. People have been living without fs-rollback for years, and I think they can wait a bit longer. The last thing we need is to get bad hype because people start losing their data if we force the issue.

josh

Matthias Clasen

7:52 a.m.

On Fri, 2014-02-21 at 08:40 -0500, Josh Boyer wrote:

...

I discussed btrfs with some of our FS experts at DevConf a couple weeks ago, and then further via email after. I'm not convinced it's ready to be the default FS for any product in Fedora yet. I'm hoping that I can get some/one of these experts to attend Flock this year and give a talk on btrfs. Where it's at, what it needs to be the default fs, etc. We may wind up with a feature-reduced btrfs option in the not too distant future being viable (e.g. no multi-device spanning, no RAID).

We may want to reword the 'file system' section I just put in the tech spec, then.

...

I realize btrfs is something people are really excited about and really want, but I'm not willing to let hype or "mostly" working features swing our decision. People have been living without fs-rollback for years, and I think they can wait a bit longer. The last thing we need is to get bad hype because people start losing their data if we force the issue.

Tbh, from where I stand, people were excited about btrfs a few years ago, but the excitement has wanted. You can hold your breath only for so long. Time to either push it over the hump this year, or give up and move on. Isn't suse using btrfs by default now ?

Colin Walters

7:58 a.m.

On Fri, Feb 21, 2014 at 7:53 AM, Colin Walters walters@verbum.org wrote:

...

There's also OSTree - it's a more invasive vision, as it forces *every upgrade* to be atomic, and at present, you need to reboot. But on the plus side, every upgrade is atomic =) And by virtue of being constantly used, it's also constantly tested in real world scenarios.

I should also mention that it actually works today. See: http://rpm-ostree.cloud.fedoraproject.org/#/ for demo VM images to download (and if you're brave, you can install on bare metal).

Also, the video from devconf.cz is online now:

http://www.youtube.com/watch?v=Hy0ZEHPXJ9Q

Josh Boyer

8:04 a.m.

On Fri, Feb 21, 2014 at 8:52 AM, Matthias Clasen mclasen@redhat.com wrote:

...

On Fri, 2014-02-21 at 08:40 -0500, Josh Boyer wrote:

...
I discussed btrfs with some of our FS experts at DevConf a couple weeks ago, and then further via email after. I'm not convinced it's ready to be the default FS for any product in Fedora yet. I'm hoping that I can get some/one of these experts to attend Flock this year and give a talk on btrfs. Where it's at, what it needs to be the default fs, etc. We may wind up with a feature-reduced btrfs option in the not too distant future being viable (e.g. no multi-device spanning, no RAID).

We may want to reword the 'file system' section I just put in the tech spec, then.

I'll look it over soon. I've been trying to play catch up on kernel bugs this week.

...

...
I realize btrfs is something people are really excited about and really want, but I'm not willing to let hype or "mostly" working features swing our decision. People have been living without fs-rollback for years, and I think they can wait a bit longer. The last thing we need is to get bad hype because people start losing their data if we force the issue.

Tbh, from where I stand, people were excited about btrfs a few years ago, but the excitement has wanted. You can hold your breath only for so long. Time to either push it over the hump this year, or give up and move on. Isn't suse using btrfs by default now ?

SLES or OpenSUSE? For SLES, I think they have it as an option, but in a very reduced mode as I described above. I think it's an option in OpenSUSE 13.x, but it isn't the default because there were things they didn't think were production ready. So they basically match Fedora in terms of defaults. I do think they promote and work on snapper and btrfs more than we do though.

josh

Jaroslav Reznik

8:08 a.m.

----- Original Message -----

...

On Fri, Feb 21, 2014 at 4:36 AM, Christian Schaller cschalle@redhat.com wrote:

...
I think most times when this feature has been discussed it has been in the context of btrfs. I think the current text in the technical specification doesn't got deeper on the subject partly because we are all a little on the fence for at what time we feel confident enough about btrfs to dare propose it as the default filesystem for the desktop. I spoke with the btrfs developer at Red Hat during a conference on the US West Coast towards the end of last year, and he thought that btrfs was ready for the desktop usecase, although he was not ready to recommend it for server use due to the (small) risk of data corruption. The argument being that if lets say you need to rollback to a half a day older snapshot due to data corruption once a year on a desktop that is probably fine due to desktops local data usually being slow moving, while on a database server is not really an option. That said, I think this item has stalled a bit due to a feeling of uncertainty if that once a year occurrence is really fine. (On the other hand it is not that the current options are 100% risk free either).

I discussed btrfs with some of our FS experts at DevConf a couple weeks ago, and then further via email after. I'm not convinced it's ready to be the default FS for any product in Fedora yet. I'm hoping that I can get some/one of these experts to attend Flock this year and give a talk on btrfs. Where it's at, what it needs to be the default fs, etc. We may wind up with a feature-reduced btrfs option in the not too distant future being viable (e.g. no multi-device spanning, no RAID).

And it's not only about the filesystem but also other parts, in our case mostly installer. There's some brtfs support but has still some limitations and for F20 we were even thinking how to show it as tech preview, instead of feature we block on (same for LVM thinp). It's always good to have more testing from brave users but we should be conservative in what we show as supported option in installer...

Jaroslav

...

josh

desktop mailing list desktop@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/desktop

Michael Catanzaro

9:01 a.m.

On Fri, 2014-02-21 at 09:04 -0500, Josh Boyer wrote:

...

SLES or OpenSUSE? For SLES, I think they have it as an option, but in a very reduced mode as I described above. I think it's an option in OpenSUSE 13.x, but it isn't the default because there were things they didn't think were production ready. So they basically match Fedora in terms of defaults. I do think they promote and work on snapper and btrfs more than we do though.

It's been decided that openSUSE 13.2, to be released in November, will be the first to use btrfs by default. I *think* this decision was made because SUSE Linux Enterprise Server 12 (to be released late this year) is going to default to btrfs as well, but I'm not sure where I read that, or if I really did.

They have disabled several features that they feel are less stable! Large thread: http://lists.opensuse.org/opensuse-factory/2013-09/msg00029.html

Chris Murphy

2:47 p.m.

On Feb 20, 2014, at 10:46 PM, Chris Murphy lists@colorremedies.com wrote:

...

Should the facts as they're presently known be included in the spec in the meantime, with a TDB status?

It's often helpful to get the acronym correct: TBD=To Be Determined. The main point of the thread was just to make sure this PRD item isn't totally missing in action on the technical spec, even if implementation details and timing are uncertain.

My subsequent responses on this thread will likely go down the rabbit hole…

Chris Murphy

3:50 p.m.

On Feb 21, 2014, at 1:54 AM, Richard Hughes hughsient@gmail.com wrote:

...

On 21 February 2014 05:46, Chris Murphy lists@colorremedies.com wrote:

...
LVM Thin Provisioning

Now the kernel doesn't explode when thinp runs out of physical space I'm happier considering this.

Yeah. I'm personally not as interested in LVM thinp for the general use case because it's overly complicated with layers and obscure commands. You have to understand LVM pretty well to grok LVM thinp. That's not the case with Btrfs. But I think both should be considered in parity when it comes to designing a snapshot/rollback strategy that works well regardless of what's backing it.

...

btrfs has been stable for me, but suffers hugely when trying to use a VM on top of it without chattr +C.

This is expected. It's an open question if apps like virt-manager, Boxes, etc. should set +C xattr on containing directories when on Btrfs; or if the currently active work on its autodefrag option will obviate +C, and when it will become a default mount option.

...

I was intending to use snapper, but the real problems lie as we also want to snapshot /boot, might not want LVM/btrfs for performance/stability reasons, and that we actually want some things to be excluded from the snapshot, e.g. the systemd journal.

I think all of those have fairly straightforward workarounds already, e.g. create a log subvolume (or LV) and mount it at /var/log and then never snapshot or roll it back. Another case is /home. It means rollbacks conditional, and there isn't such a thing as one kind of rollback. Rolling back /home doesn't imply rolling back the system. Rolling back the system doesn't mean rolling back /home. Those two seem very basic to distinguish between. There may be others. And there may be better ways to achieve them instead of fs level rollback. Btrfs does provide file level snapshots so it can be rather granular if there's demand for it.

/boot concern has possibly two work arounds: retain more kernels, and don't have too many (or too old) system level snapshots.

Chris Murphy

3:51 p.m.

On Feb 21, 2014, at 5:53 AM, Colin Walters walters@verbum.org wrote:

...

There's also OSTree - it's a more invasive vision, as it forces *every upgrade* to be atomic, and at present, you need to reboot.

I don't see this as so invasive as on Btrfs and LVM thinp snapshots, they are COW and writes are atomic anyway. Btrfs perhaps would write less since only changed blocks are written, while LVM thinp it's a chunk

The snapper, yum plugin (and the manually performed user) convention is to take a snapshot that is then set aside for safe keeping and rollback; and it is the active parent tree that's upgraded. If the upgrade implodes, the current parent tree is hosed, and if it succeeds you have a modified active system that suggests a reboot is needed sooner than later.

But I see no reason why an implementation couldn't update the snapshot instead of the active parent. If it fails, then clean up the snapshot. If it succeeds, reboot when convenient. Isn't this the OSTree convention? Create a new tree and update it, not the active tree?

Chris Murphy

Colin Walters

6:21 p.m.

On Fri, Feb 21, 2014 at 4:51 PM, Chris Murphy lists@colorremedies.com wrote:

...

I don't see this as so invasive as on Btrfs and LVM thinp snapshots, they are COW and writes are atomic anyway. Btrfs perhaps would write less since only changed blocks are written, while LVM thinp it's a chunk

While a conversation about the efficiency details of BTRFS vs LVM thinp writes versus OSTree hardlinks is interesting, it's also not very important in the big picture.

...

The snapper, yum plugin (and the manually performed user) convention is to take a snapshot that is then set aside for safe keeping and rollback; and it is the active parent tree that's upgraded.

Yes. The downside of this is that it doesn't solve the issue where updates actively break running processes, like say the default web browser of the "default offering".

I think it's *utterly insane* to ship an update system that does that. I won't continue to attach my name to one that does - instead, I will get around at some point to researching and implementing *completely race-free* live upgrades only *where possible*.

I'm pretty sure I'll have such a subset of live upgrades on top of a known safe and tested basis working much faster than anyone can try to retrofit complete safety on top of yum/rpm + snapshot tool.

...

But I see no reason why an implementation couldn't update the snapshot instead of the active parent. If it fails, then clean up the snapshot. If it succeeds, reboot when convenient. Isn't this the OSTree convention? Create a new tree and update it, not the active tree?

Correct. There is an important aspect here, which is that doing all of this safely needs a system which can do *coordinated* changes to both /boot and the root filesystem. It needs something aware of bootloaders, kernels, as well as higher level concepts such as the semantics of /etc and /var. OSTree is such a system.

Projects like snapper and the yum fs-snapshot plugin are basically glued to the side of rpm (or dpkg) - they aren't in a position to enforce reliable semantics.

For example, even though the yum fs-snapshot plugin is linked into yum, it isn't ultimately in control of kernel installation - that's just a shell script forked off in a %posttrans. OSTree, when installing a kernel (or making any other change at all), will cleanly handle ENOSPC and unwind, leaving your current system completely untouched. You could then provision new space, and retry the upgrade.

Chris Murphy

24 Feb 24 Feb

12:11 a.m.

On Feb 21, 2014, at 5:21 PM, Colin Walters walters@verbum.org wrote:

...

On Fri, Feb 21, 2014 at 4:51 PM, Chris Murphy lists@colorremedies.com wrote:

...
The snapper, yum plugin (and the manually performed user) convention is to take a snapshot that is then set aside for safe keeping and rollback; and it is the active parent tree that's upgraded.

Yes. The downside of this is that it doesn't solve the issue where updates actively break running processes, like say the default web browser of the "default offering".

I think it's *utterly insane* to ship an update system that does that. I won't continue to attach my name to one that does - instead, I will get around at some point to researching and implementing *completely race-free* live upgrades only *where possible*.

I'm pretty sure I'll have such a subset of live upgrades on top of a known safe and tested basis working much faster than anyone can try to retrofit complete safety on top of yum/rpm + snapshot tool.

I just tried creating rw btrfs snapshots, mounting them and bind mounting /dev /proc et al in a chroot and doing a dnf update from a clean F20 to current, then fixing up the grub.cfg and fstab, rebooting, and it sorta mostly appears to work. [1][2] Maybe this is better/safer done in a systemd container.

...

...
But I see no reason why an implementation couldn't update the snapshot instead of the active parent. If it fails, then clean up the snapshot. If it succeeds, reboot when convenient. Isn't this the OSTree convention? Create a new tree and update it, not the active tree?

Correct. There is an important aspect here, which is that doing all of this safely needs a system which can do *coordinated* changes to both /boot and the root filesystem.

Yes. Snapper on openSUSE is doing this already on Btrfs. I'm not sure how it's dealt with on LVM thinp since /boot has to be outside LVM thinp because while GRUB groks conventional LVM, it doesn't get thinp yet. GRUB does understand /boot on Btrfs, but Fedora's grubby has a problem with it [1]. I've also been making /var/log a separate subvolume making it immune to rootfs snapshots and rollbacks.

...

It needs something aware of bootloaders, kernels, as well as higher level concepts such as the semantics of /etc and /var. OSTree is such a system.

Yes. I started some conversations about this on the grub-devel list and there seems a tentative way to get it to dynamically show snapshots rather than always having to modify grub.cfg. Also they really don't want /boot/grub to be snapshot/rolled back because it's not good for core.img to be out of sync with grub modules (not dissimilar to the kernel on /boot being disconnected from /lib/modules).

Is there good chance of optimizing OSTree to use LVMthin and Btrfs snapshots instead of hardlinks, while still being in charge of the proper semantic enforcement?

...

Projects like snapper and the yum fs-snapshot plugin are basically glued to the side of rpm (or dpkg) - they aren't in a position to enforce reliable semantics.

Yes I also don't consider one kind of "rollback" since there can be different contexts. A user rolling back their /home doesn't mean rolling back any other user's, or the system. Conversely rolling back the system doesn't mean rolling back user /home or logs or some other things.

...

For example, even though the yum fs-snapshot plugin is linked into yum, it isn't ultimately in control of kernel installation - that's just a shell script forked off in a %posttrans. OSTree, when installing a kernel (or making any other change at all), will cleanly handle ENOSPC and unwind, leaving your current system completely untouched. You could then provision new space, and retry the upgrade.

Sounds pretty neat.

Chris Murphy

[1] grubby doesn't update grub.cfg when /boot is on a (nested) Btrfs subvolume. https://bugzilla.redhat.com/show_bug.cgi?id=864198

[2] A bunch of /usr/sbin binaries including sshd and bash have the wrong selinux label, unconfined_u:object_r:bin_t:s0, and so they fail to work unless I relabel or boot enforcing=0.

Colin Walters

6:29 a.m.

On Mon, Feb 24, 2014 at 1:11 AM, Chris Murphy lists@colorremedies.com wrote:

...

Yes. Snapper on openSUSE is doing this already on Btrfs. I'm not sure how it's dealt with on LVM thinp since /boot has to be outside LVM thinp because while GRUB groks conventional LVM, it doesn't get thinp yet. GRUB does understand /boot on Btrfs, but Fedora's grubby has a problem with it [1]. I've also been making /var/log a separate subvolume making it immune to rootfs snapshots and rollbacks.

Note for OSTree, /var/lib/rpm -> /usr/share/rpm (it's also immutable). Same for /var/lib/yum.

...

Is there good chance of optimizing OSTree to use LVMthin and Btrfs snapshots instead of hardlinks, while still being in charge of the proper semantic enforcement?

Note OSTree already today uses BTRFS_IOC_CLONE if on btrfs for implementing the separate copies of /etc. (Actually this happens via the generic g_file_copy() since https://git.gnome.org/browse/glib/commit/?id=5eba9784979e0b723c05a45cf767046... )

Beyond that though - because for OSTree, /usr is immutable, there isn't really a big advantage of thinp or btrfs snapshots. Just try this right now on your laptop:

# Once for cold cache performance time cp -al /usr /usr.copy # And once for hot cache time cp -al /usr /usr.copy2

For me (and this a real-world RHEL7 system with a 5.1G /usr):

[root@localhost /]# time cp -al usr usr.copy real 0m5.199s user 0m0.220s sys 0m2.849s [root@localhost /]# time cp -al usr usr.copy2 real 0m2.245s user 0m0.166s sys 0m2.049s

That's really fast enough for the use cases I envision, for now. Obviously FS/block snapshots have other advantages beyond being instant - for example, they don't incur lots of scattered writes to bump the refcounts of inodes. But many systems already have that happening periodically to a lesser degree with the default of relatime anyways.

Where FS/block snapshots become *necessary* is if you have *uncontrolled writes* to /usr. For example, with OSTree's hardlink model, I cannot allow arbitrary rpm %post code to run. Each one has to be carefully audited to break hardlinks via "write new copy, rename" instead of doing edits in place.

This is necessary to allow a story for local software installation. We don't need to do it though for the "pure replication" model where *no* RPM %post runs on client systems - it all happens on the build server.

This replication model where OSTree is strongest right now, and where the traditional package model is weakest, so I have been mainly emphasizing it.

That said, doing this careful auditing of RPM %post and in general laying the foundations for a package-like system on top of OSTree is very much in the long term plans.

...

Yes I also don't consider one kind of "rollback" since there can be different contexts. A user rolling back their /home doesn't mean rolling back any other user's, or the system. Conversely rolling back the system doesn't mean rolling back user /home or logs or some other things.

Definitely. OSTree doesn't touch /home (note this is now /var/home) - and so it makes a lot of sense to still have something that's more like a backup system. Particularly a backup system that knew to take a backup before OSTree upgrades.

That's where using BTRFS or thinp in *combination* with OSTree is really nice - that total freedom to do whatever you want at the block layer means you can choose to have /home (/var/home) on a separate partition and do thinp snapshots of it. Or use BTRFS's per-subvolume RAID to say you want RAID0 for /, and RAID1 for /home.

To answer your question in another way then - I'll definitely be fast to take advantage of any new APIs added by the storage layer to *transparently* make things better for OSTree. But I don't want to mandate any particular partition layout or FS/block level layout, because I think it takes away too much administrator flexibilty.

Chris Murphy

13 Mar 13 Mar

11:16 p.m.

On Feb 24, 2014, at 5:29 AM, Colin Walters walters@verbum.org wrote:

...

On Mon, Feb 24, 2014 at 1:11 AM, Chris Murphy lists@colorremedies.com wrote:

...
Is there good chance of optimizing OSTree to use LVMthin and Btrfs snapshots instead of hardlinks, while still being in charge of the proper semantic enforcement?

Note OSTree already today uses BTRFS_IOC_CLONE if on btrfs for implementing the separate copies of /etc. (Actually this happens via the generic g_file_copy() since https://git.gnome.org/browse/glib/commit/?id=5eba9784979e0b723c05a45cf767046... )

Beyond that though - because for OSTree, /usr is immutable, there isn't really a big advantage of thinp or btrfs snapshots. Just try this right now on your laptop:

# Once for cold cache performance time cp -al /usr /usr.copy # And once for hot cache time cp -al /usr /usr.copy2

For me (and this a real-world RHEL7 system with a 5.1G /usr):

[root@localhost /]# time cp -al usr usr.copy real 0m5.199s user 0m0.220s sys 0m2.849s [root@localhost /]# time cp -al usr usr.copy2 real 0m2.245s user 0m0.166s sys 0m2.049s

That's really fast enough for the use cases I envision, for now.

How about hard link deletion time? When a tree is to be discarded, we may be talking about tens of thousands of hardlinks being unlinked, right? Snapshot deletion is nearly instant (some background clean-up does happen).

What about permissions/selinux policy updates, or relabeling? A hard link can't have different permissions/context than the file it points to. If an update requires file metadata update, then I'm guessing to preserve the original state of a tree, this would require creating a copy of the file rather than a hard link?

If so that brings up this thread I brought up on the Fedora security list. The gist is asking whether there's a security risk/concern if old binaries with vulnerabilities are persistently available. I'm not sure where OSTree moves old trees, how obscure the location is. Currently the way yum-plugin-fs-snapshot and snapper behave, Btrfs snapshots are placed inside the parent, so those old binaries are always available albeit in a different path. But this could be changed, and it's also possible to mount the old tree (snapshot) with noexec or nosuid mount option to avoid most?all? of this concern.

https://lists.fedoraproject.org/pipermail/security/2014-February/001748.html

And yet another topic, loosely related to the needed tree switching semantics and booting. There is a thread on Discoverable Partitions Spec on the systemd list: http://lists.freedesktop.org/archives/systemd-devel/2014-March/017677.html

and later becomes a thread on both systemd and btrfs lists starting here: http://www.spinics.net/lists/linux-btrfs/msg32361.html

I'm kinda liking the part of this being self-describing, usable by bootloaders and systemd, rather than utilities like OSTree, snapper, and so on, having to become familiar with and responsible for updating myriad bootloader configuration scripts, and updating fstab properly. But you know more about these pitfalls so this is mostly a heads up to see if you have some opinions on whether the main two suggestions are better or worse than what we have to deal with now.

Chris Murphy

Colin Walters

14 Mar 14 Mar

7:29 a.m.

On Fri, Mar 14, 2014 at 12:16 AM, Chris Murphy lists@colorremedies.com wrote:

...

How about hard link deletion time? When a tree is to be discarded, we may be talking about tens of thousands of hardlinks being unlinked, right? Snapshot deletion is nearly instant (some background clean-up does happen).

Yep, just try the reverse operation: cp -al /usr /usr2 time rm -rf /usr2

It's ~8s for me on XFS+LVM on Samsung SSD, which I'm happy enough with. When we get to the point where we're consistently debating update performance and not reliability, I'll be happy.

...

What about permissions/selinux policy updates, or relabeling? A hard link can't have different permissions/context than the file it points to. If an update requires file metadata update, then I'm guessing to preserve the original state of a tree, this would require creating a copy of the file rather than a hard link?

Yes...but the OSTree client and server don't "know" that it's a copy. OSTree is object-based, not delta based. Every time any RPM changes, we redo a full install, and relabel on the server side, then commit that.

(I do this because I demand there is *zero difference* between a fresh install and an upgrade - this removes a lot of potential package failure modes)

So a permission change will show up as a new object in objects/$checksum.

This does mean that if a file changes permission or SELinux label in an RPM, clients redownload the entire file content. Honestly, this doesn't happen very often, and most individual files are small anyways. Static deltas will address this as well.

...

If so that brings up this thread I brought up on the Fedora security list. The gist is asking whether there's a security risk/concern if old binaries with vulnerabilities are persistently available.

Yep, this came up before: https://bugzilla.gnome.org/show_bug.cgi?id=722984

...

And yet another topic, loosely related to the needed tree switching semantics and booting. There is a thread on Discoverable Partitions Spec on the systemd list: http://lists.freedesktop.org/archives/systemd-devel/2014-March/017677.html

Discoverable Partitions pretty much orthogonal to OSTree.

...

I'm kinda liking the part of this being self-describing, usable by bootloaders and systemd, rather than utilities like OSTree, snapper, and so on, having to become familiar with and responsible for updating myriad bootloader configuration scripts

OSTree already uses the BLS: http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec/

Once syslinux and u-boot gain awareness of it, then the bootloader-specific code can be dropped from the OSTree core.

...

and updating fstab properly.

OSTree doesn't touch fstab - it writes a new bootloader configuration with a new ostree= argument that points to the correct root.

...

But you know more about these pitfalls so this is mostly a heads up to see if you have some opinions on whether the main two suggestions are better or worse than what we have to deal with now.

Ah...which "main two" suggestions and what precisely is "what we have to deal with now"? Are we talking about traditional yum+rpm? yum but on BTRFS? yum on BTRFS but writing to a new snapshot, and then editing the bootloader configuration to make that new snapshot available on the next boot?

The most useful thing I think is to compare *complete systems* (such as the latter vs OSTree). It gets hard to discuss individual pieces of technology that can be combined in many ways.

Alberto Ruiz

17 Mar 17 Mar

7:38 a.m.

FWIW You don't need to edit the bootloader to point to a new snapshot, you can tell btrfs which snapshot to use by default on mount time

----- Original Message ----- From: "Colin Walters" walters@verbum.org To: "Chris Murphy" lists@colorremedies.com Cc: "Discussions about development for the Fedora desktop" desktop@lists.fedoraproject.org Sent: Friday, 14 March, 2014 1:29:49 PM Subject: Re: Technical Spec, better upgrade/rollback control

On Fri, Mar 14, 2014 at 12:16 AM, Chris Murphy lists@colorremedies.com wrote:

Yep, just try the reverse operation: cp -al /usr /usr2 time rm -rf /usr2

It's ~8s for me on XFS+LVM on Samsung SSD, which I'm happy enough with. When we get to the point where we're consistently debating update performance and not reliability, I'll be happy.

(I do this because I demand there is *zero difference* between a fresh install and an upgrade - this removes a lot of potential package failure modes)

So a permission change will show up as a new object in objects/$checksum.

If so that brings up this thread I brought up on the Fedora security list. The gist is asking whether there's a security risk/concern if old binaries with vulnerabilities are persistently available.

Yep, this came up before: https://bugzilla.gnome.org/show_bug.cgi?id=722984

Discoverable Partitions pretty much orthogonal to OSTree.

OSTree already uses the BLS: http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec/

Once syslinux and u-boot gain awareness of it, then the bootloader-specific code can be dropped from the OSTree core.

and updating fstab properly.

OSTree doesn't touch fstab - it writes a new bootloader configuration with a new ostree= argument that points to the correct root.

But you know more about these pitfalls so this is mostly a heads up to see if you have some opinions on whether the main two suggestions are better or worse than what we have to deal with now.

The most useful thing I think is to compare *complete systems* (such as the latter vs OSTree). It gets hard to discuss individual pieces of technology that can be combined in many ways.

-- desktop mailing list desktop@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/desktop

3699

Age (days ago)

3723

Last active (days ago)

desktop@lists.fedoraproject.org

20 comments

11 participants

tags (0)

participants (11)

Alberto Ruiz
Chris Murphy
Christian Schaller
Colin Walters
drago01
Jaroslav Reznik
Josh Boyer
Matthias Clasen
Michael Catanzaro
Richard Hughes
Stephen Gallagher