= Proposed Self Contained Change: Anaconda LVM RAID = https://fedoraproject.org/wiki/Changes/AnacondaLVMRAID
Change owner(s): * Vratislav Podzimek (Anaconda/Blivet) <vpodzime AT redhat DOT com> * Heinz Mauelshagen (LVM) <heinzm AT redhat DOT com>
Use LVM RAID instead of LVM of top of MD RAID in the Anaconda installer.
== Detailed Description == In the current situation when a user chooses LVM (or Thin LVM) partitioning in the Custom Spoke and then sets RAID level for the VG Anaconda (and Blivet) create an MD RAID device which is used as a PV for the VG. With this change we are going to use LVM RAID directly instead. That means that all the LVs in that VG will be RAID LVs with the specified RAID level. LVM RAID provides same functionality as MD RAID (it shares the same kernel code) with better flexibility and additional features expected in future.
== Scope == * Proposal owners: -- Blivet developers: Support creation of LVM RAID in a similar way as LVM on top of MD RAID. (Creation of RAID LVs is already supported.) -- Anaconda developers: Use the new way to create LVM RAID instead of creating LVM on top of MD RAID. -- LVM developers: LVM RAID already has all features required by this change.
* Other developers: N/A (not a System Wide Change)
* Release engineering:
* List of deliverables: N/A (not a System Wide Change)
* Policies and guidelines:
* Trademark approval:
On Tuesday, 31 January 2017 at 13:13, Jan Kurik wrote:
= Proposed Self Contained Change: Anaconda LVM RAID = https://fedoraproject.org/wiki/Changes/AnacondaLVMRAID
Change owner(s):
- Vratislav Podzimek (Anaconda/Blivet) <vpodzime AT redhat DOT com>
- Heinz Mauelshagen (LVM) <heinzm AT redhat DOT com>
Use LVM RAID instead of LVM of top of MD RAID in the Anaconda installer.
== Detailed Description == In the current situation when a user chooses LVM (or Thin LVM) partitioning in the Custom Spoke and then sets RAID level for the VG Anaconda (and Blivet) create an MD RAID device which is used as a PV for the VG. With this change we are going to use LVM RAID directly instead. That means that all the LVs in that VG will be RAID LVs with the specified RAID level. LVM RAID provides same functionality as MD RAID (it shares the same kernel code) with better flexibility and additional features expected in future.
I'd like to see (a link to) a more comprehensive discussion of the purported advantages of LVM RAID over LVM on MD RAID here.
Regards, Dominik
On Tue, Jan 31, 2017 at 6:52 AM, Dominik 'Rathann' Mierzejewski dominik@greysector.net wrote:
I'd like to see (a link to) a more comprehensive discussion of the purported advantages of LVM RAID over LVM on MD RAID here.
If the user never interacts with the storage stack, it's a wash.
Otherwise, the advantage is RAID level is an LV attribute, set at the time the LV is created, which means LV's can have different RAID levels, and are resizeable within the VG they belong to. So a VG with three disks can have a raid0 LV, raid1 LV, and raid5 LV - and they're all resizable.
I think the resize benefit is minor because only Btrfs has online shrink, ext4 can only shrink offline, and XFS doesn't support shrink. To mitigate this means leaving some unused space in the VG (and on all PVs).
As for drawbacks, as a practical matter very few people are familiar with managing LVM RAIDs with LVM tools. While it uses md kernel code, it uses LVM metadata, not mdadm metadata, so mdadm cannot be used at all to manage them.
On Tue, Jan 31, 2017 at 7:33 AM, Chris Adams linux@cmadams.net wrote:
How do LVM RAID volumes get tested? There's a regular cron job for testing MD RAID volumes, but I'm not aware of something like that for LVM RAID.
I'm not aware of an upstream cron job for this; nor one in Fedora.
'echo check > /sys/block/mdX/md/sync_action' works for either mdadm or LVM RAID, as it's kernel code doing the scrub; but LVM does have a command for it 'lvchange --syncaction {check|repair) vg/raid_lv'
On 01/31/2017 12:02 PM, Chris Murphy wrote:
On Tue, Jan 31, 2017 at 7:33 AM, Chris Adams linux@cmadams.net wrote:
How do LVM RAID volumes get tested? There's a regular cron job for testing MD RAID volumes, but I'm not aware of something like that for LVM RAID.
I'm not aware of an upstream cron job for this; nor one in Fedora.
'echo check > /sys/block/mdX/md/sync_action' works for either mdadm or LVM RAID, as it's kernel code doing the scrub; but LVM does have a command for it 'lvchange --syncaction {check|repair) vg/raid_lv'
There is something in Fedora that does this. I don't know whether it's automatic in the kernel or there's a cron job. But my md RAID arrays get scanned every once in a while and I didn't do anything to set that up.
On 02/07/2017 12:27 PM, Samuel Sieb wrote:
On 01/31/2017 12:02 PM, Chris Murphy wrote:
On Tue, Jan 31, 2017 at 7:33 AM, Chris Adams linux@cmadams.net wrote:
How do LVM RAID volumes get tested? There's a regular cron job for testing MD RAID volumes, but I'm not aware of something like that for LVM RAID.
I'm not aware of an upstream cron job for this; nor one in Fedora.
'echo check > /sys/block/mdX/md/sync_action' works for either mdadm or LVM RAID, as it's kernel code doing the scrub; but LVM does have a command for it 'lvchange --syncaction {check|repair) vg/raid_lv'
There is something in Fedora that does this. I don't know whether it's automatic in the kernel or there's a cron job. But my md RAID arrays get scanned every once in a while and I didn't do anything to set that up.
It's /etc/cron.d/raid-check from mdadm.
On Tue, 7 Feb 2017 12:27:20 -0800 Samuel Sieb samuel@sieb.net wrote:
On 01/31/2017 12:02 PM, Chris Murphy wrote:
On Tue, Jan 31, 2017 at 7:33 AM, Chris Adams linux@cmadams.net wrote:
How do LVM RAID volumes get tested? There's a regular cron job for testing MD RAID volumes, but I'm not aware of something like that for LVM RAID.
I'm not aware of an upstream cron job for this; nor one in Fedora.
'echo check > /sys/block/mdX/md/sync_action' works for either mdadm or LVM RAID, as it's kernel code doing the scrub; but LVM does have a command for it 'lvchange --syncaction {check|repair) vg/raid_lv'
There is something in Fedora that does this. I don't know whether it's automatic in the kernel or there's a cron job. But my md RAID arrays get scanned every once in a while and I didn't do anything to set that up.
% rpm -qf /etc/cron.d/raid-check mdadm-3.4-2.fc25.x86_64 % cat /etc/cron.d/raid-check # Run system wide raid-check once a week on Sunday at 1am by default 0 1 * * Sun root /usr/sbin/raid-check
/etc/sysconfig/raid-check has the config. I think by default it's not set to do anything, but perhaps something updates this on install.
kevin
My opinion of the change is limited to whether gnome-shell will still notify the user of device failures; if not, or if unknown then I think the change should be rejected as lack of device failure notifications in the DE is a regression that outweighs the benefit. It's
Chris Murphy
On Tue, 2017-02-07 at 14:50 -0700, Chris Murphy wrote:
My opinion of the change is limited to whether gnome-shell will still notify the user of device failures; if not, or if unknown then I think the change should be rejected as lack of device failure notifications in the DE is a regression that outweighs the benefit. It's
Any idea how this is working right now? AFAICT, there's no mechanism for this other than a) running 'mdadm --monitor' or b) watching journal for log messages of particular format
We could implement a similar thing for LVM RAID.
Once upon a time, Vratislav Podzimek vpodzime@redhat.com said:
On Tue, 2017-02-07 at 14:50 -0700, Chris Murphy wrote:
My opinion of the change is limited to whether gnome-shell will still notify the user of device failures; if not, or if unknown then I think the change should be rejected as lack of device failure notifications in the DE is a regression that outweighs the benefit. It's
Any idea how this is working right now? AFAICT, there's no mechanism for this other than a) running 'mdadm --monitor' or b) watching journal for log messages of particular format
We could implement a similar thing for LVM RAID.
Well, all the world is not GNOME or any DE for that matter; mdadm has the mdmonitor service (enabled by default I believe) that will send email about MD RAID failures. This is perfectly function for a server, so should have some functionally equivalent replacement for LVM RAID.
On Tue, 2017-02-21 at 07:52 -0600, Chris Adams wrote:
Once upon a time, Vratislav Podzimek vpodzime@redhat.com said:
On Tue, 2017-02-07 at 14:50 -0700, Chris Murphy wrote:
My opinion of the change is limited to whether gnome-shell will still notify the user of device failures; if not, or if unknown then I think the change should be rejected as lack of device failure notifications in the DE is a regression that outweighs the benefit. It's
Any idea how this is working right now? AFAICT, there's no mechanism for this other than a) running 'mdadm --monitor' or b) watching journal for log messages of particular format
We could implement a similar thing for LVM RAID.
Well, all the world is not GNOME or any DE for that matter; mdadm has the mdmonitor service (enabled by default I believe) that will send email about MD RAID failures. This is perfectly function for a server, so should have some functionally equivalent replacement for LVM RAID.
Sure thing, could you please file an RFE/bug report at
https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper
Thanks!
Once upon a time, Jan Kurik jkurik@redhat.com said:
LVM RAID provides same functionality as MD RAID (it shares the same kernel code) with better flexibility and additional features expected in future.
How do LVM RAID volumes get tested? There's a regular cron job for testing MD RAID volumes, but I'm not aware of something like that for LVM RAID.
On Jan 31, 2017 6:34 AM, "Chris Adams" linux@cmadams.net wrote:
Once upon a time, Jan Kurik jkurik@redhat.com said:
LVM RAID provides same functionality as MD RAID (it shares the same kernel code) with better flexibility and additional features expected in future.
How do LVM RAID volumes get tested? There's a regular cron job for testing MD RAID volumes, but I'm not aware of something like that for LVM RAID.
For that matter, how do you administer it? For MD RAID, there's mdadm. Does LVM RAID have a fully-featured equivalent?
On Tue, 2017-01-31 at 07:05 -0800, Andrew Lutomirski wrote:
On Jan 31, 2017 6:34 AM, "Chris Adams" linux@cmadams.net wrote: Once upon a time, Jan Kurik jkurik@redhat.com said:
LVM RAID provides same functionality as MD RAID (it shares the same kernel code) with better flexibility and additional features expected in future.
How do LVM RAID volumes get tested? There's a regular cron job for testing MD RAID volumes, but I'm not aware of something like that for LVM RAID.
For that matter, how do you administer it? For MD RAID, there's mdadm. Does LVM RAID have a fully-featured equivalent?
Yes, it's administered with the LVM commands as described in 'man lvmraid'. I understand that some guide comparing the common 'mdadm' commands with 'lvm' commands could be useful though (CC'ing LVM guys).
On Tue, 2017-01-31 at 08:33 -0600, Chris Adams wrote:
Once upon a time, Jan Kurik jkurik@redhat.com said:
LVM RAID provides same functionality as MD RAID (it shares the same kernel code) with better flexibility and additional features expected in future.
How do LVM RAID volumes get tested? There's a regular cron job for testing MD RAID volumes, but I'm not aware of something like that for LVM RAID.
Already answered by Chris Murphy I believe. There's no cron job for LVM RAID specifically right now, but it should be easy to add it.
I would like the "User Experience" section to be fleshed-out a bit more. Currently it says "There should be no visible change for non-expert users. Expert users could make use of the new LVM RAID's features."
I think, though, there's plenty of middle ground here: users who are not experts in LVM RAID yet would like to have data redundancy and need to manage that.
The Documentation section mentions the installation guide, which is great, but what about documentation for replacing disks or adding to the set?
Once upon a time, Matthew Miller mattdm@fedoraproject.org said:
I would like the "User Experience" section to be fleshed-out a bit more. Currently it says "There should be no visible change for non-expert users. Expert users could make use of the new LVM RAID's features."
At a minimum, it would be nice to have a list of common tasks, especially in a comparison format (e.g. "Replace a failed drive" with MD and LVM commands). Maybe this already exists somewhere and could just be referenced.
I think one thing that's a bit of an annoyance is that LVM RAID is done at the LV layer, so you have to remember to set it up for every LV you create, vs. MD RAID where the whole VG will have RAID (this can be useful in some situations, but probably not the common case). This should be pointed out in the release notes IMHO.
Also: does this apply to /boot partition RAID 1? IIRC that didn't work with LVM RAID at one time.
On 01/31/2017 03:44 PM, Chris Adams wrote:
Also: does this apply to /boot partition RAID 1? IIRC that didn't work with LVM RAID at one time.
This is important, that mdraid is still available in anaconda and not entirely replaced by LVM RAID. Not only /boot partition but UEFI boot partition on mdraid 1
On Tue, 2017-01-31 at 15:52 -0400, Robert Marcano wrote:
On 01/31/2017 03:44 PM, Chris Adams wrote:
Also: does this apply to /boot partition RAID 1? IIRC that didn't work with LVM RAID at one time.
This is important, that mdraid is still available in anaconda and not entirely replaced by LVM RAID. Not only /boot partition but UEFI boot partition on mdraid 1
My understanding is that this only applies to "LVM on RAID" layouts. If you create /boot in anaconda and set the device type to "Software RAID" it will be created and managed with mdadm. It's only if you set a RAID level on an LVM volume group that you will be affected by this change.
Change owners/sponsors, please correct me if I'm wrong.
David
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org
On Tue, Jan 31, 2017 at 12:52 PM, Robert Marcano robert@marcanoonline.com wrote:
On 01/31/2017 03:44 PM, Chris Adams wrote:
Also: does this apply to /boot partition RAID 1? IIRC that didn't work with LVM RAID at one time.
This is important, that mdraid is still available in anaconda and not entirely replaced by LVM RAID.
Not only /boot partition but UEFI boot partition on mdraid 1
Yeah the mdadm raid1 for EFI System partitions is broken 8 ways to Sunday and is explicitly not recommended by upstream. It's an accepted hideously bad hack only because the alternative is more work than anyone is willing to volunteer time to do it correctly. The best production way of doing this is firmware RAID (i.e. imsm metadata, which mdadm supports and likewise uses md kernel support).
On Tue, 2017-01-31 at 15:52 -0400, Robert Marcano wrote:
On 01/31/2017 03:44 PM, Chris Adams wrote:
Also: does this apply to /boot partition RAID 1? IIRC that didn't work with LVM RAID at one time.
This is important, that mdraid is still available in anaconda and not entirely replaced by LVM RAID. Not only /boot partition but UEFI boot partition on mdraid 1
That's not going to happen. This change is only concerning the scenario when user chooses to have LVM on top of RAID (IOW, sets a RAID level for their VG).
On Tue, 2017-01-31 at 13:44 -0600, Chris Adams wrote:
Once upon a time, Matthew Miller mattdm@fedoraproject.org said:
I would like the "User Experience" section to be fleshed-out a bit more. Currently it says "There should be no visible change for non-expert users. Expert users could make use of the new LVM RAID's features."
At a minimum, it would be nice to have a list of common tasks, especially in a comparison format (e.g. "Replace a failed drive" with MD and LVM commands). Maybe this already exists somewhere and could just be referenced.
I'm not aware of any such document. CC'ing LVM guys to correct me if I'm wrong.
I think one thing that's a bit of an annoyance is that LVM RAID is done at the LV layer, so you have to remember to set it up for every LV you create, vs. MD RAID where the whole VG will have RAID (this can be useful in some situations, but probably not the common case). This should be pointed out in the release notes IMHO.
It's a good idea to mention this in the release notes, thanks! However, a linear LV can be simply converted to a RAID LV later so at least it's not that fatal.
Also: does this apply to /boot partition RAID 1? IIRC that didn't work with LVM RAID at one time.
/boot on LVM is not supported right now so this is not concerning /boot at all.
On Tue, 2017-01-31 at 14:28 -0500, Matthew Miller wrote:
I would like the "User Experience" section to be fleshed-out a bit more. Currently it says "There should be no visible change for non-expert users. Expert users could make use of the new LVM RAID's features."
I think, though, there's plenty of middle ground here: users who are not experts in LVM RAID yet would like to have data redundancy and need to manage that.
Sure, for them we need some nice overview of common 'mdadm' commands and their 'lvm' equivalents. CC'ing LVM guys here.
The Documentation section mentions the installation guide, which is great, but what about documentation for replacing disks or adding to the set?
ditto
On Tue, 2017-01-31 at 13:13 +0100, Jan Kurik wrote:
= Proposed Self Contained Change: Anaconda LVM RAID = https://fedoraproject.org/wiki/Changes/AnacondaLVMRAID
Change owner(s):
- Vratislav Podzimek (Anaconda/Blivet) <vpodzime AT redhat DOT com>
- Heinz Mauelshagen (LVM) <heinzm AT redhat DOT com>
Use LVM RAID instead of LVM of top of MD RAID in the Anaconda installer.
== Detailed Description == In the current situation when a user chooses LVM (or Thin LVM) partitioning in the Custom Spoke and then sets RAID level for the VG Anaconda (and Blivet) create an MD RAID device which is used as a PV for the VG. With this change we are going to use LVM RAID directly instead. That means that all the LVs in that VG will be RAID LVs with the specified RAID level. LVM RAID provides same functionality as MD RAID (it shares the same kernel code) with better flexibility and additional features expected in future.
== Scope ==
- Proposal owners:
-- Blivet developers: Support creation of LVM RAID in a similar way as LVM on top of MD RAID. (Creation of RAID LVs is already supported.) -- Anaconda developers: Use the new way to create LVM RAID instead of creating LVM on top of MD RAID. -- LVM developers: LVM RAID already has all features required by this change.
- Other developers:
N/A (not a System Wide Change)
- Release engineering:
Please ensure upgrades of systems using MD RAID are properly tested.
My server at home broke on upgrading to Fedora 22 (#1201962), and also on upgrading to Fedora 20 before that (IIRC). This implies that even when MD RAID was still being used by default, upgrades weren't very well-tested. With a move away from MD to LVM RAID, I'm concerned that things will only get worse. So let's please ensure that we have proper test coverage for existing systems.
On Wed, Feb 1, 2017 at 4:55 AM, David Woodhouse dwmw2@infradead.org wrote:
Please ensure upgrades of systems using MD RAID are properly tested.
Please help test upgrades if there's a layout you want to work. The only way anything gets tested is if someone does the test.
My server at home broke on upgrading to Fedora 22 (#1201962), and also on upgrading to Fedora 20 before that (IIRC).
That's a while ago, the system upgrade method is different now. At least on workstation it's using systemd offline update to do the major version upgrade, same as minor updates. So if it's able to do minor updates without breaking, it should be able to do the major one successfully. Whether there may be a bug that prevents a successfully upgraded system from booting - well that's what testing is for.
So let's please ensure that we have proper test coverage for existing systems.
Please describe your proposal for ensuring proper test coverage, in particular if you personally aren't able to test what is by definition a custom layout?
Please don't drop me from Cc when replying. I know the list has a misguided setup, but mailers can be configured to ignore that. Thanks.
http://david.woodhou.se/reply-to-list.html
On Wed, 2017-02-01 at 12:13 -0700, Chris Murphy wrote:
On Wed, Feb 1, 2017 at 4:55 AM, David Woodhouse dwmw2@infradead.org wrote:
My server at home broke on upgrading to Fedora 22 (#1201962), and also on upgrading to Fedora 20 before that (IIRC).
That's a while ago, the system upgrade method is different now. At least on workstation it's using systemd offline update to do the major version upgrade, same as minor updates. So if it's able to do minor updates without breaking, it should be able to do the major one successfully. Whether there may be a bug that prevents a successfully upgraded system from booting - well that's what testing is for.
I'm not sure the upgrade method matters, does it? In both cases I think it was changes to dracut and the way raid was assembled (perhaps moving from automatically in the kernel to doing it in userspace, or vice versa).
So let's please ensure that we have proper test coverage for existing systems.
Please describe your proposal for ensuring proper test coverage, in particular if you personally aren't able to test what is by definition a custom layout?
Nono, this *isn't* a custom layout. It's a fairly standard RAID setup. But if we change the defaults, then I suppose that retrospectively *makes* it a "custom layout"... at least in the sense that we can reasonably expect it to keep breaking every release or two :(
On Thu, Feb 2, 2017 at 3:38 AM, David Woodhouse dwmw2@infradead.org wrote:
I'm not sure the upgrade method matters, does it? In both cases I think it was changes to dracut and the way raid was assembled (perhaps moving from automatically in the kernel to doing it in userspace, or vice versa).
The upgrade method does matter because fedup had its own dracut variant. Systemd offline updates (used also by upgrades) uses the existing initramfs prior to the upgrade and a special, minimalist, boot target. So if the system boots normally before the update or upgrade, it should boot and assemble fine for the offline update. It could fail following a successful upgrade however - if there's some new previously undiscovered bug.
So let's please ensure that we have proper test coverage for existing systems.
Please describe your proposal for ensuring proper test coverage, in particular if you personally aren't able to test what is by definition a custom layout?
Nono, this *isn't* a custom layout. It's a fairly standard RAID setup.
All RAID setups are custom layouts insofar as they're only created with custom partitioning. There is only one default layout used by the installer.
But if we change the defaults, then I suppose that retrospetively *makes* it a "custom layout"... at least in the sense that we can reasonably expect it to keep breaking every release or two :(
It suggests it's not getting enough testing and asking someone else to test it probably will have no effect. Again, if there's a particular layout you want to make sure is working, you need to test it, or maybe write up an openQA test for it, so it can get automatically tested by a bot.
On 2 February 2017 9:06:32 pm GMT+00:00, Chris Murphy lists@colorremedies.com wrote:
On Thu, Feb 2, 2017 at 3:38 AM, David Woodhouse dwmw2@infradead.org wrote:
I'm not sure the upgrade method matters, does it? In both cases I
think
it was changes to dracut and the way raid was assembled (perhaps
moving
from automatically in the kernel to doing it in userspace, or vice versa).
The upgrade method does matter because fedup had its own dracut variant. Systemd offline updates (used also by upgrades) uses the existing initramfs prior to the upgrade and a special, minimalist, boot target. So if the system boots normally before the update or upgrade, it should boot and assemble fine for the offline update. It could fail following a successful upgrade however - if there's some new previously undiscovered bug.
Fedup has not existed for a few releases now. The upgrades are handled by dnf alone.
Dennis
So let's please ensure that we have proper test coverage for existing systems.
Please describe your proposal for ensuring proper test coverage, in particular if you personally aren't able to test what is by
definition
a custom layout?
Nono, this *isn't* a custom layout. It's a fairly standard RAID
setup.
All RAID setups are custom layouts insofar as they're only created with custom partitioning. There is only one default layout used by the installer.
But if we change the defaults, then I suppose that retrospetively *makes* it a "custom layout"... at least in the sense that we can reasonably expect it to keep breaking every release or two :(
It suggests it's not getting enough testing and asking someone else to test it probably will have no effect. Again, if there's a particular layout you want to make sure is working, you need to test it, or maybe write up an openQA test for it, so it can get automatically tested by a bot.
-- Chris Murphy _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org
Actually, I've got a concern about this feature.
I'm not sure what gnome-shell depends on for monitoring mdadm arrays, but I know it will put up a notification if an mdadm member becomes faulty; because I've seen this notification. Maybe it's getting this information from udisksd? In any case, I'm wondering whether the user will still be notified of device failures in gnome-shell with this change?
Chris Murphy
On Wed, 2017-02-01 at 12:25 -0700, Chris Murphy wrote:
Actually, I've got a concern about this feature.
I'm not sure what gnome-shell depends on for monitoring mdadm arrays, but I know it will put up a notification if an mdadm member becomes faulty; because I've seen this notification. Maybe it's getting this information from udisksd? In any case, I'm wondering whether the user will still be notified of device failures in gnome-shell with this change?
I'm not seeing any code taking care of MD RAID device monitoring in udisks2. So it has to be somewhere else. Any ideas where to look? gvfs maybe?
On Tue, Feb 21, 2017 at 6:34 AM, Vratislav Podzimek vpodzime@redhat.com wrote:
On Wed, 2017-02-01 at 12:25 -0700, Chris Murphy wrote:
Actually, I've got a concern about this feature.
I'm not sure what gnome-shell depends on for monitoring mdadm arrays, but I know it will put up a notification if an mdadm member becomes faulty; because I've seen this notification. Maybe it's getting this information from udisksd? In any case, I'm wondering whether the user will still be notified of device failures in gnome-shell with this change?
I'm not seeing any code taking care of MD RAID device monitoring in udisks2. So it has to be somewhere else. Any ideas where to look? gvfs maybe?
I'm not sure. With Fedora 25 this notification functionality is not happening. I'm not sure what or when the regression happens, or even if it's flawed testing. I did this in a virt-manager VM, and removing the virtual block device, booting I get no notification. If I boot with both devices and inside the VM I do:
# echo 1 > /sys/block/sdb/device/delete
I see complaints in kernel messages indicating the array is now degraded, and udisks does pick up this fact:
[ 80.911812] localhost.localdomain udisksd[1393]: Unable to resolve /sys/devices/virtual/block/md127/md/dev-sdb2/block symlink [ 80.912156] localhost.localdomain udisksd[1393]: Unable to resolve /sys/devices/virtual/block/md127/md/dev-sdb2/block symlink [ 80.912284] localhost.localdomain udisksd[1393]: Unable to resolve /sys/devices/virtual/block/md126/md/dev-sdb1/block symlink [ 80.912414] localhost.localdomain udisksd[1393]: Unable to resolve /sys/devices/virtual/block/md126/md/dev-sdb1/block symlink
But still no notification in GNOME Shell. I don't know if there's something GNOME Shell is expecting to get additionally from udisksd, if so it might actually be a storaged regression. In that case the solution might need to be abstracted from GNOME Shell in storaged anyway, because why reinvent this particular wheel in the DE? We have degradedness notifications needed for mdadm, LVM, and Btrfs (and even ZFS if we're asking for unicorns).
I guess I need to try this with Fedora 24 and see if it might be a storaged regression.
Tested Fedora 24 and Fedora 23, there's no notification with either one of those. I have no idea if this is a VM thing. Or if it's a regression.
Maybe the notification was depending on smartd rather than mdadm. Kinda need someone who knows more about how GNOME Shell handles faulty devices - how it's intended to work at least.
Chris Murphy
On Tue, 2017-02-21 at 14:29 -0700, Chris Murphy wrote:
Tested Fedora 24 and Fedora 23, there's no notification with either one of those. I have no idea if this is a VM thing. Or if it's a regression.
Maybe the notification was depending on smartd rather than mdadm. Kinda need someone who knows more about how GNOME Shell handles faulty devices - how it's intended to work at least.
SMART-signaled failures are propagated/signaled by udisksd.
On 02/22/2017 09:37 AM, Vratislav Podzimek wrote:
On Tue, 2017-02-21 at 14:29 -0700, Chris Murphy wrote:
Tested Fedora 24 and Fedora 23, there's no notification with either one of those. I have no idea if this is a VM thing. Or if it's a regression.
Maybe the notification was depending on smartd rather than mdadm. Kinda need someone who knows more about how GNOME Shell handles faulty devices - how it's intended to work at least.
SMART-signaled failures are propagated/signaled by udisksd.
SMART is very good but not conclusive: we've seen RAID failures where disks just die or become unresponsive, without anything wrong indicated in SMART. Backblaze did a series of hard drive reliability reports that document that as well; they are a great read if you haven't seen them: https://www.backblaze.com/blog/?s=reliability
On Wed, Feb 22, 2017 at 8:12 AM, Przemek Klosowski przemek.klosowski@nist.gov wrote:
On 02/22/2017 09:37 AM, Vratislav Podzimek wrote:
On Tue, 2017-02-21 at 14:29 -0700, Chris Murphy wrote:
Tested Fedora 24 and Fedora 23, there's no notification with either one of those. I have no idea if this is a VM thing. Or if it's a regression.
Maybe the notification was depending on smartd rather than mdadm. Kinda need someone who knows more about how GNOME Shell handles faulty devices - how it's intended to work at least.
SMART-signaled failures are propagated/signaled by udisksd.
SMART is very good but not conclusive: we've seen RAID failures where disks just die or become unresponsive, without anything wrong indicated in SMART. Backblaze did a series of hard drive reliability reports that document that as well; they are a great read if you haven't seen them: https://www.backblaze.com/blog/?s=reliability
I'm pretty certain the notification I got from GNOME Shell was related to the array itself. That suggests it's not a smartd initiated notification. But at the moment I can't reproduce this by deleting an array member device using sysfs. The array does go degraded, there are numerous kernel messages as such, but no GNOME notification.
On Tue, 2017-02-21 at 12:32 -0700, Chris Murphy wrote:
On Tue, Feb 21, 2017 at 6:34 AM, Vratislav Podzimek vpodzime@redhat.com wrote:
On Wed, 2017-02-01 at 12:25 -0700, Chris Murphy wrote:
Actually, I've got a concern about this feature.
I'm not sure what gnome-shell depends on for monitoring mdadm arrays, but I know it will put up a notification if an mdadm member becomes faulty; because I've seen this notification. Maybe it's getting this information from udisksd? In any case, I'm wondering whether the user will still be notified of device failures in gnome-shell with this change?
I'm not seeing any code taking care of MD RAID device monitoring in udisks2. So it has to be somewhere else. Any ideas where to look? gvfs maybe?
I'm not sure. With Fedora 25 this notification functionality is not happening. I'm not sure what or when the regression happens, or even if it's flawed testing. I did this in a virt-manager VM, and removing the virtual block device, booting I get no notification. If I boot with both devices and inside the VM I do:
# echo 1 > /sys/block/sdb/device/delete
I see complaints in kernel messages indicating the array is now degraded, and udisks does pick up this fact:
[ 80.911812] localhost.localdomain udisksd[1393]: Unable to resolve /sys/devices/virtual/block/md127/md/dev-sdb2/block symlink [ 80.912156] localhost.localdomain udisksd[1393]: Unable to resolve /sys/devices/virtual/block/md127/md/dev-sdb2/block symlink [ 80.912284] localhost.localdomain udisksd[1393]: Unable to resolve /sys/devices/virtual/block/md126/md/dev-sdb1/block symlink [ 80.912414] localhost.localdomain udisksd[1393]: Unable to resolve /sys/devices/virtual/block/md126/md/dev-sdb1/block symlink
Yeah, but that's not really figuring out that RAID is degraded. These are just error messages generated by udisksd still thinking the RAID is okay. So no signal emitted by udisksd here.
But still no notification in GNOME Shell. I don't know if there's something GNOME Shell is expecting to get additionally from udisksd, if so it might actually be a storaged regression. In that case the solution might need to be abstracted from GNOME Shell in storaged anyway, because why reinvent this particular wheel in the DE? We have degradedness notifications needed for mdadm, LVM, and Btrfs (and even ZFS if we're asking for unicorns).
Yeah, this needs some unified, generic solution. My suggestion was to use journal for this with some special key-value pair indicating that the message is actually a storage failure. That should be easy to do now that we have structured logging. And it could benefit from all the existing mechanisms developed for logging (outside of VM, log gathering,...).
I guess I need to try this with Fedora 24 and see if it might be a storaged regression.
That's very unlikely.
devel@lists.stg.fedoraproject.org