Hi,
Is anyone else seeing problems with F16 ext4? I recently moved my install onto a new harddisc (which is reporting SMART is fine after running a full test) and did that by making a new ext4 fs (previously ext3) then rsyncing across. I'm now seeing occasional system 'freezes' (not hard freezes, but programs can fail to start or freeze. Yesterday the desktop completely stopped, but I was able to switch to a virtual console). dmesg reports errors to do with failing to write the journal and remounting read-only. On reboot an fsck is required, which finds some fixable problems. Obviously I'm wondering if it's the OS or the new disc. Was planning an F18 install soon which might help with one problem but not the other. Don't know if it's the 'ext4 data corruption bug' or if that's been fixed. http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ
On 01/22/2013 12:08 PM, Ian Malone wrote:
Hi,
Is anyone else seeing problems with F16 ext4? I recently moved my install onto a new harddisc (which is reporting SMART is fine after running a full test) and did that by making a new ext4 fs (previously ext3) then rsyncing across. I'm now seeing occasional system 'freezes' (not hard freezes, but programs can fail to start or freeze. Yesterday the desktop completely stopped, but I was able to switch to a virtual console). dmesg reports errors to do with failing to write the journal and remounting read-only. On reboot an fsck is required, which finds some fixable problems. Obviously I'm wondering if it's the OS or the new disc. Was planning an F18 install soon which might help with one problem but not the other. Don't know if it's the 'ext4 data corruption bug' or if that's been fixed. http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ
Try running memtest for a few hours. If the machine is not completely stable (e.g. bad ram), your rsync copy could have damaged random bits in executable files and created more instability than before (when maybe it was not noticeable).
rsync some stuff multiple times when using -c. If files are magically changing with no reason, the RAM (or CPU, power supply, ...) is not ok.
On 22 January 2013 11:32, Roberto Ragusa mail@robertoragusa.it wrote:
On 01/22/2013 12:08 PM, Ian Malone wrote:
Hi,
Is anyone else seeing problems with F16 ext4? I recently moved my install onto a new harddisc (which is reporting SMART is fine after running a full test) and did that by making a new ext4 fs (previously ext3) then rsyncing across. I'm now seeing occasional system 'freezes' (not hard freezes, but programs can fail to start or freeze. Yesterday the desktop completely stopped, but I was able to switch to a virtual console). dmesg reports errors to do with failing to write the journal and remounting read-only. On reboot an fsck is required, which finds some fixable problems. Obviously I'm wondering if it's the OS or the new disc. Was planning an F18 install soon which might help with one problem but not the other. Don't know if it's the 'ext4 data corruption bug' or if that's been fixed. http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ
Try running memtest for a few hours. If the machine is not completely stable (e.g. bad ram), your rsync copy could have damaged random bits in executable files and created more instability than before (when maybe it was not noticeable).
rsync some stuff multiple times when using -c. If files are magically changing with no reason, the RAM (or CPU, power supply, ...) is not ok.
Thanks, I can try memtest for stability issues, though I'd be surprised if changes potentially introduced by rsync could affect the target filesystem stability in this way. Should be able to run an rpm verify as well to check for damage like that.
Ian Malone wrote:
Hi,
Is anyone else seeing problems with F16 ext4? I recently moved my install onto a new harddisc (which is reporting SMART is fine after running a full test) and did that by making a new ext4 fs (previously ext3) then rsyncing across. I'm now seeing occasional system 'freezes' (not hard freezes, but programs can fail to start or freeze. Yesterday the desktop completely stopped, but I was able to switch to a virtual console). dmesg reports errors to do with failing to write the journal and remounting read-only. On reboot an fsck is required, which finds some fixable problems. Obviously I'm wondering if it's the OS or the new disc. Was planning an F18 install soon which might help with one problem but not the other. Don't know if it's the 'ext4 data corruption bug' or if that's been fixed. http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ
I have multiple TB running on fc16+ext4, and have not seen any problems of any kind with the filesystem. I do occasionally see a hang, this happens with rsync if you copy something large to a machine with significant memory, at the end rsync does a sync() I believe (if it was doing fsync() I think the pain would be spread) and the system is emptying the i/o buffers to disk. A typical single disk will write at 50-100MB/s and if you have 4GB or more of backed up data, it will get slow. There are disk tuning tips you can use to spread the pain, but it makes the copy of small files run slower in many cases.
If you have a lot of small files, your journal may be full of incomplete transactions. The solution is to make the journal larger, I don't know if modifying the journal after the fact will change that, or even if it's the problem. I have played with putting journal on a faster device, 10k rpm drive, SSD, or memory (yes, just for testing, I know it's unsafe). Faster journal makes things faster with many small files.
Lots of thoughts for you to check against your particular problems.
On 22 January 2013 17:28, Bill Davidsen davidsen@tmr.com wrote:
Ian Malone wrote:
Hi,
Is anyone else seeing problems with F16 ext4? I recently moved my install onto a new harddisc (which is reporting SMART is fine after running a full test) and did that by making a new ext4 fs (previously ext3) then rsyncing across. I'm now seeing occasional system 'freezes' (not hard freezes, but programs can fail to start or freeze. Yesterday
I have multiple TB running on fc16+ext4, and have not seen any problems of any kind with the filesystem. I do occasionally see a hang, this happens with rsync if you copy something large to a machine with significant memory, at the end rsync does a sync() I believe (if it was doing fsync() I think the pain would be spread) and the system is emptying the i/o buffers to disk. A typical single disk will write at 50-100MB/s and if you have 4GB or more of backed up data, it will get slow. There are disk tuning tips you can use to spread the pain, but it makes the copy of small files run slower in many cases.
If you have a lot of small files, your journal may be full of incomplete transactions. The solution is to make the journal larger, I don't know if modifying the journal after the fact will change that, or even if it's the problem. I have played with putting journal on a faster device, 10k rpm drive, SSD, or memory (yes, just for testing, I know it's unsafe). Faster journal makes things faster with many small files.
Lots of thoughts for you to check against your particular problems.
Any way to check the journal for incomplete transactions? Except for immediately after a crash the parition comes up clean in an e2fsck from a livecd. It made three clean passes through memtest, so I don't think that's the issue.
On 22 January 2013 11:08, Ian Malone ibmalone@gmail.com wrote:
Hi,
Is anyone else seeing problems with F16 ext4? I recently moved my install onto a new harddisc (which is reporting SMART is fine after running a full test) and did that by making a new ext4 fs (previously ext3) then rsyncing across. I'm now seeing occasional system 'freezes' (not hard freezes, but programs can fail to start or freeze. Yesterday the desktop completely stopped, but I was able to switch to a virtual console). dmesg reports errors to do with failing to write the journal and remounting read-only. On reboot an fsck is required, which finds some fixable problems. Obviously I'm wondering if it's the OS or the new disc. Was planning an F18 install soon which might help with one problem but not the other. Don't know if it's the 'ext4 data corruption bug' or if that's been fixed. http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ
After a clean install of F18 I'm still having problems, though the symptoms are different. dmesg shows "READ FPDMA QUEUED" and "skipping hard reset" type errors. When this happens the entire desktop pauses for a while then recovers. The exact behaviour seems to have changed through the course of a number of kernel updates (most notably the system seems to recover stably after it happens now). It's either a hardware problem (though I've tried changing cables and the hard-disc mounting, port was working with the previous drive) or a kernel one (possibly buggy disc NCQ implementation needs workaround). Will file a bug, but thought I should follow up here.
On 4 Mar 2013 at 13:21, Ian Malone wrote:
Date sent: Mon, 4 Mar 2013 13:21:03 +0000 Subject: Re: fedora 16, ext4 corruption From: Ian Malone ibmalone@gmail.com To: Community support for Fedora users users@lists.fedoraproject.org
On 22 January 2013 11:08, Ian Malone ibmalone@gmail.com wrote:
Hi,
Is anyone else seeing problems with F16 ext4? I recently moved my install onto a new harddisc (which is reporting SMART is fine after running a full test) and did that by making a new ext4 fs (previously ext3) then rsyncing across. I'm now seeing occasional system 'freezes' (not hard freezes, but programs can fail to start or freeze. Yesterday the desktop completely stopped, but I was able to switch to a virtual console). dmesg reports errors to do with failing to write the journal and remounting read-only. On reboot an fsck is required, which finds some fixable problems. Obviously I'm wondering if it's the OS or the new disc. Was planning an F18 install soon which might help with one problem but not the other. Don't know if it's the 'ext4 data corruption bug' or if that's been fixed. http://www.phoronix.com/scan.php?page=news_item&px=MTIxNDQ
After a clean install of F18 I'm still having problems, though the symptoms are different. dmesg shows "READ FPDMA QUEUED" and "skipping hard reset" type errors. When this happens the entire desktop pauses for a while then recovers. The exact behaviour seems to have changed through the course of a number of kernel updates (most notably the system seems to recover stably after it happens now). It's either a hardware problem (though I've tried changing cables and the hard-disc mounting, port was working with the previous drive) or a kernel one (possibly buggy disc NCQ implementation needs workaround). Will file a bug, but thought I should follow up here.
--
Don't know if it is a similar problem, but I had a Fedora 16 machine that would find no smartctl error on a full scan, but was getting lots of smartctl error counts. Replaced the disk and was still getting the errors. Took the old disk, and put it in another system no new problems. Took the new disk, and same no problems. Finally put a PCI sata controller in the machine, and hooked the disk to it, and no more errors.
I am assuming it is a problem with the onboard controller, but not clear what or why there is not other error than the smartctl.
smartctl -a /dev/sda | grep "ATA Error Count"
The count got up to 2108, but since changing controller card has not reported any issues and number remains the same.
imalone http://ibmalone.blogspot.co.uk -- users mailing list users@lists.fedoraproject.org To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org
+----------------------------------------------------------+ Michael D. Setzer II - Computer Science Instructor Guam Community College Computer Center mailto:mikes@kuentos.guam.net mailto:msetzerii@gmail.com http://www.guam.net/home/mikes Guam - Where America's Day Begins G4L Disk Imaging Project maintainer http://sourceforge.net/projects/g4l/ +----------------------------------------------------------+
http://setiathome.berkeley.edu (Original) Number of Seti Units Returned: 19,471 Processing time: 32 years, 290 days, 12 hours, 58 minutes (Total Hours: 287,489)
BOINC@HOME CREDITS SETI 14037786.926241 | EINSTEIN 10213341.419852 ROSETTA 6334215.936318 | ABC 15897563.521593