[Added CCs for similar ecryptfs warning] On Thu 23-09-10 12:38:49, Andrew Morton wrote:
This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the warning. [...] device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342 /dev/mapper/vg_cesarbinspiro-lv_home SELinux: initialized (dev dm-3, type btrfs), uses xattr ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d() Hardware name: Inspiron N4010 Dirtiable inode bdi default != sb bdi btrfs Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64 aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8 Call Trace: [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48 [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177 [<ffffffff81127168>] touch_atime+0x107/0x12a [<ffffffff81122384>] ? filldir+0x0/0xd0 [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4 [<ffffffff8112270b>] sys_getdents+0x81/0xd1 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Thanks for the report. These bdi pointers are a mess. As Chris pointed out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info for directories and special inodes and thus these were previously attached to default_backing_dev_info which probably isn't what Chris would like to see. I've also got a similar report for ecryptfs which also does not initialize inode->i_mapping.backing_dev_info although it sets sb->s_bdi and thus again its inodes get filed to default_backing_dev_info lists. Quick search seems to reveal that other filesystems using handcrafted bdi's get it wrong as well and thus their inodes end up in the default_backing_dev_info lists which is generally undesirable (this was happening already before my patch, my code just started complaining about that). That suggests that we should probably handle such cases in a more generic way by changing the code in inode_init_always(). The patch below makes at least btrfs happy for me... Could you maybe test it? Thanks.
Honza
On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
[Added CCs for similar ecryptfs warning] On Thu 23-09-10 12:38:49, Andrew Morton wrote:
This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the warning. [...] device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342 /dev/mapper/vg_cesarbinspiro-lv_home SELinux: initialized (dev dm-3, type btrfs), uses xattr ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d() Hardware name: Inspiron N4010 Dirtiable inode bdi default != sb bdi btrfs Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64 aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8 Call Trace: [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48 [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177 [<ffffffff81127168>] touch_atime+0x107/0x12a [<ffffffff81122384>] ? filldir+0x0/0xd0 [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4 [<ffffffff8112270b>] sys_getdents+0x81/0xd1 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Thanks for the report. These bdi pointers are a mess. As Chris pointed out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info for directories and special inodes and thus these were previously attached to default_backing_dev_info which probably isn't what Chris would like to see.
There's no actual writeback for these, so it works fine for btrfs either way.
I've also got a similar report for ecryptfs which also does not initialize inode->i_mapping.backing_dev_info although it sets sb->s_bdi and thus again its inodes get filed to default_backing_dev_info lists. Quick search seems to reveal that other filesystems using handcrafted bdi's get it wrong as well and thus their inodes end up in the default_backing_dev_info lists which is generally undesirable (this was happening already before my patch, my code just started complaining about that). That suggests that we should probably handle such cases in a more generic way by changing the code in inode_init_always(). The patch below makes at least btrfs happy for me... Could you maybe test it? Thanks.
Christoph had a slightly different plan for this, I've cc'd him and kept the patch below for comment.
-chris
Honza
-- Jan Kara jack@suse.cz SUSE Labs, CR
From 29f60c2b08ff9637a10439d1513805835ddcc746 Mon Sep 17 00:00:00 2001 From: Jan Kara jack@suse.cz Date: Mon, 27 Sep 2010 23:56:48 +0200 Subject: [PATCH] bdi: Initialize inode->i_mapping.backing_dev_info to sb->s_bdi
Currently, we initialize inode->i_mapping.backing_dev_info to the bdi of device sb->s_bdev points to. However there is quite a big number of filesystems that do not set sb->s_bdev (because they do not have one) but do set sb->s_bdi. These filesystems would generally benefit from setting inode->i_mapping.backing_dev_info to their s_bdi because otherwise their inodes would point to default_backing_dev_info and thus dirty inode tracking would happen there. So change inode initialization code to use sb->s_bdi if it is available.
Signed-off-by: Jan Kara jack@suse.cz
fs/inode.c | 22 ++++++++++++++-------- 1 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c index 8646433..e415be4 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -172,15 +172,21 @@ int inode_init_always(struct super_block *sb, struct inode *inode) mapping->writeback_index = 0;
/*
* If the block_device provides a backing_dev_info for client
* inodes then use that. Otherwise the inode share the bdev's
* backing_dev_info.
* If the filesystem provides a backing_dev_info for client inodes
*/* then use that. Otherwise inodes share default_backing_dev_info.
- if (sb->s_bdev) {
struct backing_dev_info *bdi;
bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
mapping->backing_dev_info = bdi;
- if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
/*
* Catch cases where filesystem might be bitten by using s_bdi
* instead of sb->s_bdev. Can be removed in 2.6.38.
*/
if (sb->s_bdev) {
struct backing_dev_info *bdi =
sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
WARN(bdi != sb->s_bdi, "s_bdev bdi %s != s_bdi %s\n",
bdi->name, sb->s_bdi->name);
}
} inode->i_private = NULL; inode->i_mapping = mapping;mapping->backing_dev_info = sb->s_bdi;
-- 1.6.4.2
On Mon 27-09-10 18:54:52, Chris Mason wrote:
On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
[Added CCs for similar ecryptfs warning] On Thu 23-09-10 12:38:49, Andrew Morton wrote:
This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the warning. [...] device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342 /dev/mapper/vg_cesarbinspiro-lv_home SELinux: initialized (dev dm-3, type btrfs), uses xattr ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d() Hardware name: Inspiron N4010 Dirtiable inode bdi default != sb bdi btrfs Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64 aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8 Call Trace: [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48 [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177 [<ffffffff81127168>] touch_atime+0x107/0x12a [<ffffffff81122384>] ? filldir+0x0/0xd0 [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4 [<ffffffff8112270b>] sys_getdents+0x81/0xd1 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Thanks for the report. These bdi pointers are a mess. As Chris pointed out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info for directories and special inodes and thus these were previously attached to default_backing_dev_info which probably isn't what Chris would like to see.
There's no actual writeback for these, so it works fine for btrfs either way.
Ah, OK. I see you do all the writing already at inode dirtying time so you don't really care what happens after the dirtying for inodes without page cache. So it's really just a cosmetic issue for btrfs.
I've also got a similar report for ecryptfs which also does not initialize inode->i_mapping.backing_dev_info although it sets sb->s_bdi and thus again its inodes get filed to default_backing_dev_info lists. Quick search seems to reveal that other filesystems using handcrafted bdi's get it wrong as well and thus their inodes end up in the default_backing_dev_info lists which is generally undesirable (this was happening already before my patch, my code just started complaining about that). That suggests that we should probably handle such cases in a more generic way by changing the code in inode_init_always(). The patch below makes at least btrfs happy for me... Could you maybe test it? Thanks.
Christoph had a slightly different plan for this, I've cc'd him and kept the patch below for comment.
OK, thanks.
Honza
From 29f60c2b08ff9637a10439d1513805835ddcc746 Mon Sep 17 00:00:00 2001 From: Jan Kara jack@suse.cz Date: Mon, 27 Sep 2010 23:56:48 +0200 Subject: [PATCH] bdi: Initialize inode->i_mapping.backing_dev_info to sb->s_bdi
Currently, we initialize inode->i_mapping.backing_dev_info to the bdi of device sb->s_bdev points to. However there is quite a big number of filesystems that do not set sb->s_bdev (because they do not have one) but do set sb->s_bdi. These filesystems would generally benefit from setting inode->i_mapping.backing_dev_info to their s_bdi because otherwise their inodes would point to default_backing_dev_info and thus dirty inode tracking would happen there. So change inode initialization code to use sb->s_bdi if it is available.
Signed-off-by: Jan Kara jack@suse.cz
fs/inode.c | 22 ++++++++++++++-------- 1 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c index 8646433..e415be4 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -172,15 +172,21 @@ int inode_init_always(struct super_block *sb, struct inode *inode) mapping->writeback_index = 0;
/*
* If the block_device provides a backing_dev_info for client
* inodes then use that. Otherwise the inode share the bdev's
* backing_dev_info.
* If the filesystem provides a backing_dev_info for client inodes
*/* then use that. Otherwise inodes share default_backing_dev_info.
- if (sb->s_bdev) {
struct backing_dev_info *bdi;
bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
mapping->backing_dev_info = bdi;
- if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
/*
* Catch cases where filesystem might be bitten by using s_bdi
* instead of sb->s_bdev. Can be removed in 2.6.38.
*/
if (sb->s_bdev) {
struct backing_dev_info *bdi =
sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
WARN(bdi != sb->s_bdi, "s_bdev bdi %s != s_bdi %s\n",
bdi->name, sb->s_bdi->name);
}
} inode->i_private = NULL; inode->i_mapping = mapping;mapping->backing_dev_info = sb->s_bdi;
-- 1.6.4.2
On Mon, 2010-09-27 at 18:54 -0400, Chris Mason wrote:
On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
[Added CCs for similar ecryptfs warning] On Thu 23-09-10 12:38:49, Andrew Morton wrote:
This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the warning. [...] device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342 /dev/mapper/vg_cesarbinspiro-lv_home SELinux: initialized (dev dm-3, type btrfs), uses xattr ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d() Hardware name: Inspiron N4010 Dirtiable inode bdi default != sb bdi btrfs Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64 aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8 Call Trace: [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48 [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177 [<ffffffff81127168>] touch_atime+0x107/0x12a [<ffffffff81122384>] ? filldir+0x0/0xd0 [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4 [<ffffffff8112270b>] sys_getdents+0x81/0xd1 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Thanks for the report. These bdi pointers are a mess. As Chris pointed out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info for directories and special inodes and thus these were previously attached to default_backing_dev_info which probably isn't what Chris would like to see.
There's no actual writeback for these, so it works fine for btrfs either way.
Side note: every time inode is marked as dirty, we wake up a bdi thread or the default bdi thread. So if we have inodes which do not need write-back, we should never mark them as dirty.
On Tue 28-09-10 10:05:49, Artem Bityutskiy wrote:
On Mon, 2010-09-27 at 18:54 -0400, Chris Mason wrote:
On Tue, Sep 28, 2010 at 12:25:48AM +0200, Jan Kara wrote:
[Added CCs for similar ecryptfs warning] On Thu 23-09-10 12:38:49, Andrew Morton wrote:
This started appearing for me on v2.6.36-rc5-49-gc79bd89; it did not happen on v2.6.36-rc5-33-g1ce1e41, probably because it does not have commit 692ebd17c2905313fff3c504c249c6a0faad16ec which introduces the warning. [...] device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342 /dev/mapper/vg_cesarbinspiro-lv_home SELinux: initialized (dev dm-3, type btrfs), uses xattr ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d() Hardware name: Inspiron N4010 Dirtiable inode bdi default != sb bdi btrfs Modules linked in: ipv6 kvm_intel kvm uinput arc4 ecb snd_hda_codec_intelhdmi snd_hda_codec_realtek iwlagn snd_hda_intel iwlcore snd_hda_codec uvcvideo snd_hwdep mac80211 videodev snd_seq snd_seq_device v4l1_compat snd_pcm atl1c v4l2_compat_ioctl32 btusb cfg80211 snd_timer i2c_i801 bluetooth iTCO_wdt dell_wmi dell_laptop snd pcspkr wmi dcdbas shpchp iTCO_vendor_support soundcore snd_page_alloc rfkill joydev microcode btrfs zlib_deflate libcrc32c cryptd aes_x86_64 aes_generic xts gf128mul dm_crypt usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan] Pid: 1073, comm: find Not tainted 2.6.36-rc5+ #8 Call Trace: [<ffffffff8104d0e4>] warn_slowpath_common+0x85/0x9d [<ffffffff8104d19f>] warn_slowpath_fmt+0x46/0x48 [<ffffffff811308b7>] inode_to_bdi+0x62/0x6d [<ffffffff81131b48>] __mark_inode_dirty+0xd0/0x177 [<ffffffff81127168>] touch_atime+0x107/0x12a [<ffffffff81122384>] ? filldir+0x0/0xd0 [<ffffffff8112259b>] vfs_readdir+0x8d/0xb4 [<ffffffff8112270b>] sys_getdents+0x81/0xd1 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Thanks for the report. These bdi pointers are a mess. As Chris pointed out, btrfs forgets to properly initialize inode->i_mapping.backing_dev_info for directories and special inodes and thus these were previously attached to default_backing_dev_info which probably isn't what Chris would like to see.
There's no actual writeback for these, so it works fine for btrfs either way.
Side note: every time inode is marked as dirty, we wake up a bdi thread or the default bdi thread. So if we have inodes which do not need write-back, we should never mark them as dirty.
Are you sure? I think we wake up the thread only when it's the first dirty inode for the bdi... And a side side note ;): It's harder not to dirty the inode than it seems. E.g. btrfs (or similarly ext3) add the new inode data to the journal already at inode dirty time but still they need to track that the transaction carrying the inode is still uncommitted. Thus the inode *is* dirty in some sense. Only it does not need any writeout to happen...
Honza
Here is the patch that I already proposed a while ago. I've tested xfstests on btrfs and xfstests to make sure the btrfs issue is fixed, and I've also tested the original dirtying of device files issue and I/O operations on block device files to test the special case in the patch.
--- From: Christoph Hellwig hch@lst.de Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
We currently use struct backing_dev_info for various different purposes. Originally it was introduced to describe a backing device which includes an unplug and congestion function and various bits of readahead information and VM-relevant flags. We're also using for tracking dirty inodes for writeback.
To make writeback properly find all inodes we need to only access the per-filesystem backing_device pointed to by the superblock in ->s_bdi inside the writeback code, and not the instances pointeded to by inode->i_mapping->backing_dev which can be overriden by special devices or might not be set at all by some filesystems.
Long term we should split out the writeback-relevant bits of struct backing_device_info (which includes more than the current bdi_writeback) and only point to it from the superblock while leaving the traditional backing device as a separate structure that can be overriden by devices.
The one exception for now is the block device filesystem which really wants different writeback contexts for it's different (internal) inodes to handle the writeout more efficiently. For now we do this with a hack in fs-writeback.c because we're so late in the cycle, but in the future I plan to replace this with a superblock method that allows for multiple writeback contexts per filesystem.
Signed-off-by: Christoph Hellwig hch@lst.de
Index: linux-2.6/fs/fs-writeback.c =================================================================== --- linux-2.6.orig/fs/fs-writeback.c 2010-09-29 16:58:41.750557721 +0900 +++ linux-2.6/fs/fs-writeback.c 2010-09-29 17:11:35.040557719 +0900 @@ -72,22 +72,10 @@ int writeback_in_progress(struct backing static inline struct backing_dev_info *inode_to_bdi(struct inode *inode) { struct super_block *sb = inode->i_sb; - struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
- /* - * For inodes on standard filesystems, we use superblock's bdi. For - * inodes on virtual filesystems, we want to use inode mapping's bdi - * because they can possibly point to something useful (think about - * block_dev filesystem). - */ - if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) { - /* Some device inodes could play dirty tricks. Catch them... */ - WARN(bdi != sb->s_bdi && bdi_cap_writeback_dirty(bdi), - "Dirtiable inode bdi %s != sb bdi %s\n", - bdi->name, sb->s_bdi->name); - return sb->s_bdi; - } - return bdi; + if (strcmp(sb->s_type->name, "bdev") == 0) + return inode->i_mapping->backing_dev_info; + return sb->s_bdi; }
static void bdi_queue_work(struct backing_dev_info *bdi,
On Wed 29-09-10 10:19:36, Christoph Hellwig wrote:
From: Christoph Hellwig hch@lst.de Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
...
The one exception for now is the block device filesystem which really wants different writeback contexts for it's different (internal) inodes to handle the writeout more efficiently. For now we do this with a hack in fs-writeback.c because we're so late in the cycle, but in the future I plan to replace this with a superblock method that allows for multiple writeback contexts per filesystem.
Another exception I know about is mtd_inodefs filesystem (drivers/mtd/mtdchar.c).
Signed-off-by: Christoph Hellwig hch@lst.de
Index: linux-2.6/fs/fs-writeback.c
--- linux-2.6.orig/fs/fs-writeback.c 2010-09-29 16:58:41.750557721 +0900 +++ linux-2.6/fs/fs-writeback.c 2010-09-29 17:11:35.040557719 +0900 @@ -72,22 +72,10 @@ int writeback_in_progress(struct backing static inline struct backing_dev_info *inode_to_bdi(struct inode *inode) { struct super_block *sb = inode->i_sb;
struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info;
/*
* For inodes on standard filesystems, we use superblock's bdi. For
* inodes on virtual filesystems, we want to use inode mapping's bdi
* because they can possibly point to something useful (think about
* block_dev filesystem).
*/
if (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) {
/* Some device inodes could play dirty tricks. Catch them... */
WARN(bdi != sb->s_bdi && bdi_cap_writeback_dirty(bdi),
"Dirtiable inode bdi %s != sb bdi %s\n",
bdi->name, sb->s_bdi->name);
return sb->s_bdi;
}
return bdi;
- if (strcmp(sb->s_type->name, "bdev") == 0)
return inode->i_mapping->backing_dev_info;
- return sb->s_bdi;
So at least here you'd need also add a similar exception for "mtd_inodefs". Because of these exeptions I've chosen the (sb->s_bdi && sb->s_bdi != &noop_backing_dev_info) check rather than your exception based check. All in all I don't care much what ends up in the kernel as it's just a temporary solution... Also I've added the warning to catch situations where inodes would get filed to a different backing device after the patch. So far the reported warnings were harmless but still I'm more comfortable when it's there because otherwise we can so easily miss some device-driver-invented filesystem like mtd_inodefs which would break silently after the change...
Honza
On Wed, Sep 29, 2010 at 02:18:08PM +0200, Jan Kara wrote:
On Wed 29-09-10 10:19:36, Christoph Hellwig wrote:
From: Christoph Hellwig hch@lst.de Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
...
The one exception for now is the block device filesystem which really wants different writeback contexts for it's different (internal) inodes to handle the writeout more efficiently. For now we do this with a hack in fs-writeback.c because we're so late in the cycle, but in the future I plan to replace this with a superblock method that allows for multiple writeback contexts per filesystem.
Another exception I know about is mtd_inodefs filesystem (drivers/mtd/mtdchar.c).
No, it's not. MTD only has three different backing_dev_info instances which have different flags in the mapping-relevant portion of the backing_dev.
So at least here you'd need also add a similar exception for "mtd_inodefs".
No. For one thing we don't need any exception for correctnes alone - even the block device variant would work fine with the default case. But for mtd specificly we don't need an exception for performance either given that there are no per-device bdis in mtd.
On Wed 29-09-10 16:10:06, Christoph Hellwig wrote:
On Wed, Sep 29, 2010 at 02:18:08PM +0200, Jan Kara wrote:
On Wed 29-09-10 10:19:36, Christoph Hellwig wrote:
From: Christoph Hellwig hch@lst.de Subject: [PATCH] writeback: always use sb->s_bdi for writeback purposes
...
The one exception for now is the block device filesystem which really wants different writeback contexts for it's different (internal) inodes to handle the writeout more efficiently. For now we do this with a hack in fs-writeback.c because we're so late in the cycle, but in the future I plan to replace this with a superblock method that allows for multiple writeback contexts per filesystem.
Another exception I know about is mtd_inodefs filesystem (drivers/mtd/mtdchar.c).
No, it's not. MTD only has three different backing_dev_info instances which have different flags in the mapping-relevant portion of the backing_dev.
In the end I agree I was probably wrong but it's not that simple ;)
So at least here you'd need also add a similar exception for "mtd_inodefs".
No. For one thing we don't need any exception for correctnes alone - even the block device variant would work fine with the default case.
Here I don't agree. If you don't have some kind of exception, sb->s_bdi for both "block" and "mtd_inodefs" filesystems points to noop_backing_dev_info and you get no writeback for that one. So it isn't just a performance issue but also a correctness one.
Regarding mtd_inodefs I now looked in more detail what MTD actually does and it seems to me that MTD device inodes do not seem to carry any cached state that flusher threads could write back. So returning noop_backing_dev_info might be the right thing for them after all... (added David Woodhouse and MTD list to CC so that they can shout if it's not the case). Coming to this conclusion, I'm happy with your patch going in as is... Honza
On Thu, Sep 30, 2010 at 01:38:07AM +0200, Jan Kara wrote:
No. For one thing we don't need any exception for correctnes alone - even the block device variant would work fine with the default case.
Here I don't agree. If you don't have some kind of exception, sb->s_bdi for both "block" and "mtd_inodefs" filesystems points to noop_backing_dev_info and you get no writeback for that one. So it isn't just a performance issue but also a correctness one.
Indeed - for internal filesystems that require writeback the change causes trouble if they haven't registered a s_bdi. But for all user visible filesystems that doesn't happen as we require s_bdi for sync or even unmounts to work.
Em 27-09-2010 19:25, Jan Kara escreveu:
[Added CCs for similar ecryptfs warning] On Thu 23-09-10 12:38:49, Andrew Morton wrote:
[...] device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342 /dev/mapper/vg_cesarbinspiro-lv_home SELinux: initialized (dev dm-3, type btrfs), uses xattr ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d() Hardware name: Inspiron N4010 Dirtiable inode bdi default != sb bdi btrfs
That suggests that we should probably handle such cases in a more generic way by changing the code in inode_init_always(). The patch below makes at least btrfs happy for me... Could you maybe test it? Thanks.
Applied on top of v2.6.36-rc5-151-g32163f4, running it right now. The warning messages no longer happen, and everything seems to be working fine.
Tested-by: Cesar Eduardo Barros cesarb@cesarb.net
On Mon 27-09-10 20:55:45, Cesar Eduardo Barros wrote:
Em 27-09-2010 19:25, Jan Kara escreveu:
[Added CCs for similar ecryptfs warning] On Thu 23-09-10 12:38:49, Andrew Morton wrote:
[...] device fsid 44d595920ddedfa-3ece6b56e80f689e devid 1 transid 22342 /dev/mapper/vg_cesarbinspiro-lv_home SELinux: initialized (dev dm-3, type btrfs), uses xattr ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x62/0x6d() Hardware name: Inspiron N4010 Dirtiable inode bdi default != sb bdi btrfs
That suggests that we should probably handle such cases in a more generic way by changing the code in inode_init_always(). The patch below makes at least btrfs happy for me... Could you maybe test it? Thanks.
Applied on top of v2.6.36-rc5-151-g32163f4, running it right now. The warning messages no longer happen, and everything seems to be working fine.
Tested-by: Cesar Eduardo Barros cesarb@cesarb.net
Great, thanks for testing.
Honza
kernel@lists.fedoraproject.org