nfs problem

List overview All Threads
Download

newer

older

kernel errors after updating to...

Migrating an existing system with...

Eyal Lebedinsky

16 Apr 2017 16 Apr '17

2:13 a.m.

I asked on AskFedora and got no response. Hoping this list is more active. https://ask.fedoraproject.org/en/question/104010/nfs-showing-bad-data-f24/

I have a f24 workstation (nfs client). It is updated regularly. The server is f19, so no updates there.

A problem started in the last few weeks. When I examine a file on the server it looks OK. The file is updated every minute (collecting some stats) and 'tail' shows the added lines as they arrive.

Examining the same file from the workstation is initially OK, but then, as the file grows, I get binary zeroes at the end of the file. The same with tail, less, vi, etc. The amount of zeroes seems to be the size of the extra data appended to the file. 'ls' shows the actual (full) size of the file. It looks like the utilities see the correct size but bad data is delivered at the tail.

After a few minutes the full data is showing but then, as the file grows, again I get zeroes for a few minutes.

Where do I look next?

TIA

-- Eyal Lebedinsky (eyal@eyal.emu.id.au)

Show replies by date

stan

16 Apr 16 Apr

10:21 a.m.

On Sun, 16 Apr 2017 17:13:19 +1000 Eyal Lebedinsky fedora@eyal.emu.id.au wrote:

...

I asked on AskFedora and got no response. Hoping this list is more

It *is* Easter weekend.

...

active. https://ask.fedoraproject.org/en/question/104010/nfs-showing-bad-data-f24/

I have a f24 workstation (nfs client). It is updated regularly. The server is f19, so no updates there.

A problem started in the last few weeks. When I examine a file on the server it looks OK. The file is updated every minute (collecting some stats) and 'tail' shows the added lines as they arrive.

Examining the same file from the workstation is initially OK, but then, as the file grows, I get binary zeroes at the end of the file. The same with tail, less, vi, etc. The amount of zeroes seems to be the size of the extra data appended to the file. 'ls' shows the actual (full) size of the file. It looks like the utilities see the correct size but bad data is delivered at the tail.

After a few minutes the full data is showing but then, as the file grows, again I get zeroes for a few minutes.

Where do I look next?

I don't know nfs, but this sounds like a buffering issue. On the server, the update occurs immediately, so you see the data. On your workstation, nfs knows an update has occurred by the size, but hasn't sent the data yet, waiting for enough to accumulate, so it displays it as zeros. Just a wild guess that seems to fit your experience.

Eyal Lebedinsky

6:02 p.m.

On 17/04/17 01:21, stan wrote:

...

On Sun, 16 Apr 2017 17:13:19 +1000 Eyal Lebedinsky fedora@eyal.emu.id.au wrote:

...
I asked on AskFedora and got no response. Hoping this list is more

It *is* Easter weekend.

And it is. I asked over a week ago though, on 7/Apr.

[trim]

-- Eyal Lebedinsky (fedora@eyal.emu.id.au)

InvalidPath

17 Apr 17 Apr

8 a.m.

On Apr 16, 2017 5:03 PM, "Eyal Lebedinsky" fedora@eyal.emu.id.au wrote:

On 17/04/17 01:21, stan wrote:

...

On Sun, 16 Apr 2017 17:13:19 +1000 Eyal Lebedinsky fedora@eyal.emu.id.au wrote:

I asked on AskFedora and got no response. Hoping this list is more

...
It *is* Easter weekend.

And it is. I asked over a week ago though, on 7/Apr.

[trim]

-- Eyal Lebedinsky (fedora@eyal.emu.id.au) _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Have tou tried rolling back that last nfs update? If the server is running f19 then its old already. You wont lose anything but rolling back.

Eyal Lebedinsky

9:57 a.m.

On 17/04/17 23:00, InvalidPath wrote:

...

On Apr 16, 2017 5:03 PM, "Eyal Lebedinsky" <fedora@eyal.emu.id.au mailto:fedora@eyal.emu.id.au> wrote:

On 17/04/17 01:21, stan wrote:

    On Sun, 16 Apr 2017 17:13:19 +1000
    Eyal Lebedinsky <fedora@eyal.emu.id.au <mailto:fedora@eyal.emu.id.au>> wrote:

        I asked on AskFedora and got no response. Hoping this list is more


    It *is* Easter weekend.


And it is. I asked over a week ago though, on 7/Apr.

[trim]

--
Eyal Lebedinsky (fedora@eyal.emu.id.au <mailto:fedora@eyal.emu.id.au>)

Have tou tried rolling back that last nfs update?

I did now. It made no difference :-(

...

If the server is running f19 then its old already. You wont lose anything but rolling back.

Yes it is very old, but there was no problem until a few weeks back.

I should probably try older kernels though. 4.10.8-100.fc24.x86_64 broken (my latest) 4.10.6-100.fc24.x86_64 broken 4.9.17-100.fc24.x86_64 good I see 4.10.9-100.fc24.x86_64 is available but I will not update so as to not loose the good kernel (only the last three are kept).

I need to find where fedora bug reports go and see if this issue was reported, or maybe even fixed. [later] Now logged as https://bugzilla.redhat.com/show_bug.cgi?id=1442797

cheers

-- Eyal Lebedinsky (fedora@eyal.emu.id.au)

InvalidPath

16 Apr 16 Apr

10:31 a.m.

On Apr 16, 2017 1:14 AM, "Eyal Lebedinsky" fedora@eyal.emu.id.au wrote:

I asked on AskFedora and got no response. Hoping this list is more active. https://ask.fedoraproject.org/en/question/104010/nfs-showing -bad-data-f24/

I have a f24 workstation (nfs client). It is updated regularly. The server is f19, so no updates there.

A problem started in the last few weeks. When I examine a file on the server it looks OK. The file is updated every minute (collecting some stats) and 'tail' shows the added lines as they arrive.

After a few minutes the full data is showing but then, as the file grows, again I get zeroes for a few minutes.

Where do I look next?

TIA

-- Eyal Lebedinsky (eyal@eyal.emu.id.au) _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org What does your export config look like?

Eyal Lebedinsky

5:52 p.m.

On 17/04/17 01:31, InvalidPath wrote:

...

On Apr 16, 2017 1:14 AM, "Eyal Lebedinsky" <fedora@eyal.emu.id.au mailto:fedora@eyal.emu.id.au> wrote:

I asked on AskFedora and got no response. Hoping this list is more active.
        https://ask.fedoraproject.org/en/question/104010/nfs-showing-bad-data-f24/ <https://ask.fedoraproject.org/en/question/104010/nfs-showing-bad-data-f24/>

I have a f24 workstation (nfs client). It is updated regularly.
The server is f19, so no updates there.

A problem started in the last few weeks. When I examine a file on the server it
looks OK. The file is updated every minute (collecting some stats) and 'tail' shows
the added lines as they arrive.

Examining the same file from the workstation is initially OK, but then, as the file
grows, I get binary zeroes at the end of the file. The same with tail, less, vi, etc.
The amount of zeroes seems to be the size of the extra data appended to the file.
'ls' shows the actual (full) size of the file. It looks like the utilities see the
correct size but bad data is delivered at the tail.

After a few minutes the full data is showing but then, as the file grows, again I get
zeroes for a few minutes.

Where do I look next?

TIA

--
Eyal Lebedinsky (eyal@eyal.emu.id.au <mailto:eyal@eyal.emu.id.au>)
_______________________________________________
users mailing list -- users@lists.fedoraproject.org <mailto:users@lists.fedoraproject.org>
To unsubscribe send an email to users-leave@lists.fedoraproject.org <mailto:users-leave@lists.fedoraproject.org>

What does your export config look like?

server (.7) ===========

$ cat /etc/exports /data 192.168.3.0/24(rw,async)

$ sudo exportfs -v /data 192.168.3.0/24(rw,async,wdelay,root_squash,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)

client (.4) ===========

$cat /etc/fstab files:/data /data-e7 nfs soft 0 0

$ mount ======= files:/data on /data-e7 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.3.4,local_lock=none,addr=192.168.3.7)

-- Eyal Lebedinsky (fedora@eyal.emu.id.au)

Eyal Lebedinsky

5:56 p.m.

On 17/04/17 01:31, InvalidPath wrote:

...

On Apr 16, 2017 1:14 AM, "Eyal Lebedinsky" <fedora@eyal.emu.id.au mailto:fedora@eyal.emu.id.au> wrote:

I asked on AskFedora and got no response. Hoping this list is more active.
        https://ask.fedoraproject.org/en/question/104010/nfs-showing-bad-data-f24/ <https://ask.fedoraproject.org/en/question/104010/nfs-showing-bad-data-f24/>

I have a f24 workstation (nfs client). It is updated regularly.
The server is f19, so no updates there.

A problem started in the last few weeks. When I examine a file on the server it
looks OK. The file is updated every minute (collecting some stats) and 'tail' shows
the added lines as they arrive.

Examining the same file from the workstation is initially OK, but then, as the file
grows, I get binary zeroes at the end of the file. The same with tail, less, vi, etc.
The amount of zeroes seems to be the size of the extra data appended to the file.
'ls' shows the actual (full) size of the file. It looks like the utilities see the
correct size but bad data is delivered at the tail.

After a few minutes the full data is showing but then, as the file grows, again I get
zeroes for a few minutes.

Where do I look next?

TIA

--
Eyal Lebedinsky (eyal@eyal.emu.id.au <mailto:eyal@eyal.emu.id.au>)
_______________________________________________
users mailing list -- users@lists.fedoraproject.org <mailto:users@lists.fedoraproject.org>
To unsubscribe send an email to users-leave@lists.fedoraproject.org <mailto:users-leave@lists.fedoraproject.org>

What does your export config look like?

I should emphasize that this only started recently, either through an update, recently: Mar 30 23:14:38 DEBUG ---> Package nfs-utils.x86_64 1:1.3.4-1.rc3.fc24 will be upgraded Mar 30 23:14:38 DEBUG ---> Package nfs-utils.x86_64 1:1.3.4-2.rc3.fc24 will be an upgrade

or some change on the client (I am not aware of any).

-- Eyal Lebedinsky (fedora@eyal.emu.id.au)

Eyal Lebedinsky

2 Jun 2 Jun

5:36 a.m.

On 16/04/17 17:13, Eyal Lebedinsky wrote:

...

I asked on AskFedora and got no response. Hoping this list is more active. https://ask.fedoraproject.org/en/question/104010/nfs-showing-bad-data-f24/

I last (17/04) reported the problem on redhat bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1442797 I got no reaction since. I updated the report with a test script and also verified I have the same problem using the latest f25 (4.11.3-100.fc25). See the details there.

Is the bugzilla active at all?

Does anyone have a server running an old kernel (3.8, 3.9) that they can test against? I would like to confirm that this is not just my own setup having the issue.

TIA

-- Eyal Lebedinsky (eyal@eyal.emu.id.au) -- Eyal Lebedinsky (fedora@eyal.emu.id.au)

Roger Heflin

6:48 a.m.

If the machine mounting the file and doing the tail has read from the file and there is new data added in that last block and because of the rate the data is coming into the file the timestamp on the file does not change then the client nfs host will not know that the last block has changed and will not know to reread it (it is already in cache). If it is this bug/feature nfs has worked this way I think pretty much forever at a larger scale (2 hosts each writing every other block, if the timestamp does not change then each node will see the others blocks as empty because of cache, at least until the timestamp changes from what it knows it wrote). The trick my previous job implemented was to make sure the timestamp on the file moved ahead at least one second so that the clients knew the file changed. but if tail is actively reading it while things are getting written into it I don't see a way it would be able to work that well.

What you are describing sounds like a variant of this issue.

On Fri, Jun 2, 2017 at 5:36 AM, Eyal Lebedinsky fedora@eyal.emu.id.au wrote:

...

On 16/04/17 17:13, Eyal Lebedinsky wrote:

...
I asked on AskFedora and got no response. Hoping this list is more active.

https://ask.fedoraproject.org/en/question/104010/nfs-showing-bad-data-f24/

I last (17/04) reported the problem on redhat bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1442797 I got no reaction since. I updated the report with a test script and also verified I have the same problem using the latest f25 (4.11.3-100.fc25). See the details there.

Is the bugzilla active at all?

Does anyone have a server running an old kernel (3.8, 3.9) that they can test against? I would like to confirm that this is not just my own setup having the issue.

TIA

-- Eyal Lebedinsky (eyal@eyal.emu.id.au)

-- Eyal Lebedinsky (fedora@eyal.emu.id.au) _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org

Eyal Lebedinsky

7:07 a.m.

On 02/06/17 21:48, Roger Heflin wrote:

...

If the machine mounting the file and doing the tail has read from the file and there is new data added in that last block and because of the rate the data is coming into the file the timestamp on the file does not change then the client nfs host will not know that the last block has changed and will not know to reread it (it is already in cache). If it is this bug/feature nfs has worked this way I think pretty much forever at a larger scale (2 hosts each writing every other block, if the timestamp does not change then each node will see the others blocks as empty because of cache, at least until the timestamp changes from what it knows it wrote). The trick my previous job implemented was to make sure the timestamp on the file moved ahead at least one second so that the clients knew the file changed. but if tail is actively reading it while things are getting written into it I don't see a way it would be able to work that well.

What you are describing sounds like a variant of this issue.

Thanks Roger,

Interesting, though I wonder why it worked very well until the latest kernel series (3.10/3.11) which started showing the problem. Looks like a new "feature" to me.

BTW, the server is also the time server and the two are well synchronised. When a zero block shows up it can take a minute or two before the real data shows up. I use 'less' to view the file, hit refresh (Shift-G) and soon a line of zeroes comes along. I kept refreshing for a few minutes until the good data shows.

When I originally notices the problem (a monitoring script started showing garbage), the monitored file was updated once a minute and it needed to be updated two or three times before the real data was exported.

which I consider rather a long time for a file to present wrong content (over nfs).

Maybe there is an export (or mount) option I can use?

Also, I could not find a reference to this problem when I investigated the issue initially, and as such I assumed it is my setup. But the server (f19) had no updates or changes for a long while. It is clearly the new kernels exposing this, and I tested more that one client machine to verify that they also show the issue.

Eyal

...

On Fri, Jun 2, 2017 at 5:36 AM, Eyal Lebedinsky fedora@eyal.emu.id.au wrote:

...
On 16/04/17 17:13, Eyal Lebedinsky wrote:

...
I asked on AskFedora and got no response. Hoping this list is more active.

https://ask.fedoraproject.org/en/question/104010/nfs-showing-bad-data-f24/

I last (17/04) reported the problem on redhat bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1442797 I got no reaction since. I updated the report with a test script and also verified I have the same problem using the latest f25 (4.11.3-100.fc25). See the details there.

Is the bugzilla active at all?

Does anyone have a server running an old kernel (3.8, 3.9) that they can test against? I would like to confirm that this is not just my own setup having the issue.

TIA

-- Eyal Lebedinsky (eyal@eyal.emu.id.au)

-- Eyal Lebedinsky (fedora@eyal.emu.id.au)

-- Eyal Lebedinsky (fedora@eyal.emu.id.au)

Rick Stevens

12:13 p.m.

On 06/02/2017 05:07 AM, Eyal Lebedinsky wrote:

...

On 02/06/17 21:48, Roger Heflin wrote:

...
If the machine mounting the file and doing the tail has read from the file and there is new data added in that last block and because of the rate the data is coming into the file the timestamp on the file does not change then the client nfs host will not know that the last block has changed and will not know to reread it (it is already in cache). If it is this bug/feature nfs has worked this way I think pretty much forever at a larger scale (2 hosts each writing every other block, if the timestamp does not change then each node will see the others blocks as empty because of cache, at least until the timestamp changes from what it knows it wrote). The trick my previous job implemented was to make sure the timestamp on the file moved ahead at least one second so that the clients knew the file changed. but if tail is actively reading it while things are getting written into it I don't see a way it would be able to work that well.

What you are describing sounds like a variant of this issue.

Thanks Roger,

Interesting, though I wonder why it worked very well until the latest kernel series (3.10/3.11) which started showing the problem. Looks like a new "feature" to me.

BTW, the server is also the time server and the two are well synchronised. When a zero block shows up it can take a minute or two before the real data shows up. I use 'less' to view the file, hit refresh (Shift-G) and soon a line of zeroes comes along. I kept refreshing for a few minutes until the good data shows.

When I originally notices the problem (a monitoring script started showing garbage), the monitored file was updated once a minute and it needed to be updated two or three times before the real data was exported.

which I consider rather a long time for a file to present wrong content (over nfs).

Maybe there is an export (or mount) option I can use?

Also, I could not find a reference to this problem when I investigated the issue initially, and as such I assumed it is my setup. But the server (f19) had no updates or changes for a long while. It is clearly the new kernels exposing this, and I tested more that one client machine to verify that they also show the issue.

Newer kernels use NFSv4 by default. I can't remember what F19 uses natively or if it has issues with NFSv4 clients (it may not really implement NFSv4 properly or improperly negotiates protocol changes). You might try forcing NFSv3 mounts and see if that clears the problem.

You may want to look at the "noac" option on the clients as well as the "acregmin", "acregmax", "acdirmin", "acdirmax" and "actimeo" values (see "man 5 nfs"). Defaults and such have changed with different kernels and perhaps there's some incompatibility. ---------------------------------------------------------------------- - Rick Stevens, Systems Engineer, AllDigital ricks@alldigital.com - - AIM/Skype: therps2 ICQ: 226437340 Yahoo: origrps2 - - - - Cuteness can be overcome through sufficient bastardry - - --Mark 'Kamikaze' Hughes - ----------------------------------------------------------------------

Eyal Lebedinsky

6:51 p.m.

New subject: nfs problem [solved]

On 03/06/17 03:13, Rick Stevens wrote:

...

On 06/02/2017 05:07 AM, Eyal Lebedinsky wrote:

...
On 02/06/17 21:48, Roger Heflin wrote:

...
If the machine mounting the file and doing the tail has read from the file and there is new data added in that last block and because of the rate the data is coming into the file the timestamp on the file does not change then the client nfs host will not know that the last block has changed and will not know to reread it (it is already in cache). If it is this bug/feature nfs has worked this way I think pretty much forever at a larger scale (2 hosts each writing every other block, if the timestamp does not change then each node will see the others blocks as empty because of cache, at least until the timestamp changes from what it knows it wrote). The trick my previous job implemented was to make sure the timestamp on the file moved ahead at least one second so that the clients knew the file changed. but if tail is actively reading it while things are getting written into it I don't see a way it would be able to work that well.

What you are describing sounds like a variant of this issue.

Thanks Roger,

Interesting, though I wonder why it worked very well until the latest kernel series (3.10/3.11) which started showing the problem. Looks like a new "feature" to me.

BTW, the server is also the time server and the two are well synchronised. When a zero block shows up it can take a minute or two before the real data shows up. I use 'less' to view the file, hit refresh (Shift-G) and soon a line of zeroes comes along. I kept refreshing for a few minutes until the good data shows.

When I originally notices the problem (a monitoring script started showing garbage), the monitored file was updated once a minute and it needed to be updated two or three times before the real data was exported.

which I consider rather a long time for a file to present wrong content (over nfs).

Maybe there is an export (or mount) option I can use?

Also, I could not find a reference to this problem when I investigated the issue initially, and as such I assumed it is my setup. But the server (f19) had no updates or changes for a long while. It is clearly the new kernels exposing this, and I tested more that one client machine to verify that they also show the issue.

Newer kernels use NFSv4 by default. I can't remember what F19 uses natively or if it has issues with NFSv4 clients (it may not really implement NFSv4 properly or improperly negotiates protocol changes). You might try forcing NFSv3 mounts and see if that clears the problem.

Hi Rick,

Good call. I set mount option 'nfsvers=3' and the problem went away. kernel 3.9 probably did not implement v4 that well.

To be sure, I mounted one fs as nfsvers=3 and another as the default (mount says 4.1) and the problem does not show on the first but does show on the second.

Thanks Eyal

...

You may want to look at the "noac" option on the clients as well as the "acregmin", "acregmax", "acdirmin", "acdirmax" and "actimeo" values (see "man 5 nfs"). Defaults and such have changed with different kernels and perhaps there's some incompatibility.
Rick Stevens, Systems Engineer, AllDigital ricks@alldigital.com -

AIM/Skype: therps2 ICQ: 226437340 Yahoo: origrps2 -
                                                               -
 Cuteness can be overcome through sufficient bastardry         -
                                    --Mark 'Kamikaze' Hughes   -
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org

-- Eyal Lebedinsky (fedora@eyal.emu.id.au)

Rick Stevens

8:05 p.m.

New subject: nfs problem [solved]

On 06/02/2017 04:51 PM, Eyal Lebedinsky wrote:

...

On 03/06/17 03:13, Rick Stevens wrote:

...
On 06/02/2017 05:07 AM, Eyal Lebedinsky wrote:

...
On 02/06/17 21:48, Roger Heflin wrote:

...
If the machine mounting the file and doing the tail has read from the file and there is new data added in that last block and because of the rate the data is coming into the file the timestamp on the file does not change then the client nfs host will not know that the last block has changed and will not know to reread it (it is already in cache). If it is this bug/feature nfs has worked this way I think pretty much forever at a larger scale (2 hosts each writing every other block, if the timestamp does not change then each node will see the others blocks as empty because of cache, at least until the timestamp changes from what it knows it wrote). The trick my previous job implemented was to make sure the timestamp on the file moved ahead at least one second so that the clients knew the file changed. but if tail is actively reading it while things are getting written into it I don't see a way it would be able to work that well.

What you are describing sounds like a variant of this issue.

Thanks Roger,

Interesting, though I wonder why it worked very well until the latest kernel series (3.10/3.11) which started showing the problem. Looks like a new "feature" to me.

BTW, the server is also the time server and the two are well synchronised. When a zero block shows up it can take a minute or two before the real data shows up. I use 'less' to view the file, hit refresh (Shift-G) and soon a line of zeroes comes along. I kept refreshing for a few minutes until the good data shows.

When I originally notices the problem (a monitoring script started showing garbage), the monitored file was updated once a minute and it needed to be updated two or three times before the real data was exported.

which I consider rather a long time for a file to present wrong content (over nfs).

Maybe there is an export (or mount) option I can use?

Also, I could not find a reference to this problem when I investigated the issue initially, and as such I assumed it is my setup. But the server (f19) had no updates or changes for a long while. It is clearly the new kernels exposing this, and I tested more that one client machine to verify that they also show the issue.

Newer kernels use NFSv4 by default. I can't remember what F19 uses natively or if it has issues with NFSv4 clients (it may not really implement NFSv4 properly or improperly negotiates protocol changes). You might try forcing NFSv3 mounts and see if that clears the problem.

Hi Rick,

Good call. I set mount option 'nfsvers=3' and the problem went away. kernel 3.9 probably did not implement v4 that well.

To be sure, I mounted one fs as nfsvers=3 and another as the default (mount says 4.1) and the problem does not show on the first but does show on the second.

Thanks Eyal

Glad you got it sorted out. There's a lot of changes between V3 and V4 (permissions being a BIG one). The caching mechanisms and record/file locking are others that can bite you (V4 does the latter quite a bit better than V3). ---------------------------------------------------------------------- - Rick Stevens, Systems Engineer, AllDigital ricks@alldigital.com - - AIM/Skype: therps2 ICQ: 226437340 Yahoo: origrps2 - - - - Careful! Ugly strikes 9 out of 10 people! - ----------------------------------------------------------------------

2528

Age (days ago)

2576

Last active (days ago)

users@lists.fedoraproject.org

13 comments

5 participants

tags (0)

participants (5)

Eyal Lebedinsky
InvalidPath
Rick Stevens
Roger Heflin
stan