Hi, this is an issue in Copr: https://bugzilla.redhat.com/show_bug.cgi?id=1268192 this happen rarely, but this is not first report. So I should address it somehow.
Google say: http://serverfault.com/questions/338439/ssh-sessions-terminate-abruptly-with... https://ask.openstack.org/en/question/25306/slow-network-speed-between-vm-an...
I hesitate to turn off checksumming, but switching TCP segmentation offload off gains some improvements. During several measurements I see gain from 15.2 sec to 14.6 sec (when transferring Fedora ISO). But I was unable to reproduce the packet corruption.
The question is - should I disable TCO on Copr machines only, or should I disable it in general VM spinup playbook for all our VM?
And this wiki: https://www.rdoproject.org/Using_GRE_tenant_networks#Offloading suggest to turn it off for physical hosts too. Not sure why.
I welcome your comments
Mirek
-------- Přeposlaná zpráva -------- Předmět: [Bug 1268192] New: Rsync fails with "Corrupted MAC on input. Disconnecting: Packet corrupt" Datum: Fri, 02 Oct 2015 05:27:49 +0000 Od: bugzilla@redhat.com Komu: msuchy@redhat.com
https://bugzilla.redhat.com/show_bug.cgi?id=1268192
Bug ID: 1268192 Summary: Rsync fails with "Corrupted MAC on input. Disconnecting: Packet corrupt" Product: Copr Component: backend Assignee: msuchy@redhat.com Reporter: redhatbugzilla@kyl191.net
Description of problem: copr builds are building successfully, but the final rsync fails with an error message.
Version-Release number of selected component (if applicable):
How reproducible: Random, occurred twice on Fedora Rawhide, once on EPEL 7, but I only did 5 builds.
Steps to Reproduce: 1. Start a copr build, possibly only on EPEL7/Fedora Rawhide 2. Wait 3. Build might be marked as failed despite rpm packages being in the fold afterwards.
Actual results: Some copr builds fail inexplicably in the middle of an rsync job.
Expected results: Rsync is successful
Additional info: Affected builds: https://copr-be.cloud.fedoraproject.org/results/kyl191/nginx-pagespeed/fedor...
https://copr-be.cloud.fedoraproject.org/results/kyl191/nginx-pagespeed/epel-...
https://copr-be.cloud.fedoraproject.org/results/kyl191/nginx-pagespeed/fedor...
On 2 October 2015 at 01:44, Miroslav Suchý msuchy@redhat.com wrote:
Hi, this is an issue in Copr: https://bugzilla.redhat.com/show_bug.cgi?id=1268192 this happen rarely, but this is not first report. So I should address it somehow.
Google say: http://serverfault.com/questions/338439/ssh-sessions-terminate-abruptly-with... https://ask.openstack.org/en/question/25306/slow-network-speed-between-vm-an...
I hesitate to turn off checksumming, but switching TCP segmentation offload off gains some improvements. During several measurements I see gain from 15.2 sec to 14.6 sec (when transferring Fedora ISO). But I was unable to reproduce the packet corruption.
The question is - should I disable TCO on Copr machines only, or should I disable it in general VM spinup playbook for all our VM?
I would turn it off on Copr machiens only. If other systems see problems it can be hard to realize "oh that is happening on all boxes" late in the game. If we know we have isolated it to one set of systems it is better to do so.
And this wiki: https://www.rdoproject.org/Using_GRE_tenant_networks#Offloading suggest to turn it off for physical hosts too. Not sure why.
That is if we are using GRE in the networks. Are we? If we are it does make sense because the GRE in the kernel relies on dealing with an 'uncorrupted' packet which offloading does.
I welcome your comments
Mirek
-------- Přeposlaná zpráva -------- Předmět: [Bug 1268192] New: Rsync fails with "Corrupted MAC on input. Disconnecting: Packet corrupt" Datum: Fri, 02 Oct 2015 05:27:49 +0000 Od: bugzilla@redhat.com Komu: msuchy@redhat.com
https://bugzilla.redhat.com/show_bug.cgi?id=1268192
Bug ID: 1268192 Summary: Rsync fails with "Corrupted MAC on input. Disconnecting: Packet corrupt" Product: Copr Component: backend Assignee: msuchy@redhat.com Reporter: redhatbugzilla@kyl191.net
Description of problem: copr builds are building successfully, but the final rsync fails with an error message.
Version-Release number of selected component (if applicable):
How reproducible: Random, occurred twice on Fedora Rawhide, once on EPEL 7, but I only did 5 builds.
Steps to Reproduce:
- Start a copr build, possibly only on EPEL7/Fedora Rawhide
- Wait
- Build might be marked as failed despite rpm packages being in the fold
afterwards.
Actual results: Some copr builds fail inexplicably in the middle of an rsync job.
Expected results: Rsync is successful
Additional info: Affected builds: https://copr-be.cloud.fedoraproject.org/results/kyl191/nginx-pagespeed/fedor...
https://copr-be.cloud.fedoraproject.org/results/kyl191/nginx-pagespeed/epel-...
https://copr-be.cloud.fedoraproject.org/results/kyl191/nginx-pagespeed/fedor...
-- You are receiving this mail because: You are the assignee for the bug.
infrastructure mailing list infrastructure@lists.fedoraproject.org http://lists.fedoraproject.org/postorius/infrastructure@lists.fedoraproject....
On Fri, 2 Oct 2015 09:32:08 -0600 Stephen John Smoogen smooge@gmail.com wrote:
On 2 October 2015 at 01:44, Miroslav Suchý msuchy@redhat.com wrote:
Hi, this is an issue in Copr: https://bugzilla.redhat.com/show_bug.cgi?id=1268192 this happen rarely, but this is not first report. So I should address it somehow.
Google say: http://serverfault.com/questions/338439/ssh-sessions-terminate-abruptly-with... https://ask.openstack.org/en/question/25306/slow-network-speed-between-vm-an...
I hesitate to turn off checksumming, but switching TCP segmentation offload off gains some improvements. During several measurements I see gain from 15.2 sec to 14.6 sec (when transferring Fedora ISO). But I was unable to reproduce the packet corruption.
The question is - should I disable TCO on Copr machines only, or should I disable it in general VM spinup playbook for all our VM?
I would turn it off on Copr machiens only. If other systems see problems it can be hard to realize "oh that is happening on all boxes" late in the game. If we know we have isolated it to one set of systems it is better to do so.
Yeah, I agree. Start as small as possible and move to more machines from there if it doesn't solve the issues.
Also, I wonder if this is worth reporting as a bug so we could someday get a fix in openstack packages?
And this wiki: https://www.rdoproject.org/Using_GRE_tenant_networks#Offloading suggest to turn it off for physical hosts too. Not sure why.
That is if we are using GRE in the networks. Are we? If we are it does make sense because the GRE in the kernel relies on dealing with an 'uncorrupted' packet which offloading does.
Yeah, I think neutron does use gre tunnels... but I am not fully clear how.
kevin
Dne 3.10.2015 v 19:02 Kevin Fenzi napsal(a):
On Fri, 2 Oct 2015 09:32:08 -0600 Stephen John Smoogen smooge@gmail.com wrote:
I would turn it off on Copr machiens only. If other systems see problems it can be hard to realize "oh that is happening on all boxes" late in the game. If we know we have isolated it to one set of systems it is better to do so.
Yeah, I agree. Start as small as possible and move to more machines from there if it doesn't solve the issues.
OK. I will use this for Copr machines only for now.
That is if we are using GRE in the networks. Are we? If we are it does make sense because the GRE in the kernel relies on dealing with an 'uncorrupted' packet which offloading does.
Yeah, I think neutron does use gre tunnels... but I am not fully clear how.
Yes, it use GRE: $ grep CONFIG_NEUTRON_OVS_TENANT_NETWORK_TYPE ./files/fedora-cloud/pakstack-controller-answers.txt CONFIG_NEUTRON_OVS_TENANT_NETWORK_TYPE=gre
# grep tenant_network_types /etc/neutron/plugin.ini tenant_network_types = gre
infrastructure@lists.fedoraproject.org