So, last weekend we rebooted bvirthost09 and db-koji01.
It helped somewhat. Database dumps are back to a reasonable few hours.
However, it's still got high load and occasionally alerts and also now it's sometimes causing builders to stop talking to the hub. (They timeout and just stop checking in).
I've asked netapp folks to look and see if they can see any problems with the iscsi lun that guest is on, but they say they are not aware of any issues.
I do see some packet dropping on db-koji01:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 52:54:00:06:90:a4 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 65269562146 369724151 0 128685 0 0 TX: bytes packets errors dropped carrier collsns 395224051163 377221287 0 0 0 0
My only ideas at this point:
a) run another postgresql vacuum analyze. Perhaps the first one made some poor choices and another one would make things happier. In any case that shouldn't make things any worse.
b) Switch the network card on db-koji01 to e1000 instead of virtio-net. This really shouldn't be needed, but perhaps we are hitting some weird virtio-net bug. This would require a short outage.
c) Some other brilliant idea. ;)
kevin
Hi!
I had some similar issue some time ago on VMware. Try turning off the hardware offloading in the VM with ethtool - no outage. Maybe it helps... I think something like ethtool -k and then rx/tx/gso.
-of (mobile)
Am 14.04.2015 um 19:24 schrieb Kevin Fenzi <kevin@scrye.com mailto:kevin@scrye.com >:
So, last weekend we rebooted bvirthost09 and db-koji01.
It helped somewhat. Database dumps are back to a reasonable few hours.
However, it's still got high load and occasionally alerts and also now it's sometimes causing builders to stop talking to the hub. (They timeout and just stop checking in).
I've asked netapp folks to look and see if they can see any problems with the iscsi lun that guest is on, but they say they are not aware of any issues.
I do see some packet dropping on db-koji01:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 52:54:00:06:90:a4 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 65269562146 369724151 0 128685 0 0 TX: bytes packets errors dropped carrier collsns 395224051163 377221287 0 0 0 0
My only ideas at this point:
a) run another postgresql vacuum analyze. Perhaps the first one made some poor choices and another one would make things happier. In any case that shouldn't make things any worse.
b) Switch the network card on db-koji01 to e1000 instead of virtio-net. This really shouldn't be needed, but perhaps we are hitting some weird virtio-net bug. This would require a short outage.
c) Some other brilliant idea. ;)
kevin _______________________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org mailto:infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
<inline.txt>
On Tue, 14 Apr 2015 20:49:28 +0200 Oliver Falk oliver@linux-kernel.at wrote:
Hi!
I had some similar issue some time ago on VMware. Try turning off the hardware offloading in the VM with ethtool - no outage. Maybe it helps... I think something like ethtool -k and then rx/tx/gso.
I tried playing with this a bit yesterday, but it didn't seem to matter much. ;(
There is actually not all that much i/o going on. It's mostly cpu and lots of context switches, etc.
I guess I will fire off a vacuum analyze (since that shouldn't hurt anything or cause any problems) and just wait until we are out of freeze to schedule an outage and try a bunch of the more invasive things.
kevin
Hi!
Well, it was worth a try I think... Sorry to hear it didn't help.
Do the drops increase? What kernel is the machine running?
-of (mobile)
Am 15.04.2015 um 18:16 schrieb Kevin Fenzi <kevin@scrye.com mailto:kevin@scrye.com >:
On Tue, 14 Apr 2015 20:49:28 +0200 Oliver Falk <oliver@linux-kernel.at mailto:oliver@linux-kernel.at > wrote:
Hi!
I had some similar issue some time ago on VMware. Try turning off the hardware offloading in the VM with ethtool - no outage. Maybe it helps... I think something like ethtool -k and then rx/tx/gso.
I tried playing with this a bit yesterday, but it didn't seem to matter much. ;(
There is actually not all that much i/o going on. It's mostly cpu and lots of context switches, etc.
I guess I will fire off a vacuum analyze (since that shouldn't hurt anything or cause any problems) and just wait until we are out of freeze to schedule an outage and try a bunch of the more invasive things.
kevin _______________________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org mailto:infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
<inline.txt>
On Wed, 15 Apr 2015 18:20:29 +0200 Oliver Falk oliver@linux-kernel.at wrote:
Hi!
Well, it was worth a try I think... Sorry to hear it didn't help.
Do the drops increase?
Nope, seemed to be pretty steady. A few packets every few seconds.
What kernel is the machine running?
Both guest and host are running the latest rhel7 kernel: 3.10.0-229.1.2.el7.x86_64
I did find:
http://bugs.centos.org/view.php?id=8030
but no idea if that has been addressed or is the same thing we are seeing, but it kind of looks similar.
kevin
ok. Next thing... I notice that some queries hitting particularly hard are from spiders.
Back in the puppet days, koji disallowed all spidering.
I never have seen useful search results from it.
So, I'd like to block all the spiders and see if that helps out load issues.
robots.txt would have:
User-agent: * Disallow: /
+1s?
kevin -- diff --git a/roles/koji_hub/files/kojiweb.conf b/roles/koji_hub/files/kojiweb.conf index 86abd2e..8334274 100644 --- a/roles/koji_hub/files/kojiweb.conf +++ b/roles/koji_hub/files/kojiweb.conf @@ -6,6 +6,8 @@ KeepAlive On Alias /koji "/usr/share/koji-web/scripts/wsgi_publisher.py" #(configuration goes in /etc/kojiweb/web.conf)
+Alias /robots.txt /var/www/html/robots.txt + <Directory "/usr/share/koji-web/scripts/"> Options ExecCGI SetHandler wsgi-script diff --git a/roles/koji_hub/tasks/main.yml b/roles/koji_hub/tasks/main.yml index 0b6cd82..a5fc795 100644 --- a/roles/koji_hub/tasks/main.yml +++ b/roles/koji_hub/tasks/main.yml @@ -171,6 +171,13 @@ notify: restart httpd when: env != "staging"
+- name: koji robots.txt config + copy: src=robots.txt dest=/var/www/html/robots.txt + tags: + - config + - koji_hub + notify: restart httpd + - name: kojira log dir file: dest=/var/log/kojira owner=root group=root mode=0750 state=directory tags:
+1 from me, though you might need to add a <Directory /var/www/html> with allow from all.
----- Original Message -----
ok. Next thing... I notice that some queries hitting particularly hard are from spiders.
Back in the puppet days, koji disallowed all spidering.
I never have seen useful search results from it.
So, I'd like to block all the spiders and see if that helps out load issues.
robots.txt would have:
User-agent: * Disallow: /
+1s?
kevin
diff --git a/roles/koji_hub/files/kojiweb.conf b/roles/koji_hub/files/kojiweb.conf index 86abd2e..8334274 100644 --- a/roles/koji_hub/files/kojiweb.conf +++ b/roles/koji_hub/files/kojiweb.conf @@ -6,6 +6,8 @@ KeepAlive On Alias /koji "/usr/share/koji-web/scripts/wsgi_publisher.py" #(configuration goes in /etc/kojiweb/web.conf)
+Alias /robots.txt /var/www/html/robots.txt
<Directory "/usr/share/koji-web/scripts/"> Options ExecCGI SetHandler wsgi-script diff --git a/roles/koji_hub/tasks/main.yml b/roles/koji_hub/tasks/main.yml index 0b6cd82..a5fc795 100644 --- a/roles/koji_hub/tasks/main.yml +++ b/roles/koji_hub/tasks/main.yml @@ -171,6 +171,13 @@ notify: restart httpd when: env != "staging"
+- name: koji robots.txt config
- copy: src=robots.txt dest=/var/www/html/robots.txt
- tags:
- config
- koji_hub
- notify: restart httpd
- name: kojira log dir file: dest=/var/log/kojira owner=root group=root mode=0750
state=directory tags:
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
+100 from me.
On 16 April 2015 at 14:23, Patrick Uiterwijk puiterwijk@redhat.com wrote:
+1 from me, though you might need to add a <Directory /var/www/html> with allow from all.
----- Original Message -----
ok. Next thing... I notice that some queries hitting particularly hard are from spiders.
Back in the puppet days, koji disallowed all spidering.
I never have seen useful search results from it.
So, I'd like to block all the spiders and see if that helps out load issues.
robots.txt would have:
User-agent: * Disallow: /
+1s?
kevin
diff --git a/roles/koji_hub/files/kojiweb.conf b/roles/koji_hub/files/kojiweb.conf index 86abd2e..8334274 100644 --- a/roles/koji_hub/files/kojiweb.conf +++ b/roles/koji_hub/files/kojiweb.conf @@ -6,6 +6,8 @@ KeepAlive On Alias /koji "/usr/share/koji-web/scripts/wsgi_publisher.py" #(configuration goes in /etc/kojiweb/web.conf)
+Alias /robots.txt /var/www/html/robots.txt
<Directory "/usr/share/koji-web/scripts/"> Options ExecCGI SetHandler wsgi-script diff --git a/roles/koji_hub/tasks/main.yml b/roles/koji_hub/tasks/main.yml index 0b6cd82..a5fc795 100644 --- a/roles/koji_hub/tasks/main.yml +++ b/roles/koji_hub/tasks/main.yml @@ -171,6 +171,13 @@ notify: restart httpd when: env != "staging"
+- name: koji robots.txt config
- copy: src=robots.txt dest=/var/www/html/robots.txt
- tags:
- config
- koji_hub
- notify: restart httpd
- name: kojira log dir file: dest=/var/log/kojira owner=root group=root mode=0750
state=directory tags:
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
infrastructure@lists.fedoraproject.org