db-koji01 slowness

List overview All Threads
Download

newer

older

Retrospective freeze break: Block...

Updating puppet.txt with Ansible...

Kevin Fenzi

14 Apr 2015 14 Apr '15

12:23 p.m.

So, last weekend we rebooted bvirthost09 and db-koji01.

It helped somewhat. Database dumps are back to a reasonable few hours.

However, it's still got high load and occasionally alerts and also now it's sometimes causing builders to stop talking to the hub. (They timeout and just stop checking in).

I've asked netapp folks to look and see if they can see any problems with the iscsi lun that guest is on, but they say they are not aware of any issues.

I do see some packet dropping on db-koji01:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 52:54:00:06:90:a4 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 65269562146 369724151 0 128685 0 0 TX: bytes packets errors dropped carrier collsns 395224051163 377221287 0 0 0 0

My only ideas at this point:

a) run another postgresql vacuum analyze. Perhaps the first one made some poor choices and another one would make things happier. In any case that shouldn't make things any worse.

b) Switch the network card on db-koji01 to e1000 instead of virtio-net. This really shouldn't be needed, but perhaps we are hitting some weird virtio-net bug. This would require a short outage.

c) Some other brilliant idea. ;)

kevin

Attachments:

attachment.sig (application/pgp-signature — 819 bytes)

Show replies by date

Oliver Falk

14 Apr 14 Apr

1:49 p.m.

Hi!

I had some similar issue some time ago on VMware. Try turning off the hardware offloading in the VM with ethtool - no outage. Maybe it helps... I think something like ethtool -k and then rx/tx/gso.

-of (mobile)

Am 14.04.2015 um 19:24 schrieb Kevin Fenzi <kevin@scrye.com mailto:kevin@scrye.com >:

So, last weekend we rebooted bvirthost09 and db-koji01.

It helped somewhat. Database dumps are back to a reasonable few hours.

However, it's still got high load and occasionally alerts and also now it's sometimes causing builders to stop talking to the hub. (They timeout and just stop checking in).

I've asked netapp folks to look and see if they can see any problems with the iscsi lun that guest is on, but they say they are not aware of any issues.

I do see some packet dropping on db-koji01:

My only ideas at this point:

a) run another postgresql vacuum analyze. Perhaps the first one made some poor choices and another one would make things happier. In any case that shouldn't make things any worse.

b) Switch the network card on db-koji01 to e1000 instead of virtio-net. This really shouldn't be needed, but perhaps we are hitting some weird virtio-net bug. This would require a short outage.

c) Some other brilliant idea. ;)

kevin _______________________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org mailto:infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure

<inline.txt>

Kevin Fenzi

15 Apr 15 Apr

11:16 a.m.

On Tue, 14 Apr 2015 20:49:28 +0200 Oliver Falk oliver@linux-kernel.at wrote:

...

Hi!

I had some similar issue some time ago on VMware. Try turning off the hardware offloading in the VM with ethtool - no outage. Maybe it helps... I think something like ethtool -k and then rx/tx/gso.

I tried playing with this a bit yesterday, but it didn't seem to matter much. ;(

There is actually not all that much i/o going on. It's mostly cpu and lots of context switches, etc.

I guess I will fire off a vacuum analyze (since that shouldn't hurt anything or cause any problems) and just wait until we are out of freeze to schedule an outage and try a bunch of the more invasive things.

kevin

Oliver Falk

11:20 a.m.

Hi!

Well, it was worth a try I think... Sorry to hear it didn't help.

Do the drops increase? What kernel is the machine running?

-of (mobile)

Am 15.04.2015 um 18:16 schrieb Kevin Fenzi <kevin@scrye.com mailto:kevin@scrye.com >:

On Tue, 14 Apr 2015 20:49:28 +0200 Oliver Falk <oliver@linux-kernel.at mailto:oliver@linux-kernel.at > wrote:

...

Hi!

I had some similar issue some time ago on VMware. Try turning off the hardware offloading in the VM with ethtool - no outage. Maybe it helps... I think something like ethtool -k and then rx/tx/gso.

I tried playing with this a bit yesterday, but it didn't seem to matter much. ;(

There is actually not all that much i/o going on. It's mostly cpu and lots of context switches, etc.

<inline.txt>

Kevin Fenzi

11:51 a.m.

On Wed, 15 Apr 2015 18:20:29 +0200 Oliver Falk oliver@linux-kernel.at wrote:

...

Hi!

Well, it was worth a try I think... Sorry to hear it didn't help.

Do the drops increase?

Nope, seemed to be pretty steady. A few packets every few seconds.

...

What kernel is the machine running?

Both guest and host are running the latest rhel7 kernel: 3.10.0-229.1.2.el7.x86_64

I did find:

http://bugs.centos.org/view.php?id=8030

but no idea if that has been addressed or is the same thing we are seeing, but it kind of looks similar.

kevin

Kevin Fenzi

16 Apr 16 Apr

3:17 p.m.

ok. Next thing... I notice that some queries hitting particularly hard are from spiders.

Back in the puppet days, koji disallowed all spidering.

I never have seen useful search results from it.

So, I'd like to block all the spiders and see if that helps out load issues.

robots.txt would have:

User-agent: * Disallow: /

+1s?

kevin -- diff --git a/roles/koji_hub/files/kojiweb.conf b/roles/koji_hub/files/kojiweb.conf index 86abd2e..8334274 100644 --- a/roles/koji_hub/files/kojiweb.conf +++ b/roles/koji_hub/files/kojiweb.conf @@ -6,6 +6,8 @@ KeepAlive On Alias /koji "/usr/share/koji-web/scripts/wsgi_publisher.py" #(configuration goes in /etc/kojiweb/web.conf)

+Alias /robots.txt /var/www/html/robots.txt + <Directory "/usr/share/koji-web/scripts/"> Options ExecCGI SetHandler wsgi-script diff --git a/roles/koji_hub/tasks/main.yml b/roles/koji_hub/tasks/main.yml index 0b6cd82..a5fc795 100644 --- a/roles/koji_hub/tasks/main.yml +++ b/roles/koji_hub/tasks/main.yml @@ -171,6 +171,13 @@ notify: restart httpd when: env != "staging"

+- name: koji robots.txt config + copy: src=robots.txt dest=/var/www/html/robots.txt + tags: + - config + - koji_hub + notify: restart httpd + - name: kojira log dir file: dest=/var/log/kojira owner=root group=root mode=0750 state=directory tags:

Patrick Uiterwijk

3:23 p.m.

+1 from me, though you might need to add a <Directory /var/www/html> with allow from all.

----- Original Message -----

...

ok. Next thing... I notice that some queries hitting particularly hard are from spiders.

Back in the puppet days, koji disallowed all spidering.

I never have seen useful search results from it.

So, I'd like to block all the spiders and see if that helps out load issues.

robots.txt would have:

User-agent: * Disallow: /

+1s?

kevin

diff --git a/roles/koji_hub/files/kojiweb.conf b/roles/koji_hub/files/kojiweb.conf index 86abd2e..8334274 100644 --- a/roles/koji_hub/files/kojiweb.conf +++ b/roles/koji_hub/files/kojiweb.conf @@ -6,6 +6,8 @@ KeepAlive On Alias /koji "/usr/share/koji-web/scripts/wsgi_publisher.py" #(configuration goes in /etc/kojiweb/web.conf)

+Alias /robots.txt /var/www/html/robots.txt

<Directory "/usr/share/koji-web/scripts/"> Options ExecCGI SetHandler wsgi-script diff --git a/roles/koji_hub/tasks/main.yml b/roles/koji_hub/tasks/main.yml index 0b6cd82..a5fc795 100644 --- a/roles/koji_hub/tasks/main.yml +++ b/roles/koji_hub/tasks/main.yml @@ -171,6 +171,13 @@ notify: restart httpd when: env != "staging"

+- name: koji robots.txt config

copy: src=robots.txt dest=/var/www/html/robots.txt

tags:

config

koji_hub

notify: restart httpd

name: kojira log dir file: dest=/var/log/kojira owner=root group=root mode=0750

state=directory tags:

infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure

Stephen John Smoogen

3:26 p.m.

+100 from me.

On 16 April 2015 at 14:23, Patrick Uiterwijk puiterwijk@redhat.com wrote:

...

+1 from me, though you might need to add a <Directory /var/www/html> with allow from all.

----- Original Message -----

...
ok. Next thing... I notice that some queries hitting particularly hard are from spiders.

Back in the puppet days, koji disallowed all spidering.

I never have seen useful search results from it.

So, I'd like to block all the spiders and see if that helps out load issues.

robots.txt would have:

User-agent: * Disallow: /

+1s?

kevin

diff --git a/roles/koji_hub/files/kojiweb.conf b/roles/koji_hub/files/kojiweb.conf index 86abd2e..8334274 100644 --- a/roles/koji_hub/files/kojiweb.conf +++ b/roles/koji_hub/files/kojiweb.conf @@ -6,6 +6,8 @@ KeepAlive On Alias /koji "/usr/share/koji-web/scripts/wsgi_publisher.py" #(configuration goes in /etc/kojiweb/web.conf)

+Alias /robots.txt /var/www/html/robots.txt

<Directory "/usr/share/koji-web/scripts/"> Options ExecCGI SetHandler wsgi-script diff --git a/roles/koji_hub/tasks/main.yml b/roles/koji_hub/tasks/main.yml index 0b6cd82..a5fc795 100644 --- a/roles/koji_hub/tasks/main.yml +++ b/roles/koji_hub/tasks/main.yml @@ -171,6 +171,13 @@ notify: restart httpd when: env != "staging"

+- name: koji robots.txt config

copy: src=robots.txt dest=/var/www/html/robots.txt

tags:

config

koji_hub

notify: restart httpd

name: kojira log dir file: dest=/var/log/kojira owner=root group=root mode=0750

state=directory tags:

infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure

infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure

-- Stephen J Smoogen.

3305

Age (days ago)

3307

Last active (days ago)

infrastructure@lists.fedoraproject.org

7 comments

4 participants

tags (0)

participants (4)

Kevin Fenzi
Oliver Falk
Patrick Uiterwijk
Stephen John Smoogen