On 06/27/2017 07:47 AM, Miroslav Suchý wrote:
Dne 26.6.2017 v 18:50 Kevin Fenzi napsal(a):
Greetings.
I've seen some various retrace/faf issues of late, so I thought I would collect them into an email and see if you all could take a look and solve them. :)
Thank you for bringing it up.
- retrace02.qa.fedoraproject.org has a 100% full disk.
retrace02 is used just for staging/development. So not big issue. But I am working on it right now. Should be resolved by EOB. .... Resolved now. :)
Thanks!
That does bring up one more issue: You are using firewalld there and aren't allowing our nagios/nrpe. I added a rule to allow port 5666/tcp. You might also add this upstream/ansible.
- retrace01.qa.fedoraproject.org is almost constantly alerting on swap
being full. Not sure what to do about this, but perhaps we could add more swap or somehow limit it to use only memory for normal jobs?
Few months ago I set postgresql to use more agressive caching. So that is main culprint for consuming so much memory. I can easily lower it by few percent. But... I see right now that there is 16GB swap and 8 GB is free. And total available memory is 16 GB. Because 8GB free swap and 8GB are kernel buffers/cache. So when you see those errors and what are the exact numbers in those alerts?
retrace01.qa.fedoraproject.org
Looks like it alerted just a few min ago: Swap Notifications for this service have been disabled CRITICAL 06-27-2017 14:15:24 0d 0h 11m 8s 3/3 SWAP CRITICAL - 7% free (1011 MB out of 16383 MB) Swap-Is-Low Notifications for this service have been disabled CRITICAL 06-27-2017 14:15:03 0d 0h 11m 29s 4/4 SWAP CRITICAL - 7% free (1002 MB out of 16383 MB)
I will investigate remaining issues tomorrow.
Great, thanks.
kevin