The koji builders don't check back in automatically[1] if they've lost a connection to the host. I put this script together in an attempt to fix it, thought I'd post it here before sticking it on the builders. Basic premis is check if its checked in in 5 minutes (should be waaay more then enough) unless the box is under high load, then check 15 minutes, might be over kill.
I'd like to run this check via cron every 5 minutes on each builder. Anyone have any suggested fixes or against me running this?
-Mike
On Saturday 16 February 2008, Mike McGrath wrote:
The koji builders don't check back in automatically[1] if they've lost a connection to the host. I put this script together in an attempt to fix it, thought I'd post it here before sticking it on the builders. Basic premis is check if its checked in in 5 minutes (should be waaay more then enough) unless the box is under high load, then check 15 minutes, might be over kill.
I'd like to run this check via cron every 5 minutes on each builder. Anyone have any suggested fixes or against me running this?
-Mike
Looks fine to me. its pretty conservative. but that not a bad place to start. I think we should have something similar for the hub that checks if it can talk to the db, and if the hub is throwing 500 errors restarts apache.
Dennis
2008/2/16 Mike McGrath mmcgrath@redhat.com:
The koji builders don't check back in automatically[1] if they've lost a connection to the host. I put this script together in an attempt to fix it, thought I'd post it here before sticking it on the builders. Basic premis is check if its checked in in 5 minutes (should be waaay more then enough) unless the box is under high load, then check 15 minutes, might be over kill.
I'd like to run this check via cron every 5 minutes on each builder. Anyone have any suggested fixes or against me running this?
-Mike
The script looks fine. Out of curiosity, why would you run it as a cron job instead of via nagios?
---Brett.
On Tue, 19 Feb 2008, brett lentz wrote:
2008/2/16 Mike McGrath mmcgrath@redhat.com:
The koji builders don't check back in automatically[1] if they've lost a connection to the host. I put this script together in an attempt to fix it, thought I'd post it here before sticking it on the builders. Basic premis is check if its checked in in 5 minutes (should be waaay more then enough) unless the box is under high load, then check 15 minutes, might be over kill.
I'd like to run this check via cron every 5 minutes on each builder. Anyone have any suggested fixes or against me running this?
-Mike
The script looks fine. Out of curiosity, why would you run it as a cron job instead of via nagios?
We could very well do that actually. The main problem is, right now at least, we don't have nagios setup to actually take any actions when an event happens. Might be somthing our sysadmin-noc team can look at. Any takers?
-Mike
infrastructure@lists.fedoraproject.org