Hi Ho Everybody!
There will likely be a short period of time over the next couple of days where Zabbix will be unable to report/etc (manual DB updates).
I'll also be taking a moment to 'restructure' (for want of a better word) the way we monitor external hosts etc. - One of the reasons we went for this self punishment was to get a feel for how it compared to Nagios, I think it has performed great and we'll be able to orphan off Cacti at the same time :).
Now here is the important bit:
We have a variety of applications that have been recently had things added/altered/moved etc etc, or they have just been never added to Nagios etc, so here is your challenge:
If you run/work on/do something with the Infrastructure that meets any of the following criteria: * Is seen by the public (fas, etc) * Can cause problems to the normal routine (i.e. rawhide builds etc - did it succeed?) * Is important in some other way * Has a nice statistic that people might want to know/track...
THEN PLEASE... let us know...
What we need to know is: * How can such a thing be monitored? - Open ports/service, number of processes, age of a file, running a command and checking the output, running a custom script (to make it easier for us, if you can create such script it'd be helpful) etc etc etc * How often would it need to be checked? * What does 'failure' mean wrt the check (if one exists - statistics don't need this) * How can such a 'failure' be fixed automatically (ditto for above)...
Then we can add them all together, stir the pot and be happy happy happy.
Be extravagant too while we mightn't want to implement every single check you suggest, you might think of something that might have been forgotten...
(sysadmin-noc: I still need to work out the best way of scaling this, but I think I've nearly got it, and a SOP will be written when it's final)
- Nigel
On Sat, 18 Oct 2008, Nigel Jones wrote:
happy.
Be extravagant too while we mightn't want to implement every single check you suggest, you might think of something that might have been forgotten...
(sysadmin-noc: I still need to work out the best way of scaling this, but I think I've nearly got it, and a SOP will be written when it's final)
Would you guys like a monitoring component in our ticketing system?
-Mike
On Fri, 2008-10-17 at 22:16 -0500, Mike McGrath wrote:
On Sat, 18 Oct 2008, Nigel Jones wrote:
happy.
Be extravagant too while we mightn't want to implement every single check you suggest, you might think of something that might have been forgotten...
(sysadmin-noc: I still need to work out the best way of scaling this, but I think I've nearly got it, and a SOP will be written when it's final)
Would you guys like a monitoring component in our ticketing system?
Good idea, it now exists as 'Monitoring'.
sysadmin-noc people, don't reassign these (leave them as is) for now.
- Nigel
On Sat, 18 Oct 2008, Nigel Jones wrote:
On Fri, 2008-10-17 at 22:16 -0500, Mike McGrath wrote:
On Sat, 18 Oct 2008, Nigel Jones wrote:
happy.
Be extravagant too while we mightn't want to implement every single check you suggest, you might think of something that might have been forgotten...
(sysadmin-noc: I still need to work out the best way of scaling this, but I think I've nearly got it, and a SOP will be written when it's final)
Would you guys like a monitoring component in our ticketing system?
Good idea, it now exists as 'Monitoring'.
sysadmin-noc people, don't reassign these (leave them as is) for now.
Should we assume that all of what is being monitored with nagios will automatically be entered into zabbix or should I start opening tickets?
-Mike
On Fri, 2008-10-24 at 13:21 -0500, Mike McGrath wrote:
On Sat, 18 Oct 2008, Nigel Jones wrote:
On Fri, 2008-10-17 at 22:16 -0500, Mike McGrath wrote:
On Sat, 18 Oct 2008, Nigel Jones wrote:
happy.
Be extravagant too while we mightn't want to implement every single check you suggest, you might think of something that might have been forgotten...
(sysadmin-noc: I still need to work out the best way of scaling this, but I think I've nearly got it, and a SOP will be written when it's final)
Would you guys like a monitoring component in our ticketing system?
Good idea, it now exists as 'Monitoring'.
sysadmin-noc people, don't reassign these (leave them as is) for now.
Should we assume that all of what is being monitored with nagios will automatically be entered into zabbix or should I start opening tickets?
-Mike
Tickets might help serve as a good reminder so that nothing gets overlooked or forgotten, and a way to farm out smaller pieces of the work to anyone new that's looking for a place to start helping out.
---Brett.
"You must have an IQ of at least half a million." -- Popeye
infrastructure@lists.fedoraproject.org