In looking at our resources, marchant was wondering what our top ten things we run crazy to the computer at 3am. Turns out it was a lot harder to do with just 10 so this is the first pass of stuff
Low Priority Servers: Download Services: serverbeach1.fedoraproject.org [ rename ] torrent01.fedoraproject.org Staging: app01.stg.phx2.fedoraproject.org app02.stg.phx2.fedoraproject.org db01.stg.phx2.fedoraproject.org fas01.stg.phx2.fedoraproject.org koji01.stg.phx2.fedoraproject.org noc01.stg.phx2.fedoraproject.org pkgs01.stg.phx2.fedoraproject.org proxy01.stg.phx2.fedoraproject.org releng01.stg.phx2.fedoraproject.org value01.stg.phx2.fedoraproject.org Public Test: publictest01.fedoraproject.org publictest02.fedoraproject.org publictest03.fedoraproject.org publictest04.fedoraproject.org publictest05.fedoraproject.org publictest06.fedoraproject.org publictest07.fedoraproject.org publictest08.fedoraproject.org publictest09.fedoraproject.org publictest10.fedoraproject.org fakefas01.fedoraproject.org Voice: asterisk02.fedoraproject.org asterisk1.fedoraproject.org [ rename ] Releng: cvs01.phx2.fedoraproject.org ppc04.phx2.fedoraproject.org ppc05.phx2.fedoraproject.org ppc06.phx2.fedoraproject.org ppc07.phx2.fedoraproject.org ppc08.phx2.fedoraproject.org ppc09.phx2.fedoraproject.org ppc10.phx2.fedoraproject.org ppc12.phx2.fedoraproject.org x86-01.phx2.fedoraproject.org x86-02.phx2.fedoraproject.org x86-03.phx2.fedoraproject.org x86-04.phx2.fedoraproject.org x86-05.phx2.fedoraproject.org x86-06.phx2.fedoraproject.org x86-07.phx2.fedoraproject.org x86-09.phx2.fedoraproject.org x86-10.phx2.fedoraproject.org x86-11.phx2.fedoraproject.org x86-12.phx2.fedoraproject.org x86-13.phx2.fedoraproject.org x86-14.phx2.fedoraproject.org x86-15.phx2.fedoraproject.org x86-16.phx2.fedoraproject.org x86-17.phx2.fedoraproject.org x86-18.phx2.fedoraproject.org x86-19.phx2.fedoraproject.org x86-20.phx2.fedoraproject.org QA: retrace01.fedoraproject.org autoqa01 qa01-08 Hosted Services people01.fedoraproject.org smtp-mm01.fedoraproject.org smtp-mm02.fedoraproject.org smtp-mm03.fedoraproject.org Web value01.phx2.fedoraproject.org value02.phx2.fedoraproject.org Not Our Stuff: cnode01.fedoraproject.org dhcp02.c.fedoraproject.org
Medium Priority Backups backup02.fedoraproject.org Download Services download01.phx2.fedoraproject.org download02.phx2.fedoraproject.org download03.phx2.fedoraproject.org download04.phx2.fedoraproject.org download05.phx2.fedoraproject.org secondary01.phx2.fedoraproject.org [ recommission soon?] Hosted Services collab1.fedoraproject.org [ email for lists.fp.o/gobby] collab2.fedoraproject.org [ email for lists.fp.o/gobby] hosted1.fedoraproject.org hosted2.fedoraproject.org Noc Services dhcp01.phx2.fedoraproject.org Releng spin01.phx2.fedoraproject.org bnfs01.phx2.fedoraproject.org Virtualization Hardware bodhost01.fedoraproject.org serverbeach3.fedoraproject.org [rename] serverbeach4.fedoraproject.org [rename] serverbeach5.fedoraproject.org [rename] tummy1.fedoraproject.org [rename] internetx01.fedoraproject.org osuosl1.fedoraproject.org Web Servers app01.phx2.fedoraproject.org app02.phx2.fedoraproject.org app03.phx2.fedoraproject.org app04.phx2.fedoraproject.org app07.phx2.fedoraproject.org memcached01.phx2.fedoraproject.org memcached02.phx2.fedoraproject.org app05.fedoraproject.org app6.fedoraproject.org [ needs to be renamed ] proxy01.phx2.fedoraproject.org proxy02.fedoraproject.org proxy04.fedoraproject.org proxy07.fedoraproject.org proxy3.fedoraproject.org [rename] proxy5.fedoraproject.org [rename] proxy6.fedoraproject.org [rename]
High Priority Application Servers fas01.phx2.fedoraproject.org fas02.phx2.fedoraproject.org fas03.phx2.fedoraproject.org Database Servers db01.phx2.fedoraproject.org db02.phx2.fedoraproject.org db03.phx2.fedoraproject.org Backups backup01.phx2.fedoraproject.org NOC services bastion01.phx2.fedoraproject.org bastion02.phx2.fedoraproject.org log01.phx2.fedoraproject.org noc01.phx2.fedoraproject.org noc02.fedoraproject.org ns02.fedoraproject.org ns03.phx2.fedoraproject.org ns04.phx2.fedoraproject.org ns1.fedoraproject.org [rename] puppet01.phx2.fedoraproject.org Releng compose-x86-01.phx2.fedoraproject.org koji01.phx2.fedoraproject.org koji02.phx2.fedoraproject.org kojipkgs01.phx2.fedoraproject.org nfs01.phx2.fedoraproject.org pkgs01.phx2.fedoraproject.org releng01.phx2.fedoraproject.org releng02.phx2.fedoraproject.org relepel01.phx2.fedoraproject.org sign-bridge01.phx2.fedoraproject.org sign-vault01.phx2.fedoraproject.org Virtualization Hardware bvirthost01.phx2.fedoraproject.org bxen01.phx2.fedoraproject.org bxen02.phx2.fedoraproject.org bxen03.phx2.fedoraproject.org bxen04.phx2.fedoraproject.org virthost01.phx2.fedoraproject.org virthost02.phx2.fedoraproject.org virthost13.phx2.fedoraproject.org xen03.phx2.fedoraproject.org xen04.phx2.fedoraproject.org xen05.phx2.fedoraproject.org xen07.phx2.fedoraproject.org xen09.phx2.fedoraproject.org xen10.phx2.fedoraproject.org xen11.phx2.fedoraproject.org xen12.phx2.fedoraproject.org xen14.phx2.fedoraproject.org xen15.phx2.fedoraproject.org serverbeach2.fedoraproject.org [ns1] ibiblio01.fedoraproject.org [ns2] telia1.fedoraproject.org [noc02] Web services bapp01.phx2.fedoraproject.org
Stephen John Smoogen smooge@gmail.com wrote:
In looking at our resources, marchant was wondering what our top ten things we run crazy to the computer at 3am. Turns out it was a lot harder to do with just 10 so this is the first pass of stuff Low Priority Servers: Download Services: serverbeach1.fedoraproject.org [ rename ] torrent01.fedoraproject.org Staging: app01.stg.phx2.fedoraproject.org app02.stg.phx2.fedoraproject.org db01.stg.phx2.fedoraproject.org fas01.stg.phx2.fedoraproject.org koji01.stg.phx2.fedoraproject.org noc01.stg.phx2.fedoraproject.org pkgs01.stg.phx2.fedoraproject.org proxy01.stg.phx2.fedoraproject.org releng01.stg.phx2.fedoraproject.org value01.stg.phx2.fedoraproject.org Public Test: publictest01.fedoraproject.org publictest02.fedoraproject.org publictest03.fedoraproject.org publictest04.fedoraproject.org publictest05.fedoraproject.org publictest06.fedoraproject.org publictest07.fedoraproject.org publictest08.fedoraproject.org publictest09.fedoraproject.org publictest10.fedoraproject.org fakefas01.fedoraproject.org Voice: asterisk02.fedoraproject.org asterisk1.fedoraproject.org [ rename ] Releng: cvs01.phx2.fedoraproject.org ppc04.phx2.fedoraproject.org ppc05.phx2.fedoraproject.org ppc06.phx2.fedoraproject.org ppc07.phx2.fedoraproject.org ppc08.phx2.fedoraproject.org ppc09.phx2.fedoraproject.org ppc10.phx2.fedoraproject.org ppc12.phx2.fedoraproject.org x86-01.phx2.fedoraproject.org x86-02.phx2.fedoraproject.org x86-03.phx2.fedoraproject.org x86-04.phx2.fedoraproject.org x86-05.phx2.fedoraproject.org x86-06.phx2.fedoraproject.org x86-07.phx2.fedoraproject.org x86-09.phx2.fedoraproject.org x86-10.phx2.fedoraproject.org x86-11.phx2.fedoraproject.org x86-12.phx2.fedoraproject.org x86-13.phx2.fedoraproject.org x86-14.phx2.fedoraproject.org x86-15.phx2.fedoraproject.org x86-16.phx2.fedoraproject.org x86-17.phx2.fedoraproject.org x86-18.phx2.fedoraproject.org x86-19.phx2.fedoraproject.org x86-20.phx2.fedoraproject.org QA: retrace01.fedoraproject.org autoqa01 qa01-08 Hosted Services people01.fedoraproject.org smtp-mm01.fedoraproject.org smtp-mm02.fedoraproject.org smtp-mm03.fedoraproject.org Web value01.phx2.fedoraproject.org value02.phx2.fedoraproject.org Not Our Stuff: cnode01.fedoraproject.org dhcp02.c.fedoraproject.org Medium Priority Backups backup02.fedoraproject.org Download Services download01.phx2.fedoraproject.org download02.phx2.fedoraproject.org download03.phx2.fedoraproject.org download04.phx2.fedoraproject.org download05.phx2.fedoraproject.org secondary01.phx2.fedoraproject.org [ recommission soon?] Hosted Services collab1.fedoraproject.org [ email for lists.fp.o/gobby] collab2.fedoraproject.org [ email for lists.fp.o/gobby] hosted1.fedoraproject.org hosted2.fedoraproject.org Noc Services dhcp01.phx2.fedoraproject.org Releng spin01.phx2.fedoraproject.org bnfs01.phx2.fedoraproject.org Virtualization Hardware bodhost01.fedoraproject.org serverbeach3.fedoraproject.org [rename] serverbeach4.fedoraproject.org [rename] serverbeach5.fedoraproject.org [rename] tummy1.fedoraproject.org [rename] internetx01.fedoraproject.org osuosl1.fedoraproject.org Web Servers app01.phx2.fedoraproject.org app02.phx2.fedoraproject.org app03.phx2.fedoraproject.org app04.phx2.fedoraproject.org app07.phx2.fedoraproject.org memcached01.phx2.fedoraproject.org memcached02.phx2.fedoraproject.org app05.fedoraproject.org app6.fedoraproject.org [ needs to be renamed ] proxy01.phx2.fedoraproject.org proxy02.fedoraproject.org proxy04.fedoraproject.org proxy07.fedoraproject.org proxy3.fedoraproject.org [rename] proxy5.fedoraproject.org [rename] proxy6.fedoraproject.org [rename] High Priority Application Servers fas01.phx2.fedoraproject.org fas02.phx2.fedoraproject.org fas03.phx2.fedoraproject.org Database Servers db01.phx2.fedoraproject.org db02.phx2.fedoraproject.org db03.phx2.fedoraproject.org Backups backup01.phx2.fedoraproject.org NOC services bastion01.phx2.fedoraproject.org bastion02.phx2.fedoraproject.org log01.phx2.fedoraproject.org noc01.phx2.fedoraproject.org noc02.fedoraproject.org ns02.fedoraproject.org ns03.phx2.fedoraproject.org ns04.phx2.fedoraproject.org ns1.fedoraproject.org [rename] puppet01.phx2.fedoraproject.org Releng compose-x86-01.phx2.fedoraproject.org koji01.phx2.fedoraproject.org koji02.phx2.fedoraproject.org kojipkgs01.phx2.fedoraproject.org nfs01.phx2.fedoraproject.org pkgs01.phx2.fedoraproject.org releng01.phx2.fedoraproject.org releng02.phx2.fedoraproject.org relepel01.phx2.fedoraproject.org sign-bridge01.phx2.fedoraproject.org sign-vault01.phx2.fedoraproject.org Virtualization Hardware bvirthost01.phx2.fedoraproject.org bxen01.phx2.fedoraproject.org bxen02.phx2.fedoraproject.org bxen03.phx2.fedoraproject.org bxen04.phx2.fedoraproject.org virthost01.phx2.fedoraproject.org virthost02.phx2.fedoraproject.org virthost13.phx2.fedoraproject.org xen03.phx2.fedoraproject.org xen04.phx2.fedoraproject.org xen05.phx2.fedoraproject.org xen07.phx2.fedoraproject.org xen09.phx2.fedoraproject.org xen10.phx2.fedoraproject.org xen11.phx2.fedoraproject.org xen12.phx2.fedoraproject.org xen14.phx2.fedoraproject.org xen15.phx2.fedoraproject.org serverbeach2.fedoraproject.org [ns1] ibiblio01.fedoraproject.org [ns2] telia1.fedoraproject.org [noc02] Web services bapp01.phx2.fedoraproject.org -- Stephen J Smoogen. "The core skill of innovators is error recovery, not failure avoidance." Randy Nelson, President of Pixar University. "Let us be kind, one to another, for most of us are fighting a hard battle." -- Ian MacLaren_____________________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
Does the nagios stage environment operate in an equivalent manner to prod such that testing nagios 3 in stage for these systems would accurately reflect prod? I assume that there are specific monitors for each of these systems that would need to be exercised? I can only imagine what that list will look like...
On Fri, Mar 4, 2011 at 17:07, Gareth Marchant gareth@litehaus.net wrote:
Stephen John Smoogen smooge@gmail.com wrote:
https://admin.fedoraproject.org/mailman/listinfo/infrastructure
Does the nagios stage environment operate in an equivalent manner to prod such that testing nagios 3 in stage for these systems would accurately reflect prod? I assume that there are specific monitors for each of these systems that would need to be exercised? I can only imagine what that list will look like...
staging should be 1:1 with production. However the list I created and how it matches with current nagios configurations may not agree at all :).
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
On Fri, 04 Mar 2011 19:07:53 -0500 Gareth Marchant gareth@litehaus.net wrote:
Does the nagios stage environment operate in an equivalent manner to prod such that testing nagios 3 in stage for these systems would accurately reflect prod? I assume that there are specific monitors for each of these systems that would need to be exercised? I can only imagine what that list will look like...
https://admin.stg.fedoraproject.org/nagios/
You can see that it can't reach/monitor a lot of the things that the real instance does. The stg env just doesn't have access to all the things it would need outside it.
kevin
Kevin Fenzi kevin@scrye.com wrote:
On Fri, 04 Mar 2011 19:07:53 -0500 Gareth Marchant gareth@litehaus.net wrote: > Does the nagios stage environment operate in an equivalent manner to > prod such that testing nagios 3 in stage for these systems would > accurately reflect prod? I assume that there are specific monitors > for each of these systems that would need to be exercised? I can only > imagine what that list will look like... https://admin.stg.fedoraproject.org/nagios/ You can see that it can't reach/monitor a lot of the things that the real instance does. The stg env just doesn't have access to all the things it would need outside it. kevin_____________________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
How about devices? I am sure there are routers, switches, gateways, firewalls and maybe storage hardware monitored by nagios that are high priority/highly critical and worthy of test?
How deeply should testing go or, put another way, how much go-live risk can be tolerated? Should a gap analysis of stage environment to production be performed prior to making a nagios test plan? I am not sure how rigorously structured this upgrade plan should be!
On Fri, 04 Mar 2011 20:31:05 -0500 Gareth Marchant gareth@litehaus.net wrote:
How about devices? I am sure there are routers, switches, gateways, firewalls and maybe storage hardware monitored by nagios that are high priority/highly critical and worthy of test?
Well, much of the routers/switches/gateways are not under our control. They are controlled by whatever facility we have machines in. Monitoring of gateways is mostly done via monitoring the vpns we use between sites.
There is some storage backend stuff in phx2 that should probibly be monitored.
How deeply should testing go or, put another way, how much go-live risk can be tolerated? Should a gap analysis of stage environment to production be performed prior to making a nagios test plan? I am not sure how rigorously structured this upgrade plan should be!
Yeah, not sure either. ;)
I think monitoring could be improved, but it's hard to do that all at once. One possible plan would be to spin up a new nocXX in production, get it so that everything is showing green on it's monitoring before we retire noc01. The downside is that we might have to give this new machine/ip access to more things to be able to monitor, and we would be double monitoring things during the transition. On the plus side we could check them against each other to make sure we were monitoring everything we were before and that it was ok.
Of course some services would have to be migrated all at once. (zodbot, dhcp, tftp, meetbot httpd).
Just a thought.
kevin
On Fri, Mar 4, 2011 at 18:31, Gareth Marchant gareth@litehaus.net wrote:
Kevin Fenzi kevin@scrye.com wrote:
On Fri, 04 Mar 2011 19:07:53 -0500 Gareth Marchant gareth@litehaus.net wrote: > Does the nagios stage environment operate in an equivalent manner to > prod such that testing nagios 3 in stage for these systems would > accurately reflect prod? I assume that there are specific monitors > for each of these systems that would need to be exercised? I can only > imagine what that list will look like... https://admin.stg.fedoraproject.org/nagios/ You can see that it can't reach/monitor a lot of the things that the real instance does. The stg env just doesn't have access to all the things it would need outside it. kevin ________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
How about devices? I am sure there are routers, switches, gateways, firewalls and maybe storage hardware monitored by nagios that are high priority/highly critical and worthy of test?
We don't control 99.999% of them and have no access to the beyond pinging them. In many ways our infrastructure is very much a "cloud". We have systems but everything else is outsourced :).
The storage hardware we can monitor is pretty much the Equalogics that releng has. Everything else we get through closed firewalled off networks.
How deeply should testing go or, put another way, how much go-live risk can be tolerated? Should a gap analysis of stage environment to production be performed prior to making a nagios test plan? I am not sure how rigorously structured this upgrade plan should be!
If gap analysis or other items are itches you like to scratch we can work them into version 2 of the test plan(s). It would be a good training exercise for people to see how its done (as I only know it from consultants who were not doing it right according to the next set of consultants.) If they are not things you like to touch with a 10 foot pole, I have no want to make a volunteer spend time on them.
Our go-live risk tolerance is pretty high as we have done upgrades with no test plan for 6-7 years now. The goal here is to start from something a bit more complex than "does the web page have errors, no then we are good." because we have grown to be more complex and end up with 4-8 hour periods of "well darn I completely forgot that."
So I expect that we will have many lessons learned after each to say "we will add this to testing next time." and then be able to do so. I guess what I am saying is lets do enough that it fits on an ipad web-page the first time and make it more complex as we go.
My general philosophy for people volunteering time on Fedora is: Rule 1: Do good work for others as you would want them to do for you. Rule 2: Have Fun Rule 3: Keep true to Freedom, Friends, First, and Features without breaking 1 or 2.
So don't stress over the test plan if it misses a bunch of stuff. [I am saying this out loud because I usually get stressed over such stuff and have to remind myself :).] My main hope is to learn how to do our stuff better incrementally.
I hope this helps better outline what we need to start with. If a deadline would work better, I would like to have Nagios be ready to go live by the first of April. What do we need to have noc01.stg tested by March 28th?
On Sat, 2011-03-05 at 16:06 -0700, Stephen John Smoogen wrote:
On Fri, Mar 4, 2011 at 18:31, Gareth Marchant gareth@litehaus.net wrote:
Kevin Fenzi kevin@scrye.com wrote:
On Fri, 04 Mar 2011 19:07:53 -0500 Gareth Marchant gareth@litehaus.net wrote: > Does the nagios stage environment operate in an equivalent manner to > prod such that testing nagios 3 in stage for these systems would > accurately reflect prod? I assume that there are specific monitors > for each of these systems that would need to be exercised? I can only > imagine what that list will look like... https://admin.stg.fedoraproject.org/nagios/ You can see that it can't reach/monitor a lot of the things that the real instance does. The stg env just doesn't have access to all the things it would need outside it. kevin ________________________________ infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
How about devices? I am sure there are routers, switches, gateways, firewalls and maybe storage hardware monitored by nagios that are high priority/highly critical and worthy of test?
We don't control 99.999% of them and have no access to the beyond pinging them. In many ways our infrastructure is very much a "cloud". We have systems but everything else is outsourced :).
The storage hardware we can monitor is pretty much the Equalogics that releng has. Everything else we get through closed firewalled off networks.
How deeply should testing go or, put another way, how much go-live risk can be tolerated? Should a gap analysis of stage environment to production be performed prior to making a nagios test plan? I am not sure how rigorously structured this upgrade plan should be!
If gap analysis or other items are itches you like to scratch we can work them into version 2 of the test plan(s). It would be a good training exercise for people to see how its done (as I only know it from consultants who were not doing it right according to the next set of consultants.) If they are not things you like to touch with a 10 foot pole, I have no want to make a volunteer spend time on them.
Our go-live risk tolerance is pretty high as we have done upgrades with no test plan for 6-7 years now. The goal here is to start from something a bit more complex than "does the web page have errors, no then we are good." because we have grown to be more complex and end up with 4-8 hour periods of "well darn I completely forgot that."
So I expect that we will have many lessons learned after each to say "we will add this to testing next time." and then be able to do so. I guess what I am saying is lets do enough that it fits on an ipad web-page the first time and make it more complex as we go.
My general philosophy for people volunteering time on Fedora is: Rule 1: Do good work for others as you would want them to do for you. Rule 2: Have Fun Rule 3: Keep true to Freedom, Friends, First, and Features without breaking 1 or 2.
So don't stress over the test plan if it misses a bunch of stuff. [I am saying this out loud because I usually get stressed over such stuff and have to remind myself :).] My main hope is to learn how to do our stuff better incrementally.
I hope this helps better outline what we need to start with. If a deadline would work better, I would like to have Nagios be ready to go live by the first of April. What do we need to have noc01.stg tested by March 28th?
Perfect, this is exactly the philosophical viewpoint I was hoping to get. "Test plan" means different things to different people! Fortunately the only itch I have to scratch is covered in "Rule 1."
I will expand the basic plan I put together before. I think that expanding it just enough to cover the obvious stuff is sufficient based on what I think I am hearing?
For example: 1. Test the nagios system, for example exercise nagios services to verify clean start/stop/restart, bounce the server to verify nagios comes online without intervention and perhaps have several individuals hit the nagios web interface while services restart to validate things operate in an expected manner. 2. Turn down various services on various hosts and verify proper notification, start with one or two services and progress to turning off large(r) quantities of services simultaneously. 3. Test notification facilities, not sure exactly how mail alerts are configured, but might be worthwhile to test broken smtp connectivity to validate secondary alert functions like a fallback smtp connection or text alerts?
I will pad this basic list with some actual tasks, and would be happy to hear other people's input and suggestions for items 1,2 & 3 above.
Is nagios 3 in stg the result of an in-place upgrade from nagios 2? Should the essentials of the upgrade procedure be documented in order to be replayed when the time comes in prod?
On Mar 5, 2011, at 8:51 PM, Gareth Marchant wrote:
Is nagios 3 in stg the result of an in-place upgrade from nagios 2? Should the essentials of the upgrade procedure be documented in order to be replayed when the time comes in prod?
noc02 (nagios external) is running nagios 3, and I did think of one thing that we should take a look at. In notifications from noc02, it does not mention that the error originated from noc02 anymore. I have no clue why.
noc01.stg is a machine that was built as EL6 and is running with my preliminary nagios puppet module, in puppet/modules/nagios/* ... so config stuff should be editable in that directory.
As I've said numerous times, I am fairly confident that our nagios config will work perfectly fine in 3, I'm more worried about things like meetbot logs and zodbot. Let's get those tested on noc01.stg and have /everything/ working so we can stick to smooge's deadline.
On Sat, 5 Mar 2011 21:00:25 -0500 Ricky Elrod codeblock@elrod.me wrote:
noc02 (nagios external) is running nagios 3, and I did think of one thing that we should take a look at. In notifications from noc02, it does not mention that the error originated from noc02 anymore. I have no clue why.
noc01.stg is a machine that was built as EL6 and is running with my preliminary nagios puppet module, in puppet/modules/nagios/* ... so config stuff should be editable in that directory.
As I've said numerous times, I am fairly confident that our nagios config will work perfectly fine in 3, I'm more worried about things like meetbot logs and zodbot. Let's get those tested on noc01.stg and have /everything/ working so we can stick to smooge's deadline.
I know zodbot was tested (we changed it's nick on noc01.stg and had it join and tested it out some. I don't know that we have a full testing plan, but it should work fine I would think.
Not sure how to test meetbot logs, but they are just static html, so I would think they would work just fine too.
kevin
Ricky
assuming you send notifications as email a simple fix to tell that a nagios notification came from a "specific server" is to just edit the comment section in the password file for the account nagios is running as, usually nagios
noc01 /etc/passwd nagios:x:100:100:nagios-noc1:/home/nagios:/sbin/nologin
noc02 /etc/passwd nagios:x:100:100:nagios-noc02:/home/nagios:/sbin/nologin
email would show
nagios-noc1 <nagios at noc01> nagios-noc2 <nagios at noc02>
of course you could change your 'notification' template too..
am new to the list and looking to assist with fedora-infrastructure, and thought I would start with an area I know (nagios), hopefully this is helpful and starts me on my way of actually knowing how fpo nagios is setup , so at some later date I could provide a more viable answer.
am jbass29503 on #fedora-admin
JB
On Sun, Mar 6, 2011 at 3:57 PM, Kevin Fenzi kevin@scrye.com wrote:
On Sat, 5 Mar 2011 21:00:25 -0500 Ricky Elrod codeblock@elrod.me wrote:
noc02 (nagios external) is running nagios 3, and I did think of one thing that we should take a look at. In notifications from noc02, it does not mention that the error originated from noc02 anymore. I have no clue why.
noc01.stg is a machine that was built as EL6 and is running with my preliminary nagios puppet module, in puppet/modules/nagios/* ... so config stuff should be editable in that directory.
As I've said numerous times, I am fairly confident that our nagios config will work perfectly fine in 3, I'm more worried about things like meetbot logs and zodbot. Let's get those tested on noc01.stg and have /everything/ working so we can stick to smooge's deadline.
I know zodbot was tested (we changed it's nick on noc01.stg and had it join and tested it out some. I don't know that we have a full testing plan, but it should work fine I would think.
Not sure how to test meetbot logs, but they are just static html, so I would think they would work just fine too.
kevin
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
On Sun, 6 Mar 2011 23:57:17 -0500 Bassford John jbass29503@gmail.com wrote:
Ricky
assuming you send notifications as email a simple fix to tell that a nagios notification came from a "specific server" is to just edit the comment section in the password file for the account nagios is running as, usually nagios
noc01 /etc/passwd nagios:x:100:100:nagios-noc1:/home/nagios:/sbin/nologin
noc02 /etc/passwd nagios:x:100:100:nagios-noc02:/home/nagios:/sbin/nologin
email would show
nagios-noc1 <nagios at noc01> nagios-noc2 <nagios at noc02>
of course you could change your 'notification' template too..
Yeah, either would work. It might be nice to see easily which one it came from.
am new to the list and looking to assist with fedora-infrastructure, and thought I would start with an area I know (nagios), hopefully this is helpful and starts me on my way of actually knowing how fpo nagios is setup , so at some later date I could provide a more viable answer.
Excellent and welcome. ;)
am jbass29503 on #fedora-admin
Feel free to ask questions or add comments there as you like...
kevin
infrastructure@lists.fedoraproject.org