nagios and ansible - infrastructure - Fedora mailing-lists

17 Jun 2013


      I've been thinking about how we should handle nagios in the ansible
world.
our current nagios config in puppet has a number of issues:
1. it's a bit cumbersome b/c you edit nagios independent of adding the
host's config
2. when you remove a host the nagios config doesn't automatically go
away
3. the fqdn/vpn hostname thing between noc01 and noc02 is kinda a giant
pain in the ass
4. dependencies between Networks->vhosts->guests are manual and
irritating to maintain.
I'm open to suggestions about how to maintain all of this. Here are
some ideas I've tinkered with:
a. we stop putting nagios configs in the specific config mgmt entirely
-and put it in another repo - like with dns. That doesn't make 1 or 2
any better - but it could allow us to script to make 3 and 4 much better
b. we make populating nagios configs for hosts or services be a
function of playbooking the host/group creation. So all of the nagios
configs go on when you add the host. - tht doesn't solve 2 or 4 but
maybe it does handle 1 and 3 a bit.
c. we make the nagios configs generate from the host inventory data
that ansible can retrieve. It will require us to define a series of
additional variables per host or per group. So when you add a new host
you'll need to wait for a cron run or an ansible run against our nagios
hosts to get them to see the new hosts. With enough effort I think we
can tag all of 1, 2, 3 and 4 in creating THE whole set of nagios
configs that way and rsyncing them over using the ansible-rsync module
(or just rsync). 
  The problem with this one is that it seems like an all-or-nothing
  scenario - we need to drive ALL of our nagios configs off of this or
  none at all. With that in mind it seems like we would need to define
  hosts as part of ansible even if they are still being managed by
  puppet. That's extra work but I think it is work we'd have to do
  eventually.
So (c) would be something like this:
  - take the list of hosts - look for a vmhost or if it is a cloud
    instance - make that a dep
  - look for a datacenter - make that a dep
  - look for a vpn cert - make that a dep
  - and on up the chain.
  - look for any special service definitions that we'd be managing
    manually
  - put all of the hosts definitions in one big file so changing out
    that file can be idempotent
  - put service definitions in individual files - but have the files
    rsynced over with --delete so removing one gets removed on the
    nagios side, too
Anyone have an option D we should think about? I'd like to hear about
more
Thanks,
-sv