After last weeks meeting, Mike asked me to look into a solution for log management, reporting and alerting. As the new guy, I happily jumped in and so here's a short status report on what I found and what some of the options are. I would appreciate any input.
First of all, there's a lot of different projects trying to deliver this functionality. A big part of them though, are either dead, commercial and / or mainly aimed at parsing Apache logfiles.
I am assuming we want an open source tool, so, skipping the commercial products like Splunk and skipping the dead projects, it boils down to these as most promising. If I turn out to be missing some great project, please tell me and I'll look into it.
1. octopussy (http://www.8pussy.org/dokuwiki/doku.php) Log aggregation tool, providing a one stop web UI to view all logs for a bunch of servers, complete with alerting to email, jabber or nagios, creation of graphs etc. Imports data into a MySQL database. The downsides are that it's developer base is pretty narrow, it is relatively complex to configure and it is Debian centric in both documentation and available packaging. On the other hand, octopussy is what comes closest to Splunk in the open source world that I know of. Sadly, all other Splunk-like apps are commercial. Last release: December 2009, last commit a couple of days ago.
2. logreport / lire (http://www.logreport.org/) Log parser that runs from cron or manually; parses logs from many different applications and generates html, txt or pdf that can optionally be mailed to people. Slightly odd curses configuration frontend. Works by importing log files into what is called the 'dlf store' and renders periodic reports from that data. Can receive logfiles over mail. Downside is that development is pretty slow, even if it is backed by a foundation (about one release per year; last one was in March 2009). Low mailinglist traffic. Does not do alerting and real time parsing. DLF has sqlite as a backend, afaict, and I'm not sure how well that scales in the long run. Last release: March 2009, no activity on mailinglist and CVS since then.
3. epylog (https://fedorahosted.org/epylog/) Has some fans in the Infra group already, it seems. Modular log parser, run from cron. Stores offsets for already parsed files, to the next run can start where the previous one left off. Generates reports in html in a configurable location or sends them out per mail. Custom report delivery methods can be configured. Does not do alerting and real time parsing. Hosted on Fedorahosted, written in Python (the other two apps are mainly in Perl). Very easy to write custom modules for. Last release: none recently, but subversion is active.
Of the projects, only epylog is packaged for Fedora already and Octopussy is the only one that does realtime log parsing and alerting.
What should be the next step in this? I probably need to do some more testing, but before I do that, let me know what you think is important in a log monitoring and reporting application? Any specific tests you would like to see done for these apps?
Let me know what you think.
Maxim Burgerhout maxim@wzzrd.com ---------------- GPG Fingerprint EB11 5E56 E648 9D99 E8EF 05FB C513 6FD4 1302 B48A
On Tue, 12 Jan 2010, Maxim Burgerhout wrote:
After last weeks meeting, Mike asked me to look into a solution for log management, reporting and alerting. As the new guy, I happily jumped in and so here's a short status report on what I found and what some of the options are. I would appreciate any input.
First of all, there's a lot of different projects trying to deliver this functionality. A big part of them though, are either dead, commercial and / or mainly aimed at parsing Apache logfiles.
I am assuming we want an open source tool, so, skipping the commercial products like Splunk and skipping the dead projects, it boils down to these as most promising. If I turn out to be missing some great project, please tell me and I'll look into it.
- octopussy (http://www.8pussy.org/dokuwiki/doku.php)
Log aggregation tool, providing a one stop web UI to view all logs for a bunch of servers, complete with alerting to email, jabber or nagios, creation of graphs etc. Imports data into a MySQL database. The downsides are that it's developer base is pretty narrow, it is relatively complex to configure and it is Debian centric in both documentation and available packaging. On the other hand, octopussy is what comes closest to Splunk in the open source world that I know of. Sadly, all other Splunk-like apps are commercial. Last release: December 2009, last commit a couple of days ago.
octopussy is mostly perl and xml it looks like. My main concern with it is that it seems to have only one contributor. Might be worth setting up to look at though I'm not so sure we need real-time analysis.
- logreport / lire (http://www.logreport.org/)
Log parser that runs from cron or manually; parses logs from many different applications and generates html, txt or pdf that can optionally be mailed to people. Slightly odd curses configuration frontend. Works by importing log files into what is called the 'dlf store' and renders periodic reports from that data. Can receive logfiles over mail. Downside is that development is pretty slow, even if it is backed by a foundation (about one release per year; last one was in March 2009). Low mailinglist traffic. Does not do alerting and real time parsing. DLF has sqlite as a backend, afaict, and I'm not sure how well that scales in the long run. Last release: March 2009, no activity on mailinglist and CVS since then.
- epylog (https://fedorahosted.org/epylog/)
Has some fans in the Infra group already, it seems. Modular log parser, run from cron. Stores offsets for already parsed files, to the next run can start where the previous one left off. Generates reports in html in a configurable location or sends them out per mail. Custom report delivery methods can be configured. Does not do alerting and real time parsing. Hosted on Fedorahosted, written in Python (the other two apps are mainly in Perl). Very easy to write custom modules for. Last release: none recently, but subversion is active.
Of the projects, only epylog is packaged for Fedora already and Octopussy is the only one that does realtime log parsing and alerting.
What should be the next step in this? I probably need to do some more testing, but before I do that, let me know what you think is important in a log monitoring and reporting application? Any specific tests you would like to see done for these apps?
Personally I'd like to get general metrics from the logs and list errors / warnings that we would care about. The problem is we never really know the format of some errors we get. We had recently gotten some memory errors from fedorahosted and no one noticed it until we happened to log in and see it.
I think I like the idea of a single nightly report that is easy to read through. The trick is figuring out what should be in that report I guess.
What are others using for log analysis?
-Mike
On Tue, Jan 12, 2010 at 08:29:51AM -0600, Mike McGrath wrote:
Personally I'd like to get general metrics from the logs and list errors / warnings that we would care about. The problem is we never really know the format of some errors we get. We had recently gotten some memory errors from fedorahosted and no one noticed it until we happened to log in and see it.
I think I like the idea of a single nightly report that is easy to read through. The trick is figuring out what should be in that report I guess.
What are others using for log analysis?
-Mike
Splunk :)
But in the non-commercial realm, there's a lot of stuff listed here[1]. In pre-splunk days, we were using swatch[2] quite heavily. It's not pretty to configure, but did its job. Wouldn't be surprised if there are some Python-ish tools out there that do the same.
We used it in tandem with syslog-ng (which we still use) and a FIFO.
Ray
[1] http://www.loganalysis.org/log-parsers-generic/ [2] http://www.oit.ucsb.edu/~eta/swatch/swatch.html
On Tue, Jan 12, 2010 at 15:29, Mike McGrath mmcgrath@redhat.com wrote:
octopussy is mostly perl and xml it looks like. My main concern with it is that it seems to have only one contributor. Might be worth setting up to look at though I'm not so sure we need real-time analysis.
Ok, if the real-time analysis is not a hard requirement, octopussy becomes a lot less attractive. Doing things real-time is one of it's key features.
Personally I'd like to get general metrics from the logs and list errors / warnings that we would care about. The problem is we never really know the format of some errors we get. We had recently gotten some memory errors from fedorahosted and no one noticed it until we happened to log in and see it.
Either of the other two options (lire and epylog) can do this, as every log line that doesn't match any specific rule gets printed in the daily report. I think this gives epylog the better papers, because it is already in Fedora. We might need to write some custom modules for it for it, but as it's just Python, I recon that'll be relatively easy. We can start off with the built-in modules and then create custom ones as time goes by and the need arises.
Is building a central logserver an option at all, btw?
We could also use the 'swatch' program Ray mentioned or something like it to receive alerts and then epylog / lire / something else to generate the daily reports.
Maxim
On Tue, 12 Jan 2010, Maxim Burgerhout wrote:
custom ones as time goes by and the need arises.
Is building a central logserver an option at all, btw?
We could also use the 'swatch' program Ray mentioned or something like it to receive alerts and then epylog / lire / something else to generate the daily reports.
In a former life here is what I did:
syslog-ng to make sets of logs merged into common locations using the same format/structure of /var/log on any system.
so I had:
/var/log/profiles/webservers/.... /var/log/profiles/appservers/.... /var/log/profiles/mxes/... etc etc
Then I used epylog to generate html output of each of the above every day so we could sift them properly.
And I used sec: http://simple-evcorr.sourceforge.net/
to do on-the-fly event notification. When something specific came into syslog-ng it would spawn an sec job which would send alerts using nrpe to nagios.
I can probably obtain those configs if they would be handy.
-sv
On Tue, 12 Jan 2010, Maxim Burgerhout wrote:
On Tue, Jan 12, 2010 at 15:29, Mike McGrath mmcgrath@redhat.com wrote:
octopussy is mostly perl and xml it looks like. My main concern with it is that it seems to have only one contributor. Might be worth setting up to look at though I'm not so sure we need real-time analysis.
Ok, if the real-time analysis is not a hard requirement, octopussy becomes a lot less attractive. Doing things real-time is one of it's key features.
Personally I'd like to get general metrics from the logs and list errors / warnings that we would care about. The problem is we never really know the format of some errors we get. We had recently gotten some memory errors from fedorahosted and no one noticed it until we happened to log in and see it.
Either of the other two options (lire and epylog) can do this, as every log line that doesn't match any specific rule gets printed in the daily report. I think this gives epylog the better papers, because it is already in Fedora. We might need to write some custom modules for it for it, but as it's just Python, I recon that'll be relatively easy. We can start off with the built-in modules and then create custom ones as time goes by and the need arises.
Is building a central logserver an option at all, btw?
We have one already.
-Mike
On Tue, Jan 12, 2010 at 08:29:51AM -0600, Mike McGrath wrote:
What are others using for log analysis?
Logcheck was recommended to me once, but it seems like there are no proper releases, but development is active on git: http://logcheck.org/index.html
Regards Till
infrastructure@lists.fedoraproject.org