Hi there! I've had a system up and running for a while that I was thinking of converting to open-source and it has a large overlap with crash-catcher/ABRT so I was wondering if it would make sense to try and share code.
The system I have mainly focuses on the back-end collection and correlation and very little (practically nothing) on the front-end (i.e. where crash-catcher seems to have most of it's focus).
The system I have is called "coroner". It has agents (called "constables") that run on hosts looking for trouble. If they find an incident, the constable reports the incident to the coroner. Based on central configuration, the coroner may request that the constable investigate further by running an autopsy, the results of which are posted back up to the coroner. The coroner takes responsibility for correlating incidents based on forensics (e.g. a fingerprint provided by the constable, which is a parsed MD5 representation of a C/C++ stacktrace). The coroner provides a web interface that allows data mining of the incidents and allows owners of applications to configure actions to take (e.g. "email me a daily digest of any incidents", "send the incident data to a morgue for archival", "delete all core dumps", "delete all but 3 core-dumps", etc). The web interface can also show what applications core-dump the most and other boring statistics :).
The system understands different coredumps, linux and solaris crashdumps, and custom incident data (e.g. we have a tool we use when a process is hung that straces and attaches gdb and does a variety of other probes - that data can be collected in the same way as a coredump).
Obviously much of this makes sense within an organisation but may not make sense in a distributed world such as the typical use-case for crash-catcher. However, it's possible that there may be sufficient overlap that we could share API's or provide somesuch.
The "watching for incidents" code that I have is very clunky and crash-catcher appears to have far and away a better and more configurable way of finding problems. There are other pars of the coroner that are also "clunky" and could very well do with re-writing. It's all written in perl with a little bit of C (for parsing core files quickly without having to fire up gdb). It's also not in need of disentanglement from some internal systems of our own that make it unready for immediate deployment in other organisations. However, it's a start.
Is there interest in exploring some kind of collaboration?