Hi there! I've had a system up and running for a while that I was thinking
of converting to open-source and it has a large overlap with
crash-catcher/ABRT so I was wondering if it would make sense to try
and share code.
The system I have mainly focuses on the back-end collection and correlation
and very little (practically nothing) on the front-end (i.e. where
crash-catcher seems to have most of it's focus).
The system I have is called "coroner". It has agents (called
that run on hosts looking for trouble. If they find an incident, the
constable reports the incident to the coroner. Based on central
configuration, the coroner may request that the constable investigate
further by running an autopsy, the results of which are posted back up to
the coroner. The coroner takes responsibility for correlating incidents
based on forensics (e.g. a fingerprint provided by the constable, which is a
parsed MD5 representation of a C/C++ stacktrace). The coroner provides a web
interface that allows data mining of the incidents and allows owners of
applications to configure actions to take (e.g. "email me a daily digest of
any incidents", "send the incident data to a morgue for archival",
all core dumps", "delete all but 3 core-dumps", etc). The web interface
also show what applications core-dump the most and other boring statistics
The system understands different coredumps, linux and solaris crashdumps,
and custom incident data (e.g. we have a tool we use when a process is hung
that straces and attaches gdb and does a variety of other probes - that data
can be collected in the same way as a coredump).
Obviously much of this makes sense within an organisation but may not make
sense in a distributed world such as the typical use-case for crash-catcher.
However, it's possible that there may be sufficient overlap that we could
share API's or provide somesuch.
The "watching for incidents" code that I have is very clunky and
crash-catcher appears to have far and away a better and more configurable
way of finding problems. There are other pars of the coroner that are also
"clunky" and could very well do with re-writing. It's all written in perl
with a little bit of C (for parsing core files quickly without having to
fire up gdb). It's also not in need of disentanglement from some internal
systems of our own that make it unready for immediate deployment in other
organisations. However, it's a start.
Is there interest in exploring some kind of collaboration?