All,
I've been using RHQ for past couple of days to integrate our application. For a newbie like me, RHQ was very easy to learn and ponder around ( UI is very intuitive ). I've been able to get it working very quickly with my application. Thanks to you guys for coming up with such a great application.
Now, I have a situation, where I should demonstrate that RHQ should raise an alert in case 'If connectivity to an external interface fails for 3 times in past 5 mins" raise a medium priority alert to a given list of Users. To demonstrate this I have instrumented my code and exposed a JMX attribute to denote this value. The value of the JMX attribute is incremented everytime we have a connectivity timeout with the external interface.
RHQ was able to fire an alert, when this counter is reaching the value of 3 but since this value is not reset from my application it was never able to recover from this situation. For ex: After 3 connection timeouts, the connectivity to the external interface resumed, so the counter stays at 3 even for the next collection of metrics, next time RHQ queries the metrics it identifies the value again matches the rule and it fires an alert. What is the best way to handle in this kind of situation ?
Ideally, if RHQ can measure the difference between two successive collections, then it can plot a trend and identify the behavior that connectivity has been resumed after sometime. Can RHQ support collection of metrics in this way ?
Any suggestions are welcome.
Thanks, Sarat kumar.
Sarat,
I've been using RHQ for past couple of days to integrate our application. For a newbie like me, RHQ was very easy to learn and ponder around ( UI is very intuitive ). I've been able to get it working very quickly with my application. Thanks to you guys for coming up with such a great application.
Thanks
Now, I have a situation, where I should demonstrate that RHQ should raise an alert in case 'If connectivity to an external interface fails for 3 times in past 5 mins" raise a medium priority alert to a given list of Users. To demonstrate this I have instrumented my code and exposed a JMX attribute to denote this value. The value of the JMX attribute is incremented everytime we have a connectivity timeout with the external interface.
RHQ was able to fire an alert, when this counter is reaching the value of 3 but since this value is not reset from my application it was never able to recover from this situation. For ex: After 3 connection timeouts, the connectivity to the external interface resumed, so the counter stays at 3 even for the next collection of metrics, next time RHQ queries the metrics it identifies the value again matches the rule and it fires an alert. What is the best way to handle in this kind of situation ?
Just send a 1 when the connection is not available and a 0 else. Then define an alert that triggers on value > 0.5 and define a dampening rule for N occurrences in X minutes
Hope that helps Heiko
rhq-users@lists.stg.fedorahosted.org