Hi,
I am in the middle of installing a new environment based on RHQ 4.3 and I'm having a lot of trouble with agents not gathering metrics.
What seems to happen is that the agent autodiscovers everything correctly but does not appear to schedule metric gathering correctly so no data is received by the server. After a lot of messing around I have narrowed it down to a few obvious symptoms.
When running the agent interactively on one of our problematic servers. I see when running the inventory command all the resources in the tree like this;
Resource[id=12030, type=Memory Pool, key=java.lang:name=PS Perm Gen,type=Memory Pool, name=PS Perm Gen, parent=Memory Subsystem] (sync=NEW, state=START ED, avail=UNKNOWN, sched=0/0)
Does anybody know why the state is always sync=NEW?
We also have some problematic agents where the inventory command shows sync=SYNCHRONIZED but the sched section still says 0/0. If you then go into the RHQ web interface and disable then enable a schedule on one of the metrics for the resource the sched number will be 1/1.
On some agents this is very intermittent. For example one platform has 7 mounted filesystems and about half had sched=0/0 and the other half had the correct number of enabled and disabled metrics.
Any idea where to investigate next or what can cause metric schedules to fail to download to the agent correctly?
As a note the agents are talking to the server via SSL and the database for RHQ is a 2 node Oracle RAC cluster and there are 2 RHQ servers. I don't know if these factors make a difference but thought I should mention it.
Thanks Steve Millidge www.c2b2.co.uk
Just replying to my own post.
I think we have tracked it down to a ConcurrentModificationException being thrown during InventoryManager.synchInventory.
Apologies for thestack trace but I can't get data off the box easily.
Hashmap$KeyIterator.next. Line 828 InventoryManager.mergeResource. Line 2708 InventoryManager.mergeResource line 2709 InventoryManager.mergeUnknownResources. 2610 InventoryManager.synchInventory. Line 1041
The exception is easily thrown by running the command. Inventory -s. From the agent interactive prompt.
I will try RHQ4.4. Later today.
Thanks
Steve Millidge Www.c2b2.co.uk
Sent from my BlackBerry® wireless device
-----Original Message----- From: "smillidge@c2b2.co.uk" smillidge@c2b2.co.uk Sender: rhq-users-bounces@lists.fedorahosted.org Date: Sun, 13 May 2012 08:12:53 To: rhq-users@lists.fedorahosted.org Reply-To: rhq-users@lists.fedorahosted.org Subject: Problem with Agent not gathering metrics
_______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users
FYI:
https://bugzilla.redhat.com/show_bug.cgi?id=794790
Code was added to fix the CCME incidents. If you are still seeing this in the latest RHQ release, write up a BZ and reference that one and provide whatever stack traces you can provide in your new BZ.
Thanks, John Mazz
----- Original Message -----
Just replying to my own post.
I think we have tracked it down to a ConcurrentModificationException being thrown during InventoryManager.synchInventory.
Apologies for thestack trace but I can't get data off the box easily.
Hashmap$KeyIterator.next. Line 828 InventoryManager.mergeResource. Line 2708 InventoryManager.mergeResource line 2709 InventoryManager.mergeUnknownResources. 2610 InventoryManager.synchInventory. Line 1041
The exception is easily thrown by running the command. Inventory -s. From the agent interactive prompt.
I will try RHQ4.4. Later today.
Thanks
Steve Millidge Www.c2b2.co.uk
Sent from my BlackBerry® wireless device
-----Original Message----- From: "smillidge@c2b2.co.uk" smillidge@c2b2.co.uk Sender: rhq-users-bounces@lists.fedorahosted.org Date: Sun, 13 May 2012 08:12:53 To: rhq-users@lists.fedorahosted.org Reply-To: rhq-users@lists.fedorahosted.org Subject: Problem with Agent not gathering metrics
rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users
Thanks John we upgraded to 4.4 and it seems to be better. We are not getting this problem with 4.4.
Steve Millidge C2B2 The Leading Independent Middleware Experts. T: 08450 539457 |M: 07920 100626 |W: www.c2b2.co.uk | E: smillidge@c2b2.co.uk --------------------------------------------------------------------------------------------------------------- C2B2 Consulting Limited, Unit 33, Malvern Hills Science Park, Geraldine Road, Malvern, Worcestershire, WR14 3SZ Registered in England and Wales: 4563419, Registered Office: Ardendale, Old Hollow, Malvern, Worcestershire
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: 14 May 2012 16:24 To: rhq-users@lists.fedorahosted.org Subject: Re: Problem with Agent not gathering metrics
FYI:
https://bugzilla.redhat.com/show_bug.cgi?id=794790
Code was added to fix the CCME incidents. If you are still seeing this in the latest RHQ release, write up a BZ and reference that one and provide whatever stack traces you can provide in your new BZ.
Thanks, John Mazz
----- Original Message -----
Just replying to my own post.
I think we have tracked it down to a ConcurrentModificationException being thrown during InventoryManager.synchInventory.
Apologies for thestack trace but I can't get data off the box easily.
Hashmap$KeyIterator.next. Line 828 InventoryManager.mergeResource. Line 2708 InventoryManager.mergeResource line 2709 InventoryManager.mergeUnknownResources. 2610 InventoryManager.synchInventory. Line 1041
The exception is easily thrown by running the command. Inventory -s. From the agent interactive prompt.
I will try RHQ4.4. Later today.
Thanks
Steve Millidge Www.c2b2.co.uk
Sent from my BlackBerry® wireless device
-----Original Message----- From: "smillidge@c2b2.co.uk" smillidge@c2b2.co.uk Sender: rhq-users-bounces@lists.fedorahosted.org Date: Sun, 13 May 2012 08:12:53 To: rhq-users@lists.fedorahosted.org Reply-To: rhq-users@lists.fedorahosted.org Subject: Problem with Agent not gathering metrics
rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users
_______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/rhq-users
rhq-users@lists.stg.fedorahosted.org