We currently have the following configuration for our RHQ server:
Production RHQ Server 1 (Node A) Production RHQ Server 2 (Node B) * Both nodes connected to the Oracle backend database) * Storage nodes on each instance, clustered
BCP RHQ Server (Node C - different location) * Storage node not in cluster
We have an F5 URL that sits infront of all the nodes, and can point to any of them in the following scenario: * Prod Node A fails, traffic directed to Prod Node B * Prod Node A and B fails, traffic directed to BCP Node C
When an RHQ agent on a client windows box is configured, and the F5 GTM URL is used as the RHQ Server, the agent connects to Prod Node A (as that node is the primary active node for the F5 to point to). If I bring down both Prod Nodes, I would think the F5 URL would connect the agent to the BCP Node C RHQ server. But, it never connects to BCP, but rather in its logs shows it trying to connect to both Prod nodes and failing. Is this due to the storage nodes in Prod being clustered (assumption)? Why wouldn't the F5 URL direct the agent to connect to BCP?
Node C is also connected to the same Oracle backend DB I assume. That has to be the case for your scenario to work at all.
I don't know what you mean by the third storage node not in first cluster (A and B) - as I understand it, all the RHQ storage nodes must be clustered together. I don't know if you can have storage nodes in separate clusters (A and B in one cluster, C in its own cluster). That just doesn't seem like it would work.
Go in your Administration>Agents page and look at your agent's failover list (shows the list of all the servers it will try to connect to when it needs to failover). It should show all 3. Or, you can go in your agent's data/ directory and look in the failover.dat file - that's the list of the server endpoints it will try when it needs to failover. Again, all three servers should be in there.
As for the F5 URL stuff, never tried it, don't know how it would work in an RHQ environment. don't know anyone else that tried it either. maybe someone on the list here can chime in if they know.
----- Original Message -----
We currently have the following configuration for our RHQ server: Production RHQ Server 1 (Node A) Production RHQ Server 2 (Node B)
* Both nodes connected to the Oracle backend database) * Storage nodes on each instance, clustered
BCP RHQ Server (Node C – different location)
* Storage node not in cluster
We have an F5 URL that sits infront of all the nodes, and can point to any of them in the following scenario:
* Prod Node A fails, traffic directed to Prod Node B * Prod Node A and B fails, traffic directed to BCP Node C
When an RHQ agent on a client windows box is configured, and the F5 GTM URL is used as the RHQ Server, the agent connects to Prod Node A (as that node is the primary active node for the F5 to point to). If I bring down both Prod Nodes, I would think the F5 URL would connect the agent to the BCP Node C RHQ server. But, it never connects to BCP, but rather in its logs shows it trying to connect to both Prod nodes and failing. Is this due to the storage nodes in Prod being clustered (assumption)? Why wouldn’t the F5 URL direct the agent to connect to BCP?
rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
If I understand this correctly, the reason is this. The agent initially contacts *any server* but is not immediately serviced by that server. It is only provided with a "failover list" of servers it should use. It then tries to connect with the first server in that list (the "primary"). Failing contact it will then proceed to "failover" to the next HA server in the ordered list of HA servers. Eventually it will always try to return to its primary server, if possible.
The failover list will contain entries for only the registered HA servers, in the case A and B.
On 8/11/2014 1:40 PM, barry.barnett@wellsfargo.com wrote:
We currently have the following configuration for our RHQ server: Production RHQ Server 1 (Node A) Production RHQ Server 2 (Node B)
- Both nodes connected to the Oracle backend database)
- Storage nodes on each instance, clustered
BCP RHQ Server (Node C – different location)
- Storage node not in cluster
We have an F5 URL that sits infront of all the nodes, and can point to any of them in the following scenario:
- Prod Node A fails, traffic directed to Prod Node B
- Prod Node A and B fails, traffic directed to BCP Node C
When an RHQ agent on a client windows box is configured, and the F5 GTM URL is used as the RHQ Server, the agent connects to Prod Node A (as that node is the primary active node for the F5 to point to). If I bring down both Prod Nodes, I would think the F5 URL would connect the agent to the BCP Node C RHQ server. But, it never connects to BCP, but rather in its logs shows it trying to connect to both Prod nodes and failing. Is this due to the storage nodes in Prod being clustered (assumption)? Why wouldn’t the F5 URL direct the agent to connect to BCP?
rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
Oh, I misunderstood. Is that third server not actually registered as a RHQ Server in the HA Cloud?? That would be bad. What Jay said :)
----- Original Message -----
If I understand this correctly, the reason is this. The agent initially contacts *any server* but is not immediately serviced by that server. It is only provided with a "failover list" of servers it should use. It then tries to connect with the first server in that list (the "primary"). Failing contact it will then proceed to "failover" to the next HA server in the ordered list of HA servers. Eventually it will always try to return to its primary server, if possible.
The failover list will contain entries for only the registered HA servers, in the case A and B.
On 8/11/2014 1:40 PM, barry.barnett@wellsfargo.com wrote:
We currently have the following configuration for our RHQ server: Production RHQ Server 1 (Node A) Production RHQ Server 2 (Node B)
* Both nodes connected to the Oracle backend database) * Storage nodes on each instance, clustered
BCP RHQ Server (Node C – different location)
* Storage node not in cluster
We have an F5 URL that sits infront of all the nodes, and can point to any of them in the following scenario:
* Prod Node A fails, traffic directed to Prod Node B * Prod Node A and B fails, traffic directed to BCP Node C
When an RHQ agent on a client windows box is configured, and the F5 GTM URL is used as the RHQ Server, the agent connects to Prod Node A (as that node is the primary active node for the F5 to point to). If I bring down both Prod Nodes, I would think the F5 URL would connect the agent to the BCP Node C RHQ server. But, it never connects to BCP, but rather in its logs shows it trying to connect to both Prod nodes and failing. Is this due to the storage nodes in Prod being clustered (assumption)? Why wouldn’t the F5 URL direct the agent to connect to BCP?
rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Regards,
Barry
From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of Jay Shaughnessy Sent: Monday, August 11, 2014 2:07 PM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
If I understand this correctly, the reason is this. The agent initially contacts *any server* but is not immediately serviced by that server. It is only provided with a "failover list" of servers it should use. It then tries to connect with the first server in that list (the "primary"). Failing contact it will then proceed to "failover" to the next HA server in the ordered list of HA servers. Eventually it will always try to return to its primary server, if possible.
The failover list will contain entries for only the registered HA servers, in the case A and B.
On 8/11/2014 1:40 PM, barry.barnett@wellsfargo.commailto:barry.barnett@wellsfargo.com wrote: We currently have the following configuration for our RHQ server:
Production RHQ Server 1 (Node A) Production RHQ Server 2 (Node B) · Both nodes connected to the Oracle backend database) · Storage nodes on each instance, clustered
BCP RHQ Server (Node C – different location) · Storage node not in cluster
We have an F5 URL that sits infront of all the nodes, and can point to any of them in the following scenario: · Prod Node A fails, traffic directed to Prod Node B · Prod Node A and B fails, traffic directed to BCP Node C
When an RHQ agent on a client windows box is configured, and the F5 GTM URL is used as the RHQ Server, the agent connects to Prod Node A (as that node is the primary active node for the F5 to point to). If I bring down both Prod Nodes, I would think the F5 URL would connect the agent to the BCP Node C RHQ server. But, it never connects to BCP, but rather in its logs shows it trying to connect to both Prod nodes and failing. Is this due to the storage nodes in Prod being clustered (assumption)? Why wouldn’t the F5 URL direct the agent to connect to BCP?
_______________________________________________
rhq-users mailing list
rhq-users@lists.fedorahosted.orgmailto:rhq-users@lists.fedorahosted.org
RHQ Agent's communicate only with RHQ Servers. The data the agents collect is reported to a server, potentially processed in various ways, like for alerting, and then stored. Today, availability data, events and more are stored into the [oracle] RDB. Metric data is stored into the storage cluster you've created. Agents never communicate directly to the storage cluster. The storage cluster knows only about storing and fetching data. It has no concept of agents and all the associated things that go with that, like authorization, communication, etc.
So, RHQ Servers are one thing and they connect to the RDB and the RHQ Storage Node cluster. They also service agents. Agents only communicate with an RHQ Server, and not necessarily the server that initially contacted, but rather the server they are told to contact in order to distribute agent load.
I'm sorry, I don't know about F5, but basically the agents are not dealing with F5 after perhaps the initial server contact. They are connecting to a server via the RHQ comm layer, directly.
On 8/11/2014 2:17 PM, barry.barnett@wellsfargo.com wrote:
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Regards,
Barry
*From:*rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] *On Behalf Of *Jay Shaughnessy *Sent:* Monday, August 11, 2014 2:07 PM *To:* rhq-users@lists.fedorahosted.org *Subject:* Re: F5 GTM - RHQ Agent configuration issue
If I understand this correctly, the reason is this. The agent initially contacts *any server* but is not immediately serviced by that server. It is only provided with a "failover list" of servers it should use. It then tries to connect with the first server in that list (the "primary"). Failing contact it will then proceed to "failover" to the next HA server in the ordered list of HA servers. Eventually it will always try to return to its primary server, if possible.
The failover list will contain entries for only the registered HA servers, in the case A and B.
On 8/11/2014 1:40 PM, barry.barnett@wellsfargo.com mailto:barry.barnett@wellsfargo.com wrote:
We currently have the following configuration for our RHQ server: Production RHQ Server 1 (Node A) Production RHQ Server 2 (Node B) ·Both nodes connected to the Oracle backend database) ·Storage nodes on each instance, clustered BCP RHQ Server (Node C – different location) ·Storage node not in cluster We have an F5 URL that sits infront of all the nodes, and can point to any of them in the following scenario: ·Prod Node A fails, traffic directed to Prod Node B ·Prod Node A and B fails, traffic directed to BCP Node C When an RHQ agent on a client windows box is configured, and the F5 GTM URL is used as the RHQ Server, the agent connects to Prod Node A (as that node is the primary active node for the F5 to point to). If I bring down both Prod Nodes, I would think the F5 URL would connect the agent to the BCP Node C RHQ server. But, it never connects to BCP, but rather in its logs shows it trying to connect to both Prod nodes and failing. Is this due to the storage nodes in Prod being clustered (assumption)? Why wouldn’t the F5 URL direct the agent to connect to BCP? _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org <mailto:rhq-users@lists.fedorahosted.org> https://lists.fedorahosted.org/mailman/listinfo/rhq-users
rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Don't confuse the storage node configuration with the agent configration. Two different things. The agent's failover list is determined when a new server is added to the RHQ HA environment and shared to the agent the next time the agent connects to the server or when the agent periodically asks for its failover list (which happens every hour by default).
The storage cluster stuff is handled independently/differently from that. I dont know much about the storage node cluster config; someone else would have to chime in there about storage nodes.
When the agent needs to connect to a server, it looks for the hostname and port of the server to use (the data you see in failover.dat) and will use the proper Jboss/Remoting protocol (which is typically either servlet or sslservlet). So the URL it tries to connect to will be some jboss/remoting URL like "servlet://server-hostname:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet"
I don't know how you are telling the agent to use this F5 GTM URL you refer to, but I have a feeling the agent isn't using the URL you think it is using. But that's just a guess. Turn on agent debug, and look at the debug messages it spews when trying to make connections during its failover and see what its trying to connect to.
After I install the agent, I configure it by issuing the following on Windows:
rhq-agent.bat --cleanconfig
When I enter this command, it will ask me for the agent name, host server, RHQ-Server name, RHQ-Server Port, etc.. I enter the name of the destination RHQ Server as the F5 URL, and its port under the RHQ-Server Port. I thought the Agent would then use that to point to wherever the F5 was pointing to at that instance in time. If the F5 pointed to another RHQ server, then I thought that agent instance would also then point there.
I'm guessing that when I configured this originally, it did use the F5 URL, which was connected to Prod Node A at the time. And then the agent was passed the failover list only from Prod, and now uses that to connect. Could I manually edit this list to add the BCP server??? If so, how do I do this?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Monday, August 11, 2014 3:47 PM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Don't confuse the storage node configuration with the agent configration. Two different things. The agent's failover list is determined when a new server is added to the RHQ HA environment and shared to the agent the next time the agent connects to the server or when the agent periodically asks for its failover list (which happens every hour by default).
The storage cluster stuff is handled independently/differently from that. I dont know much about the storage node cluster config; someone else would have to chime in there about storage nodes.
When the agent needs to connect to a server, it looks for the hostname and port of the server to use (the data you see in failover.dat) and will use the proper Jboss/Remoting protocol (which is typically either servlet or sslservlet). So the URL it tries to connect to will be some jboss/remoting URL like "servlet://server-hostname:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet"
I don't know how you are telling the agent to use this F5 GTM URL you refer to, but I have a feeling the agent isn't using the URL you think it is using. But that's just a guess. Turn on agent debug, and look at the debug messages it spews when trying to make connections during its failover and see what its trying to connect to. _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
I found the failover-list.dat file in the rhq-agent/data directory. Can I make the change to this file directly? Do I need to make the change on every agent I just added (44 of them) to point to the RHQ Servers, or is there a way to push out the additional hostname to the failover list of all agents?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:33 AM To: rhq-users@lists.fedorahosted.org Subject: RE: F5 GTM - RHQ Agent configuration issue
After I install the agent, I configure it by issuing the following on Windows:
rhq-agent.bat --cleanconfig
When I enter this command, it will ask me for the agent name, host server, RHQ-Server name, RHQ-Server Port, etc.. I enter the name of the destination RHQ Server as the F5 URL, and its port under the RHQ-Server Port. I thought the Agent would then use that to point to wherever the F5 was pointing to at that instance in time. If the F5 pointed to another RHQ server, then I thought that agent instance would also then point there.
I'm guessing that when I configured this originally, it did use the F5 URL, which was connected to Prod Node A at the time. And then the agent was passed the failover list only from Prod, and now uses that to connect. Could I manually edit this list to add the BCP server??? If so, how do I do this?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Monday, August 11, 2014 3:47 PM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Don't confuse the storage node configuration with the agent configration. Two different things. The agent's failover list is determined when a new server is added to the RHQ HA environment and shared to the agent the next time the agent connects to the server or when the agent periodically asks for its failover list (which happens every hour by default).
The storage cluster stuff is handled independently/differently from that. I dont know much about the storage node cluster config; someone else would have to chime in there about storage nodes.
When the agent needs to connect to a server, it looks for the hostname and port of the server to use (the data you see in failover.dat) and will use the proper Jboss/Remoting protocol (which is typically either servlet or sslservlet). So the URL it tries to connect to will be some jboss/remoting URL like "servlet://server-hostname:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet"
I don't know how you are telling the agent to use this F5 GTM URL you refer to, but I have a feeling the agent isn't using the URL you think it is using. But that's just a guess. Turn on agent debug, and look at the debug messages it spews when trying to make connections during its failover and see what its trying to connect to. _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
Would I simply change the HA parm in the rhq-server.properties file to use the F5 URL as opposed to the prod host names?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:39 AM To: 'rhq-users@lists.fedorahosted.org'; mazz@redhat.com Subject: RE: F5 GTM - RHQ Agent configuration issue
I found the failover-list.dat file in the rhq-agent/data directory. Can I make the change to this file directly? Do I need to make the change on every agent I just added (44 of them) to point to the RHQ Servers, or is there a way to push out the additional hostname to the failover list of all agents?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:33 AM To: rhq-users@lists.fedorahosted.org Subject: RE: F5 GTM - RHQ Agent configuration issue
After I install the agent, I configure it by issuing the following on Windows:
rhq-agent.bat --cleanconfig
When I enter this command, it will ask me for the agent name, host server, RHQ-Server name, RHQ-Server Port, etc.. I enter the name of the destination RHQ Server as the F5 URL, and its port under the RHQ-Server Port. I thought the Agent would then use that to point to wherever the F5 was pointing to at that instance in time. If the F5 pointed to another RHQ server, then I thought that agent instance would also then point there.
I'm guessing that when I configured this originally, it did use the F5 URL, which was connected to Prod Node A at the time. And then the agent was passed the failover list only from Prod, and now uses that to connect. Could I manually edit this list to add the BCP server??? If so, how do I do this?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Monday, August 11, 2014 3:47 PM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Don't confuse the storage node configuration with the agent configration. Two different things. The agent's failover list is determined when a new server is added to the RHQ HA environment and shared to the agent the next time the agent connects to the server or when the agent periodically asks for its failover list (which happens every hour by default).
The storage cluster stuff is handled independently/differently from that. I dont know much about the storage node cluster config; someone else would have to chime in there about storage nodes.
When the agent needs to connect to a server, it looks for the hostname and port of the server to use (the data you see in failover.dat) and will use the proper Jboss/Remoting protocol (which is typically either servlet or sslservlet). So the URL it tries to connect to will be some jboss/remoting URL like "servlet://server-hostname:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet"
I don't know how you are telling the agent to use this F5 GTM URL you refer to, but I have a feeling the agent isn't using the URL you think it is using. But that's just a guess. Turn on agent debug, and look at the debug messages it spews when trying to make connections during its failover and see what its trying to connect to. _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
I do not believe you can do what you want.
First, you *could* manually change the failover.dat file, but:
a) the agent will re-generate that file from information it gets from the server whenever it connects to a server or on the hour (the agent periodically asks the server if its failover list changed - you could turn that off by setting the rhq.agent.primary-server-switchover-check-interval-msecs preference to 0 (see the comments for that setting in agent-configuration.xml), but the agent will still regenerate it whenever it connects to a server either at startup or at failover to another server. Can't tell you how things will work if you set that to 0. If you add a server to the HA environment, you'd need to re-connect all your agents to get that list to update, which is why in the comments for that setting we say you shouldn't set it to 0 unless you only have a single server in your HA env.
b) that file doesn't have the URL - its just the hostname and ports (secure and unsecure port). The agent uses JBoss/Remoting for its communications API and it uses JBoss/Remoting protocols (like servlet or sslservlet for example). But if the hostname and port is all you need to go over your F5 redirector, it may be all you need. I don't know. But you would then have to contend with a).
The failover list is generated from the server public endpoint - see the UI page Administration/Server and look at your servers - see the public endpoint information? Hostname, secure port, port? That's the information that goes in the agent failover lists. I don't think you can change those to a common F5 URL because that might cause servers not to startup properly and/or it might cause DB constraints in the RHQ_SERVER table. I don't know, I've never tried it, but I would think bad things are going to happen.
In short, I can't think of a way to do what you want. It may be possible by doing out-of-the-ordinary things like manually editing failover.dat or setting the public endpoints to all your servers to be the same and turning off the switchover check interval, but all of those have possible side effects or might not even be possible at all. So I can't say how it would work.
I recommend reading the following wiki pages that talk about agent communication, registration, failover, etc.
https://docs.jboss.org/author/display/RHQ/High+Availability#HighAvailability...
https://docs.jboss.org/author/display/RHQ/Communications+Configuration
https://docs.jboss.org/author/display/RHQ/Agent+Registration
----- Original Message -----
Would I simply change the HA parm in the rhq-server.properties file to use the F5 URL as opposed to the prod host names?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:39 AM To: 'rhq-users@lists.fedorahosted.org'; mazz@redhat.com Subject: RE: F5 GTM - RHQ Agent configuration issue
I found the failover-list.dat file in the rhq-agent/data directory. Can I make the change to this file directly? Do I need to make the change on every agent I just added (44 of them) to point to the RHQ Servers, or is there a way to push out the additional hostname to the failover list of all agents?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:33 AM To: rhq-users@lists.fedorahosted.org Subject: RE: F5 GTM - RHQ Agent configuration issue
After I install the agent, I configure it by issuing the following on Windows:
rhq-agent.bat --cleanconfig
When I enter this command, it will ask me for the agent name, host server, RHQ-Server name, RHQ-Server Port, etc.. I enter the name of the destination RHQ Server as the F5 URL, and its port under the RHQ-Server Port. I thought the Agent would then use that to point to wherever the F5 was pointing to at that instance in time. If the F5 pointed to another RHQ server, then I thought that agent instance would also then point there.
I'm guessing that when I configured this originally, it did use the F5 URL, which was connected to Prod Node A at the time. And then the agent was passed the failover list only from Prod, and now uses that to connect. Could I manually edit this list to add the BCP server??? If so, how do I do this?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Monday, August 11, 2014 3:47 PM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Don't confuse the storage node configuration with the agent configration. Two different things. The agent's failover list is determined when a new server is added to the RHQ HA environment and shared to the agent the next time the agent connects to the server or when the agent periodically asks for its failover list (which happens every hour by default).
The storage cluster stuff is handled independently/differently from that. I dont know much about the storage node cluster config; someone else would have to chime in there about storage nodes.
When the agent needs to connect to a server, it looks for the hostname and port of the server to use (the data you see in failover.dat) and will use the proper Jboss/Remoting protocol (which is typically either servlet or sslservlet). So the URL it tries to connect to will be some jboss/remoting URL like "servlet://server-hostname:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet"
I don't know how you are telling the agent to use this F5 GTM URL you refer to, but I have a feeling the agent isn't using the URL you think it is using. But that's just a guess. Turn on agent debug, and look at the debug messages it spews when trying to make connections during its failover and see what its trying to connect to. _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
Do you think if we add the BCP storage node to the cluster, so that the storage nodes are all replicated, and then also have BCP use the same backend Oracle DB as production, then the agent would pick up on the new BCP server in the mix?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Tuesday, August 12, 2014 9:21 AM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
I do not believe you can do what you want.
First, you *could* manually change the failover.dat file, but:
a) the agent will re-generate that file from information it gets from the server whenever it connects to a server or on the hour (the agent periodically asks the server if its failover list changed - you could turn that off by setting the rhq.agent.primary-server-switchover-check-interval-msecs preference to 0 (see the comments for that setting in agent-configuration.xml), but the agent will still regenerate it whenever it connects to a server either at startup or at failover to another server. Can't tell you how things will work if you set that to 0. If you add a server to the HA environment, you'd need to re-connect all your agents to get that list to update, which is why in the comments for that setting we say you shouldn't set it to 0 unless you only have a single server in your HA env.
b) that file doesn't have the URL - its just the hostname and ports (secure and unsecure port). The agent uses JBoss/Remoting for its communications API and it uses JBoss/Remoting protocols (like servlet or sslservlet for example). But if the hostname and port is all you need to go over your F5 redirector, it may be all you need. I don't know. But you would then have to contend with a).
The failover list is generated from the server public endpoint - see the UI page Administration/Server and look at your servers - see the public endpoint information? Hostname, secure port, port? That's the information that goes in the agent failover lists. I don't think you can change those to a common F5 URL because that might cause servers not to startup properly and/or it might cause DB constraints in the RHQ_SERVER table. I don't know, I've never tried it, but I would think bad things are going to happen.
In short, I can't think of a way to do what you want. It may be possible by doing out-of-the-ordinary things like manually editing failover.dat or setting the public endpoints to all your servers to be the same and turning off the switchover check interval, but all of those have possible side effects or might not even be possible at all. So I can't say how it would work.
I recommend reading the following wiki pages that talk about agent communication, registration, failover, etc.
https://docs.jboss.org/author/display/RHQ/High+Availability#HighAvailability...
https://docs.jboss.org/author/display/RHQ/Communications+Configuration
https://docs.jboss.org/author/display/RHQ/Agent+Registration
----- Original Message -----
Would I simply change the HA parm in the rhq-server.properties file to use the F5 URL as opposed to the prod host names?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:39 AM To: 'rhq-users@lists.fedorahosted.org'; mazz@redhat.com Subject: RE: F5 GTM - RHQ Agent configuration issue
I found the failover-list.dat file in the rhq-agent/data directory. Can I make the change to this file directly? Do I need to make the change on every agent I just added (44 of them) to point to the RHQ Servers, or is there a way to push out the additional hostname to the failover list of all agents?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:33 AM To: rhq-users@lists.fedorahosted.org Subject: RE: F5 GTM - RHQ Agent configuration issue
After I install the agent, I configure it by issuing the following on Windows:
rhq-agent.bat --cleanconfig
When I enter this command, it will ask me for the agent name, host server, RHQ-Server name, RHQ-Server Port, etc.. I enter the name of the destination RHQ Server as the F5 URL, and its port under the RHQ-Server Port. I thought the Agent would then use that to point to wherever the F5 was pointing to at that instance in time. If the F5 pointed to another RHQ server, then I thought that agent instance would also then point there.
I'm guessing that when I configured this originally, it did use the F5 URL, which was connected to Prod Node A at the time. And then the agent was passed the failover list only from Prod, and now uses that to connect. Could I manually edit this list to add the BCP server??? If so, how do I do this?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Monday, August 11, 2014 3:47 PM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Don't confuse the storage node configuration with the agent configration. Two different things. The agent's failover list is determined when a new server is added to the RHQ HA environment and shared to the agent the next time the agent connects to the server or when the agent periodically asks for its failover list (which happens every hour by default).
The storage cluster stuff is handled independently/differently from that. I dont know much about the storage node cluster config; someone else would have to chime in there about storage nodes.
When the agent needs to connect to a server, it looks for the hostname and port of the server to use (the data you see in failover.dat) and will use the proper Jboss/Remoting protocol (which is typically either servlet or sslservlet). So the URL it tries to connect to will be some jboss/remoting URL like "servlet://server-hostname:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet"
I don't know how you are telling the agent to use this F5 GTM URL you refer to, but I have a feeling the agent isn't using the URL you think it is using. But that's just a guess. Turn on agent debug, and look at the debug messages it spews when trying to make connections during its failover and see what its trying to connect to. _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
_______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
It really has nothing to do with Storage Nodes. It has only to do with RHQ Servers and Agents. The RHQ Agents talk only to RHQ Servers. They contact those servers via the host:port information configured for the *servers*, and stored in the failover list. I don't see how manipulating the failover list can help you, as it already contains all of the possible RHQ Servers, and already orders them in a fashion such that agent load is evenly distributed amongst the available RHQ Servers.
When you use --cleanconfig and specify the RHQ-Server name, RHQ-Server Port the agent *only* uses that server information to make first contact with a server. It then gets the initial failover list from that first server contact. It then drops that connection and establishes a new server connection based on the failover list.
Only the RHQ Servers communicate with the storage cluster, as part if the back-end repository of data. The storage clusters are completely hidden from the agents.
On 8/12/2014 10:29 AM, barry.barnett@wellsfargo.com wrote:
Do you think if we add the BCP storage node to the cluster, so that the storage nodes are all replicated, and then also have BCP use the same backend Oracle DB as production, then the agent would pick up on the new BCP server in the mix?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Tuesday, August 12, 2014 9:21 AM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
I do not believe you can do what you want.
First, you *could* manually change the failover.dat file, but:
a) the agent will re-generate that file from information it gets from the server whenever it connects to a server or on the hour (the agent periodically asks the server if its failover list changed - you could turn that off by setting the rhq.agent.primary-server-switchover-check-interval-msecs preference to 0 (see the comments for that setting in agent-configuration.xml), but the agent will still regenerate it whenever it connects to a server either at startup or at failover to another server. Can't tell you how things will work if you set that to 0. If you add a server to the HA environment, you'd need to re-connect all your agents to get that list to update, which is why in the comments for that setting we say you shouldn't set it to 0 unless you only have a single server in your HA env.
b) that file doesn't have the URL - its just the hostname and ports (secure and unsecure port). The agent uses JBoss/Remoting for its communications API and it uses JBoss/Remoting protocols (like servlet or sslservlet for example). But if the hostname and port is all you need to go over your F5 redirector, it may be all you need. I don't know. But you would then have to contend with a).
The failover list is generated from the server public endpoint - see the UI page Administration/Server and look at your servers - see the public endpoint information? Hostname, secure port, port? That's the information that goes in the agent failover lists. I don't think you can change those to a common F5 URL because that might cause servers not to startup properly and/or it might cause DB constraints in the RHQ_SERVER table. I don't know, I've never tried it, but I would think bad things are going to happen.
In short, I can't think of a way to do what you want. It may be possible by doing out-of-the-ordinary things like manually editing failover.dat or setting the public endpoints to all your servers to be the same and turning off the switchover check interval, but all of those have possible side effects or might not even be possible at all. So I can't say how it would work.
I recommend reading the following wiki pages that talk about agent communication, registration, failover, etc.
https://docs.jboss.org/author/display/RHQ/High+Availability#HighAvailability...
https://docs.jboss.org/author/display/RHQ/Communications+Configuration
https://docs.jboss.org/author/display/RHQ/Agent+Registration
----- Original Message -----
Would I simply change the HA parm in the rhq-server.properties file to use the F5 URL as opposed to the prod host names?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:39 AM To: 'rhq-users@lists.fedorahosted.org'; mazz@redhat.com Subject: RE: F5 GTM - RHQ Agent configuration issue
I found the failover-list.dat file in the rhq-agent/data directory. Can I make the change to this file directly? Do I need to make the change on every agent I just added (44 of them) to point to the RHQ Servers, or is there a way to push out the additional hostname to the failover list of all agents?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:33 AM To: rhq-users@lists.fedorahosted.org Subject: RE: F5 GTM - RHQ Agent configuration issue
After I install the agent, I configure it by issuing the following on Windows:
rhq-agent.bat --cleanconfig
When I enter this command, it will ask me for the agent name, host server, RHQ-Server name, RHQ-Server Port, etc.. I enter the name of the destination RHQ Server as the F5 URL, and its port under the RHQ-Server Port. I thought the Agent would then use that to point to wherever the F5 was pointing to at that instance in time. If the F5 pointed to another RHQ server, then I thought that agent instance would also then point there.
I'm guessing that when I configured this originally, it did use the F5 URL, which was connected to Prod Node A at the time. And then the agent was passed the failover list only from Prod, and now uses that to connect. Could I manually edit this list to add the BCP server??? If so, how do I do this?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Monday, August 11, 2014 3:47 PM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Don't confuse the storage node configuration with the agent configration. Two different things. The agent's failover list is determined when a new server is added to the RHQ HA environment and shared to the agent the next time the agent connects to the server or when the agent periodically asks for its failover list (which happens every hour by default).
The storage cluster stuff is handled independently/differently from that. I dont know much about the storage node cluster config; someone else would have to chime in there about storage nodes.
When the agent needs to connect to a server, it looks for the hostname and port of the server to use (the data you see in failover.dat) and will use the proper Jboss/Remoting protocol (which is typically either servlet or sslservlet). So the URL it tries to connect to will be some jboss/remoting URL like "servlet://server-hostname:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet"
I don't know how you are telling the agent to use this F5 GTM URL you refer to, but I have a feeling the agent isn't using the URL you think it is using. But that's just a guess. Turn on agent debug, and look at the debug messages it spews when trying to make connections during its failover and see what its trying to connect to. _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
So if we add another RHQ Server to that server mix? If we do that, I'd imagine we'd want the storage nodes to replicate to all servers that are 'paired', no?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of Jay Shaughnessy Sent: Tuesday, August 12, 2014 11:02 AM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
It really has nothing to do with Storage Nodes. It has only to do with RHQ Servers and Agents. The RHQ Agents talk only to RHQ Servers. They contact those servers via the host:port information configured for the *servers*, and stored in the failover list. I don't see how manipulating the failover list can help you, as it already contains all of the possible RHQ Servers, and already orders them in a fashion such that agent load is evenly distributed amongst the available RHQ Servers.
When you use --cleanconfig and specify the RHQ-Server name, RHQ-Server Port the agent *only* uses that server information to make first contact with a server. It then gets the initial failover list from that first server contact. It then drops that connection and establishes a new server connection based on the failover list.
Only the RHQ Servers communicate with the storage cluster, as part if the back-end repository of data. The storage clusters are completely hidden from the agents.
On 8/12/2014 10:29 AM, barry.barnett@wellsfargo.com wrote:
Do you think if we add the BCP storage node to the cluster, so that the storage nodes are all replicated, and then also have BCP use the same backend Oracle DB as production, then the agent would pick up on the new BCP server in the mix?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Tuesday, August 12, 2014 9:21 AM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
I do not believe you can do what you want.
First, you *could* manually change the failover.dat file, but:
a) the agent will re-generate that file from information it gets from the server whenever it connects to a server or on the hour (the agent periodically asks the server if its failover list changed - you could turn that off by setting the rhq.agent.primary-server-switchover-check-interval-msecs preference to 0 (see the comments for that setting in agent-configuration.xml), but the agent will still regenerate it whenever it connects to a server either at startup or at failover to another server. Can't tell you how things will work if you set that to 0. If you add a server to the HA environment, you'd need to re-connect all your agents to get that list to update, which is why in the comments for that setting we say you shouldn't set it to 0 unless you only have a single server in your HA env.
b) that file doesn't have the URL - its just the hostname and ports (secure and unsecure port). The agent uses JBoss/Remoting for its communications API and it uses JBoss/Remoting protocols (like servlet or sslservlet for example). But if the hostname and port is all you need to go over your F5 redirector, it may be all you need. I don't know. But you would then have to contend with a).
The failover list is generated from the server public endpoint - see the UI page Administration/Server and look at your servers - see the public endpoint information? Hostname, secure port, port? That's the information that goes in the agent failover lists. I don't think you can change those to a common F5 URL because that might cause servers not to startup properly and/or it might cause DB constraints in the RHQ_SERVER table. I don't know, I've never tried it, but I would think bad things are going to happen.
In short, I can't think of a way to do what you want. It may be possible by doing out-of-the-ordinary things like manually editing failover.dat or setting the public endpoints to all your servers to be the same and turning off the switchover check interval, but all of those have possible side effects or might not even be possible at all. So I can't say how it would work.
I recommend reading the following wiki pages that talk about agent communication, registration, failover, etc.
https://docs.jboss.org/author/display/RHQ/High+Availability#HighAvaila bility-FailoverLists
https://docs.jboss.org/author/display/RHQ/Communications+Configuration
https://docs.jboss.org/author/display/RHQ/Agent+Registration
----- Original Message -----
Would I simply change the HA parm in the rhq-server.properties file to use the F5 URL as opposed to the prod host names?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:39 AM To: 'rhq-users@lists.fedorahosted.org'; mazz@redhat.com Subject: RE: F5 GTM - RHQ Agent configuration issue
I found the failover-list.dat file in the rhq-agent/data directory. Can I make the change to this file directly? Do I need to make the change on every agent I just added (44 of them) to point to the RHQ Servers, or is there a way to push out the additional hostname to the failover list of all agents?
-----Original Message----- From: Barnett, Barry F Sent: Tuesday, August 12, 2014 8:33 AM To: rhq-users@lists.fedorahosted.org Subject: RE: F5 GTM - RHQ Agent configuration issue
After I install the agent, I configure it by issuing the following on Windows:
rhq-agent.bat --cleanconfig
When I enter this command, it will ask me for the agent name, host server, RHQ-Server name, RHQ-Server Port, etc.. I enter the name of the destination RHQ Server as the F5 URL, and its port under the RHQ-Server Port. I thought the Agent would then use that to point to wherever the F5 was pointing to at that instance in time. If the F5 pointed to another RHQ server, then I thought that agent instance would also then point there.
I'm guessing that when I configured this originally, it did use the F5 URL, which was connected to Prod Node A at the time. And then the agent was passed the failover list only from Prod, and now uses that to connect. Could I manually edit this list to add the BCP server??? If so, how do I do this?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Monday, August 11, 2014 3:47 PM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
Why wouldn’t the agent that points to the F5 GTM URL point to wherever the F5 wants to have it communicate with? So you’re saying the BCP RHQ server storage node has to join the Prod storage node cluster for this failover to BCP to work, even with the F5 GTM URL being used?
Don't confuse the storage node configuration with the agent configration. Two different things. The agent's failover list is determined when a new server is added to the RHQ HA environment and shared to the agent the next time the agent connects to the server or when the agent periodically asks for its failover list (which happens every hour by default).
The storage cluster stuff is handled independently/differently from that. I dont know much about the storage node cluster config; someone else would have to chime in there about storage nodes.
When the agent needs to connect to a server, it looks for the hostname and port of the server to use (the data you see in failover.dat) and will use the proper Jboss/Remoting protocol (which is typically either servlet or sslservlet). So the URL it tries to connect to will be some jboss/remoting URL like "servlet://server-hostname:7080/jboss-remoting-servlet-invoker/ServerInvokerServlet"
I don't know how you are telling the agent to use this F5 GTM URL you refer to, but I have a feeling the agent isn't using the URL you think it is using. But that's just a guess. Turn on agent debug, and look at the debug messages it spews when trying to make connections during its failover and see what its trying to connect to. _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
_______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
Do you think if we add the BCP storage node to the cluster, so that the storage nodes are all replicated, and then also have BCP use the same backend Oracle DB as production, then the agent would pick up on the new BCP server in the mix?
What Jay said. This agent/server/failover-list discussion has nothing to do with storage nodes. That's a completely different issue that would need to be addressed :)
Ok, how do we add an existing RHQ server to the 2 Prod servers already in place?
-----Original Message----- From: rhq-users-bounces@lists.fedorahosted.org [mailto:rhq-users-bounces@lists.fedorahosted.org] On Behalf Of John Mazzitelli Sent: Tuesday, August 12, 2014 11:09 AM To: rhq-users@lists.fedorahosted.org Subject: Re: F5 GTM - RHQ Agent configuration issue
Do you think if we add the BCP storage node to the cluster, so that the storage nodes are all replicated, and then also have BCP use the same backend Oracle DB as production, then the agent would pick up on the new BCP server in the mix?
What Jay said. This agent/server/failover-list discussion has nothing to do with storage nodes. That's a completely different issue that would need to be addressed :) _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
Ok, how do we add an existing RHQ server to the 2 Prod servers already in place?
Just install another server, pointing to the same backend DB as the other servers. Everything will be done automatically. By installing a server that talks to the same DB, that will tell the new server it is to add itself and it takes care of things.
Here's docs on the HA stuff: https://docs.jboss.org/author/display/RHQ/High+Availability
And yes, when you install new Storage Nodes, they must be clustered with all the others. The rhqctl installer should handle everything for you when you install a new storage node. I'm not an expert in the storage node stuff, any questions about that would have to be answered by others.
Right, adding a new server to the HA cluster is almost exactly like installing the first server. You can just follow https://docs.jboss.org/author/display/RHQ/RHQ+Server+Installation.
On 8/12/2014 11:42 AM, John Mazzitelli wrote:
Ok, how do we add an existing RHQ server to the 2 Prod servers already in place?
Just install another server, pointing to the same backend DB as the other servers. Everything will be done automatically. By installing a server that talks to the same DB, that will tell the new server it is to add itself and it takes care of things.
Here's docs on the HA stuff: https://docs.jboss.org/author/display/RHQ/High+Availability
And yes, when you install new Storage Nodes, they must be clustered with all the others. The rhqctl installer should handle everything for you when you install a new storage node. I'm not an expert in the storage node stuff, any questions about that would have to be answered by others. _______________________________________________ rhq-users mailing list rhq-users@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/rhq-users
rhq-users@lists.stg.fedorahosted.org