The mirrorlists are falling over - haproxy keeps marking app servers as down, and some requests are getting HTTP 503 Server Temporarily Unavailable responses. This happens every 10 minutes, for 2-3 minutes, as several thousand EC3 instances request the mirrorlist again.
For reference, we're seeing a spike of over 2000 simultaneous requests across our 6 proxy and 4 app servers, occuring every 10 minutes, dropping back down to under 20 simultaneous requests inbetween.
Trying out several things.
1) increase number of mirrorlist WSGI processes on each app server from 45 to 100. This is the maximum number of simultaneous mirrorlist requests that each server can serve. I've tried this value on app01, and running this many still keeps the mirrorlist_server back end (which fork()s on each connection) humming right along. I think this is safe. Increasing much beyond this though, the app servers will start to swap, which we must avoid. We can watch the swapping, and if it starts, lower this value somewhat. The value was 6 just a few days ago, which wasn't working either.
This gives us 400 slots to work with on the app servers.
2) try limiting the number of connections from each proxy server to each app server, to 25 per. Right now we're seeing a max of between 60 and 135 simultaneous requests from each proxy server to each app server. All those over 25 will get queued by haproxy and then served as app server instances become available. I did this on proxy03, and it really helped out the app servers and kept them humming. There were still some longish response times (some >30 seconds).
We're still oversubscribing app server slots here though, but oddly, not by as much as you'd think, as proxy03 is taking 40% of the incoming requests itself for some reason.
3) bump the haproxy timeout up to 60 seconds. 5 seconds (the global default) is way too low when we get the spikes. This was causing haproxy to think app servers were down, and start sending load to the other app servers, which would then overload, and then start sending to the first backup server, ... Let's be nicer. If during a spike it takes 60 seconds to get an answer, or be told HTTP 503, so be it.
4) have haproxy use all the backup servers when all the app servers are marked down. Right now it sends all the requests to a single backup server, and if that's down, all to the next backup server, etc. We know one server can't handle the load (even 4 aren't really), so don't overload a single backup either.
5) the default mirrorlist_server listen backlog is only 5, meaning that at most 5 WSGI clients get queued up if all the children are busy. To handle spikes, bump that to 300 (though it's limited by the kernel to 128 by default). This was the intent, but the code was buggy.
6) bug fix to mirrorlist_server to not ignore SIGCHLD. Amazing this ever worked in the first place. This should resolve the problem where mirrorlist_server slows down and memory grows over time.
diff --git a/modules/haproxy/files/haproxy.cfg b/modules/haproxy/files/haproxy.cfg index 6e538ed..5a6fda0 100644 --- a/modules/haproxy/files/haproxy.cfg +++ b/modules/haproxy/files/haproxy.cfg @@ -43,15 +43,17 @@ listen fp-wiki 0.0.0.0:10001
listen mirror-lists 0.0.0.0:10002 balance hdr(appserver) - server app1 app1:80 check inter 5s rise 2 fall 3 - server app2 app2:80 check inter 5s rise 2 fall 3 - server app3 app3:80 check inter 5s rise 2 fall 3 - server app4 app4:80 check inter 5s rise 2 fall 3 - server app5 app5:80 backup check inter 10s rise 2 fall 3 - server app6 app6:80 backup check inter 10s rise 2 fall 3 - server app7 app7:80 check inter 5s rise 2 fall 3 - server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 + timeout connect 60s + server app1 app1:80 check inter 5s rise 2 fall 3 maxconn 25 + server app2 app2:80 check inter 5s rise 2 fall 3 maxconn 25 + server app3 app3:80 check inter 5s rise 2 fall 3 maxconn 25 + server app4 app4:80 check inter 5s rise 2 fall 3 maxconn 25 + server app5 app5:80 backup check inter 10s rise 2 fall 3 maxconn 25 + server app6 app6:80 backup check inter 10s rise 2 fall 3 maxconn 25 + server app7 app7:80 check inter 5s rise 2 fall 3 maxconn 25 + server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 maxconn 25 option httpchk GET /mirrorlist + option allbackups
listen pkgdb 0.0.0.0:10003 balance hdr(appserver) diff --git a/modules/mirrormanager/files/mirrorlist-server.conf b/modules/mirrormanager/files/mirrorlist-server.conf index fd7cf98..482f7af 100644 --- a/modules/mirrormanager/files/mirrorlist-server.conf +++ b/modules/mirrormanager/files/mirrorlist-server.conf @@ -7,7 +7,7 @@ Alias /publiclist /var/lib/mirrormanager/mirrorlists/publiclist/ ExpiresDefault "modification plus 1 hour" </Directory>
-WSGIDaemonProcess mirrorlist user=apache processes=45 threads=1 display-name=mirrorlist maximum-requests=1000 +WSGIDaemonProcess mirrorlist user=apache processes=100 threads=1 display-name=mirrorlist maximum-requests=1000
WSGIScriptAlias /metalink /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi WSGIScriptAlias /mirrorlist /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi
From 45d401446bfecba768fdf4f26409bf291172f7bc Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com Date: Mon, 10 May 2010 15:23:57 -0500 Subject: [PATCH 1/2] mirrorlist_server: set request_queue_size earlier
While the docs say that request_queue_size can be a per-instance value, in reality it's used during ForkingUnixStreamServer __init__, meaning it needs to override the default class attribute instead.
Moving this up means that connections aren't blocking after about 5 are already running (default), and mirrorlist_client can now connect in ~200us like one would expect, rather than seconds or tens of seconds like we were seeing when lots (say, 40+) clients were connecting simultaneously. --- mirrorlist-server/mirrorlist_server.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py index 8825a1a..2ade357 100755 --- a/mirrorlist-server/mirrorlist_server.py +++ b/mirrorlist-server/mirrorlist_server.py @@ -725,6 +725,7 @@ def sighup_handler(signum, frame): signal.signal(signal.SIGHUP, sighup_handler)
class ForkingUnixStreamServer(ForkingMixIn, UnixStreamServer): + request_queue_size = 300 def finish_request(self, request, client_address): signal.signal(signal.SIGHUP, signal.SIG_IGN) BaseServer.finish_request(self, request, client_address) @@ -815,7 +816,6 @@ def main(): signal.signal(signal.SIGHUP, sighup_handler) signal.signal(signal.SIGCHLD, signal.SIG_IGN) ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler) - ss.request_queue_size = 300 ss.serve_forever()
try:
On Tue, May 11, 2010 at 02:48:23PM -0500, Matt Domsch wrote:
The mirrorlists are falling over - haproxy keeps marking app servers as down, and some requests are getting HTTP 503 Server Temporarily Unavailable responses. This happens every 10 minutes, for 2-3 minutes, as several thousand EC3 instances request the mirrorlist again.
For reference, we're seeing a spike of over 2000 simultaneous requests across our 6 proxy and 4 app servers, occuring every 10 minutes, dropping back down to under 20 simultaneous requests inbetween.
Trying out several things.
increase number of mirrorlist WSGI processes on each app server from 45 to 100. This is the maximum number of simultaneous mirrorlist requests that each server can serve. I've tried this value on app01, and running this many still keeps the mirrorlist_server back end (which fork()s on each connection) humming right along. I think this is safe. Increasing much beyond this though, the app servers will start to swap, which we must avoid. We can watch the swapping, and if it starts, lower this value somewhat. The value was 6 just a few days ago, which wasn't working either.
This gives us 400 slots to work with on the app servers.
This seems okay as a temporary measure but we won't want this as a permanent fix unless we can get more RAM for the app servers or separate app servers for the mirrorlist processes.
The reason is that running close to swap means that we don't have room to grow the other services if they need it, increase the mirrorlist processes if we need even more slots, or add new services.
+1
try limiting the number of connections from each proxy server to each app server, to 25 per. Right now we're seeing a max of between 60 and 135 simultaneous requests from each proxy server to each app server. All those over 25 will get queued by haproxy and then served as app server instances become available. I did this on proxy03, and it really helped out the app servers and kept them humming. There were still some longish response times (some >30 seconds).
We're still oversubscribing app server slots here though, but oddly, not by as much as you'd think, as proxy03 is taking 40% of the incoming requests itself for some reason.
This does seem like a good thing to try and then decide if we want it permanently.
+1
- bump the haproxy timeout up to 60 seconds. 5 seconds (the global default) is way too low when we get the spikes. This was causing haproxy to think app servers were down, and start sending load to the other app servers, which would then overload, and then start sending to the first backup server, ... Let's be nicer. If during a spike it takes 60 seconds to get an answer, or be told HTTP 503, so be it.
60 seconds seems a bit long when something does happen to a single server that should take it out of rotation for a bit. We aren't likely to purposefully be doing things that take down app server during change freeze but it's probably not a good idea to be quite this high in the long run. Something to do for now but tweak some after the release?
+1
- have haproxy use all the backup servers when all the app servers are marked down. Right now it sends all the requests to a single backup server, and if that's down, all to the next backup server, etc. We know one server can't handle the load (even 4 aren't really), so don't overload a single backup either.
+1
- the default mirrorlist_server listen backlog is only 5, meaning that at most 5 WSGI clients get queued up if all the children are busy. To handle spikes, bump that to 300 (though it's limited by the kernel to 128 by default). This was the intent, but the code was buggy.
+1
- bug fix to mirrorlist_server to not ignore SIGCHLD. Amazing this ever worked in the first place. This should resolve the problem where mirrorlist_server slows down and memory grows over time.
+1
diff --git a/modules/haproxy/files/haproxy.cfg b/modules/haproxy/files/haproxy.cfg index 6e538ed..5a6fda0 100644 --- a/modules/haproxy/files/haproxy.cfg +++ b/modules/haproxy/files/haproxy.cfg @@ -43,15 +43,17 @@ listen fp-wiki 0.0.0.0:10001
listen mirror-lists 0.0.0.0:10002 balance hdr(appserver)
- server app1 app1:80 check inter 5s rise 2 fall 3
- server app2 app2:80 check inter 5s rise 2 fall 3
- server app3 app3:80 check inter 5s rise 2 fall 3
- server app4 app4:80 check inter 5s rise 2 fall 3
- server app5 app5:80 backup check inter 10s rise 2 fall 3
- server app6 app6:80 backup check inter 10s rise 2 fall 3
- server app7 app7:80 check inter 5s rise 2 fall 3
- server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3
- timeout connect 60s
- server app1 app1:80 check inter 5s rise 2 fall 3 maxconn 25
- server app2 app2:80 check inter 5s rise 2 fall 3 maxconn 25
- server app3 app3:80 check inter 5s rise 2 fall 3 maxconn 25
- server app4 app4:80 check inter 5s rise 2 fall 3 maxconn 25
- server app5 app5:80 backup check inter 10s rise 2 fall 3 maxconn 25
- server app6 app6:80 backup check inter 10s rise 2 fall 3 maxconn 25
- server app7 app7:80 check inter 5s rise 2 fall 3 maxconn 25
- server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 maxconn 25 option httpchk GET /mirrorlist
- option allbackups
listen pkgdb 0.0.0.0:10003 balance hdr(appserver) diff --git a/modules/mirrormanager/files/mirrorlist-server.conf b/modules/mirrormanager/files/mirrorlist-server.conf index fd7cf98..482f7af 100644 --- a/modules/mirrormanager/files/mirrorlist-server.conf +++ b/modules/mirrormanager/files/mirrorlist-server.conf @@ -7,7 +7,7 @@ Alias /publiclist /var/lib/mirrormanager/mirrorlists/publiclist/ ExpiresDefault "modification plus 1 hour"
</Directory>
-WSGIDaemonProcess mirrorlist user=apache processes=45 threads=1 display-name=mirrorlist maximum-requests=1000 +WSGIDaemonProcess mirrorlist user=apache processes=100 threads=1 display-name=mirrorlist maximum-requests=1000
WSGIScriptAlias /metalink /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi WSGIScriptAlias /mirrorlist /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi
From 45d401446bfecba768fdf4f26409bf291172f7bc Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com Date: Mon, 10 May 2010 15:23:57 -0500 Subject: [PATCH 1/2] mirrorlist_server: set request_queue_size earlier
While the docs say that request_queue_size can be a per-instance value, in reality it's used during ForkingUnixStreamServer __init__, meaning it needs to override the default class attribute instead.
Moving this up means that connections aren't blocking after about 5 are already running (default), and mirrorlist_client can now connect in ~200us like one would expect, rather than seconds or tens of seconds like we were seeing when lots (say, 40+) clients were connecting simultaneously.
mirrorlist-server/mirrorlist_server.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py index 8825a1a..2ade357 100755 --- a/mirrorlist-server/mirrorlist_server.py +++ b/mirrorlist-server/mirrorlist_server.py @@ -725,6 +725,7 @@ def sighup_handler(signum, frame): signal.signal(signal.SIGHUP, sighup_handler)
class ForkingUnixStreamServer(ForkingMixIn, UnixStreamServer):
- request_queue_size = 300 def finish_request(self, request, client_address): signal.signal(signal.SIGHUP, signal.SIG_IGN) BaseServer.finish_request(self, request, client_address)
@@ -815,7 +816,6 @@ def main(): signal.signal(signal.SIGHUP, sighup_handler) signal.signal(signal.SIGCHLD, signal.SIG_IGN) ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler)
ss.request_queue_size = 300 ss.serve_forever()
try:
-- 1.7.0.1
From d82f20b10c755e5ce40d67ca7ea4a6dba9e37d34 Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com Date: Mon, 10 May 2010 23:56:09 -0500 Subject: [PATCH 2/2] mirrorlist_server: don't ignore SIGCHLD
Amazing that this ever worked in the first place. Ignoring SIGCHLD causes the parent's active_children list to grow without bound. This is also probably the cause of our long-term memory size growth. The parent really needs to catch SIGCHLD in order to do its reaping.
mirrorlist-server/mirrorlist_server.py | 1 - 1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py index 2ade357..0de7132 100755 --- a/mirrorlist-server/mirrorlist_server.py +++ b/mirrorlist-server/mirrorlist_server.py @@ -814,7 +814,6 @@ def main(): open_geoip_databases() read_caches() signal.signal(signal.SIGHUP, sighup_handler)
- signal.signal(signal.SIGCHLD, signal.SIG_IGN) ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler) ss.serve_forever()
+1 to implementation
-Toshio
On Tue, 11 May 2010, Toshio Kuratomi wrote:
On Tue, May 11, 2010 at 02:48:23PM -0500, Matt Domsch wrote:
The mirrorlists are falling over - haproxy keeps marking app servers as down, and some requests are getting HTTP 503 Server Temporarily Unavailable responses. This happens every 10 minutes, for 2-3 minutes, as several thousand EC3 instances request the mirrorlist again.
For reference, we're seeing a spike of over 2000 simultaneous requests across our 6 proxy and 4 app servers, occuring every 10 minutes, dropping back down to under 20 simultaneous requests inbetween.
Trying out several things.
increase number of mirrorlist WSGI processes on each app server from 45 to 100. This is the maximum number of simultaneous mirrorlist requests that each server can serve. I've tried this value on app01, and running this many still keeps the mirrorlist_server back end (which fork()s on each connection) humming right along. I think this is safe. Increasing much beyond this though, the app servers will start to swap, which we must avoid. We can watch the swapping, and if it starts, lower this value somewhat. The value was 6 just a few days ago, which wasn't working either.
This gives us 400 slots to work with on the app servers.
This seems okay as a temporary measure but we won't want this as a permanent fix unless we can get more RAM for the app servers or separate app servers for the mirrorlist processes.
The reason is that running close to swap means that we don't have room to grow the other services if they need it, increase the mirrorlist processes if we need even more slots, or add new services.
+1
try limiting the number of connections from each proxy server to each app server, to 25 per. Right now we're seeing a max of between 60 and 135 simultaneous requests from each proxy server to each app server. All those over 25 will get queued by haproxy and then served as app server instances become available. I did this on proxy03, and it really helped out the app servers and kept them humming. There were still some longish response times (some >30 seconds).
We're still oversubscribing app server slots here though, but oddly, not by as much as you'd think, as proxy03 is taking 40% of the incoming requests itself for some reason.
This does seem like a good thing to try and then decide if we want it permanently.
+1
- bump the haproxy timeout up to 60 seconds. 5 seconds (the global default) is way too low when we get the spikes. This was causing haproxy to think app servers were down, and start sending load to the other app servers, which would then overload, and then start sending to the first backup server, ... Let's be nicer. If during a spike it takes 60 seconds to get an answer, or be told HTTP 503, so be it.
60 seconds seems a bit long when something does happen to a single server that should take it out of rotation for a bit. We aren't likely to purposefully be doing things that take down app server during change freeze but it's probably not a good idea to be quite this high in the long run. Something to do for now but tweak some after the release?
+1
- have haproxy use all the backup servers when all the app servers are marked down. Right now it sends all the requests to a single backup server, and if that's down, all to the next backup server, etc. We know one server can't handle the load (even 4 aren't really), so don't overload a single backup either.
+1
- the default mirrorlist_server listen backlog is only 5, meaning that at most 5 WSGI clients get queued up if all the children are busy. To handle spikes, bump that to 300 (though it's limited by the kernel to 128 by default). This was the intent, but the code was buggy.
+1
- bug fix to mirrorlist_server to not ignore SIGCHLD. Amazing this ever worked in the first place. This should resolve the problem where mirrorlist_server slows down and memory grows over time.
+1
diff --git a/modules/haproxy/files/haproxy.cfg b/modules/haproxy/files/haproxy.cfg index 6e538ed..5a6fda0 100644 --- a/modules/haproxy/files/haproxy.cfg +++ b/modules/haproxy/files/haproxy.cfg @@ -43,15 +43,17 @@ listen fp-wiki 0.0.0.0:10001
listen mirror-lists 0.0.0.0:10002 balance hdr(appserver)
- server app1 app1:80 check inter 5s rise 2 fall 3
- server app2 app2:80 check inter 5s rise 2 fall 3
- server app3 app3:80 check inter 5s rise 2 fall 3
- server app4 app4:80 check inter 5s rise 2 fall 3
- server app5 app5:80 backup check inter 10s rise 2 fall 3
- server app6 app6:80 backup check inter 10s rise 2 fall 3
- server app7 app7:80 check inter 5s rise 2 fall 3
- server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3
- timeout connect 60s
- server app1 app1:80 check inter 5s rise 2 fall 3 maxconn 25
- server app2 app2:80 check inter 5s rise 2 fall 3 maxconn 25
- server app3 app3:80 check inter 5s rise 2 fall 3 maxconn 25
- server app4 app4:80 check inter 5s rise 2 fall 3 maxconn 25
- server app5 app5:80 backup check inter 10s rise 2 fall 3 maxconn 25
- server app6 app6:80 backup check inter 10s rise 2 fall 3 maxconn 25
- server app7 app7:80 check inter 5s rise 2 fall 3 maxconn 25
- server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 maxconn 25 option httpchk GET /mirrorlist
- option allbackups
listen pkgdb 0.0.0.0:10003 balance hdr(appserver) diff --git a/modules/mirrormanager/files/mirrorlist-server.conf b/modules/mirrormanager/files/mirrorlist-server.conf index fd7cf98..482f7af 100644 --- a/modules/mirrormanager/files/mirrorlist-server.conf +++ b/modules/mirrormanager/files/mirrorlist-server.conf @@ -7,7 +7,7 @@ Alias /publiclist /var/lib/mirrormanager/mirrorlists/publiclist/ ExpiresDefault "modification plus 1 hour"
</Directory>
-WSGIDaemonProcess mirrorlist user=apache processes=45 threads=1 display-name=mirrorlist maximum-requests=1000 +WSGIDaemonProcess mirrorlist user=apache processes=100 threads=1 display-name=mirrorlist maximum-requests=1000
WSGIScriptAlias /metalink /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi WSGIScriptAlias /mirrorlist /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi
From 45d401446bfecba768fdf4f26409bf291172f7bc Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com Date: Mon, 10 May 2010 15:23:57 -0500 Subject: [PATCH 1/2] mirrorlist_server: set request_queue_size earlier
While the docs say that request_queue_size can be a per-instance value, in reality it's used during ForkingUnixStreamServer __init__, meaning it needs to override the default class attribute instead.
Moving this up means that connections aren't blocking after about 5 are already running (default), and mirrorlist_client can now connect in ~200us like one would expect, rather than seconds or tens of seconds like we were seeing when lots (say, 40+) clients were connecting simultaneously.
mirrorlist-server/mirrorlist_server.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py index 8825a1a..2ade357 100755 --- a/mirrorlist-server/mirrorlist_server.py +++ b/mirrorlist-server/mirrorlist_server.py @@ -725,6 +725,7 @@ def sighup_handler(signum, frame): signal.signal(signal.SIGHUP, sighup_handler)
class ForkingUnixStreamServer(ForkingMixIn, UnixStreamServer):
- request_queue_size = 300 def finish_request(self, request, client_address): signal.signal(signal.SIGHUP, signal.SIG_IGN) BaseServer.finish_request(self, request, client_address)
@@ -815,7 +816,6 @@ def main(): signal.signal(signal.SIGHUP, sighup_handler) signal.signal(signal.SIGCHLD, signal.SIG_IGN) ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler)
ss.request_queue_size = 300 ss.serve_forever()
try:
-- 1.7.0.1
From d82f20b10c755e5ce40d67ca7ea4a6dba9e37d34 Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com Date: Mon, 10 May 2010 23:56:09 -0500 Subject: [PATCH 2/2] mirrorlist_server: don't ignore SIGCHLD
Amazing that this ever worked in the first place. Ignoring SIGCHLD causes the parent's active_children list to grow without bound. This is also probably the cause of our long-term memory size growth. The parent really needs to catch SIGCHLD in order to do its reaping.
mirrorlist-server/mirrorlist_server.py | 1 - 1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py index 2ade357..0de7132 100755 --- a/mirrorlist-server/mirrorlist_server.py +++ b/mirrorlist-server/mirrorlist_server.py @@ -814,7 +814,6 @@ def main(): open_geoip_databases() read_caches() signal.signal(signal.SIGHUP, sighup_handler)
- signal.signal(signal.SIGCHLD, signal.SIG_IGN) ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler) ss.serve_forever()
+1 to implementation
+1 to all of these.
-Mike
On Tue, May 11, 2010 at 02:48:23PM -0500, Matt Domsch wrote:
The mirrorlists are falling over - haproxy keeps marking app servers as down, and some requests are getting HTTP 503 Server Temporarily Unavailable responses. This happens every 10 minutes, for 2-3 minutes, as several thousand EC3 instances request the mirrorlist again.
For reference, we're seeing a spike of over 2000 simultaneous requests across our 6 proxy and 4 app servers, occuring every 10 minutes, dropping back down to under 20 simultaneous requests inbetween.
Trying out several things.
increase number of mirrorlist WSGI processes on each app server from 45 to 100. This is the maximum number of simultaneous mirrorlist requests that each server can serve. I've tried this value on app01, and running this many still keeps the mirrorlist_server back end (which fork()s on each connection) humming right along. I think this is safe. Increasing much beyond this though, the app servers will start to swap, which we must avoid. We can watch the swapping, and if it starts, lower this value somewhat. The value was 6 just a few days ago, which wasn't working either.
This gives us 400 slots to work with on the app servers.
I don't have to do this anymore. I found the source of the problem (the CPU on the app servers was at 100% utilization, due to another bug in MM's handling of user input). Fixing that, we don't need nearly as many workers.
try limiting the number of connections from each proxy server to each app server, to 25 per. Right now we're seeing a max of between 60 and 135 simultaneous requests from each proxy server to each app server. All those over 25 will get queued by haproxy and then served as app server instances become available. I did this on proxy03, and it really helped out the app servers and kept them humming. There were still some longish response times (some >30 seconds).
We're still oversubscribing app server slots here though, but oddly, not by as much as you'd think, as proxy03 is taking 40% of the incoming requests itself for some reason.
I'm not going to do these either.
- bump the haproxy timeout up to 60 seconds. 5 seconds (the global default) is way too low when we get the spikes. This was causing haproxy to think app servers were down, and start sending load to the other app servers, which would then overload, and then start sending to the first backup server, ... Let's be nicer. If during a spike it takes 60 seconds to get an answer, or be told HTTP 503, so be it.
I did bump the timeout to 30 seconds instead of 60. We'll see how that works.
- have haproxy use all the backup servers when all the app servers are marked down. Right now it sends all the requests to a single backup server, and if that's down, all to the next backup server, etc. We know one server can't handle the load (even 4 aren't really), so don't overload a single backup either.
Done.
- the default mirrorlist_server listen backlog is only 5, meaning that at most 5 WSGI clients get queued up if all the children are busy. To handle spikes, bump that to 300 (though it's limited by the kernel to 128 by default). This was the intent, but the code was buggy.
Will do.
- bug fix to mirrorlist_server to not ignore SIGCHLD. Amazing this ever worked in the first place. This should resolve the problem where mirrorlist_server slows down and memory grows over time.
Will do.
The root cause of the CPU utilization by mirrorlist_client.wsgi was that mirrorlist_server.py couldn't deal with malformed arch=i386%80%E2 style query strings, and was crashing. mirrorlist_client.wsgi would then spin forever waiting for the server to respond, which it never would. This patch addresses the malformed input.
From 621af2882e984459a23d1e7af3ef1854ea6f0ba1 Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com Date: Tue, 11 May 2010 22:57:19 -0500 Subject: [PATCH 1/2] mirrorlist_client: sanitize input into UTF-8
Users may put all sorts of odd URL-escaped characters onto the URLs. When this happens, mirrorlist_server.py child thread would crash trying to write the byte string with escaped '\x80' characters inside it into the error report header. When the child crashes, mirrorlist_client.wsgi would spin forever eating 100% CPU waiting for a response from the server that would never come.
This patch sanitizes the input query args into a byte string that only contains UTF-8 characters. This fixes this particular source of crash in the server and subsequent hang of the client. It would still be good if the client could time out or otherwise recognize when the server is never coming back. --- mirrorlist-server/mirrorlist_client.wsgi | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/mirrorlist-server/mirrorlist_client.wsgi b/mirrorlist-server/mirrorlist_client.wsgi index 9dbdf7a..dd6d1d9 100755 --- a/mirrorlist-server/mirrorlist_client.wsgi +++ b/mirrorlist-server/mirrorlist_client.wsgi @@ -83,6 +83,12 @@ def request_setup(request): pathinfo = request.environ['PATH_INFO'] if scriptname == '/metalink' or pathinfo == '/metalink': d['metalink'] = True + + for k, v in d.iteritems(): + try: + d[k] = unicode(v, 'utf8', 'ignore').encode('utf8') + except: + pass return d
def accept_encoding_gzip(request):
On Tue, May 11, 2010 at 11:20:58PM -0500, Matt Domsch wrote:
On Tue, May 11, 2010 at 02:48:23PM -0500, Matt Domsch wrote:
The mirrorlists are falling over - haproxy keeps marking app servers as down, and some requests are getting HTTP 503 Server Temporarily Unavailable responses. This happens every 10 minutes, for 2-3 minutes, as several thousand EC3 instances request the mirrorlist again.
For reference, we're seeing a spike of over 2000 simultaneous requests across our 6 proxy and 4 app servers, occuring every 10 minutes, dropping back down to under 20 simultaneous requests inbetween.
Trying out several things.
increase number of mirrorlist WSGI processes on each app server from 45 to 100. This is the maximum number of simultaneous mirrorlist requests that each server can serve. I've tried this value on app01, and running this many still keeps the mirrorlist_server back end (which fork()s on each connection) humming right along. I think this is safe. Increasing much beyond this though, the app servers will start to swap, which we must avoid. We can watch the swapping, and if it starts, lower this value somewhat. The value was 6 just a few days ago, which wasn't working either.
This gives us 400 slots to work with on the app servers.
I don't have to do this anymore. I found the source of the problem (the CPU on the app servers was at 100% utilization, due to another bug in MM's handling of user input). Fixing that, we don't need nearly as many workers.
try limiting the number of connections from each proxy server to each app server, to 25 per. Right now we're seeing a max of between 60 and 135 simultaneous requests from each proxy server to each app server. All those over 25 will get queued by haproxy and then served as app server instances become available. I did this on proxy03, and it really helped out the app servers and kept them humming. There were still some longish response times (some >30 seconds).
We're still oversubscribing app server slots here though, but oddly, not by as much as you'd think, as proxy03 is taking 40% of the incoming requests itself for some reason.
I'm not going to do these either.
- bump the haproxy timeout up to 60 seconds. 5 seconds (the global default) is way too low when we get the spikes. This was causing haproxy to think app servers were down, and start sending load to the other app servers, which would then overload, and then start sending to the first backup server, ... Let's be nicer. If during a spike it takes 60 seconds to get an answer, or be told HTTP 503, so be it.
I did bump the timeout to 30 seconds instead of 60. We'll see how that works.
- have haproxy use all the backup servers when all the app servers are marked down. Right now it sends all the requests to a single backup server, and if that's down, all to the next backup server, etc. We know one server can't handle the load (even 4 aren't really), so don't overload a single backup either.
Done.
- the default mirrorlist_server listen backlog is only 5, meaning that at most 5 WSGI clients get queued up if all the children are busy. To handle spikes, bump that to 300 (though it's limited by the kernel to 128 by default). This was the intent, but the code was buggy.
Will do.
- bug fix to mirrorlist_server to not ignore SIGCHLD. Amazing this ever worked in the first place. This should resolve the problem where mirrorlist_server slows down and memory grows over time.
Will do.
The root cause of the CPU utilization by mirrorlist_client.wsgi was that mirrorlist_server.py couldn't deal with malformed arch=i386%80%E2 style query strings, and was crashing. mirrorlist_client.wsgi would then spin forever waiting for the server to respond, which it never would. This patch addresses the malformed input.
From 621af2882e984459a23d1e7af3ef1854ea6f0ba1 Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com Date: Tue, 11 May 2010 22:57:19 -0500 Subject: [PATCH 1/2] mirrorlist_client: sanitize input into UTF-8
Users may put all sorts of odd URL-escaped characters onto the URLs. When this happens, mirrorlist_server.py child thread would crash trying to write the byte string with escaped '\x80' characters inside it into the error report header. When the child crashes, mirrorlist_client.wsgi would spin forever eating 100% CPU waiting for a response from the server that would never come.
This patch sanitizes the input query args into a byte string that only contains UTF-8 characters. This fixes this particular source of crash in the server and subsequent hang of the client. It would still be good if the client could time out or otherwise recognize when the server is never coming back.
mirrorlist-server/mirrorlist_client.wsgi | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/mirrorlist-server/mirrorlist_client.wsgi b/mirrorlist-server/mirrorlist_client.wsgi index 9dbdf7a..dd6d1d9 100755 --- a/mirrorlist-server/mirrorlist_client.wsgi +++ b/mirrorlist-server/mirrorlist_client.wsgi @@ -83,6 +83,12 @@ def request_setup(request): pathinfo = request.environ['PATH_INFO'] if scriptname == '/metalink' or pathinfo == '/metalink': d['metalink'] = True
- for k, v in d.iteritems():
try:
d[k] = unicode(v, 'utf8', 'ignore').encode('utf8')
except:
return dpass
def accept_encoding_gzip(request):
1.7.0.1
Second patch will address the lack of a timeout:
From 4976394a2f988843baf1d3e490dcc5dd3e74b1ea Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com Date: Tue, 11 May 2010 23:14:15 -0500 Subject: [PATCH 2/2] mirrorlist_client: add 60sec timeout to reading from the server
mirrorlist-server/mirrorlist_client.wsgi | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/mirrorlist-server/mirrorlist_client.wsgi b/mirrorlist-server/mirrorlist_client.wsgi index dd6d1d9..cc4416c 100755 --- a/mirrorlist-server/mirrorlist_client.wsgi +++ b/mirrorlist-server/mirrorlist_client.wsgi @@ -10,8 +10,10 @@ from string import zfill, atoi, strip, replace from paste.wsgiwrappers import * import gzip import cStringIO +from datetime import datetime, timedelta
socketfile = '/var/run/mirrormanager/mirrorlist_server.sock' +request_timeout = 60 # seconds
def get_mirrorlist(d): try: @@ -37,9 +39,10 @@ def get_mirrorlist(d): readlen = len(resultsize) resultsize = atoi(resultsize)
- expiry = datetime.utcnow() + timedelta(seconds=request_timeout) readlen = 0 p = ''
- while readlen < resultsize:
- while readlen < resultsize and datetime.utcnow() < expiry: p += s.recv(resultsize - readlen) readlen = len(p)
-- 1.7.0.1
I plan to hotfix both mirrorlist_client.wsgi and mirrorlist_server.py on all the app servers with the above several patches then.
+1
Thanks, Matt!
-Toshio
On Tue, 11 May 2010, Matt Domsch wrote:
The root cause of the CPU utilization by mirrorlist_client.wsgi was that mirrorlist_server.py couldn't deal with malformed arch=i386%80%E2 style query strings, and was crashing. mirrorlist_client.wsgi would then spin forever waiting for the server to respond, which it never would. This patch addresses the malformed input.
+1?
+1
-sv
infrastructure@lists.fedoraproject.org