New subject: [PATCH] haproxy & mirrorlist processes (root cause, more fixes)

11 May 2010

The mirrorlists are falling over - haproxy keeps marking app servers
as down, and some requests are getting HTTP 503 Server Temporarily
Unavailable responses.  This happens every 10 minutes, for 2-3
minutes, as several thousand EC3 instances request the mirrorlist
again.
For reference, we're seeing a spike of over 2000 simultaneous requests
across our 6 proxy and 4 app servers, occuring every 10 minutes,
dropping back down to under 20 simultaneous requests inbetween.
Trying out several things.
1) increase number of mirrorlist WSGI processes on each app server
   from 45 to 100.  This is the maximum number of simultaneous
   mirrorlist requests that each server can serve.  I've tried this
   value on app01, and running this many still keeps the
   mirrorlist_server back end (which fork()s on each connection)
   humming right along.  I think this is safe.  Increasing much beyond
   this though, the app servers will start to swap, which we must
   avoid.  We can watch the swapping, and if it starts, lower this
   value somewhat.  The value was 6 just a few days ago, which wasn't
   working either.
This gives us 400 slots to work with on the app servers.
2) try limiting the number of connections from each proxy server to
   each app server, to 25 per.  Right now we're seeing a max of
   between 60 and 135 simultaneous requests from each proxy server to
   each app server.  All those over 25 will get queued by haproxy and
   then served as app server instances become available.  I did this
   on proxy03, and it really helped out the app servers and kept them
   humming.  There were still some longish response times (some >30
   seconds).
We're still oversubscribing app server slots here though, but
   oddly, not by as much as you'd think, as proxy03 is taking 40% of
   the incoming requests itself for some reason.
3) bump the haproxy timeout up to 60 seconds.  5 seconds (the global
   default) is way too low when we get the spikes.  This was causing
   haproxy to think app servers were down, and start sending load to
   the other app servers, which would then overload, and then start
   sending to the first backup server, ...  Let's be nicer.  If during
   a spike it takes 60 seconds to get an answer, or be told HTTP 503,
   so be it.
4) have haproxy use all the backup servers when all the app servers
   are marked down.  Right now it sends all the requests to a single
   backup server, and if that's down, all to the next backup server,
   etc.  We know one server can't handle the load (even 4 aren't
   really), so don't overload a single backup either.
5) the default mirrorlist_server listen backlog is only 5, meaning
   that at most 5 WSGI clients get queued up if all the children are
   busy.  To handle spikes, bump that to 300 (though it's limited by
   the kernel to 128 by default).  This was the intent, but the code was buggy.
6) bug fix to mirrorlist_server to not ignore SIGCHLD.  Amazing this
   ever worked in the first place.  This should resolve the problem
   where mirrorlist_server slows down and memory grows over time.

diff --git a/modules/haproxy/files/haproxy.cfg b/modules/haproxy/files/haproxy.cfg
index 6e538ed..5a6fda0 100644
--- a/modules/haproxy/files/haproxy.cfg
+++ b/modules/haproxy/files/haproxy.cfg
@@ -43,15 +43,17 @@ listen  fp-wiki 0.0.0.0:10001
listen  mirror-lists 0.0.0.0:10002
     balance hdr(appserver)
-    server  app1 app1:80 check inter 5s rise 2 fall 3
-    server  app2 app2:80 check inter 5s rise 2 fall 3
-    server  app3 app3:80 check inter 5s rise 2 fall 3
-    server  app4 app4:80 check inter 5s rise 2 fall 3
-    server  app5 app5:80 backup check inter 10s rise 2 fall 3
-    server  app6 app6:80 backup check inter 10s rise 2 fall 3
-    server  app7 app7:80 check inter 5s rise 2 fall 3
-    server  bapp1 bapp1:80 backup check inter 5s rise 2 fall 3
+    timeout connect 60s
+    server  app1 app1:80 check inter 5s rise 2 fall 3 maxconn 25
+    server  app2 app2:80 check inter 5s rise 2 fall 3 maxconn 25
+    server  app3 app3:80 check inter 5s rise 2 fall 3 maxconn 25
+    server  app4 app4:80 check inter 5s rise 2 fall 3 maxconn 25
+    server  app5 app5:80 backup check inter 10s rise 2 fall 3 maxconn 25
+    server  app6 app6:80 backup check inter 10s rise 2 fall 3 maxconn 25
+    server  app7 app7:80 check inter 5s rise 2 fall 3 maxconn 25
+    server  bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 maxconn 25
     option  httpchk GET /mirrorlist
+    option  allbackups
listen  pkgdb 0.0.0.0:10003
     balance hdr(appserver)
diff --git a/modules/mirrormanager/files/mirrorlist-server.conf b/modules/mirrormanager/files/mirrorlist-server.conf
index fd7cf98..482f7af 100644
--- a/modules/mirrormanager/files/mirrorlist-server.conf
+++ b/modules/mirrormanager/files/mirrorlist-server.conf
@@ -7,7 +7,7 @@ Alias /publiclist /var/lib/mirrormanager/mirrorlists/publiclist/
         ExpiresDefault "modification plus 1 hour"
 </Directory>
-WSGIDaemonProcess mirrorlist user=apache processes=45 threads=1 display-name=mirrorlist maximum-requests=1000
+WSGIDaemonProcess mirrorlist user=apache processes=100 threads=1 display-name=mirrorlist maximum-requests=1000
WSGIScriptAlias /metalink /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi
 WSGIScriptAlias /mirrorlist /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi
...
From 45d401446bfecba768fdf4f26409bf291172f7bc Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com
Date: Mon, 10 May 2010 15:23:57 -0500
Subject: [PATCH 1/2] mirrorlist_server: set request_queue_size earlier
While the docs say that request_queue_size can be a per-instance
value, in reality it's used during ForkingUnixStreamServer __init__,
meaning it needs to override the default class attribute instead.
Moving this up means that connections aren't blocking after about 5
are already running (default), and mirrorlist_client can now connect
in ~200us like one would expect, rather than seconds or tens of
seconds like we were seeing when lots (say, 40+) clients were
connecting simultaneously.
---
 mirrorlist-server/mirrorlist_server.py |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py
index 8825a1a..2ade357 100755
--- a/mirrorlist-server/mirrorlist_server.py
+++ b/mirrorlist-server/mirrorlist_server.py
@@ -725,6 +725,7 @@ def sighup_handler(signum, frame):
     signal.signal(signal.SIGHUP, sighup_handler)
class ForkingUnixStreamServer(ForkingMixIn, UnixStreamServer):
+    request_queue_size = 300
     def finish_request(self, request, client_address):
         signal.signal(signal.SIGHUP, signal.SIG_IGN)
         BaseServer.finish_request(self, request, client_address)
@@ -815,7 +816,6 @@ def main():
     signal.signal(signal.SIGHUP, sighup_handler)
     signal.signal(signal.SIGCHLD, signal.SIG_IGN)
     ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler)
-    ss.request_queue_size = 300
     ss.serve_forever()
try:
-- 
1.7.0.1


>From d82f20b10c755e5ce40d67ca7ea4a6dba9e37d34 Mon Sep 17 00:00:00 2001
From: Matt Domsch Matt_Domsch@dell.com
Date: Mon, 10 May 2010 23:56:09 -0500
Subject: [PATCH 2/2] mirrorlist_server: don't ignore SIGCHLD

Amazing that this ever worked in the first place.  Ignoring SIGCHLD
causes the parent's active_children list to grow without bound.  This
is also probably the cause of our long-term memory size growth.  The
parent really needs to catch SIGCHLD in order to do its reaping.
---
 mirrorlist-server/mirrorlist_server.py |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py
index 2ade357..0de7132 100755
--- a/mirrorlist-server/mirrorlist_server.py
+++ b/mirrorlist-server/mirrorlist_server.py
@@ -814,7 +814,6 @@ def main():
     open_geoip_databases()
     read_caches()
     signal.signal(signal.SIGHUP, sighup_handler)
-    signal.signal(signal.SIGCHLD, signal.SIG_IGN)
     ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler)
     ss.serve_forever()

-- 
1.7.0.1



-- 
Matt Domsch
Technology Strategist
Dell | Office of the CTO

    

[PATCH] haproxy & mirrorlist processes

def accept_encoding_gzip(request):