So we've finally hit that tipping point in mod_cache where it's not quite behaving correctly. So I've been looking at alternatives. For those not familiar with the current setup (in order of processes) it goes:
httpd(proxy) -> haproxy(proxy) -> httpd(app)
The first two apps are both on the proxy servers, haproxy is our balancer that sends it to httpd.
I've been looking at a better proxy solution. I initially pushed back against varnish because it would complicate the environment, and this will but since apache isn't cutting it I figured a slow incremental change is the best approach. So what I'm proposing is this:
httpd(proxy) -> varnish(proxy) -> haproxy(proxy) -> httpd(app)
So a couple of reasons why I'm choosing to do design, especially since, in theory, varnish can completely replace both httpd and haproxy in that picture.
First, trying to be incremental allows this to be a very non-intrusive change. We're literally installing the varnish package, deploying a single varnish config file and altering port settings in the httpd configs. This will be easy to revert and troubleshoot.
Second, replacing haproxy. Varnish's load balancing is pretty primitive right now. It can do health checks, but only at the host level. This means we'd have to create a check definition for every host * every service. Which is just pretty nasty.
Third, replacing httpd is also a bit complex. We use a lot of features in apache to do things like redirects, compression, etagging, static file serving (like fedoraproject.org), and the big one is ssl.
So anyway, varnish's caching abilities are FAR superior to httpd, not just in terms of speed. The people that have suggested this in the past (warren and daMaestro come to mind) were right, it can do a lot. So for interested parties, go over the tech docs, lets learn it and find out what features we may want.
For now though I'm working on getting it in puppet in staging, we can run it there for a little while and then move it to production. The nice thing here is that we can very slowly integrate things, like starting with smolt or the wiki, then add others since it's just a port change. Haproxy listens on a different port for each farm, but varnish only listens on one. Since all of our applications are in their own namespace (/wiki vs /smolt-wiki for example) it makes this transition smooth and easy. Hurray good architecture!
So, questions, comments, concerns? Anyone think this is a bad idea? Speak up!
-Mike
On Thu, Jul 22, 2010 at 10:36:20AM -0500, Mike McGrath wrote:
So we've finally hit that tipping point in mod_cache where it's not quite behaving correctly. So I've been looking at alternatives. For those not familiar with the current setup (in order of processes) it goes:
httpd(proxy) -> haproxy(proxy) -> httpd(app)
The first two apps are both on the proxy servers, haproxy is our balancer that sends it to httpd.
I've been looking at a better proxy solution. I initially pushed back against varnish because it would complicate the environment, and this will but since apache isn't cutting it I figured a slow incremental change is the best approach. So what I'm proposing is this:
httpd(proxy) -> varnish(proxy) -> haproxy(proxy) -> httpd(app)
+1. I agree with the reasoning and like the simplicity of being able to move varnish left or right in the above, if/when it's capable of serving those needs as well.
Mike McGrath wrote:
I've been looking at a better proxy solution. I initially pushed back against varnish because it would complicate the environment, and this will but since apache isn't cutting it I figured a slow incremental change is the best approach. So what I'm proposing is this:
httpd(proxy) -> varnish(proxy) -> haproxy(proxy) -> httpd(app)
So a couple of reasons why I'm choosing to do design, especially since, in theory, varnish can completely replace both httpd and haproxy in that picture.
I do not have all that much positive experience wrt. Varnish's efficiency. Have you researched any other alternatives?
I'm reading that the proxy functionality is first and foremost implemented for caching, and second for l/b?
In other architecture designs (large or heavily loaded rails deployments for example), the concept of assets is introduced. Would such functionality serve purpose?
-- Jeroen
On Fri, 23 Jul 2010, Jeroen van Meeuwen wrote:
Mike McGrath wrote:
I've been looking at a better proxy solution. I initially pushed back against varnish because it would complicate the environment, and this will but since apache isn't cutting it I figured a slow incremental change is the best approach. So what I'm proposing is this:
httpd(proxy) -> varnish(proxy) -> haproxy(proxy) -> httpd(app)
So a couple of reasons why I'm choosing to do design, especially since, in theory, varnish can completely replace both httpd and haproxy in that picture.
I do not have all that much positive experience wrt. Varnish's efficiency. Have you researched any other alternatives?
Define efficiency, load times, disk space, cpu usage?
I'm reading that the proxy functionality is first and foremost implemented for caching, and second for l/b?
Actually I'd reverse that, first and foremost for load balancing and redundancy. Next about caching + geo location.
In other architecture designs (large or heavily loaded rails deployments for example), the concept of assets is introduced. Would such functionality serve purpose?
I've not used assets before, what's the scoop?
-Mike
Mike McGrath wrote:
On Fri, 23 Jul 2010, Jeroen van Meeuwen wrote:
Mike McGrath wrote:
I've been looking at a better proxy solution. I initially pushed back against varnish because it would complicate the environment, and this
will
but since apache isn't cutting it I figured a slow incremental change is the best approach. So what I'm proposing is this:
httpd(proxy) -> varnish(proxy) -> haproxy(proxy) -> httpd(app)
So a couple of reasons why I'm choosing to do design, especially since,
in
theory, varnish can completely replace both httpd and haproxy in that picture.
I do not have all that much positive experience wrt. Varnish's efficiency. Have you researched any other alternatives?
Define efficiency, load times, disk space, cpu usage?
Admittedly I'm not an expert in Varnish, but we run it over at Fedora Unity (behind apache, as a caching engine for plone/zope). Jonathan Steffan has a lot more expertise on Varnish - I let him know about this thread in case he would be willing to allow us to pick his brain. Pages remained presenting their cached version after editing, article locks are shown where in fact there were no locks, that sorta thing.
I suppose sinking one's teeth into it would suffice so resolve such problems, but for our deployment Varnish bloated the system more then we had in mind in terms or reducing the load on the backend application. This, in our case, had to do with memory consumption more so then with anything else.
In other architecture designs (large or heavily loaded rails deployments
for
example), the concept of assets is introduced. Would such functionality
serve
purpose?
I've not used assets before, what's the scoop?
With mod_passenger for Apache, *everything* that you hit on the VirtualHost goes through the Passenger stack. It has learned to not parse the contents of /images/, sorta speak, yet the client locks a thread and makes the the rather bloated httpd+passenger run in circles even for just a little while -per "asset".
With assets -and I'm not sure that term is accurate "across the board"-, a bunch of smart Ruby-on-Rails webservers (with mod_rails/mod_rack and such) no longer serve any content that is 1) static or 2) generated to be cached, and not subject to any Access Control. The latter would, in the case of Ruby on Rails, apply to haml/sass content, but in the case of Fedora would (maybe) apply to the mirror lists. These assets can then be hosted on a light-weight, stupid and efficient webserver -even without caching such would off-load the backend application servers considerably.
-- Jeroen
PS. Sorry, this had been sitting in my Drafts way too long...
On Wed, Jul 28, 2010 at 03:36, Jeroen van Meeuwen kanarip@kanarip.com wrote:
Mike McGrath wrote:
On Fri, 23 Jul 2010, Jeroen van Meeuwen wrote:
Mike McGrath wrote:
I've been looking at a better proxy solution. I initially pushed back against varnish because it would complicate the environment, and this
will
but since apache isn't cutting it I figured a slow incremental change is the best approach. So what I'm proposing is this:
httpd(proxy) -> varnish(proxy) -> haproxy(proxy) -> httpd(app)
So a couple of reasons why I'm choosing to do design, especially since,
in
theory, varnish can completely replace both httpd and haproxy in that picture.
I do not have all that much positive experience wrt. Varnish's efficiency. Have you researched any other alternatives?
Define efficiency, load times, disk space, cpu usage?
Admittedly I'm not an expert in Varnish, but we run it over at Fedora Unity
I also, have only slightly touched Varnish, just getting it to cache things, I cached one site that was rather slow to respond for everyone, that ended up being taken down, and I saw some fair improvement with the cached version. The stack was Apache up front with Varnish at the backend (at least speaking in terms of my machine).
(behind apache, as a caching engine for plone/zope). Jonathan Steffan has a lot more expertise on Varnish - I let him know about this thread in case he would be willing to allow us to pick his brain. Pages remained presenting their cached version after editing, article locks are shown where in fact there were no locks, that sorta thing.
I suppose sinking one's teeth into it would suffice so resolve such problems, but for our deployment Varnish bloated the system more then we had in mind in terms or reducing the load on the backend application. This, in our case, had to do with memory consumption more so then with anything else.
Varnish can be told not to use memory for caching, and that's how I've used it, 1GB doesn't go a long way when you've got 64-bit Apache HTTPd.
In other architecture designs (large or heavily loaded rails deployments
for
example), the concept of assets is introduced. Would such functionality
serve
purpose?
I've not used assets before, what's the scoop?
With mod_passenger for Apache, *everything* that you hit on the VirtualHost goes through the Passenger stack. It has learned to not parse the contents of
I didn't know that. Then again, my rails apps don't get traffic at all.
/images/, sorta speak, yet the client locks a thread and makes the the rather bloated httpd+passenger run in circles even for just a little while -per "asset".
With assets -and I'm not sure that term is accurate "across the board"-, a bunch of smart Ruby-on-Rails webservers (with mod_rails/mod_rack and such) no longer serve any content that is 1) static or 2) generated to be cached, and not subject to any Access Control. The latter would, in the case of Ruby on Rails, apply to haml/sass content, but in the case of Fedora would (maybe) apply to the mirror lists. These assets can then be hosted on a light-weight, stupid and efficient webserver -even without caching such would off-load the backend application servers considerably.
This definition of assets is most definitely correct. I tend to think of them as anything that is used frequently, and changed rarely, like HTML generated by cron jobs or changed manually by people, CSS, JavaScript, images of any kind.
Darren VanBuren ================== http://theoks.net/
On Wed, Jul 28, 2010 at 5:26 PM, Darren VanBuren onekopaka@gmail.com wrote:
On Wed, Jul 28, 2010 at 03:36, Jeroen van Meeuwen kanarip@kanarip.com wrote:
Mike McGrath wrote:
On Fri, 23 Jul 2010, Jeroen van Meeuwen wrote:
Mike McGrath wrote:
I've been looking at a better proxy solution. I initially pushed back against varnish because it would complicate the environment, and this
will
but since apache isn't cutting it I figured a slow incremental change is the best approach. So what I'm proposing is this:
httpd(proxy) -> varnish(proxy) -> haproxy(proxy) -> httpd(app)
So a couple of reasons why I'm choosing to do design, especially since,
in
theory, varnish can completely replace both httpd and haproxy in that picture.
I do not have all that much positive experience wrt. Varnish's efficiency. Have you researched any other alternatives?
If the content you are trying to cache are uncacheable, it really doesnt matter what tech you use. But if it is cacheable, varnish does the job better than any other alternative out there.
Varnish can be told not to use memory for caching, and that's how I've used it, 1GB doesn't go a long way when you've got 64-bit Apache HTTPd.
It ends up in virtual memory anyhow, serving from disk is too slow. You probably have graphs showing the usage today?
Cheers Stein Ove Rosseland
On Wed, Jul 28, 2010 at 10:31, Stein Ove Rosseland so.rosseland@gmail.com wrote:
On Wed, Jul 28, 2010 at 5:26 PM, Darren VanBuren onekopaka@gmail.com wrote:
On Wed, Jul 28, 2010 at 03:36, Jeroen van Meeuwen kanarip@kanarip.com wrote:
Mike McGrath wrote:
On Fri, 23 Jul 2010, Jeroen van Meeuwen wrote:
Mike McGrath wrote:
I've been looking at a better proxy solution. I initially pushed back against varnish because it would complicate the environment, and this
will
but since apache isn't cutting it I figured a slow incremental change is the best approach. So what I'm proposing is this:
httpd(proxy) -> varnish(proxy) -> haproxy(proxy) -> httpd(app)
So a couple of reasons why I'm choosing to do design, especially since,
in
theory, varnish can completely replace both httpd and haproxy in that picture.
I do not have all that much positive experience wrt. Varnish's efficiency. Have you researched any other alternatives?
If the content you are trying to cache are uncacheable, it really doesnt matter what tech you use. But if it is cacheable, varnish does the job better than any other alternative out there.
Varnish can be told not to use memory for caching, and that's how I've used it, 1GB doesn't go a long way when you've got 64-bit Apache HTTPd.
It ends up in virtual memory anyhow, serving from disk is too slow. You probably have graphs showing the usage today?
Cheers Stein Ove Rosseland
The site I was caching is long since dead, and therefore, the caching system is also removed.
Darren L. VanBuren ===================== http://theoks.net/
infrastructure@lists.fedoraproject.org