-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, 21 Mar 2012 10:08:38 -0400 seth vidal skvidal@fedoraproject.org wrote:
On Tue, 20 Mar 2012 21:38:13 -0500 Dennis Gilmore dennis@ausil.us wrote:
Today there is not a way to do an anaconda install on any arm system. though hopefully we will have that for deployment.
I would hope so. :)
probably we would be adding 100-300 systems. not only do we need to consider overloading of puppet, but also logging and monitoring. I guess its more how do we scale our infrastructure from at a guess ~100 nodes today to 3 to 4 times that
Centrally logging the builders is probably unnecessary. Especially if we're bouncing them all the time.
i think it could be useful for capacity planning and detecting when things go bad(TM). I wouldn't cry if we do not have it.
honestly we could do this instead of the monthly updates. just rebuild them instead
Sure - but I'm thinking of the emergency "oh look at that nightmare" updates.
im ok with that, im pretty sure fas will scale to the extra boxes. do we drop monitoring of the builders? what about collectd etc.
Collectd - off. We're not gaining much by having that punish the syslog server. We can monitor the builders w/o needing all of the copious info that collectd provides.
fas I'm not very worried about - though I suspect a couple of things will change w/how we get the dbs onto the hosts.
main issue is that today we are not 100% sure of how we will install arm boxes. how do we deal with all the non puppet related systems?
I think, if the playbooks are working well, we can use ansible to do this.
also need to look into how we can better scale koji itself. when we go from 20 to 200+ builders we need to make sure that load doesn't cause koji to fall over.
okay - but I think that's more something for the kojidevs than fedora infra?
not really, its not that koji itself wont scale but that we really will likely need to look at load balancing again, or look at an internal hub or 2, each builder checks in every 10 seconds to see if there is anything to do. all state and everything else is stored in the db. so adding multiple hubs to read and write to the db are ok. but i want to make sure that 300 hosts checking in and all the public traffic for koji get gracefully handled
all the arm boxes will have management consoles. but today im not 100% sure how access to that would be. we would also need to deploy fedora for any arm based systems. things we need to reconsider also is networking today the storage network and the builder networks are /24's so we could use 253 nodes. i suspect we will go over that on the build network. we could not have the storage network on arm builders. it is really only needed for createrepo. but we may need to look at expanding kojipkgs to more nodes. or increase its network throughput with multiple bonded gig network ports. think mass rebuild and 100 or 200 buildroots initialising at once. it will stress our resources on all levels. but the flexibility of so many nodes could allow us to deploy solid solutions to scale and show that fedora is still the leader in open infrastructure and sets industry best practices.
So one thing I'm not sure I understand - why would we need so many arm builders? Is it b/c there are so many more arm archs so there will need to be more pkgs built?
2 reasons why we will be looking at so many. hardware and software floating point are incompatiable. so builders that are building hardware floating point only build hardware floating point and the same for software floating point. and while we are looking at quad core 1.5ghz-2.0ghz builders with 4gb ram to start with they are still not quite as powerful a as there x86 counterparts. since they are low power 3-10 watts per node as opposed to 200-300watts for the existing builders I want to err on the side of too many rather than not enough and have people complain that they have to wait for a arm builder. realistically mass rebuilds are when it will be most noticiable. At a minimum I want at least double the number of x86 nodes for each arch so ~80 total. I do have on my list of things to come up with some reporting from arm koji and primary koji to see what the average build time is. knowing that what will will deploy will be faster than what we have today.
Dennis