Greetings.
I thought I would give a quick status update on our private cloud work (which skvidal has been doing. Thanks skvidal! )
Our hardware in all in and working. Our network is up and working. We have a test instance of eucalyptus up and running with a pair of machines.
Short term:
I'd like to test out Openstack on another 3 nodes or so. It's come a ways since we evaluated it last.
We need to test more with the admin/command line tools.
We need to figure out how we want to setup groups/users/etc.
We need to repave everything and re-install it in a controlled and documented manner.
Outstanding questions:
Policy:
I figured we would start out with a small group of folks with access and expand based on feedback and capacity.
https://fedoraproject.org/wiki/Infrastructure_private_cloud has some use cases we thought of.
Questions I would love feedback on:
What expectation do we want on reboots? They can go down at any time, or 'we will try and let you know if we want to reboot things' or we plan on doing a maint window every X and your instances WILL be rebooted?
What timeframe should we tell people they can use instances? Do we want to kill them after some specific time? Note that if we want to use this for dev instances, we may want to at least snapshot before taking down.
What sort of policy do we want on "Fedora relatedness" for instances? I don't think we want to offer general instances for people, but how to explain the line? Do we want to specifically forbid any uses?
What ports do we want to allow folks to use? Anything? 80/443/22 only?
How about persistent data storage? We promise to keep data for X timeframe? We make no promises? We keep as long as we have storage available?
I think we should have a very broad 'catch all' at the end of the policy allowing us to refuse service to anyone for any reason, allowing us to shutdown instances that cause problems. Or should we word that more narrowly?
How often do we want to update images? Say we have a Fedora 17 image for folks, would we want to update it daily with updates? weekly? Just when we feel like it? When security bugs affect ssh ? When security issues affect the kernel?
Any other policy related questions folks can think of?
kevin
On Wed, 29 Aug 2012, Kevin Fenzi wrote:
Greetings.
I thought I would give a quick status update on our private cloud work (which skvidal has been doing. Thanks skvidal! )
Our hardware in all in and working. Our network is up and working. We have a test instance of eucalyptus up and running with a pair of machines.
Short term:
I'd like to test out Openstack on another 3 nodes or so. It's come a ways since we evaluated it last.
+1 fed-cloud02 is the other 'head' system. If we take that and 04-05 - then that should give us a base to test more with.
We need to test more with the admin/command line tools.
I've been using the command line tools exclusively for all the euca stuff. The web interface i've only used to verify that some setting changes have occurred.
We need to figure out how we want to setup groups/users/etc.
My concept at the moment is to identify groups who will repeatedly need to create instances and create an 'account' for them. Then delegate admin access on those 'accounts' to specific users.
For people who just need an instance now to test with - we do that ourselves and flag the instance as having a short life span and who it is for.
We need to repave everything and re-install it in a controlled and documented manner.
+1. right now my steps have been:
1. new machines 2. setup repos 3. setup network devices (bridging, masquerading, dns, etc) 4. install euca software 5. configure eucalyptus.conf (and for node controllers libvirt.xsl) 6. do the euca initializing/registering and running of euca-modify-properties 7. reboot and make sure everything is up.
What expectation do we want on reboots? They can go down at any time, or 'we will try and let you know if we want to reboot things' or we plan on doing a maint window every X and your instances WILL be rebooted?
I'd say users should plan for them to go down. Just like with ec2 instances.
What timeframe should we tell people they can use instances?
Ask the user but default to one working week? (5days?)
Do we want to kill them after some specific time?
yes
Note that if we want to use this for dev instances, we may want to at least snapshot before taking down.
agreed
What sort of policy do we want on "Fedora relatedness" for instances? I don't think we want to offer general instances for people, but how to explain the line? Do we want to specifically forbid any uses?
not clear on this either. I think for a little while we'll have our hands full with just: - copr builders - randomn instances - fedora qa - fedora apps instances
What ports do we want to allow folks to use? Anything? 80/443/22 only?
So if the user has a euca 'account' then they can create their own security policy "group" which controls what can access that instance. By default I'd say 22,80,443 and ping should be sufficient for remote.
How about persistent data storage? We promise to keep data for X timeframe? We make no promises? We keep as long as we have storage available?
and how much in total, I'd think.
I think we should have a very broad 'catch all' at the end of the policy allowing us to refuse service to anyone for any reason, allowing us to shutdown instances that cause problems. Or should we word that more narrowly?
don't we have something similar with regard to fedorapeople or fedorahosted?
How often do we want to update images? Say we have a Fedora 17 image for folks, would we want to update it daily with updates? weekly? Just when we feel like it? When security bugs affect ssh ? When security issues affect the kernel?
Updating it daily seems excessive, the user can update it on their own of course. Given a short cycle of fedora I'd say maybe a couple of times a release and try to stay relatively on top of new kernels.
Running ami-creator to generate a new image is not very difficult, though.
-sv
On Wed, 29 Aug 2012 14:03:26 -0400 (EDT) Seth Vidal skvidal@fedoraproject.org wrote:
On Wed, 29 Aug 2012, Kevin Fenzi wrote:
Greetings.
I thought I would give a quick status update on our private cloud work (which skvidal has been doing. Thanks skvidal! )
Our hardware in all in and working. Our network is up and working. We have a test instance of eucalyptus up and running with a pair of machines.
Short term:
I'd like to test out Openstack on another 3 nodes or so. It's come a ways since we evaluated it last.
+1 fed-cloud02 is the other 'head' system. If we take that and 04-05 - then that should give us a base to test more with.
Yep. I can work on that, or you can if you like.
We need to test more with the admin/command line tools.
I've been using the command line tools exclusively for all the euca stuff. The web interface i've only used to verify that some setting changes have occurred.
Cool.
We need to figure out how we want to setup groups/users/etc.
My concept at the moment is to identify groups who will repeatedly need to create instances and create an 'account' for them. Then delegate admin access on those 'accounts' to specific users.
For people who just need an instance now to test with - we do that ourselves and flag the instance as having a short life span and who it is for.
Yeah, we could use our existing ticketing system for the 'one off' type instances like that.
...snip...
I'd say users should plan for them to go down. Just like with ec2 instances.
ok.
What timeframe should we tell people they can use instances?
Ask the user but default to one working week? (5days?)
That sounds reasonable. I am sure we will get some "Oh, I need another week" "Oh, I never got to it, give me another".
Do we have some way to tell if a instance is doing anything? I guess we could possibly log network traffic with iptables and key off that, but if it was a compute instance with no networking stuff going on...
Do we want to kill them after some specific time?
yes
Note that if we want to use this for dev instances, we may want to at least snapshot before taking down.
agreed
What sort of policy do we want on "Fedora relatedness" for instances? I don't think we want to offer general instances for people, but how to explain the line? Do we want to specifically forbid any uses?
not clear on this either. I think for a little while we'll have our hands full with just:
- copr builders
- randomn instances
- fedora qa
- fedora apps instances
Yeah. I just want people to have clear expectations that this is not to "run their blog or irc bouncer" in 24x7 forever. (Unless we do wish to allow that).
What ports do we want to allow folks to use? Anything? 80/443/22 only?
So if the user has a euca 'account' then they can create their own security policy "group" which controls what can access that instance.
ok. Just any account or an 'admin' account?
By default I'd say 22,80,443 and ping should be sufficient for remote.
Agreed.
How about persistent data storage? We promise to keep data for X timeframe? We make no promises? We keep as long as we have storage available?
and how much in total, I'd think.
Yeah, a quota type setup would be nice, but probibly not implemented.
I think we should have a very broad 'catch all' at the end of the policy allowing us to refuse service to anyone for any reason, allowing us to shutdown instances that cause problems. Or should we word that more narrowly?
don't we have something similar with regard to fedorapeople or fedorahosted?
I guess it's implied...
For Fedorapeople we have:
Do not distribute anything on fedorapeople.org that Fedora itself cannot distribute for legal reasons. Nothing on the ForbiddenItems list or otherwise non distributable by Fedora. Do not upload your private .ssh keys. While Fedora IT works hard on keeping the servers secure, break ins will happen and private keys uploaded can be downloaded and brute-forced easily these days. Private .ssh keys if found during an audit will be deleted.
How often do we want to update images? Say we have a Fedora 17 image for folks, would we want to update it daily with updates? weekly? Just when we feel like it? When security bugs affect ssh ? When security issues affect the kernel?
Updating it daily seems excessive, the user can update it on their own of course. Given a short cycle of fedora I'd say maybe a couple of times a release and try to stay relatively on top of new kernels.
Running ami-creator to generate a new image is not very difficult, though.
Thats good. If we do them on some schedule (weekly or bi-weekly or whatever) they could also be a useful tool...
I would like an instance with updates as of 2012-08-29, because right after that some update blew up and I want to find out why. (as a qa type thing).
kevin
On Wed, Aug 29, 2012 at 8:03 PM, Seth Vidal skvidal@fedoraproject.orgwrote:
On Wed, 29 Aug 2012, Kevin Fenzi wrote:
Greetings.
I thought I would give a quick status update on our private cloud work (which skvidal has been doing. Thanks skvidal! )
Our hardware in all in and working. Our network is up and working. We have a test instance of eucalyptus up and running with a pair of machines.
Short term:
I'd like to test out Openstack on another 3 nodes or so. It's come a ways since we evaluated it last.
+1 fed-cloud02 is the other 'head' system. If we take that and 04-05 - then that should give us a base to test more with.
+1 as well.
We need to test more with the admin/command line tools.
I've been using the command line tools exclusively for all the euca stuff. The web interface i've only used to verify that some setting changes have occurred.
We need to figure out how we want to setup groups/users/etc.
My concept at the moment is to identify groups who will repeatedly need to create instances and create an 'account' for them. Then delegate admin access on those 'accounts' to specific users.
Are you taking into account the FAS format (user, sponsors, Admin) for the
access level? Unless you guys don't intend to plug it to FAS. Otherwise, that sounds reasonable.
For people who just need an instance now to test with - we do that ourselves and flag the instance as having a short life span and who it is for.
We need to repave everything and re-install it in a controlled and documented manner.
+1. right now my steps have been:
+1
- new machines
- setup repos
- setup network devices (bridging, masquerading, dns, etc)
- install euca software
- configure eucalyptus.conf (and for node controllers libvirt.xsl)
- do the euca initializing/registering and running of
euca-modify-properties 7. reboot and make sure everything is up.
What expectation do we want on reboots? They can go down at any
time, or 'we will try and let you know if we want to reboot things' or we plan on doing a maint window every X and your instances WILL be rebooted?
I'd say users should plan for them to go down. Just like with ec2 instances.
I'd say that really depend on the purpose of the reboot/shutdown.
Any of your statement can be taken into account. We should level this depending on if we're facing a security issues, or whatever that force us to act on the instance.
What timeframe should we tell people they can use instances?
Ask the user but default to one working week? (5days?)
What does this timeframe stand for?
Do we want to kill them after some specific time?
yes
Does that mean oversee the inactivity or something and shut right down the instance?
Note that if we want to use this for dev instances, we may want to at
least snapshot before taking down.
agreed
+1
What sort of policy do we want on "Fedora relatedness" for instances?
I don't think we want to offer general instances for people, but how to explain the line? Do we want to specifically forbid any uses?
not clear on this either. I think for a little while we'll have our hands full with just:
- copr builders
- randomn instances
- fedora qa
- fedora apps instances
What ports do we want to allow folks to use? Anything? 80/443/22 only?
So if the user has a euca 'account' then they can create their own security policy "group" which controls what can access that instance. By default I'd say 22,80,443 and ping should be sufficient for remote.
+1 on this default. Which lead me to ask :
Does intance aims to be accessible from outside of the fpo network?
How about persistent data storage? We promise to keep data for X
timeframe? We make no promises? We keep as long as we have storage available?
and how much in total, I'd think.
I think we should have a very broad 'catch all' at the end of the
policy allowing us to refuse service to anyone for any reason, allowing us to shutdown instances that cause problems. Or should we word that more narrowly?
don't we have something similar with regard to fedorapeople or fedorahosted?
Right, however, we're not targeting the same user neither the same use cases, right? Or are you saying we could word something based on them?
How often do we want to update images? Say we have a Fedora 17 image
for folks, would we want to update it daily with updates? weekly? Just when we feel like it? When security bugs affect ssh ? When security issues affect the kernel?
Updating it daily seems excessive, the user can update it on their own of course. Given a short cycle of fedora I'd say maybe a couple of times a release and try to stay relatively on top of new kernels.
sounds reasonable. However, I think we should more focus on security and critical bugs affecting the instances and not just update for the fun. As said, user can handle its updates itself.
Running ami-creator to generate a new image is not very difficult, though.
Additional questions:
Does this "private cloud" intend to replace the publictests.* system in place in a near future?
I may have more questions following up.
On Wed, 29 Aug 2012 20:53:06 +0200 Xavier Lamien laxathom@fedoraproject.org wrote:
...snip...
Are you taking into account the FAS format (user, sponsors, Admin) for the
access level? Unless you guys don't intend to plug it to FAS. Otherwise, that sounds reasonable.
I think initially we are not wanting to interface with fas directly. We could revisit that I suppose... it might be nice to have groups in fas update the cloud permissions, but I have no idea how hard that will be.
+1 on this default. Which lead me to ask :
Does intance aims to be accessible from outside of the fpo network?
Yes. We have a class C of external IP's. Of course there may be some instances that will not need to use external ip's, but many will.
Right, however, we're not targeting the same user neither the same use cases, right? Or are you saying we could word something based on them?
Just something based on them, or related I guess.
sounds reasonable. However, I think we should more focus on security and critical bugs affecting the instances and not just update for the fun. As said, user can handle its updates itself.
Yeah, true.
Additional questions:
Does this "private cloud" intend to replace the publictests.* system in place in a near future?
Yes. we have already largely phased out public test systems in favor of $application.dev instances for development of applications.
If we can work it, I'd love for our *dev instances to move to this as well. I suspect many of them are idle a lot of the time, and it would be great to have it so a dev could just bring one up, work on it, and then snapshot/drop it.
I may have more questions following up.
Please do!
Thanks for the input.
kevin
On Wed, Aug 29, 2012 at 9:33 PM, Kevin Fenzi kevin@scrye.com wrote:
On Wed, 29 Aug 2012 20:53:06 +0200 Xavier Lamien laxathom@fedoraproject.org wrote:
...snip...
Are you taking into account the FAS format (user, sponsors, Admin) for the
access level? Unless you guys don't intend to plug it to FAS. Otherwise, that sounds reasonable.
I think initially we are not wanting to interface with fas directly. We could revisit that I suppose... it might be nice to have groups in fas update the cloud permissions, but I have no idea how hard that will be.
That would be as a plug-in. writing it it's just a question of time/resources availability.
+1 on this default. Which lead me to ask :
Does intance aims to be accessible from outside of the fpo network?
Yes. We have a class C of external IP's. Of course there may be some instances that will not need to use external ip's, but many will.
hrm...so do we really want to let user to be fully responsible for the
contents of the instance? I mean, how the content is being r/w from outside as he could open ports as we want (you know, security and stuff...).
Right, however, we're not targeting the same user neither the same use cases, right? Or are you saying we could word something based on them?
Just something based on them, or related I guess.
sounds reasonable. However, I think we should more focus on security and critical bugs affecting the instances and not just update for the fun. As said, user can handle its updates itself.
Yeah, true.
If we wont make user's life easier, we could eventually manage what repositories he has access to... x_x
Additional questions:
Does this "private cloud" intend to replace the publictests.* system in place in a near future?
Yes. we have already largely phased out public test systems in favor of $application.dev instances for development of applications.
If we can work it, I'd love for our *dev instances to move to this as well. I suspect many of them are idle a lot of the time, and it would be great to have it so a dev could just bring one up, work on it, and then snapshot/drop it.
Okay, Now I can understand the timeframe mentioned earlier. Do you guys think of a better way to deal with instances availability/requests to book a timeframe (say 1 month) of an instance? Something where people can see what kind of instances are available, choose one which match their criteria (or request one based on available profiles which is based on available HW resources), set the timeframe needed and receive an emal with all the infos to connect to it or via a pop-up window or whatever.
I may have more questions following up.
Please do!
Thanks for the input.
On Thu, 30 Aug 2012, Xavier Lamien wrote:
Yes. We have a class C of external IP's. Of course there may be some instances that will not need to use external ip's, but many will.
hrm...so do we really want to let user to be fully responsible for the contents of the instance? I mean, how the content is being r/w from outside as he could open ports as we want (you know, security and stuff...).
That's the whole point. These systems are: 1. on an isolated network from the rest of our systems 2. controlled through the head node(s) of the cloud system 3. can be easily restricted to a small number of ports, if we choose.
However, we don't need to do that. We can just let the user do whatever and if there is a problem, terminate the instance and it is all gone.
Cheap and disposable.
If we wont make user's life easier, we could eventually manage what repositories he has access to... x_x
We already do - but again - it doesn't matter what repos they have access to. The systems are not meant for permanence.
Okay, Now I can understand the timeframe mentioned earlier. Do you guys think of a better way to deal with instances availability/requests to book a timeframe (say 1 month) of an instance? Something where people can see what kind of instances are available, choose one which match their criteria (or request one based on available profiles which is based on available HW resources), set the timeframe needed and receive an emal with all the infos to connect to it or via a pop-up window or whatever.
It's not so much what kind of instances are available as what kind of resources they need.
Right now here are system types I defined in euca: vm types cpu ram disk |- c1.medium 1 512 10 |- m1.small 1 1024 10 |- m1.large 2 1024 20 |- m1.xlarge 4 4096 40 |- c1.xlarge 8 8192 40
Those are flexible, of course, but I think they cover a pretty good range for most cases.
-sv
On Wed, Aug 29, 2012 at 01:33:21PM -0600, Kevin Fenzi wrote:
On Wed, 29 Aug 2012 20:53:06 +0200 Xavier Lamien laxathom@fedoraproject.org wrote:
Does this "private cloud" intend to replace the publictests.* system in place in a near future?
Yes. we have already largely phased out public test systems in favor of $application.dev instances for development of applications.
If we can work it, I'd love for our *dev instances to move to this as well. I suspect many of them are idle a lot of the time, and it would be great to have it so a dev could just bring one up, work on it, and then snapshot/drop it.
yeah, I was going to say that some of these make a lot of sense for things that are easily reproducible (copr builders, for instance) while not making sense for things where someone might be using it longer term (pkgdb01.dev where the dev is using it as the primary box to do developmenton.)
But the development boxes aren't utilized 100% of the time so if we can:
1) snapshot the data so that there isn't a setup cost 2) let the dev bring the instance up on their own
then we should be able to halt the instances when we determine that they've been idle and the dev can bring them back up when they get back to working on it.
For something like pkgdb01.dev, the things that are modified are:
* apache config in /etc/httpd/conf.d/pkgdb.conf * code checkout in /srv/dev/ * Database -- in this case postgres so /var/lib/pgsql ** Note that database dump and reload may take quite a while. So to capture this data we might want to stop the database server and then snapshot the databases data files.
* Other apps have other data stores as well -- packages has a xapian db, for instance.
-Toshio
infrastructure@lists.fedoraproject.org