I have few questions regarding how we plan to use our new Fedora Cloud:
1) Do we want to use Swift there? How much space we should allocate for Swift storage?
2) How should I configure /var/lib/nova/instances? This is directory where RootFS, swap and ephemeral storages of instances are stored. Right now fed-cloud09 have this path on root mount point where is 15 GB available. But it can be extended to whole size of "vg_server" volume group, which is 383GB. fed-cloud1X have now: /dev/mapper/vg_guests-nova 3.2T 32G 3.0T 2% /var/lib/nova This gives us accumulated (over controller and 6 compute nodes) storage of 18.4 TB storage. However this setup does not allows live migration, because /var/lib/nova is not on shared FS.
So I propose that /var/lib/nova/instances is NFS exported from fed-cloud09 to all compute nodes. This will allow live migrations of VMs. However this means that /var/lib/nova/instances on fed-cloud09 should be big enough to host all rootfs, swaps and ephemeral devices for all VMs in Fedora Cloud. This is 3-20 GB per instance. so 383 GB is not enough. Other disadvantage is that ephemeral devices will be on network storage and not so fast as local device. However I suppose we do not use ephemeral storage too much (maybe not at all). I would use /dev/md127 from fed-cloud09 (2.89 TB) for /var/lib/nova. This should be enough for approximately 287 instances. Which is cca accumulated capacity of our compute nodes. However it may slightly limit us in future if we ever add more compute nodes. I would suggest to use /dev/md127 (2.89TB raid6 storage) on fed-cloud1X(s) as storage for Swift. And configure Glance to store images in Swift. This have advantage that Swift can be set up to store more replicas. On the other hand this is little bit overkill. As we would have 3TB+ (it depend how much replicas we will use) storage for Swift. However biggest consumer would be Glance. And Glance in fed-cloud02 only consume 100GB currently.
What you like most, hove more space for /var/lib/nova as in current setup but without live migrations? Or smaller storage, which on the other hand allows live migration? Or you have better idea how to split disks and machines to different services?
P.S. I am putting aside normal volumes for data, which will reside in DellEqualogic which is 20TB. And it is perfectly fine.
On Fri, 10 Apr 2015 09:01:58 +0200 Miroslav Suchý msuchy@redhat.com wrote:
I have few questions regarding how we plan to use our new Fedora Cloud:
- Do we want to use Swift there? How much space we should allocate
for Swift storage?
I think it might be a good idea to have some swift space setup, but I am not sure what use cases we fully have for it, so I would say it should be somewhat small. 100GB or something? This would also be backed by the equalogics? Or would it be distributed on the nodes?
- How should I configure /var/lib/nova/instances?
This is directory where RootFS, swap and ephemeral storages of instances are stored. Right now fed-cloud09 have this path on root mount point where is 15 GB available. But it can be extended to whole size of "vg_server" volume group, which is 383GB. fed-cloud1X have now: /dev/mapper/vg_guests-nova 3.2T 32G 3.0T 2% /var/lib/nova This gives us accumulated (over controller and 6 compute nodes) storage of 18.4 TB storage. However this setup does not allows live migration, because /var/lib/nova is not on shared FS.
So I propose that /var/lib/nova/instances is NFS exported from fed-cloud09 to all compute nodes. This will allow live migrations of VMs. However this means that /var/lib/nova/instances on fed-cloud09 should be big enough to host all rootfs, swaps and ephemeral devices for all VMs in Fedora Cloud. This is 3-20 GB per instance. so 383 GB is not enough. Other disadvantage is that ephemeral devices will be on network storage and not so fast as local device. However I suppose we do not use ephemeral storage too much (maybe not at all). I would use /dev/md127 from fed-cloud09 (2.89 TB) for /var/lib/nova. This should be enough for approximately 287 instances. Which is cca accumulated capacity of our compute nodes. However it may slightly limit us in future if we ever add more compute nodes.
I guess that would be ok. Or could we also use the equalogics for this? Make say a 5TB volume for this? That would limit volume space however.
Another option would be gluster, but thats going to add more complexity for sure.
One other option: we have another smaller equalogics we could use for instances perhaps. It has about 1/2 the space of the current one in the cloud, but that still might be better if we could use it instead of local?
I would suggest to use /dev/md127 (2.89TB raid6 storage) on fed-cloud1X(s) as storage for Swift. And configure Glance to store images in Swift. This have advantage that Swift can be set up to store more replicas. On the other hand this is little bit overkill. As we would have 3TB+ (it depend how much replicas we will use) storage for Swift. However biggest consumer would be Glance. And Glance in fed-cloud02 only consume 100GB currently.
Yeah, we really don't have too many images. On the old/existing cloud we used the head node (fed-cloud02) and then added one of the compute nodes to cinder (fed-cloud08 I think) to add more room. But thats not the part we a limited on in the new cloud. ;)
What you like most, hove more space for /var/lib/nova as in current setup but without live migrations? Or smaller storage, which on the other hand allows live migration? Or you have better idea how to split disks and machines to different services?
Well, I am not sure I care too much about live migrations. I can see some cases where it would be nice, but in most of those we could also just spin up a new instance and reconfigure.
Are there specific cases where live migrations would help us out a lot?
P.S. I am putting aside normal volumes for data, which will reside in DellEqualogic which is 20TB. And it is perfectly fine.
Yeah.
kevin
On 04/10/2015 06:04 PM, Kevin Fenzi wrote:
I think it might be a good idea to have some swift space setup, but I am not sure what use cases we fully have for it, so I would say it should be somewhat small. 100GB or something? This would also be backed by the equalogics? Or would it be distributed on the nodes?
Swift have built-in split/replica mechanism. So I think that with this size, I can steal some space from vg_server of nodes.
Are there specific cases where live migrations would help us out a lot?
I do not know. Cold migration last several minutes. I agree that we can afford it. We are not bank or stock operator where every outage cost pile of money.
On Mon, 13 Apr 2015 09:56:26 +0200 Miroslav Suchý msuchy@redhat.com wrote:
On 04/10/2015 06:04 PM, Kevin Fenzi wrote:
I think it might be a good idea to have some swift space setup, but I am not sure what use cases we fully have for it, so I would say it should be somewhat small. 100GB or something? This would also be backed by the equalogics? Or would it be distributed on the nodes?
Swift have built-in split/replica mechanism. So I think that with this size, I can steal some space from vg_server of nodes.
Yeah. Or perhaps somewhat bigger would make sense? 500G?
Are there specific cases where live migrations would help us out a lot?
I do not know. Cold migration last several minutes. I agree that we can afford it. We are not bank or stock operator where every outage cost pile of money.
Yeah. The one place I thought might be nice was if we wanted to reboot a compute node to update it, but then I got to thinking, why shouldn't we also just reboot the instances too and update them as well? ;)
kevin
On 04/13/2015 03:54 PM, Kevin Fenzi wrote:
Yeah. The one place I thought might be nice was if we wanted to reboot a compute node to update it, but then I got to thinking, why shouldn't we also just reboot the instances too and update them as well? ;)
I just tried - when I reboot Compute Nodes, all VM located there will switch to "Shut down" state. And will not be power on automatically. Hard reset will power them on. However I am afraid that cloud-perstisten.yml will not handle it. When the machine is not unreachable it will spin up new one.
Cold migration works. I tested it yesterday on our instance and it is functional.
On Thu, 16 Apr 2015 09:55:49 +0200 Miroslav Suchý msuchy@redhat.com wrote:
On 04/13/2015 03:54 PM, Kevin Fenzi wrote:
Yeah. The one place I thought might be nice was if we wanted to reboot a compute node to update it, but then I got to thinking, why shouldn't we also just reboot the instances too and update them as well? ;)
I just tried - when I reboot Compute Nodes, all VM located there will switch to "Shut down" state. And will not be power on automatically. Hard reset will power them on. However I am afraid that cloud-perstisten.yml will not handle it. When the machine is not unreachable it will spin up new one.
Yeah, it would need a way to look for 'shutdown' and hard power it on. Might be doable.
Cold migration works. I tested it yesterday on our instance and it is functional.
ok.
kevin
infrastructure@lists.fedoraproject.org