Greetings.
I've been playing around with glusterfs the last few days, and I thought I would send out a note about what I had found and ideas for how we could use it. ;)
glusterfs is a distributed filesystem. It's actually very easy to setup and manage, which is nice. ;)
You can setup a local one-node gluster volume in just a few commands:
yum install glusterfs* service glusterd start gluster volume create testvolume yourhostname:/testbrick gluster volume start testvolume mkdir /mnt/testvolume mount -t glusterfs yourhostname:/testvolume /mnt/testvolume
Setting up multiple nodes/peers/bricks is pretty easy.
Setting up distributing data and replication is pretty easy.
The replication seems to work pretty transparently. I setup a 2 node setup and kill -9'ed the gluster processes on one and the other kept on trucking just fine, and resynced fine after I restarted it.
Has an nfs mount ability, although it's not that good IMHO, as it's a single point of failure on whatever hostname you specify in the mount then. It could however be a handy fallback.
Warts:
- iptables rules are a bit anoying to allow the nodes to talk to each other. See: http://europe.gluster.org/community/documentation/index.php/Gluster_3.2:_Ins...
- There is a georeplication feature to allow you to replicate over a WAN link to a slave gluster instance or directory. However, this can't be a live instance, it's just for disaster recovery, and currently it requires a root ssh login with passwordless key. Pass.
- df is a bit useless, as mounts show the space on the backing volume that you created the brick on. Unless we setup mounts for each volume to use it won't really reflect space. On the other hand, du should work fine.
Possible uses for us:
We could look at using this for a shared virt storage, which would let us move things around more easily. However, there's a number of problems with that: we would have to run it on the bare virthosts, we would have to switch (as far as I can tell) to filesystem .img files for the virt images, which may not be as nice as lvm volumes. Also, we haven't in the past really moved things around much, and libvirt allows for migrations anyhow. So, I don't think this usage is too much win.
So, looking at sharing application level data, I would think we would want to setup a virt on each of our virthosts (called 'glusterN' or something). Then we could make volumes and share them to needed applications with the required replication/distribution needs for that data.
What things could we put on this?
- The tracker xapian db?
- How about other databases? I'm not sure how some db's would handle it, but it could make us able to stop the db on one host, bring it up on another with virtually no outage. If each database is it's own volume, we can move each one around pretty easily. (ie, have 4 db servers, move all db's to one, reboot/update the other 3, move them back, reboot the last, etc).
- Web/static content? Right now we rsync that to all the proxies every hour. If we had a gluster volume for it, we could just build and rsync to the gluster instance. Or if the build doesn't do anything wacky, just build on there.
- Hosted and Collab data? This would be better than the current drbd setup as we could have two instances actively using the mount/data at the same time. We would need to figure out how to distribute requests tho.
- Moving forward to next year/later this year if we get a hold of a bunch of storage, how about /mnt/koji? We would need at least 2 nodes that have enough space, but then that would get us good replication and ability to survive a machine crash much easier.
- Insert your crazy idea here. What do we have that could be made better by replication/distribution in this way? Is there enough here to make it worth deploying?
Thoughts?
kevin
On 1 February 2012 12:32, Kevin Fenzi kevin@scrye.com wrote:
Greetings.
- Web/static content? Right now we rsync that to all the proxies every
hour. If we had a gluster volume for it, we could just build and rsync to the gluster instance. Or if the build doesn't do anything wacky, just build on there.
I think of the items this sounds the best. One of the issues we have to make sure of is how gluster deals with lossy networks. Most of the wlan filesystems assume that the networks will be rather stable and fast which we do not have complete say over. The second item is how does selinux and other items work on top of gluster?
- Hosted and Collab data? This would be better than the current drbd
setup as we could have two instances actively using the mount/data at the same time. We would need to figure out how to distribute requests tho.
- Moving forward to next year/later this year if we get a hold of a
bunch of storage, how about /mnt/koji? We would need at least 2 nodes that have enough space, but then that would get us good replication and ability to survive a machine crash much easier.
- Insert your crazy idea here. What do we have that could be made
better by replication/distribution in this way? Is there enough here to make it worth deploying?
Thoughts?
I think we need to set up a test environment at say osuosl, serverbeach, telia and bodhost and see how it handles traffic over various items. I am not sure we are going to get it outside/inside of RH with the current firewalls and needs.
kevin
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
On Wed, 1 Feb 2012 18:04:56 -0700 Stephen John Smoogen smooge@gmail.com wrote:
On 1 February 2012 12:32, Kevin Fenzi kevin@scrye.com wrote:
Greetings.
- Web/static content? Right now we rsync that to all the proxies
every hour. If we had a gluster volume for it, we could just build and rsync to the gluster instance. Or if the build doesn't do anything wacky, just build on there.
I think of the items this sounds the best. One of the issues we have to make sure of is how gluster deals with lossy networks. Most of the wlan filesystems assume that the networks will be rather stable and fast which we do not have complete say over.
Right. Will do some testing on slow links. Nothing is going to be 100% wonderfull on a slow link tho. ;)
The second item is how does selinux and other items work on top of gluster?
No selinux. Everything has a fuse label.
I think we need to set up a test environment at say osuosl, serverbeach, telia and bodhost and see how it handles traffic over various items. I am not sure we are going to get it outside/inside of RH with the current firewalls and needs.
it should pass over the vpn just fine. (at least for mounts).
I'll see about setting up a test at some remote places and see how it does. Perhaps I will use serverbeach06 for one end and make a new test/dev instance somewhere in phx2 for the other (although it will need a vpn).
kevin
On 2 February 2012 13:52, Kevin Fenzi kevin@scrye.com wrote:
On Wed, 1 Feb 2012 18:04:56 -0700 Stephen John Smoogen smooge@gmail.com wrote:
On 1 February 2012 12:32, Kevin Fenzi kevin@scrye.com wrote:
Greetings.
- Web/static content? Right now we rsync that to all the proxies
every hour. If we had a gluster volume for it, we could just build and rsync to the gluster instance. Or if the build doesn't do anything wacky, just build on there.
I think of the items this sounds the best. One of the issues we have to make sure of is how gluster deals with lossy networks. Most of the wlan filesystems assume that the networks will be rather stable and fast which we do not have complete say over.
Right. Will do some testing on slow links. Nothing is going to be 100% wonderfull on a slow link tho. ;)
Well there is 80% wonderful and there is website is not working at all :).
The second item is how does selinux and other items work on top of gluster?
No selinux. Everything has a fuse label.
Ah ok.
I think we need to set up a test environment at say osuosl, serverbeach, telia and bodhost and see how it handles traffic over various items. I am not sure we are going to get it outside/inside of RH with the current firewalls and needs.
it should pass over the vpn just fine. (at least for mounts).
Oh I didn't think about the VPN.
I'll see about setting up a test at some remote places and see how it does. Perhaps I will use serverbeach06 for one end and make a new test/dev instance somewhere in phx2 for the other (although it will need a vpn).
kevin
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
infrastructure@lists.fedoraproject.org