... now, finally, with more 64-bit-ness!
From Ted:
I've made the first WIP release of e2fsprogs 1.42. The primary purpose is for people to test the 64-bit functionality and be confident that we didn't introduce any 32-bit regressions.
So in theory you can at least mfks & mount a 16T fs and beyond, if you'd like to test that.
There was also enough surgery that testing it against older, smaller filesystems is welcomed too.
Thanks, -Eric
On 08/09/2011 06:45 PM, Eric Sandeen wrote:
I've made the first WIP release of e2fsprogs 1.42. The primary purpose is for people to test the 64-bit functionality and be confident that we didn't introduce any 32-bit regressions.
So in theory you can at least mfks & mount a 16T fs and beyond, if you'd like to test that.
Isn't this just a snapshot? In that case, the package should follow the standard naming guidelines
https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Pre-Release_packag...
Rahul
On 08/10/2011 07:59 AM, Rahul Sundaram wrote:
On 08/09/2011 06:45 PM, Eric Sandeen wrote:
I've made the first WIP release of e2fsprogs 1.42. The primary purpose is for people to test the 64-bit functionality and be confident that we didn't introduce any 32-bit regressions.
So in theory you can at least mfks & mount a 16T fs and beyond, if you'd like to test that.
Isn't this just a snapshot? In that case, the package should follow the standard naming guidelines
https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Pre-Release_packag...
Rahul
As far as I know, I did.
If I did something wrong, can you provide me with some details?
-Eric
On 08/10/2011 07:59 AM, Rahul Sundaram wrote:
On 08/09/2011 06:45 PM, Eric Sandeen wrote:
I've made the first WIP release of e2fsprogs 1.42. The primary purpose is for people to test the 64-bit functionality and be confident that we didn't introduce any 32-bit regressions.
So in theory you can at least mfks & mount a 16T fs and beyond, if you'd like to test that.
Isn't this just a snapshot? In that case, the package should follow the standard naming guidelines
https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Pre-Release_packag...
Rahul
The subject has upstream's name/version, but as packaged, it's:
Package Name e2fsprogs Version 1.42 Release 0.1.WIP.0702.fc17
which is correct, AFAICT.
-Eric
On 8/9/11 8:15 AM, Eric Sandeen wrote:
... now, finally, with more 64-bit-ness!
From Ted:
I've made the first WIP release of e2fsprogs 1.42. The primary purpose is for people to test the 64-bit functionality and be confident that we didn't introduce any 32-bit regressions.
So in theory you can at least mfks & mount a 16T fs and beyond, if you'd like to test that.
There was also enough surgery that testing it against older, smaller filesystems is welcomed too.
Thanks, -Eric
Another little heads up - a newer snapshot is built in rawhide now.
Anyone who wants to fiddle with large ext4 filesystems, have at it please!
Thanks, -Eric
On Mon, Sep 26, 2011 at 02:51:33PM -0500, Eric Sandeen wrote:
Another little heads up - a newer snapshot is built in rawhide now.
Anyone who wants to fiddle with large ext4 filesystems, have at it please!
Is there any background information to this change that I can read?
I created a 2**60 byte disk, partitioned it, and tried to create an ext4 filesystem on it, but that doesn't work:
<rescue> mke2fs -t ext4 /dev/vda1
mke2fs 1.42-WIP (25-Sep-2011)
Warning: the fs_type huge is not defined in mke2fs.conf
/dev/vda1: Cannot create filesystem with requested number of inodes while setting up superblock
<rescue> parted /dev/vda print
Model: Virtio Block Device (virtblk) Disk /dev/vda: 1126TB Sector size (logical/physical): 512B/512B Partition Table: gpt
Number Start End Size File system Name Flags 1 32.8kB 1126TB 1126TB p1
Rich.
On 10/3/11 1:13 PM, Richard W.M. Jones wrote:
On Mon, Sep 26, 2011 at 02:51:33PM -0500, Eric Sandeen wrote:
Another little heads up - a newer snapshot is built in rawhide now.
Anyone who wants to fiddle with large ext4 filesystems, have at it please!
Is there any background information to this change that I can read?
I created a 2**60 byte disk, partitioned it, and tried to create an ext4 filesystem on it, but that doesn't work:
heh; going for the gusto I see!
Can we maybe start with a mere 500 terabytes? ;)
<rescue> mke2fs -t ext4 /dev/vda1
mke2fs 1.42-WIP (25-Sep-2011)
Warning: the fs_type huge is not defined in mke2fs.conf
Icky, but unrelated to below, I think
/dev/vda1: Cannot create filesystem with requested number of inodes while setting up superblock
sounds like it calculated an inode ratio that put it over the 2^32 inode nr. limit.
I'll look into it.
Thanks, -Eric
<rescue> parted /dev/vda print
Model: Virtio Block Device (virtblk) Disk /dev/vda: 1126TB Sector size (logical/physical): 512B/512B Partition Table: gpt
Number Start End Size File system Name Flags 1 32.8kB 1126TB 1126TB p1
Rich.
On Mon, Oct 03, 2011 at 03:10:43PM -0500, Eric Sandeen wrote:
On 10/3/11 1:13 PM, Richard W.M. Jones wrote:
On Mon, Sep 26, 2011 at 02:51:33PM -0500, Eric Sandeen wrote:
Another little heads up - a newer snapshot is built in rawhide now.
Anyone who wants to fiddle with large ext4 filesystems, have at it please!
Is there any background information to this change that I can read?
I created a 2**60 byte disk, partitioned it, and tried to create an ext4 filesystem on it, but that doesn't work:
heh; going for the gusto I see!
Can we maybe start with a mere 500 terabytes? ;)
Well I started with 2**63-513, but sadly qemu has a bug that prevents me from writing to such a disk ...
https://rwmj.wordpress.com/2011/10/03/maximum-qcow2-disk-size/ https://bugs.launchpad.net/qemu/+bug/865518
<rescue> mke2fs -t ext4 /dev/vda1
mke2fs 1.42-WIP (25-Sep-2011)
Warning: the fs_type huge is not defined in mke2fs.conf
Icky, but unrelated to below, I think
/dev/vda1: Cannot create filesystem with requested number of inodes while setting up superblock
sounds like it calculated an inode ratio that put it over the 2^32 inode nr. limit.
I'll look into it.
Thanks. Let me know if/when there's anything else I can test.
Rich.
On 10/3/11 4:05 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 03:10:43PM -0500, Eric Sandeen wrote:
On 10/3/11 1:13 PM, Richard W.M. Jones wrote:
On Mon, Sep 26, 2011 at 02:51:33PM -0500, Eric Sandeen wrote:
Another little heads up - a newer snapshot is built in rawhide now.
Anyone who wants to fiddle with large ext4 filesystems, have at it please!
Is there any background information to this change that I can read?
I created a 2**60 byte disk, partitioned it, and tried to create an ext4 filesystem on it, but that doesn't work:
heh; going for the gusto I see!
Can we maybe start with a mere 500 terabytes? ;)
Well I started with 2**63-513, but sadly qemu has a bug that prevents me from writing to such a disk ...
https://rwmj.wordpress.com/2011/10/03/maximum-qcow2-disk-size/ https://bugs.launchpad.net/qemu/+bug/865518
<rescue> mke2fs -t ext4 /dev/vda1
mke2fs 1.42-WIP (25-Sep-2011)
Warning: the fs_type huge is not defined in mke2fs.conf
Icky, but unrelated to below, I think
/dev/vda1: Cannot create filesystem with requested number of inodes while setting up superblock
sounds like it calculated an inode ratio that put it over the 2^32 inode nr. limit.
I'll look into it.
Thanks. Let me know if/when there's anything else I can test.
testing something more real-world (20T ... 500T?) might still be interesting.
I'm finding a couple issues already right up at the theoretical max limit, just at mkfs time :(
-Eric
Rich.
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote:
testing something more real-world (20T ... 500T?) might still be interesting.
Here's my test script:
qemu-img create -f qcow2 test1.img 500T && \ guestfish -a test1.img \ memsize 4096 : run : \ part-disk /dev/vda gpt : mkfs ext4 /dev/vda1
The guestfish "mkfs" command translates directly to "mke2fs -t ext4" in this case.
500T: fails with the same error:
/dev/vda1: Cannot create filesystem with requested number of inodes while setting up superblock
By a process of bisection I found that I get the same error for all sizes >= 255T.
For 254T, I get:
/dev/vda1: Memory allocation failed while setting up superblock
I wasn't able to give the VM enough memory to make this succeed. I've only got 8G on this laptop. Should I need large amounts of memory to create these filesystems?
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote:
testing something more real-world (20T ... 500T?) might still be interesting.
Here's my test script:
qemu-img create -f qcow2 test1.img 500T && \ guestfish -a test1.img \ memsize 4096 : run : \ part-disk /dev/vda gpt : mkfs ext4 /dev/vda1
The guestfish "mkfs" command translates directly to "mke2fs -t ext4" in this case.
500T: fails with the same error:
/dev/vda1: Cannot create filesystem with requested number of inodes while setting up superblock
By a process of bisection I found that I get the same error for all sizes >= 255T.
For 254T, I get:
/dev/vda1: Memory allocation failed while setting up superblock
I wasn't able to give the VM enough memory to make this succeed. I've only got 8G on this laptop. Should I need large amounts of memory to create these filesystems?
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
Bleah. Care to use xfs? ;)
Anyway, interesting; when I tried the larger sizes I got many other problems, but never the "requested number of inodes" error.
I just created a large sparse file on xfs, and pointed mke2fs at that.
But I'm using bleeding-edge git, ~= the latest WIP snapshot (which I haven't put into rawhide yet, because it doesn't actually build for me w/o a couple patches I'd like upstream to ACK first).
-Eric
On 10/04/2011 12:33 AM, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote: I wasn't able to give the VM enough memory to make this succeed. I've only got 8G on this laptop. Should I need large amounts of memory to create these filesystems?
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
Bleah. Care to use xfs? ;)
why we've to use xfs? really? nobody really use large fs on linux? or nobody really use rhel? why not the e2fsprogs has too much upstream support? with 2-3TB disk the 16TB fs limit is really funny...or not so funny:-(
On 10/3/11 5:53 PM, Farkas Levente wrote:
On 10/04/2011 12:33 AM, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote: I wasn't able to give the VM enough memory to make this succeed. I've only got 8G on this laptop. Should I need large amounts of memory to create these filesystems?
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
Bleah. Care to use xfs? ;)
why we've to use xfs? really? nobody really use large fs on linux? or nobody really use rhel? why not the e2fsprogs has too much upstream support? with 2-3TB disk the 16TB fs limit is really funny...or not so funny:-(
XFS has been proven at this scale on Linux for a very long time, is all.
But, that comment was mostly tongue in cheek.
Large filesystem support for ext4 has languished upstream for a very long time, and few in the community seemed terribly interested to test it, either.
It's all fairly late in the game for ext4, but it's finally gaining some momentum, I hope. At least, the > 16T code is in the main git branch now, and the next release will pretty well have to have the restriction lifted. As Richard found, there are sure to be a few rough edges.
Luckily nobody is really talking about deploying ext4 (or XFS for that matter) at 1024 petabytes.
Testing in the 50T range is probably reasonable now, though pushing the boundaries (or maybe well shy of those boundaries) is worth doing.
-Eric
100T seems to work for light use.
I can create the filesystem, mount it, write files and directories and read them back, and fsck doesn't report any problems.
Filesystem Size Used Avail Use% Mounted on /dev/vda1 99T 129M 94T 1% /sysroot
Linux (none) 3.1.0-0.rc6.git0.3.fc16.x86_64 #1 SMP Fri Sep 16 12:26:22 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
e2fsprogs-1.42-0.3.WIP.0925.fc17.x86_64
qcow2 is very usable as a method for testing at this size.
Rich.
On 10/04/2011 01:03 AM, Eric Sandeen wrote:
On 10/3/11 5:53 PM, Farkas Levente wrote:
On 10/04/2011 12:33 AM, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote: I wasn't able to give the VM enough memory to make this succeed. I've only got 8G on this laptop. Should I need large amounts of memory to create these filesystems?
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
Bleah. Care to use xfs? ;)
why we've to use xfs? really? nobody really use large fs on linux? or nobody really use rhel? why not the e2fsprogs has too much upstream support? with 2-3TB disk the 16TB fs limit is really funny...or not so funny:-(
XFS has been proven at this scale on Linux for a very long time, is all.
the why rh do NOT support it in 32 bit? there're still system that should have to run on 32 bit:-(
On Tue, Oct 4, 2011 at 3:09 AM, Farkas Levente lfarkas@lfarkas.org wrote:
On 10/04/2011 01:03 AM, Eric Sandeen wrote:
On 10/3/11 5:53 PM, Farkas Levente wrote:
On 10/04/2011 12:33 AM, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote: I wasn't able to give the VM enough memory to make this succeed. Â I've only got 8G on this laptop. Â Should I need large amounts of memory to create these filesystems?
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. Â The underlying qcow2 file grows to several gigs and I had to kill it. Â I need to play with the lazy init features of ext4.
Rich.
Bleah. Â Care to use xfs? ;)
why we've to use xfs? really? nobody really use large fs on linux? or nobody really use rhel? why not the e2fsprogs has too much upstream support? with 2-3TB disk the 16TB fs limit is really funny...or not so funny:-(
XFS has been proven at this scale on Linux for a very long time, is all.
the why rh do NOT support it in 32 bit? there're still system that should have to run on 32 bit:-(
Then you've come to the right list! We build 32-bit kernels and they have XFS included in Fedora.
josh
On 10/4/11 2:09 AM, Farkas Levente wrote:
On 10/04/2011 01:03 AM, Eric Sandeen wrote:
On 10/3/11 5:53 PM, Farkas Levente wrote:
On 10/04/2011 12:33 AM, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote: I wasn't able to give the VM enough memory to make this succeed. I've only got 8G on this laptop. Should I need large amounts of memory to create these filesystems?
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
Bleah. Care to use xfs? ;)
why we've to use xfs? really? nobody really use large fs on linux? or nobody really use rhel? why not the e2fsprogs has too much upstream support? with 2-3TB disk the 16TB fs limit is really funny...or not so funny:-(
XFS has been proven at this scale on Linux for a very long time, is all.
the why rh do NOT support it in 32 bit? there're still system that should have to run on 32 bit:-(
32-bit machines have a 32-bit index into the page cache; on x86, that limits us to 16T for XFS, as well. So 32-bit is really not that interesting for large filesystem use.
If you need really scalable filesystems, I'd suggest a 64-bit machine.
-Eric
On 10/04/2011 05:30 PM, Eric Sandeen wrote:
XFS has been proven at this scale on Linux for a very long time, is all.
the why rh do NOT support it in 32 bit? there're still system that should have to run on 32 bit:-(
32-bit machines have a 32-bit index into the page cache; on x86, that limits us to 16T for XFS, as well. So 32-bit is really not that interesting for large filesystem use.
If you need really scalable filesystems, I'd suggest a 64-bit machine.
i mean if you support xfs and think it's better then ext4 why not support it on rhel 32bit?
On Tue, Oct 04, 2011 at 11:38:18PM +0200, Farkas Levente wrote:
On 10/04/2011 05:30 PM, Eric Sandeen wrote:
XFS has been proven at this scale on Linux for a very long time, is all.
the why rh do NOT support it in 32 bit? there're still system that should have to run on 32 bit:-(
32-bit machines have a 32-bit index into the page cache; on x86, that limits us to 16T for XFS, as well. So 32-bit is really not that interesting for large filesystem use.
If you need really scalable filesystems, I'd suggest a 64-bit machine.
i mean if you support xfs and think it's better then ext4 why not support it on rhel 32bit?
This is a question you should direct through Red Hat's support channels.
Rich.
On 10/05/2011 12:47 AM, Richard W.M. Jones wrote:
On Tue, Oct 04, 2011 at 11:38:18PM +0200, Farkas Levente wrote:
On 10/04/2011 05:30 PM, Eric Sandeen wrote:
XFS has been proven at this scale on Linux for a very long time, is all.
the why rh do NOT support it in 32 bit? there're still system that should have to run on 32 bit:-(
32-bit machines have a 32-bit index into the page cache; on x86, that limits us to 16T for XFS, as well. So 32-bit is really not that interesting for large filesystem use.
If you need really scalable filesystems, I'd suggest a 64-bit machine.
i mean if you support xfs and think it's better then ext4 why not support it on rhel 32bit?
This is a question you should direct through Red Hat's support channels.
i'm just like to ask Erik's opinion (who seems to be the fs people at rh:-)
On 10/05/2011 04:01 AM, Farkas Levente wrote:
On 10/05/2011 12:47 AM, Richard W.M. Jones wrote:
On Tue, Oct 04, 2011 at 11:38:18PM +0200, Farkas Levente wrote:
On 10/04/2011 05:30 PM, Eric Sandeen wrote:
XFS has been proven at this scale on Linux for a very long time, is all.
the why rh do NOT support it in 32 bit? there're still system that should have to run on 32 bit:-(
32-bit machines have a 32-bit index into the page cache; on x86, that limits us to 16T for XFS, as well. So 32-bit is really not that interesting for large filesystem use.
If you need really scalable filesystems, I'd suggest a 64-bit machine.
i mean if you support xfs and think it's better then ext4 why not support it on rhel 32bit?
This is a question you should direct through Red Hat's support channels.
i'm just like to ask Erik's opinion (who seems to be the fs people at rh:-)
Eric is our technical lead for file system and works in the broader file system team.
What Red Hat supports is determined by lots of things - some of them technical, some of them practical.
Practically, we try to focus our testing and resources on the most common platforms for enterprise users. 32 bit is not that common and certainly not a reasonable choice for large file systems. Most new enterprise class servers are 64 bit these days and have been for years.
I can say that as Eric's manager if that helps :)
Just to be clear, this is a *Fedora* list, not a Red Hat or RHEL list, so what considerations we as a community make about what is supported in fedora are different. In Fedora, we do worry more about supporting legacy platforms so 32 bit support will go on longer. We still do have concerns about getting sufficient testing/qa resources to validate each platform.
thanks!
Ric
On 10/04/2011 01:03 AM, Eric Sandeen wrote:
Large filesystem support for ext4 has languished upstream for a very long time, and few in the community seemed terribly interested to test it, either.
why? that's what i simple do not understand!?...
On 10/04/2011 03:12 AM, Farkas Levente wrote:
On 10/04/2011 01:03 AM, Eric Sandeen wrote:
Large filesystem support for ext4 has languished upstream for a very long time, and few in the community seemed terribly interested to test it, either.
why? that's what i simple do not understand!?...
Very few users test anything larger than a few TB's in the fedora/developer world. I routinely do a poll when I do talks with the audience about maximum file system size and almost never see a large number of people testing over 16 TB (what ext4/ext3 support historically). Most big file system users are in the national labs, bio sciences, etc...
There are also other ways to handle big data these days that pool together lots of little file systems (gluster, ceph, lustre, hdfs, etc).
It just takes time and testing to get better confidence, we will get to stability on ext4 soon enough at larger sizes.
Ric
On Mon, Oct 03, 2011 at 05:33:47PM -0500, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Actually this one isn't too bad once I let it run to the finish. The qcow2 file ends up just 7.8G. I'll try mounting it etc later.
Bleah. Care to use xfs? ;)
Just playing with ext4 at the limits :-)
Rich.
On 10/03/2011 06:33 PM, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote:
testing something more real-world (20T ... 500T?) might still be interesting.
Here's my test script:
qemu-img create -f qcow2 test1.img 500T&& \ guestfish -a test1.img \ memsize 4096 : run : \ part-disk /dev/vda gpt : mkfs ext4 /dev/vda1
...
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
Bleah. Care to use xfs? ;)
WHy not btrfs? I am testing a 24TB physical server and ext4 creation took forever while btrfs was almost instant. I understand it's still experimental (I hear storing virtual disk images on btrfs still has unresolved performance problems) but vm disk storage should be fine. FWIW I have been using btrfs as my /home at home for some time now; so far so good.
On 10/04/2011 07:19 PM, Przemek Klosowski wrote:
On 10/03/2011 06:33 PM, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote:
testing something more real-world (20T ... 500T?) might still be interesting.
Here's my test script:
qemu-img create -f qcow2 test1.img 500T&& \ guestfish -a test1.img \ memsize 4096 : run : \ part-disk /dev/vda gpt : mkfs ext4 /dev/vda1
...
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
Bleah. Care to use xfs? ;)
WHy not btrfs? I am testing a 24TB physical server and ext4 creation took forever while btrfs was almost instant. I understand it's still experimental (I hear storing virtual disk images on btrfs still has unresolved performance problems) but vm disk storage should be fine. FWIW I have been using btrfs as my /home at home for some time now; so far so good.
Creating an XFS file system is also a matter of seconds (both xfs and btrfs do dynamic inode allocation).
Note that ext4 has a new feature that allows inodes to be initialized in the background, so you will see much quicker mkfs.ext4 times as well :)
ric
On 10/4/11 6:53 PM, Ric Wheeler wrote:
On 10/04/2011 07:19 PM, Przemek Klosowski wrote:
On 10/03/2011 06:33 PM, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote:
testing something more real-world (20T ... 500T?) might still be interesting.
Here's my test script:
qemu-img create -f qcow2 test1.img 500T&& \ guestfish -a test1.img \ memsize 4096 : run : \ part-disk /dev/vda gpt : mkfs ext4 /dev/vda1
...
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
Bleah. Care to use xfs? ;)
WHy not btrfs? I am testing a 24TB physical server and ext4 creation took forever while btrfs was almost instant. I understand it's still experimental (I hear storing virtual disk images on btrfs still has unresolved performance problems) but vm disk storage should be fine. FWIW I have been using btrfs as my /home at home for some time now; so far so good.
Creating an XFS file system is also a matter of seconds (both xfs and btrfs do dynamic inode allocation).
Note that ext4 has a new feature that allows inodes to be initialized in the background, so you will see much quicker mkfs.ext4 times as well :)
right; for large ext4 fs use (or testing), try
# mkfs.ext4 -E lazy_itable_init=1 /dev/blah
this will cause it to skip inode table initialization, and speed up mkfs a LOT. It'll also keep sparse test images smaller.
IMHO this should probably be made default above a certain size.
The tradeoff is that inode table initialization happens in kernelspace, post-mount - with efforts made to do it in the background, and not impact other I/O too much.
-Eric
ric
On 10/5/11 9:58 AM, Eric Sandeen wrote:
On 10/4/11 6:53 PM, Ric Wheeler wrote:
...
Note that ext4 has a new feature that allows inodes to be initialized in the background, so you will see much quicker mkfs.ext4 times as well :)
right; for large ext4 fs use (or testing), try
# mkfs.ext4 -E lazy_itable_init=1 /dev/blah
this will cause it to skip inode table initialization, and speed up mkfs a LOT. It'll also keep sparse test images smaller.
IMHO this should probably be made default above a certain size.
The tradeoff is that inode table initialization happens in kernelspace, post-mount - with efforts made to do it in the background, and not impact other I/O too much.
Sorry, Lukas reminds me that this should already be the default mode, with a new enough kernel and new enough e2fsprogs. Rawhide should meet those criteria.
-Eric
-Eric
ric
On Wed, Oct 05, 2011 at 10:42:37AM -0500, Eric Sandeen wrote:
On 10/5/11 9:58 AM, Eric Sandeen wrote:
On 10/4/11 6:53 PM, Ric Wheeler wrote:
...
Note that ext4 has a new feature that allows inodes to be initialized in the background, so you will see much quicker mkfs.ext4 times as well :)
right; for large ext4 fs use (or testing), try
# mkfs.ext4 -E lazy_itable_init=1 /dev/blah
this will cause it to skip inode table initialization, and speed up mkfs a LOT. It'll also keep sparse test images smaller.
IMHO this should probably be made default above a certain size.
The tradeoff is that inode table initialization happens in kernelspace, post-mount - with efforts made to do it in the background, and not impact other I/O too much.
Sorry, Lukas reminds me that this should already be the default mode, with a new enough kernel and new enough e2fsprogs. Rawhide should meet those criteria.
lazy_itable_init is always on by default now?
Rich.
On 10/05/2011 05:42 PM, Eric Sandeen wrote:
right; for large ext4 fs use (or testing), try
# mkfs.ext4 -E lazy_itable_init=1 /dev/blah
this will cause it to skip inode table initialization, and speed up mkfs a LOT. It'll also keep sparse test images smaller.
IMHO this should probably be made default above a certain size.
The tradeoff is that inode table initialization happens in kernelspace, post-mount - with efforts made to do it in the background, and not impact other I/O too much.
Sorry, Lukas reminds me that this should already be the default mode, with a new enough kernel and new enough e2fsprogs. Rawhide should meet those criteria.
yes i know it's not rhel list, but still is it working on rhel-6?
On Wed, Oct 05, 2011 at 09:58:59AM -0500, Eric Sandeen wrote:
right; for large ext4 fs use (or testing), try
# mkfs.ext4 -E lazy_itable_init=1 /dev/blah
this will cause it to skip inode table initialization, and speed up mkfs a LOT. It'll also keep sparse test images smaller.
IMHO this should probably be made default above a certain size.
You almost preempted my question: Could I use this for every ext4 filesystem I make? Honestly from a virt / libguestfs p.o.v. it sounds like something we should always do.
The tradeoff is that inode table initialization happens in kernelspace, post-mount - with efforts made to do it in the background, and not impact other I/O too much.
Is there a circumstance where this is bad? I'm thinking perhaps a case where you create a filesystem and immediately try to populate it with lots of files.
Rich.
On 10/5/11 12:54 PM, Richard W.M. Jones wrote:
On Wed, Oct 05, 2011 at 09:58:59AM -0500, Eric Sandeen wrote:
right; for large ext4 fs use (or testing), try
# mkfs.ext4 -E lazy_itable_init=1 /dev/blah
this will cause it to skip inode table initialization, and speed up mkfs a LOT. It'll also keep sparse test images smaller.
IMHO this should probably be made default above a certain size.
You almost preempted my question: Could I use this for every ext4 filesystem I make? Honestly from a virt / libguestfs p.o.v. it sounds like something we should always do.
Yes, and sorry for the earlier confusion - it should be on by default for newer kernels + e2fsprogs.
The tradeoff is that inode table initialization happens in kernelspace, post-mount - with efforts made to do it in the background, and not impact other I/O too much.
Is there a circumstance where this is bad? I'm thinking perhaps a case where you create a filesystem and immediately try to populate it with lots of files.
I do have some concerns about that. I think it will take some careful benchmarking to know for sure whether it is an issue.
Commit bfff68738f1cb5c93dab1114634cea02aae9e7ba has a good summary of how it all works, and what some impact on early fs operations may be:
We do not disturb regular inode allocations in any way, it just do not care whether the inode table is, or is not zeroed. But when zeroing, we have to skip used inodes, obviously. Also we should prevent new inode allocations from the group, while zeroing is on the way. For that we take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem in the ext4_claim_inode, so when we are unlucky and allocator hits the group which is currently being zeroed, it just has to wait.
-Eric
Rich.
On 10/05/2011 01:19 AM, Przemek Klosowski wrote:
On 10/03/2011 06:33 PM, Eric Sandeen wrote:
On 10/3/11 5:13 PM, Richard W.M. Jones wrote:
On Mon, Oct 03, 2011 at 04:11:28PM -0500, Eric Sandeen wrote:
testing something more real-world (20T ... 500T?) might still be interesting.
Here's my test script:
qemu-img create -f qcow2 test1.img 500T&& \ guestfish -a test1.img \ memsize 4096 : run : \ part-disk /dev/vda gpt : mkfs ext4 /dev/vda1
...
At 100T it doesn't run out of memory, but the man behind the curtain starts to show. The underlying qcow2 file grows to several gigs and I had to kill it. I need to play with the lazy init features of ext4.
Rich.
Bleah. Care to use xfs? ;)
WHy not btrfs? I am testing a 24TB physical server and ext4 creation took forever while btrfs was almost instant. I understand it's still experimental (I hear storing virtual disk images on btrfs still has unresolved performance problems) but vm disk storage should be fine. FWIW I have been using btrfs as my /home at home for some time now; so far so good.
imho if the bfrts maintainer said it's not ready for prime time then it's not ready for production use.
devel@lists.stg.fedoraproject.org