Hi, Tools such as ls or stat report the size of a directory. Of course it is not the content size. stat -c %s /home/sergio/.config
6550
What does 6550 mean in btrfs context?
Thanks in advance!
I know next to nothing about btrfs, but on ext family of filesystems, directories are technically just special files. Those files contain a simple list of directory entries (i.e. filename <-> inode number pairs). So I guess in the case of ext, stat reports the size of this list.
On Thu, Feb 25, 2021 at 2:21 PM Sergio Belkin sebelk@gmail.com wrote:
Hi, Tools such as ls or stat report the size of a directory. Of course it is not the content size. stat -c %s /home/sergio/.config 6550
What does 6550 mean in btrfs context?
stat -c %s <dir> reports the st_size of the directory inode, and the value is 2x the characters in all filenames in that directory. I'm not sure if the names of directories are included in the count, or just filenames.
Tools such as ls or stat report the size of a directory. Of course it is not the content size. stat -c %s /home/sergio/.config 6550
What does 6550 mean in btrfs context?
Regardless of filesystem type, the size of a directory is the sum of the sizes of the struct linux_dirent (or linux_dirent64) for the filenames of the contained files. See the manual page "man 2 getdents".
El jue, 25 feb 2021 a las 21:58, John Reiser (jreiser@bitwagon.com) escribió:
Tools such as ls or stat report the size of a directory. Of course it is
not the content size.
stat -c %s /home/sergio/.config 6550
What does 6550 mean in btrfs context?
Regardless of filesystem type, the size of a directory is the sum of the sizes of the struct linux_dirent (or linux_dirent64) for the filenames of the contained files. See the manual page "man 2 getdents". _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Thanks everyone, I thought that it had a special meaning for btrfs
On 2/25/21 6:56 PM, John Reiser wrote:
Tools such as ls or stat report the size of a directory. Of course it is not the content size. stat -c %s /home/sergio/.config 6550
What does 6550 mean in btrfs context?
Regardless of filesystem type, the size of a directory is the sum of the sizes of the struct linux_dirent (or linux_dirent64) for the filenames of the contained files. See the manual page "man 2 getdents".
That's not correct; dirents are in-memory structures, unrelated to what the [fl]stat(2) interface used by ls returns for a directory.
The size returned by stat(2) (aka ls) of a directory inode is filesystem implementation dependent, and AFAIK has no well-defined meaning. stat(2) refers to st_size only for files and symlinks, not for directories. Same w/ POSIX:
off_t st_size For regular files, the file size in bytes. For symbolic links, the length in bytes of the pathname contained in the symbolic link.
So, as one example on ext4 - directories never shrink.
# mkdir dir # ls -ld dir drwxr-xr-x. 2 root root 4096 Feb 26 00:48 dir
# touch dir/123456789 # ls -ld dir drwxr-xr-x. 2 root root 4096 Feb 26 01:00 dir
# for I in `seq 1 2000`; do touch dir/longfilename$I; done # ls -ld dir drwxr-xr-x. 2 root root 69632 Feb 26 00:49 dir
# rm -f dir/* # ls -ld dir drwxr-xr-x. 2 root root 69632 Feb 26 00:49 dir
69632 byte directory with no files in it, wheeee.
xfs is different:
# mkdir dir # ls -ld dir drwxr-xr-x. 2 root root 6 Feb 26 00:58 dir
# touch dir/123456789 # ls -ld dir drwxr-xr-x. 2 root root 23 Feb 26 00:59 dir
# for I in `seq 1 2000`; do touch dir/longfilename$I; done # ls -ld dir drwxr-xr-x. 2 root root 65536 Feb 26 00:59 dir
# rm -f dir/* # ls -ld dir drwxr-xr-x. 2 root root 6 Feb 26 00:59 dir
btrfs is still different:
# mkdir dir # ls -ld dir drwxr-xr-x. 1 root root 0 Feb 26 01:05 dir
# touch dir/123456789 # ls -ld dir drwxr-xr-x. 1 root root 18 Feb 26 01:06 dir
# for I in `seq 1 2000`; do touch dir/longfilename$I; done # ls -ld dir drwxr-xr-x. 1 root root 61804 Feb 26 01:06 dir
# rm -f dir/* # ls -ld dir drwxr-xr-x. 1 root root 0 Feb 26 01:06 dir
In short, "size" of a dir doesn't tell you much.
-Eric
* Eric Sandeen:
So, as one example on ext4 - directories never shrink.
# rm -f dir/* # ls -ld dir drwxr-xr-x. 2 root root 69632 Feb 26 00:49 dir
69632 byte directory with no files in it, wheeee.
e2fsck -D will shrink such directories in an offline operation, I think.
Thanks, Florian
John Reiser jreiser@bitwagon.com wrote:
See the manual page "man 2 getdents".
Um, which bit? I don't see anything obvious to that end.
On AFS directories are handled as files that the filesystem downloads and parses locally. The size returned in st_size is the size of the directory content blob.
David
On 3/16/21, David Howells wrote:
John Reiser jreiser@bitwagon.com wrote:
See the manual page "man 2 getdents".
Um, which bit? I don't see anything obvious to that end.
On that manual page: ===== The system call getdents() reads several linux_dirent structures from the directory referred to by the open file descriptor fd into the buffer pointed to by dirp. [snip]] On success, the number of bytes read is returned. =====
So the return value is related to the size of the directory; the sum of the values returned before End-Of-File should be quite close to the .st_size of the directory. If a program is walking through the directory, reading all the entries via getdents64(), then .st_size of the directory is the only thing known in advance about the total size. (Of course anything involving a directory can depend on concurrent create/delete/rename of files within the directory.)
On AFS directories are handled as files that the filesystem downloads and parses locally. The size returned in st_size is the size of the directory content blob.
Hi,
On 16/03/2021 16:51, John Reiser wrote:
On 3/16/21, David Howells wrote:
John Reiser jreiser@bitwagon.com wrote:
See the manual page "man 2 getdents".
Um, which bit? I don't see anything obvious to that end.
On that manual page:
The system call getdents() reads several linux_dirent structures from the directory referred to by the open file descriptor fd into the buffer pointed to by dirp. [snip]] On success, the number of bytes read is returned. =====
So the return value is related to the size of the directory; the sum of the values returned before End-Of-File should be quite close to the .st_size of the directory. If a program is walking through the directory, reading all the entries via getdents64(), then .st_size of the directory is the only thing known in advance about the total size. (Of course anything involving a directory can depend on concurrent create/delete/rename of files within the directory.)
If you are looking for a hint on how large a buffer to allocate, then st_blksize is generally used as a hint for directory reads, or otherwise, a fixed size buffer or a page or two. The st_size field is meaningless for directories and you'll get all kinds of odd results depending on the filesystem that is in use, so best avoided,
Steve.
* Steven Whitehouse:
If you are looking for a hint on how large a buffer to allocate, then st_blksize is generally used as a hint for directory reads, or otherwise, a fixed size buffer or a page or two. The st_size field is meaningless for directories and you'll get all kinds of odd results depending on the filesystem that is in use, so best avoided,
One caveat is that st_blksize can be unreasonably large, particularly with NFS and large configured read buffers. So maybe cap that value at 32 KiB, but then why not use 32 KiB unconditionally?
Another issue is that too small a buffer might make some directories unreadable, given that Linux does not actually enforce NAME_MAX. Some file systems support NAME_MAX Unicode codepoints in a name, which is encoded to roughly 0.75 KiB in UTF-8.
Thanks, Florian
On 3/16/21 11:51 AM, John Reiser wrote:
On 3/16/21, David Howells wrote:
John Reiser jreiser@bitwagon.com wrote:
See the manual page "man 2 getdents".
Um, which bit? I don't see anything obvious to that end.
On that manual page:
The system call getdents() reads several linux_dirent structures from the directory referred to by the open file descriptor fd into the buffer pointed to by dirp. [snip]] On success, the number of bytes read is returned.
But the original question was about the st_size returned stat, which is not calling getdents.
Two different numbers, which mean 2 different things.
=====
So the return value is related to the size of the directory; the sum of the values returned before End-Of-File should be quite close to the .st_size of the directory.
Again, that's not at all correct. Counter-example on ext4:
# stat -c %s dir 2547712 # ls -a1 dir . .. file # strace -v -egetdents ls dir getdents(3, [{d_ino=524289, d_off=4294967296, d_reclen=24, d_name=".", d_type=DT_DIR}, {d_ino=2, d_off=3358761300848251151, d_reclen=24, d_name="..", d_type=DT_DIR}, {d_ino=534290, d_off=9223372036854775807, d_reclen=24, d_name="file", d_type=DT_REG}], 32768) = 72
72 is not anywhere close to 2547712
If a program is walking through the directory, reading all the entries via getdents64(), then .st_size of the directory is the only thing known in advance about the total size.
But it tells you nothing about how much is likely to be returned by getdents.
You should not use st_size to infer anything about the amount of data which will be returned by getdents. POSIX does not define the meaning of st_size for directories, and different filesystems can do wildly different things.
As Steve mentioned, st_blksize is your best hint for this purpose.
-Eric
John Reiser jreiser@bitwagon.com wrote:
On 3/16/21, David Howells wrote:
John Reiser jreiser@bitwagon.com wrote:
See the manual page "man 2 getdents".
Um, which bit? I don't see anything obvious to that end.
On that manual page:
The system call getdents() reads several linux_dirent structures from the directory referred to by the open file descriptor fd into the buffer pointed to by dirp. [snip]] On success, the number of bytes read is returned. =====
It doesn't say anything about the size of the directory there. "Number of bytes read is returned" should be taken as how much of the user buffer was filled - information you need to know to be able to parse it.
Further, there's getdents() and there's getdents64() and their structs are of different sizes. By your logic st_size would have to be the number of linux_dirent structs for use with the former and the number of linux_dirent64 structs for use with the latter... And then there's readdir() as well with it's old_linux_dirent struct.
So, no, it cannot work like that.
David
Dnia Thu, Feb 25, 2021 at 06:20:34PM -0300, Sergio Belkin napisał(a):
Hi, Tools such as ls or stat report the size of a directory. Of course it is not the content size.
It depends on the filesystem. For Cephfs, directory size _is_ the content size:
% ls -lh drwxrwxr-x. 3 zdzichu zdzichu 90G 11-06 16:37 'Collection 1' drwxr-xr-x. 17 root root 11M 09-23 15:45 postfix drwxr-xr-x. 2 cherokee cherokee 15M 11-24 00:02 var_log_cherokee
I do not know what btrfs reports as directory size.
devel@lists.stg.fedoraproject.org