I put a CF card in my linux-2.6.7 laptop and mounted it. Then I wrote some files with japanese filenames onto it, using emacs, and writing the names in UTF8. Then I sync'd and unmounted the card.
Next, I put that card into a Japanese windows 2000 system and was able to read those filenames. I think what happened was that the linux-2.6.7 kernel automatically converted the characters to the proper codepage (cp932 I think) before writing those files to the card.
Then I used the Japanese windows 2000 system to create some new directories and files with Japanese names. Then I ejected the CF, and brought it back to my linux-2.6.7 system, and I was able to read those new filenames no problems.
Finally, I tried to mount the CF card on my linux-2.4.18 embedded system. Now I have problems: the filenames written by linux-2.6.7 were visible, but the filenames written by Japanese windows 2000 were not. Why is that?
Thanks, Dave
Hi David,
Can you try to mount it with codepage= parameter and see if it helps?
Leon
On Fri, 2004-09-03 at 11:25 -0700, David Wuertele wrote:
I put a CF card in my linux-2.6.7 laptop and mounted it. Then I wrote some files with japanese filenames onto it, using emacs, and writing the names in UTF8. Then I sync'd and unmounted the card.
Next, I put that card into a Japanese windows 2000 system and was able to read those filenames. I think what happened was that the linux-2.6.7 kernel automatically converted the characters to the proper codepage (cp932 I think) before writing those files to the card.
Then I used the Japanese windows 2000 system to create some new directories and files with Japanese names. Then I ejected the CF, and brought it back to my linux-2.6.7 system, and I was able to read those new filenames no problems.
Finally, I tried to mount the CF card on my linux-2.4.18 embedded system. Now I have problems: the filenames written by linux-2.6.7 were visible, but the filenames written by Japanese windows 2000 were not. Why is that?
Thanks, Dave
-- Fedora-i18n-list mailing list Fedora-i18n-list@redhat.com http://www.redhat.com/mailman/listinfo/fedora-i18n-list
Leon> Can you try to mount it with codepage= parameter and see if it Leon> helps?
I tried mounting with various codepages (cp932, utf8, cp437), all performed identically.
In fact, I have dumped the FAT partition and analyzed it. I can't find a pattern of difference betwen the filenames that are readable by Linux and the filenames that aren't. I am appending those analyses: first the analysis of the FAT table for the readalbe filenames, and then the analysis of the FAT table for the unreadable filenames.
Dave
Hi,
This is good comparison. Maybe a better comparison is to make a same filename like 日本語 on linux, dump the FAT table and delete it, and then make same filename in Windows, dump the FAT table and compare those two data.
If that still doesn't show a pattern it may well be a bug.
Thanks! Leon
On Tue, 2004-09-07 at 11:20 -0700, David Wuertele wrote:
I tried mounting with various codepages (cp932, utf8, cp437), all performed identically.
In fact, I have dumped the FAT partition and analyzed it. I can't find a pattern of difference betwen the filenames that are readable by Linux and the filenames that aren't. I am appending those analyses: first the analysis of the FAT table for the readalbe filenames, and then the analysis of the FAT table for the unreadable filenames.
Dave
Fedora-i18n-list mailing list Fedora-i18n-list@redhat.com http://www.redhat.com/mailman/listinfo/fedora-i18n-list
Leon> This is good comparison. Maybe a better comparison is to make a Leon> same filename like 日本語 on linux, dump the FAT table and Leon> delete it, and then make same filename in Windows, dump the FAT Leon> table and compare those two data.
Tried this. Now the data are the same, and linux-2.4.18 chokes on both. It will not return UTF strings for the filenames. I just get this:
# ls -l /tmp/cf -rwxr-xr-x 1 root 0 42 Sep 10 12:42 ______~1 -rwxr-xr-x 1 root 0 27 Sep 10 12:44 ____~1 -rwxr-xr-x 1 root 0 27 Sep 10 12:44 ____~2 -rwxr-xr-x 1 root 0 33 Sep 10 12:43 ___ats~1 -rwxr-xr-x 1 root 0 24 Sep 10 12:41 ___~1 -rwxr-xr-x 1 root 0 20 Sep 10 12:40 ascii -rwxr-xr-x 1 root 0 24 Sep 10 12:41 ascii-~1 -rwxr-xr-x 1 root 0 31 Sep 10 12:43 atend_~1 -rwxr-xr-x 1 root 0 34 Sep 10 12:43 in___m~1 #
When I mount this same CF on a laptop running linux-2.6.5, I get this:
# ls -l flash total 18 total 18 -rwxr-xr-x 1 root root 27 Sep 10 03:44 カタカナ -rwxr-xr-x 1 root root 27 Sep 10 03:44 ひらがな -rwxr-xr-x 1 root root 20 Sep 10 03:40 ascii -rwxr-xr-x 1 root root 24 Sep 10 03:41 ascii-long-name -rwxr-xr-x 1 root root 31 Sep 10 03:43 atend日本語 -rwxr-xr-x 1 root root 34 Sep 10 03:43 in日本語middle -rwxr-xr-x 1 root root 24 Sep 10 03:41 日本語 -rwxr-xr-x 1 root root 33 Sep 10 03:43 日本語atstart -rwxr-xr-x 1 root root 42 Sep 10 03:42 日本語とても長いファイル名 #
I thought that it might be the ls program that has the problem, but when I strace the ls, it looks like the characters are already bogus in the glibc layer:
open("/tmp/cf", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3 fstat64(3, {st_mode=S_IFDIR|0755, st_size=16384, ...}) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 getdents64(3, /* 11 entries */, 2048) = 336 lstat64("/tmp/cf/ascii", {st_mode=S_IFREG|0755, st_size=20, ...}) = 0 lstat64("/tmp/cf/ascii-~1", {st_mode=S_IFREG|0755, st_size=24, ...}) = 0 lstat64("/tmp/cf/___~1", {st_mode=S_IFREG|0755, st_size=24, ...}) = 0 lstat64("/tmp/cf/______~1", {st_mode=S_IFREG|0755, st_size=42, ...}) = 0 lstat64("/tmp/cf/___ats~1", {st_mode=S_IFREG|0755, st_size=33, ...}) = 0 lstat64("/tmp/cf/in___m~1", {st_mode=S_IFREG|0755, st_size=34, ...}) = 0 lstat64("/tmp/cf/atend_~1", {st_mode=S_IFREG|0755, st_size=31, ...}) = 0 lstat64("/tmp/cf/____~1", {st_mode=S_IFREG|0755, st_size=27, ...}) = 0 lstat64("/tmp/cf/____~2", {st_mode=S_IFREG|0755, st_size=27, ...}) = 0 getdents64(3, /* 0 entries */, 2048) = 0 close(3) = 0
My guess is that the getdents64() system call is returning the wrong values, but I can't tell. This is the same regardless of what codepage or iocharset I specify:
# mount linux-cf-test.img /tmp/cf -t msdos -o loop,codepage=437 # ls /tmp/cf ______~1 ____~2 ___~1 ascii-~1 in___m~1 ____~1 ___ats~1 ascii atend_~1 # umount /tmp/cf # mount linux-cf-test.img /tmp/cf -t msdos -o loop,codepage=932 # ls /tmp/cf ______~1 ____~2 ___~1 ascii-~1 in___m~1 ____~1 ___ats~1 ascii atend_~1 # umount /tmp/cf # mount linux-cf-test.img /tmp/cf -t msdos -o loop,codepage=932,iocharset=utf8 # ls /tmp/cf ______~1 ____~2 ___~1 ascii-~1 in___m~1 ____~1 ___ats~1 ascii atend_~1 # mount /dev/root on / type nfs (rw,v2,rsize=1024,wsize=1024,hard,udp,nolock,addr=192.168.5.1) none on /dev type devfs (rw) /proc on /proc type proc (rw) tmpfs on /tmp type tmpfs (rw) tmpfs on /var type tmpfs (rw) tmpfs on /mnt/smb type tmpfs (rw) /dev/ide/host0/bus0/target0/lun0/part1 on /mnt/flash1 type vfat (ro,noatime) /dev/loop/0 on /tmp/cf type msdos (rw) #
Any suggestions on how to debug this?
Thanks, Dave
i18n@lists.stg.fedoraproject.org