On Mon, 2011-09-19 at 18:00 +0200, Lennart Poettering wrote:
On Mon, 19.09.11 00:38, Lennart Poettering (mzerqung@0pointer.de) wrote:
Having used tracker on by default for a bit now it seems that after the initial crawling, which is an expensive operation, I didn't notice any particular increase in resource usage. With removable devices indexing off, this should ideally be a one-shot operation.
I'd be thankful if the impact the crawling has on the running system could be minimized via SCHED_IDLE and IOPRIO_CLASS_IDLE. gnome bz #659422.
Hmm, so investigating this further with strace it appears to me that tracker is trying to make use of the kernel in a way it shouldn't. Or to put this another way: our infrastructure (the Linux kernel) isn't ready for tracker yet.
The problems here have been known for a long time, but afaik there still hasn't been done anything about them to make things ready for tracker. I am not sure we should enable tracker before these fundamental issues aren't fixed. I am completely fine with enabling stuff that has known bugs because that's the way how you get them fixed. But in this case here I fear that the basics just don't exist and hence tracker is fundamentally built on infrastructure that is borked. Everytime people have looked into enabling tracker (or beagle for the matter) these issues showed up, and every single time nothing happened about them, and I am not really seeing why these issues stopped mattering now.
To be more explicit:
a) tracker uses inotify recursively and creates a massive number of watches due to that. That is both ugly and doesn't scale. Tracker apparently tries to not take up the full pool of inotfy handles the system provides, but that won't help if you have more than one user on the system. The solution here should probably be fanotify, which allows proper recursive file system watches. So far fanotify has been accessible to root only, which is presumably why tracker doesn't use it. However, the solution here cannot be to work around that fact by using inotify, but must be to invest the necessary kernel work to make fanotify useful from unprivileged processes.
b) We still don't have a way to detect offline modification of directories. That means detecting changes to the home directory made offline is very expensive. btrfs now has hooks to improve the situation, but ext4 still hasn't. Does tracker at least use the btrfs hooks? (btrfs provides a log of changes to userspace, which can be used for that. Another solution are recursive directory change timestamps).
I'd really prefer if we could fix these fundamental issues before we enable tracker. To me it appears here as if we are trying to make the second step before the first.
[ And there are acouple of other things I'd like to see changed. For example, I am pretty sure that tracker's open() calls to files should not be considered accesses in regards to access time. O_NOATIME should be used here, which would reduce the amount of disk writes substantially. Also, tracker appears to BSD lock all files it accesses. That looks quite borked. Which other tool is it synchronizing against here? This looks unsecure to do (because the files are often accessible to others), and since these locks are advisory only there needs to be a strict protocol followed by everybody else accessing these files, which I guarantee you there isn't since these are basically all the user's files. Moreover it appears tracker is mixing BSD and POSIX locks, which is dangerous due to ABBA, in particular when used on NFS directories, which will just end up in total chaos since Linux is so stupid to "upgrade" BSD locks on NFS shares to POSIX locks on the way. In any case you should NEVER EVER use POSIX locking, since it is compltely borked anyway. The locking must go. I also see a massive amount of futex calls in strace, i.e. probably some mutexes thrown in the mix to make the locking problems even more interesting, which makes my fingernails roll up, since they apparently are congested all the time? ]
Good stuff, Lennart.
But it would probably have much more effect in https://bugzilla.gnome.org/browse.cgi?product=tracker or https://mail.gnome.org/archives/tracker-list/
Can you put it there ?