On Tue, 20.09.11 10:46, Jürg Billeter (j@bitron.ch) wrote:
On Mon, 2011-09-19 at 18:00 +0200, Lennart Poettering wrote:
a) tracker uses inotify recursively and creates a massive number of watches due to that. That is both ugly and doesn't scale. Tracker apparently tries to not take up the full pool of inotfy handles the system provides, but that won't help if you have more than one user on the system. The solution here should probably be fanotify, which allows proper recursive file system watches. So far fanotify has been accessible to root only, which is presumably why tracker doesn't use it. However, the solution here cannot be to work around that fact by using inotify, but must be to invest the necessary kernel work to make fanotify useful from unprivileged processes.
The current recursive inotify watches are definitely far from ideal. We were looking forward to fanotify entering mainline, however, last time I checked it was still missing too much functionality to even match inotify for our purposes. In addition to permission issues it did not support notification of directory events at all. If I remember correctly, this was all planned for future iterations but not considered a priority use case for fanotify. Does anyone know Eric's current plans?
Eric is an awesome, very responsive guy. You can catch him on IRC easily. Talk to him directly!
I'd really prefer if we could fix these fundamental issues before we enable tracker. To me it appears here as if we are trying to make the second step before the first.
I think you were in the room when we discussed these issues in Gran Canaria. At that point the plan was to wait for fanotify and try to convince filesystem developers of recursive directory timestamps. If I remember correctly, Matthew Garrett volunteered to talk to other kernel developers as a next step. Unfortunately, I don't think we had any follow-up discussions about recursive directory timestamps.
fanotify was delayed quite a bit, and we were told that there was nothing we can do to help and it was way too early to start experimenting with it for tracker. Now that it is in mainline, it would probably be easier to help out, but given that it took a long time for a kernel developer to get a subset of the planned features into mainline, I didn't attempt to work on the missing features myself so far.
GC is three years ago. a lot of time lost by not getting the kernel fixed.
[ And there are acouple of other things I'd like to see changed. For example, I am pretty sure that tracker's open() calls to files should not be considered accesses in regards to access time. O_NOATIME should be used here, which would reduce the amount of disk writes
We already use O_NOATIME in a few extractors, however, certain libraries don't make this very easy. Not even GIO allows to open a file for reading with O_NOATIME set, as far as I can tell. Also, I don't expect the amount of atime-related writes to be very high with relatime, but I haven't measured this and could be mistaken.
GIO and all the libs you are using are open source, so prepare patches! I am quite sure you even have commit access to gio, right?
substantially. Also, tracker appears to BSD lock all files it accesses. That looks quite borked. Which other tool is it synchronizing against here? This looks unsecure to do (because the files are often accessible to others), and since these locks are advisory only there needs to be a strict protocol followed by everybody else accessing these files, which I guarantee you there isn't since these are basically all the user's files. Moreover it appears tracker is mixing BSD and POSIX locks, which is dangerous due to ABBA, in particular when used on NFS directories, which will just end up in total chaos since Linux is so stupid to "upgrade" BSD locks on NFS shares to POSIX locks on the way. In any case you should NEVER EVER use POSIX locking, since it is compltely borked anyway. The locking must go. I also see a massive amount of futex calls in strace, i.e. probably some mutexes thrown in the mix to make the locking problems even more interesting, which makes my fingernails roll up, since they apparently are congested all the time? ]
Can you please share your findings in a bug report? As far as I know, tracker itself doesn't use BSD or POSIX locks at all. SQLite is using file locks, but if there are issues with how SQLite is using locks, this should probably be discussed with SQLite upstream.
My findings are basically a straight-forward reading of "strace -p $(pidof tracker-miner-fs)" which I ran to track down that .desktop file looping issue i reported.
As a side-note, I myself would like to see more radical changes in how user files will be stored in the future. Ideally, we would stop storing them in traditional directory hierarchies. Among other things, this would completely avoid the need for recursive directory monitoring, recursive directory timestamps, and crawling on startup. On the downside, this would require changes in many applications, although FUSE could certainly help providing a compatibility layer. If anyone is interested in discussing or working on this, let me know.
Personally I must say I see big value in supporting all kinds of files, not just files generated by GNOME apps. For me personally having tracker index my source code sounds like the most exciting use of tracker at all.
Lennart