Hey, folks.
So, in F16, totem has grown a dependency on tracker, via grilo-plugins. This means tracker is now present and active in default F16 installs, where it wasn't before. This is leading to the problems that have been endemic to desktop search mechanisms ever since the first one crawled out of the fiery pits of hell - excessive resource consumption, viz:
https://lists.fedoraproject.org/pipermail/test/2011-September/102688.html https://lists.fedoraproject.org/pipermail/test/2011-September/102770.html
peter robinson reckons it would be easy to split out grilo-plugins' dependency on tracker and hence get tracker out of the default package set again, if that's desired:
https://lists.fedoraproject.org/pipermail/test/2011-September/102730.html
so, what's the plan here? is tracker's default inclusion a mistake which can be rectified as per pbrobinson's plan, or should we be trying to get tracker excessive-consumption bugs fixed?
if it's going to be in there by default, it should probably have a less aggressive indexing policy by default. it installs with its aggressiveness set to max, a fairly expansive set of search paths (including removable media), and without the config tool which lets you tweak these things, as nothing has a dep on tracker-search-tool.
Adam Williamson (awilliam@redhat.com) said:
So, in F16, totem has grown a dependency on tracker, via grilo-plugins. This means tracker is now present and active in default F16 installs, where it wasn't before. This is leading to the problems that have been endemic to desktop search mechanisms ever since the first one crawled out of the fiery pits of hell - excessive resource consumption, viz:
https://lists.fedoraproject.org/pipermail/test/2011-September/102688.html https://lists.fedoraproject.org/pipermail/test/2011-September/102770.html
peter robinson reckons it would be easy to split out grilo-plugins' dependency on tracker and hence get tracker out of the default package set again, if that's desired:
Note that gnome-documents, now a default package, also requires tracker.
Bill
On Mon, 2011-09-12 at 16:45 -0400, Bill Nottingham wrote:
Adam Williamson (awilliam@redhat.com) said:
So, in F16, totem has grown a dependency on tracker, via grilo-plugins. This means tracker is now present and active in default F16 installs, where it wasn't before. This is leading to the problems that have been endemic to desktop search mechanisms ever since the first one crawled out of the fiery pits of hell - excessive resource consumption, viz:
https://lists.fedoraproject.org/pipermail/test/2011-September/102688.html https://lists.fedoraproject.org/pipermail/test/2011-September/102770.html
peter robinson reckons it would be easy to split out grilo-plugins' dependency on tracker and hence get tracker out of the default package set again, if that's desired:
Note that gnome-documents, now a default package, also requires tracker.
oop, good catch. I did suspect it's going to become more a part of the GNOME Experience from now on, though I think I missed the bit where tracker beat zeitgeist =)
On Mon, 2011-09-12 at 14:01 -0700, Adam Williamson wrote:
On Mon, 2011-09-12 at 16:45 -0400, Bill Nottingham wrote:
Adam Williamson (awilliam@redhat.com) said:
So, in F16, totem has grown a dependency on tracker, via grilo-plugins. This means tracker is now present and active in default F16 installs, where it wasn't before. This is leading to the problems that have been endemic to desktop search mechanisms ever since the first one crawled out of the fiery pits of hell - excessive resource consumption, viz:
https://lists.fedoraproject.org/pipermail/test/2011-September/102688.html https://lists.fedoraproject.org/pipermail/test/2011-September/102770.html
peter robinson reckons it would be easy to split out grilo-plugins' dependency on tracker and hence get tracker out of the default package set again, if that's desired:
Note that gnome-documents, now a default package, also requires tracker.
oop, good catch. I did suspect it's going to become more a part of the GNOME Experience from now on, though I think I missed the bit where tracker beat zeitgeist =)
If tracker is proving to be a pain, file upstream bugs. Splitting the grilo tracker plugin to a sub-package is not the right way to fix this.
Hi Adam,
On Mon, 2011-09-12 at 14:01 -0700, Adam Williamson wrote:
oop, good catch. I did suspect it's going to become more a part of the GNOME Experience from now on, though I think I missed the bit where tracker beat zeitgeist =)
Yeah, tracker is a required dependency of gnome-documents now. As Bastien says, we should try to identify and fix possible bugs and resource issues upstream.
I also think tracker should use a different default configuration; these are the changes I think we should make: - removable media indexing should be off by default - I don't know if this is used by the totem grilo integration, but it's not by gnome-documents. Anyway, I think the best way to approach this is tracker should just be aware of removable media and applications who make use of them should be able to have them crawled on-demand, but that needs fixing in tracker first. - indexing should be enabled on battery - right now it's disabled. If we disable indexing on battery, applications like gnome-documents will be "locked" into the last view of the local file entries in the tracker database before you unplugged the cable, which is very ugly.
Having used tracker on by default for a bit now it seems that after the initial crawling, which is an expensive operation, I didn't notice any particular increase in resource usage. With removable devices indexing off, this should ideally be a one-shot operation.
As a side note, I also took some time to clean up the tracker spec file a bit (patches at [1]); I think the flickr miner (which doesn't use g-o-a and is not easily configurable) should be split in a separate package, disabled by default.
[1] http://people.gnome.org/~cosimoc/tracker-patches/
Cosimo
On Tue, 2011-09-13 at 23:41 -0400, Cosimo Cecchi wrote:
Hi Adam,
On Mon, 2011-09-12 at 14:01 -0700, Adam Williamson wrote:
oop, good catch. I did suspect it's going to become more a part of the GNOME Experience from now on, though I think I missed the bit where tracker beat zeitgeist =)
Yeah, tracker is a required dependency of gnome-documents now. As Bastien says, we should try to identify and fix possible bugs and resource issues upstream.
I also think tracker should use a different default configuration; these are the changes I think we should make:
- removable media indexing should be off by default - I don't know if
this is used by the totem grilo integration, but it's not by gnome-documents. Anyway, I think the best way to approach this is tracker should just be aware of removable media and applications who make use of them should be able to have them crawled on-demand, but that needs fixing in tracker first.
- indexing should be enabled on battery - right now it's disabled. If we
disable indexing on battery, applications like gnome-documents will be "locked" into the last view of the local file entries in the tracker database before you unplugged the cable, which is very ugly.
Having used tracker on by default for a bit now it seems that after the initial crawling, which is an expensive operation, I didn't notice any particular increase in resource usage. With removable devices indexing off, this should ideally be a one-shot operation.
As a side note, I also took some time to clean up the tracker spec file a bit (patches at [1]); I think the flickr miner (which doesn't use g-o-a and is not easily configurable) should be split in a separate package, disabled by default.
These changes sound like a good idea, and should help a lot to keep the 'tracker-induced rage' down. CC'ing the tracker package maintainer to get his ok before we enact this.
Matthias
Em Ter, 2011-09-13 às 23:41 -0400, Cosimo Cecchi escreveu:
- indexing should be enabled on battery - right now it's disabled. If we
disable indexing on battery, applications like gnome-documents will be "locked" into the last view of the local file entries in the tracker database before you unplugged the cable, which is very ugly.
Any idea how much of an impact on battery life this would imply? It already seems the kernel keeps introducing regressions in this area and overall things might look quite bad from past releases.
-- Evandro
On 14 September 2011 16:15, Evandro Giovanini efgiovanini@gmail.com wrote:
Any idea how much of an impact on battery life this would imply?
Open up gnome-power-statistics, click "Processor" and see if tracker is keeping the CPU busy. Every wakeup means more power.
Simplifying, if tracker isn't waking, then it's not consuming battery.
Richard.
Em Qua, 2011-09-14 às 17:02 +0100, Richard Hughes escreveu:
On 14 September 2011 16:15, Evandro Giovanini efgiovanini@gmail.com wrote:
Any idea how much of an impact on battery life this would imply?
Open up gnome-power-statistics, click "Processor" and see if tracker is keeping the CPU busy. Every wakeup means more power.
Simplifying, if tracker isn't waking, then it's not consuming battery.
Richard.
I had forgotten how nice gnome-power-statics is!
For me tracker is basically not consuming battery, I guess this option is a relic from old days.
-- Evandro
On 09/13/2011 10:41 PM, Cosimo Cecchi wrote:
Hi Adam,
On Mon, 2011-09-12 at 14:01 -0700, Adam Williamson wrote:
oop, good catch. I did suspect it's going to become more a part of the GNOME Experience from now on, though I think I missed the bit where tracker beat zeitgeist =)
Yeah, tracker is a required dependency of gnome-documents now. As Bastien says, we should try to identify and fix possible bugs and resource issues upstream.
I also think tracker should use a different default configuration; these are the changes I think we should make:
- removable media indexing should be off by default - I don't know if
this is used by the totem grilo integration, but it's not by gnome-documents. Anyway, I think the best way to approach this is tracker should just be aware of removable media and applications who make use of them should be able to have them crawled on-demand, but that needs fixing in tracker first.
- indexing should be enabled on battery - right now it's disabled. If we
disable indexing on battery, applications like gnome-documents will be "locked" into the last view of the local file entries in the tracker database before you unplugged the cable, which is very ugly.
I think it would also be good to evaluate the default inotify limits and consider raising them (or decreasing the number of inotify watches Tracker uses). When running Tracker, I've had other software such as Dropbox complain that it cannot use inotify because I have too many watches registered.
- Michael
On Wed, 2011-09-14 at 11:22 -0500, Michael Ekstrand wrote:
I think it would also be good to evaluate the default inotify limits and consider raising them (or decreasing the number of inotify watches Tracker uses). When running Tracker, I've had other software such as Dropbox complain that it cannot use inotify because I have too many watches registered.
I must say I've never seen this error (despite using Dropbox), but it seems like a valid thing to investigate into (Ideally tracker should use something like fanotify [1] instead and be happy). Please file a bug (possibly upstream) if you encounter a similar bug again.
[1] http://lwn.net/Articles/360955/
Cosimo
On 2011-09-13 23:41:01 EDT, Cosimo Cecchi wrote:
Yeah, tracker is a required dependency of gnome-documents now. As Bastien says, we should try to identify and fix possible bugs and resource issues upstream.
I don't see why tracker or any such daemon should ever be a required package, or that one should not be able to uninstall it without taking much of Gnome with it.
I have no problems with it being used in the default gnome-shell setup,
On a related note, I had thought the concerns of people that gnome-shell did not have a simple to find way of shutting down the computer as an over reaction. Now that it has also been removed from the GDM setup has made me change my mind. Is the prefered method of shutting down the computer pressing the hardware's power button? What if I just want to just reboot another OS?
On Wed, 2011-09-14 at 17:23 -0400, Barry Fishman wrote:
I don't see why tracker or any such daemon should ever be a required package, or that one should not be able to uninstall it without taking much of Gnome with it.
I have no problems with it being used in the default gnome-shell setup,
Tracker provides both a store process (holding the actual DB) and a shared library for applications with APIs to access and modify the database. gnome-documents depends on the library, which of course is written under the assumption that the daemon is available. Of course you can always uninstall tracker, but that will remove core parts of the GNOME desktop as well (at least gnome-documents).
Cosimo
On Wed, Sep 14, 2011 at 6:23 PM, Barry Fishman barry_fishman@acm.org wrote:
On 2011-09-13 23:41:01 EDT, Cosimo Cecchi wrote:
Yeah, tracker is a required dependency of gnome-documents now. As Bastien says, we should try to identify and fix possible bugs and resource issues upstream.
I don't see why tracker or any such daemon should ever be a required package, or that one should not be able to uninstall it without taking much of Gnome with it.
I have no problems with it being used in the default gnome-shell setup,
On a related note, I had thought the concerns of people that gnome-shell did not have a simple to find way of shutting down the computer as an over reaction. Now that it has also been removed from the GDM setup has made me change my mind. Is the prefered method of shutting down the computer pressing the hardware's power button? What if I just want to just reboot another OS?
The menu for Shutdown/Restart will be added back to GDM.
-- Evandro
On Tue, 2011-09-13 at 23:41 -0400, Cosimo Cecchi wrote:
I also think tracker should use a different default configuration; these are the changes I think we should make:
Just a quick update: both my proposed changes made their way into the default tracker configuration upstream, so they will eventually reach Fedora when the new version gets released and packaged.
Cosimo
On Tue, 13.09.11 23:41, Cosimo Cecchi (ccecchi@redhat.com) wrote:
Hi Adam,
On Mon, 2011-09-12 at 14:01 -0700, Adam Williamson wrote:
oop, good catch. I did suspect it's going to become more a part of the GNOME Experience from now on, though I think I missed the bit where tracker beat zeitgeist =)
Yeah, tracker is a required dependency of gnome-documents now. As Bastien says, we should try to identify and fix possible bugs and resource issues upstream.
I also think tracker should use a different default configuration; these are the changes I think we should make:
- removable media indexing should be off by default - I don't know if
this is used by the totem grilo integration, but it's not by gnome-documents. Anyway, I think the best way to approach this is tracker should just be aware of removable media and applications who make use of them should be able to have them crawled on-demand, but that needs fixing in tracker first.
- indexing should be enabled on battery - right now it's disabled. If we
disable indexing on battery, applications like gnome-documents will be "locked" into the last view of the local file entries in the tracker database before you unplugged the cable, which is very ugly.
Having used tracker on by default for a bit now it seems that after the initial crawling, which is an expensive operation, I didn't notice any particular increase in resource usage. With removable devices indexing off, this should ideally be a one-shot operation.
I'd be thankful if the impact the crawling has on the running system could be minimized via SCHED_IDLE and IOPRIO_CLASS_IDLE. gnome bz #659422.
Lennart
On Mon, 19.09.11 00:38, Lennart Poettering (mzerqung@0pointer.de) wrote:
Having used tracker on by default for a bit now it seems that after the initial crawling, which is an expensive operation, I didn't notice any particular increase in resource usage. With removable devices indexing off, this should ideally be a one-shot operation.
I'd be thankful if the impact the crawling has on the running system could be minimized via SCHED_IDLE and IOPRIO_CLASS_IDLE. gnome bz #659422.
Hmm, so investigating this further with strace it appears to me that tracker is trying to make use of the kernel in a way it shouldn't. Or to put this another way: our infrastructure (the Linux kernel) isn't ready for tracker yet.
The problems here have been known for a long time, but afaik there still hasn't been done anything about them to make things ready for tracker. I am not sure we should enable tracker before these fundamental issues aren't fixed. I am completely fine with enabling stuff that has known bugs because that's the way how you get them fixed. But in this case here I fear that the basics just don't exist and hence tracker is fundamentally built on infrastructure that is borked. Everytime people have looked into enabling tracker (or beagle for the matter) these issues showed up, and every single time nothing happened about them, and I am not really seeing why these issues stopped mattering now.
To be more explicit:
a) tracker uses inotify recursively and creates a massive number of watches due to that. That is both ugly and doesn't scale. Tracker apparently tries to not take up the full pool of inotfy handles the system provides, but that won't help if you have more than one user on the system. The solution here should probably be fanotify, which allows proper recursive file system watches. So far fanotify has been accessible to root only, which is presumably why tracker doesn't use it. However, the solution here cannot be to work around that fact by using inotify, but must be to invest the necessary kernel work to make fanotify useful from unprivileged processes.
b) We still don't have a way to detect offline modification of directories. That means detecting changes to the home directory made offline is very expensive. btrfs now has hooks to improve the situation, but ext4 still hasn't. Does tracker at least use the btrfs hooks? (btrfs provides a log of changes to userspace, which can be used for that. Another solution are recursive directory change timestamps).
I'd really prefer if we could fix these fundamental issues before we enable tracker. To me it appears here as if we are trying to make the second step before the first.
[ And there are acouple of other things I'd like to see changed. For example, I am pretty sure that tracker's open() calls to files should not be considered accesses in regards to access time. O_NOATIME should be used here, which would reduce the amount of disk writes substantially. Also, tracker appears to BSD lock all files it accesses. That looks quite borked. Which other tool is it synchronizing against here? This looks unsecure to do (because the files are often accessible to others), and since these locks are advisory only there needs to be a strict protocol followed by everybody else accessing these files, which I guarantee you there isn't since these are basically all the user's files. Moreover it appears tracker is mixing BSD and POSIX locks, which is dangerous due to ABBA, in particular when used on NFS directories, which will just end up in total chaos since Linux is so stupid to "upgrade" BSD locks on NFS shares to POSIX locks on the way. In any case you should NEVER EVER use POSIX locking, since it is compltely borked anyway. The locking must go. I also see a massive amount of futex calls in strace, i.e. probably some mutexes thrown in the mix to make the locking problems even more interesting, which makes my fingernails roll up, since they apparently are congested all the time? ]
Lennart
On Mon, 2011-09-19 at 18:00 +0200, Lennart Poettering wrote:
On Mon, 19.09.11 00:38, Lennart Poettering (mzerqung@0pointer.de) wrote:
Having used tracker on by default for a bit now it seems that after the initial crawling, which is an expensive operation, I didn't notice any particular increase in resource usage. With removable devices indexing off, this should ideally be a one-shot operation.
I'd be thankful if the impact the crawling has on the running system could be minimized via SCHED_IDLE and IOPRIO_CLASS_IDLE. gnome bz #659422.
Hmm, so investigating this further with strace it appears to me that tracker is trying to make use of the kernel in a way it shouldn't. Or to put this another way: our infrastructure (the Linux kernel) isn't ready for tracker yet.
The problems here have been known for a long time, but afaik there still hasn't been done anything about them to make things ready for tracker. I am not sure we should enable tracker before these fundamental issues aren't fixed. I am completely fine with enabling stuff that has known bugs because that's the way how you get them fixed. But in this case here I fear that the basics just don't exist and hence tracker is fundamentally built on infrastructure that is borked. Everytime people have looked into enabling tracker (or beagle for the matter) these issues showed up, and every single time nothing happened about them, and I am not really seeing why these issues stopped mattering now.
To be more explicit:
a) tracker uses inotify recursively and creates a massive number of watches due to that. That is both ugly and doesn't scale. Tracker apparently tries to not take up the full pool of inotfy handles the system provides, but that won't help if you have more than one user on the system. The solution here should probably be fanotify, which allows proper recursive file system watches. So far fanotify has been accessible to root only, which is presumably why tracker doesn't use it. However, the solution here cannot be to work around that fact by using inotify, but must be to invest the necessary kernel work to make fanotify useful from unprivileged processes.
b) We still don't have a way to detect offline modification of directories. That means detecting changes to the home directory made offline is very expensive. btrfs now has hooks to improve the situation, but ext4 still hasn't. Does tracker at least use the btrfs hooks? (btrfs provides a log of changes to userspace, which can be used for that. Another solution are recursive directory change timestamps).
I'd really prefer if we could fix these fundamental issues before we enable tracker. To me it appears here as if we are trying to make the second step before the first.
[ And there are acouple of other things I'd like to see changed. For example, I am pretty sure that tracker's open() calls to files should not be considered accesses in regards to access time. O_NOATIME should be used here, which would reduce the amount of disk writes substantially. Also, tracker appears to BSD lock all files it accesses. That looks quite borked. Which other tool is it synchronizing against here? This looks unsecure to do (because the files are often accessible to others), and since these locks are advisory only there needs to be a strict protocol followed by everybody else accessing these files, which I guarantee you there isn't since these are basically all the user's files. Moreover it appears tracker is mixing BSD and POSIX locks, which is dangerous due to ABBA, in particular when used on NFS directories, which will just end up in total chaos since Linux is so stupid to "upgrade" BSD locks on NFS shares to POSIX locks on the way. In any case you should NEVER EVER use POSIX locking, since it is compltely borked anyway. The locking must go. I also see a massive amount of futex calls in strace, i.e. probably some mutexes thrown in the mix to make the locking problems even more interesting, which makes my fingernails roll up, since they apparently are congested all the time? ]
Good stuff, Lennart.
But it would probably have much more effect in https://bugzilla.gnome.org/browse.cgi?product=tracker or https://mail.gnome.org/archives/tracker-list/
Can you put it there ?
On Mon, 19.09.11 12:43, Matthias Clasen (mclasen@redhat.com) wrote:
a) tracker uses inotify recursively and creates a massive number of watches due to that. That is both ugly and doesn't scale. Tracker apparently tries to not take up the full pool of inotfy handles the system provides, but that won't help if you have more than one user on the system. The solution here should probably be fanotify, which allows proper recursive file system watches. So far fanotify has been accessible to root only, which is presumably why tracker doesn't use it. However, the solution here cannot be to work around that fact by using inotify, but must be to invest the necessary kernel work to make fanotify useful from unprivileged processes.
b) We still don't have a way to detect offline modification of directories. That means detecting changes to the home directory made offline is very expensive. btrfs now has hooks to improve the situation, but ext4 still hasn't. Does tracker at least use the btrfs hooks? (btrfs provides a log of changes to userspace, which can be used for that. Another solution are recursive directory change timestamps).
I'd really prefer if we could fix these fundamental issues before we enable tracker. To me it appears here as if we are trying to make the second step before the first.
[ And there are acouple of other things I'd like to see changed. For example, I am pretty sure that tracker's open() calls to files should not be considered accesses in regards to access time. O_NOATIME should be used here, which would reduce the amount of disk writes substantially. Also, tracker appears to BSD lock all files it accesses. That looks quite borked. Which other tool is it synchronizing against here? This looks unsecure to do (because the files are often accessible to others), and since these locks are advisory only there needs to be a strict protocol followed by everybody else accessing these files, which I guarantee you there isn't since these are basically all the user's files. Moreover it appears tracker is mixing BSD and POSIX locks, which is dangerous due to ABBA, in particular when used on NFS directories, which will just end up in total chaos since Linux is so stupid to "upgrade" BSD locks on NFS shares to POSIX locks on the way. In any case you should NEVER EVER use POSIX locking, since it is compltely borked anyway. The locking must go. I also see a massive amount of futex calls in strace, i.e. probably some mutexes thrown in the mix to make the locking problems even more interesting, which makes my fingernails roll up, since they apparently are congested all the time? ]
Good stuff, Lennart.
But it would probably have much more effect in https://bugzilla.gnome.org/browse.cgi?product=tracker or https://mail.gnome.org/archives/tracker-list/
Can you put it there ?
I already filed two bugs there today. And the inotify/offline modification issues the tracker folks are well aware of, because people (including me) told them that over and over again over the years.
This mail was mostly intended as a summary for fedora, and for pointing out that these core issues have not been addressed in the last years. There's no news in all of this. Maybe except for the fact that Fedora appears willing to ignore the issues that previously mattered.
Anyway, I will forward this to Jürg an Martyn, maybe they have something to say something new about this.
Lennart
On Mon, 2011-09-19 at 20:58 +0200, Lennart Poettering wrote:
Maybe except for the fact that Fedora appears willing to ignore the issues that previously mattered.
It is not a Fedora issue, really. GNOME is moving to depend on tracker. Letting it linger as an optional dependency for another few years is realistically not going help things along. Do you have any better idea for how to make progress on getting the necessary infrastructure fixes in place, other than using the stuff ?
On Mon, 19.09.11 18:37, Matthias Clasen (mclasen@redhat.com) wrote:
On Mon, 2011-09-19 at 20:58 +0200, Lennart Poettering wrote:
Maybe except for the fact that Fedora appears willing to ignore the issues that previously mattered.
It is not a Fedora issue, really. GNOME is moving to depend on tracker. Letting it linger as an optional dependency for another few years is realistically not going help things along. Do you have any better idea for how to make progress on getting the necessary infrastructure fixes in place, other than using the stuff ?
Work with Eric to get fanotify fixed up for unprivileged use. fanotify is already in the kernel, so the big changes have been made. It's just a matter of building on that, and Eric is a very responsive and friendly person. The privilege issues could probably be dealt with in a minimal way already by simply adding some kind of tail-drop logic to the fanotify queues.
Lennart
-- Lennart Poettering - Red Hat, Inc.
On Mon, Sep 19, 2011 at 11:37 PM, Matthias Clasen mclasen@redhat.com wrote:
On Mon, 2011-09-19 at 20:58 +0200, Lennart Poettering wrote:
Maybe except for the fact that Fedora appears willing to ignore the issues that previously mattered.
It is not a Fedora issue, really. GNOME is moving to depend on tracker. Letting it linger as an optional dependency for another few years is realistically not going help things along. Do you have any better idea for how to make progress on getting the necessary infrastructure fixes in place, other than using the stuff ?
Correct, its not a Fedora issue but its not going to help Fedora's cause if on most new installs it burns CPU and makes the end users experience horrible. Having initially dealt with tracker when packaging both rygel and moblin it has the same problems back in Fedora 12 and I got hate mail, and from memory bug reports from Lennart (the bug report was Lennart, not the mail) about the issues and upstream weren't overly interested in fixing them so the solution was to make the dependency optional. I like the principal behind tracker (and beagle before it) but if in the short term the problem isn't going to result in Fedora being usable (and yes, I'm aware initial indexing is always going to be a problem, but it needs to detect a fresh install and not grind the machine for 3 hours to know the current user has nothing in their home directory external media - that's what it did on my fresh install no media netbook) for the vast majority of users we need to make it optional for F-16 and re-review it because ultimately end users won't tolerate it and will walk with their feet elsewhere, just like they have from Ubuntu to Fedora because they don't like Unity. We don't want all the new Power Users to leave 6 months after they've arrived.
Peter
On Mon, 2011-09-19 at 18:00 +0200, Lennart Poettering wrote:
a) tracker uses inotify recursively and creates a massive number of watches due to that. That is both ugly and doesn't scale. Tracker apparently tries to not take up the full pool of inotfy handles the system provides, but that won't help if you have more than one user on the system. The solution here should probably be fanotify, which allows proper recursive file system watches. So far fanotify has been accessible to root only, which is presumably why tracker doesn't use it. However, the solution here cannot be to work around that fact by using inotify, but must be to invest the necessary kernel work to make fanotify useful from unprivileged processes.
The current recursive inotify watches are definitely far from ideal. We were looking forward to fanotify entering mainline, however, last time I checked it was still missing too much functionality to even match inotify for our purposes. In addition to permission issues it did not support notification of directory events at all. If I remember correctly, this was all planned for future iterations but not considered a priority use case for fanotify. Does anyone know Eric's current plans?
The max_user_watches limit is per inotify instance and each user can have max_user_instances inotify instances, as far as I know. I don't see how this would be an issue on multi-user systems. The default limit should be sufficient for a large part of the user base with the default tracker configuration. However, it might make sense to increase the default on distributions using tracker to be on the safe side.
b) We still don't have a way to detect offline modification of directories. That means detecting changes to the home directory made offline is very expensive. btrfs now has hooks to improve the situation, but ext4 still hasn't. Does tracker at least use the btrfs hooks? (btrfs provides a log of changes to userspace, which can be used for that. Another solution are recursive directory change timestamps).
No, tracker does not have any btrfs-specific code at the moment. I'm still waiting for a btrfs fsck release to start using btrfs myself, but I'm looking forward to learning more about the btrfs facilities and how we could use them to improve tracker.
I'd really prefer if we could fix these fundamental issues before we enable tracker. To me it appears here as if we are trying to make the second step before the first.
I think you were in the room when we discussed these issues in Gran Canaria. At that point the plan was to wait for fanotify and try to convince filesystem developers of recursive directory timestamps. If I remember correctly, Matthew Garrett volunteered to talk to other kernel developers as a next step. Unfortunately, I don't think we had any follow-up discussions about recursive directory timestamps.
fanotify was delayed quite a bit, and we were told that there was nothing we can do to help and it was way too early to start experimenting with it for tracker. Now that it is in mainline, it would probably be easier to help out, but given that it took a long time for a kernel developer to get a subset of the planned features into mainline, I didn't attempt to work on the missing features myself so far.
[ And there are acouple of other things I'd like to see changed. For example, I am pretty sure that tracker's open() calls to files should not be considered accesses in regards to access time. O_NOATIME should be used here, which would reduce the amount of disk writes
We already use O_NOATIME in a few extractors, however, certain libraries don't make this very easy. Not even GIO allows to open a file for reading with O_NOATIME set, as far as I can tell. Also, I don't expect the amount of atime-related writes to be very high with relatime, but I haven't measured this and could be mistaken.
substantially. Also, tracker appears to BSD lock all files it accesses. That looks quite borked. Which other tool is it synchronizing against here? This looks unsecure to do (because the files are often accessible to others), and since these locks are advisory only there needs to be a strict protocol followed by everybody else accessing these files, which I guarantee you there isn't since these are basically all the user's files. Moreover it appears tracker is mixing BSD and POSIX locks, which is dangerous due to ABBA, in particular when used on NFS directories, which will just end up in total chaos since Linux is so stupid to "upgrade" BSD locks on NFS shares to POSIX locks on the way. In any case you should NEVER EVER use POSIX locking, since it is compltely borked anyway. The locking must go. I also see a massive amount of futex calls in strace, i.e. probably some mutexes thrown in the mix to make the locking problems even more interesting, which makes my fingernails roll up, since they apparently are congested all the time? ]
Can you please share your findings in a bug report? As far as I know, tracker itself doesn't use BSD or POSIX locks at all. SQLite is using file locks, but if there are issues with how SQLite is using locks, this should probably be discussed with SQLite upstream.
As a side-note, I myself would like to see more radical changes in how user files will be stored in the future. Ideally, we would stop storing them in traditional directory hierarchies. Among other things, this would completely avoid the need for recursive directory monitoring, recursive directory timestamps, and crawling on startup. On the downside, this would require changes in many applications, although FUSE could certainly help providing a compatibility layer. If anyone is interested in discussing or working on this, let me know.
Regards, Jürg
On Tue, 20.09.11 10:46, Jürg Billeter (j@bitron.ch) wrote:
On Mon, 2011-09-19 at 18:00 +0200, Lennart Poettering wrote:
a) tracker uses inotify recursively and creates a massive number of watches due to that. That is both ugly and doesn't scale. Tracker apparently tries to not take up the full pool of inotfy handles the system provides, but that won't help if you have more than one user on the system. The solution here should probably be fanotify, which allows proper recursive file system watches. So far fanotify has been accessible to root only, which is presumably why tracker doesn't use it. However, the solution here cannot be to work around that fact by using inotify, but must be to invest the necessary kernel work to make fanotify useful from unprivileged processes.
The current recursive inotify watches are definitely far from ideal. We were looking forward to fanotify entering mainline, however, last time I checked it was still missing too much functionality to even match inotify for our purposes. In addition to permission issues it did not support notification of directory events at all. If I remember correctly, this was all planned for future iterations but not considered a priority use case for fanotify. Does anyone know Eric's current plans?
Eric is an awesome, very responsive guy. You can catch him on IRC easily. Talk to him directly!
I'd really prefer if we could fix these fundamental issues before we enable tracker. To me it appears here as if we are trying to make the second step before the first.
I think you were in the room when we discussed these issues in Gran Canaria. At that point the plan was to wait for fanotify and try to convince filesystem developers of recursive directory timestamps. If I remember correctly, Matthew Garrett volunteered to talk to other kernel developers as a next step. Unfortunately, I don't think we had any follow-up discussions about recursive directory timestamps.
fanotify was delayed quite a bit, and we were told that there was nothing we can do to help and it was way too early to start experimenting with it for tracker. Now that it is in mainline, it would probably be easier to help out, but given that it took a long time for a kernel developer to get a subset of the planned features into mainline, I didn't attempt to work on the missing features myself so far.
GC is three years ago. a lot of time lost by not getting the kernel fixed.
[ And there are acouple of other things I'd like to see changed. For example, I am pretty sure that tracker's open() calls to files should not be considered accesses in regards to access time. O_NOATIME should be used here, which would reduce the amount of disk writes
We already use O_NOATIME in a few extractors, however, certain libraries don't make this very easy. Not even GIO allows to open a file for reading with O_NOATIME set, as far as I can tell. Also, I don't expect the amount of atime-related writes to be very high with relatime, but I haven't measured this and could be mistaken.
GIO and all the libs you are using are open source, so prepare patches! I am quite sure you even have commit access to gio, right?
substantially. Also, tracker appears to BSD lock all files it accesses. That looks quite borked. Which other tool is it synchronizing against here? This looks unsecure to do (because the files are often accessible to others), and since these locks are advisory only there needs to be a strict protocol followed by everybody else accessing these files, which I guarantee you there isn't since these are basically all the user's files. Moreover it appears tracker is mixing BSD and POSIX locks, which is dangerous due to ABBA, in particular when used on NFS directories, which will just end up in total chaos since Linux is so stupid to "upgrade" BSD locks on NFS shares to POSIX locks on the way. In any case you should NEVER EVER use POSIX locking, since it is compltely borked anyway. The locking must go. I also see a massive amount of futex calls in strace, i.e. probably some mutexes thrown in the mix to make the locking problems even more interesting, which makes my fingernails roll up, since they apparently are congested all the time? ]
Can you please share your findings in a bug report? As far as I know, tracker itself doesn't use BSD or POSIX locks at all. SQLite is using file locks, but if there are issues with how SQLite is using locks, this should probably be discussed with SQLite upstream.
My findings are basically a straight-forward reading of "strace -p $(pidof tracker-miner-fs)" which I ran to track down that .desktop file looping issue i reported.
As a side-note, I myself would like to see more radical changes in how user files will be stored in the future. Ideally, we would stop storing them in traditional directory hierarchies. Among other things, this would completely avoid the need for recursive directory monitoring, recursive directory timestamps, and crawling on startup. On the downside, this would require changes in many applications, although FUSE could certainly help providing a compatibility layer. If anyone is interested in discussing or working on this, let me know.
Personally I must say I see big value in supporting all kinds of files, not just files generated by GNOME apps. For me personally having tracker index my source code sounds like the most exciting use of tracker at all.
Lennart
On Tue, 2011-09-20 at 10:46 +0200, Jürg Billeter wrote:
fanotify was delayed quite a bit, and we were told that there was nothing we can do to help and it was way too early to start experimenting with it for tracker. Now that it is in mainline, it would probably be easier to help out, but given that it took a long time for a kernel developer to get a subset of the planned features into mainline, I didn't attempt to work on the missing features myself so far.
Looks like the 'wait for the kernel to spontaneously grow the features we need' approach is not working great here. You should make your case for what you want to see in fanotify, and push for it.
We already use O_NOATIME in a few extractors, however, certain libraries don't make this very easy. Not even GIO allows to open a file for reading with O_NOATIME set, as far as I can tell. Also, I don't expect the amount of atime-related writes to be very high with relatime, but I haven't measured this and could be mistaken.
A GIO feature request for this has been filed ?
As a side-note, I myself would like to see more radical changes in how user files will be stored in the future. Ideally, we would stop storing them in traditional directory hierarchies. Among other things, this would completely avoid the need for recursive directory monitoring, recursive directory timestamps, and crawling on startup. On the downside, this would require changes in many applications, although FUSE could certainly help providing a compatibility layer. If anyone is interested in discussing or working on this, let me know.
I fear that this kind of 'radical' vision is not going to help making tracker successful in the short to medium term. I would really like to see tracker be successful in the use cases that it currently claims to cover before I would trust all my data to it...if ever.
Matthias
desktop@lists.stg.fedoraproject.org