Hey, I think we might have found a bug in the mirror crawler where it did not do the repomd sha256sum check if a mirror is checked via FTP. I think the crawler might still need a good bit of cleanup apart from this, but here is an initial attempt at a patch to fix this:
http://ricky.fedorapeople.org/mirrormanager/0001-Check-sha256sum-of-repomd.x...
I did some testing of this against fedora.bu.edu on bapp1 and it seemed to mark the F10/F11 repodata directories outdated as expected.
Given the mirror issues that we've been havin recently, it might be worth considering live patching this until the next mirrormanager release.
Thanks, Ricky
On 2009-07-20 12:13:41 AM, Ricky Zhou wrote:
Hey, I think we might have found a bug in the mirror crawler where it did not do the repomd sha256sum check if a mirror is checked via FTP. I think the crawler might still need a good bit of cleanup apart from this, but here is an initial attempt at a patch to fix this:
http://ricky.fedorapeople.org/mirrormanager/0001-Check-sha256sum-of-repomd.x...
I just took a closer look at this with Matt, and it turns out that my extra code in this patch shouldn't be necessary (and in fact, doesn't seem to run at all). I'm going to look at testing this more on another outdated site.
Thanks, Ricky
On 2009-07-20 12:28:34 AM, Ricky Zhou wrote:
I just took a closer look at this with Matt, and it turns out that my extra code in this patch shouldn't be necessary (and in fact, doesn't seem to run at all). I'm going to look at testing this more on another outdated site.
Hi, Matt and I just spoke on IRC more, and I think we have a slightly better idea of the issue now. I think mirrormanager returning outdated mirrors might have actually been related to the mounts issue as well.
One issue that we realized was that for F10 updates, yum uses a URL similar to
http://mirrors.fedoraproject.org/mirrorlist?repo=updates-released-f$releasev...
to generate the directory to get repodata from. However, this returns the path to pub/fedora.redhat/linux/updates/10/x86_64 on mirrors, and while that may be up to date (since mirrormanager only checks the 10 newest files in that directory, and the recent timestamp issues may have made this test unreliable), the pub/fedora.redhat/linux/updates/10/x86_64/repodata may not be.
In the case of the bu mirror, we found that pub/fedora.redhat/linux/updates/10/x86_64/repodata was properly marked outdated, but pub/fedora.redhat/linux/updates/10/x86_64 was not.
Another issue that Matt mentioned is that report_mirror will tell mirrormanager to mark any directory that the site claims to have as up2date, the idea being that mirrors run rsync && report_mirror. This does seem to be cause issues during mass mirror issues like this though, and Matt also brought up the issue that some mirrors may run report_mirror even if the rsync fails.
Some issues/responses we discussed: 1) For the first issue, we need to mark an entire repository outdated if the repodata is outdated. This should start happening properly as well now that the timestamps issue is fixed, although we can do this explicitly in the code as well. 2) MirrorManager currently doesn't check timestamps, and the solution to this isn't trivial, especially since with FTP, which returns directory listing data as just the text of the output. This is almost impossible to parse accurately, especially when time zones are involved, and when time zone data isn't even returned by FTP. 3) Perhaps it could be good to change some behavior with report_mirror. Right now, when public mirrors run it, it gives the benefit of starting to send traffic to the mirror as soon as possible after syncing, but in situations like the current one, this behavior can lead to outdated mirrors being marked up2date in MirrorManager.
Thanks, Ricky
On Mon, Jul 20, 2009 at 01:27:11 -0400, Ricky Zhou ricky@fedoraproject.org wrote:
- MirrorManager currently doesn't check timestamps, and the solution to this isn't trivial, especially since with FTP, which returns directory listing data as just the text of the output. This is almost impossible to parse accurately, especially when time zones are involved, and when time zone data isn't even returned by FTP.
Maybe you could check a hash of the repomd.xml file? You shouldn't have to track too many different hashes.
On 2009-07-20 09:34:47 AM, Bruno Wolff III wrote:
- MirrorManager currently doesn't check timestamps, and the solution to this isn't trivial, especially since with FTP, which returns directory listing data as just the text of the output. This is almost impossible to parse accurately, especially when time zones are involved, and when time zone data isn't even returned by FTP.
Maybe you could check a hash of the repomd.xml file? You shouldn't have to track too many different hashes.
For what it's worth, this hash checking already happens on repomd.xml files for mirrors that are crawled via HTTP, and my patch added that check to FTP mirrors as well.
When talking on IRC with Matt, we realized that the check shouldn't be necessary at all though, since the other files in the repodata are successfully getting the repodata directory marked outdated (and we did confirm that this was happening with the last bu mirror crawl).
Overall, I think the crawling has been working fine even without the timestamp checking (apart from some issues caused by the timestamp problem we recently saw), I just wanted to mention why that was currently disabled.
As another side note, mirrormanager is currently aware of what directories are repositories:
< mdomsch> sure < mdomsch> so, MM does know that that dir is a repository < mdomsch> class Directory: repository = SingleJoin('Repository') < mdomsch> bu the crawler doesn't do anything special with that knowledge < mdomsch> perhaps it should < mdomsch> by definition, a Repository is a Directory that has a child directory named 'repodata' < mdomsch> but the whole directory tree starting at that Directory down, is part of the repository
So all of the framework should be in place for marking an entire repository out of date if the repodata is out of date.
Thanks, Ricky
infrastructure@lists.fedoraproject.org