While trying to recreate the mm2_crawler crash without the MirrorManager database as backend I discovered that the crawler mainly uses python's httplib to do all the HEAD requests. For repomd.xml file, which are actually downloaded, the crawler switches to urlgrabber. Which seems to be problematic in threaded applications. Or in combination with httplib. Or something.
The easiest solution seems to be to rewrite the single urlgrabber.urlread() to use one of the other available methods.
So a question to the python experts. Which implementation is the "best" to download a single repomd.xml via either http or ftp?
I would replace it with urllib2. Is that the correct replacement?
Adrian
On Fri, 10 Apr 2015 14:41:22 +0200 Adrian Reber adrian@lisas.de wrote:
While trying to recreate the mm2_crawler crash without the MirrorManager database as backend I discovered that the crawler mainly uses python's httplib to do all the HEAD requests. For repomd.xml file, which are actually downloaded, the crawler switches to urlgrabber. Which seems to be problematic in threaded applications. Or in combination with httplib. Or something.
Ah. Great detective work!
The easiest solution seems to be to rewrite the single urlgrabber.urlread() to use one of the other available methods.
So a question to the python experts. Which implementation is the "best" to download a single repomd.xml via either http or ftp?
I would replace it with urllib2. Is that the correct replacement?
I would think that or python-requests? Not sure...
kevin
infrastructure@lists.fedoraproject.org