This is a start to check binary rpm packages for consistency. Right now mostly the rpm header is checked to get a feeling how much "strange" binary rpm packages might be out there. It has two modes of checking, one for the current Fedora Development tree with more strict checks and a more relaxed one that should work for all existing rpm packages, also other distributions.
I'd be interested to get feedback on what output is generated for rpm addon expositories and non - Red Hat distributions if the script generates warning messages. At least for Fedora Core only very few rpm tags are actually used in the rpm header.
Examples usage: ./pyrpm.py --strict /mirror/fedora/development/i386/Fedora/RPMS/*.rpm
Checking all rpms: locate .rpm | xargs ./pyrpm.py find /mirror/linux -name "*.rpm" -type f -print0 2>/dev/null | xargs -0 ./pyrpm.py
greetings,
Florian La Roche
On Saturday 29 January 2005 22:21, Florian La Roche wrote:
I'd be interested to get feedback on what output is generated for rpm addon expositories and non - Red Hat distributions if the script generates warning messages.
For http://python.org/pyvault, only unknown packager: PyVault Repository http://python.org/pyvault unknown vendor: PyVault
when the --strict flag is used.
thanks,
On Sat, 29 Jan 2005, Florian La Roche wrote:
This is a start to check binary rpm packages for consistency. Right now mostly the rpm header is checked to get a feeling how much "strange" binary rpm packages might be out there. It has two modes of checking, one for the current Fedora Development tree with more strict checks and a more relaxed one that should work for all existing rpm packages, also other distributions.
I'd be interested to get feedback on what output is generated for rpm addon expositories and non - Red Hat distributions if the script generates warning messages. At least for Fedora Core only very few rpm tags are actually used in the rpm header.
Examples usage: ./pyrpm.py --strict /mirror/fedora/development/i386/Fedora/RPMS/*.rpm
Checking all rpms: locate .rpm | xargs ./pyrpm.py find /mirror/linux -name "*.rpm" -type f -print0 2>/dev/null | xargs -0 ./pyrpm.py
Hi Florian,
I've ran it on about 28000 packages, mostly unknown tag values:
unknown distribution: Dag Apt Repository for Red Hat 7.3 unknown packager: Dries Verachtert dries@ulyssis.org unknown vendor: Dag Apt Repository, http://dag.wieers.com/apt/
However it also triggered a problem:
ValueError: amavisd-new-milter-2.2.0-2.0.rh8.test.i386.rpm: wrong data in rpm lead Traceback (most recent call last): File "./pyrpm.py", line 676, in ? verifyAllRpms() File "./pyrpm.py", line 657, in verifyAllRpms rpm = verifyRpm(a, legacy) File "./pyrpm.py", line 583, in verifyRpm if rpm.readHeader(): File "./pyrpm.py", line 308, in readHeader self.parseLead(leaddata) File "./pyrpm.py", line 110, in parseLead self.raiseErr("wrong data in rpm lead") File "./pyrpm.py", line 59, in raiseErr raise ValueError, "%s: %s" % (self.filename, err)
on files like:
perl-Tk-804.026-1.rhfc1.test.i386.rpm amavisd-new-2.2.0-2.0.rh8.test.i386.rpm xpde-0.4.0-1.1.fc2.test.i386.rpm
Fortunately all of these have been renamed files where the repotag has been changed to 'test'. Something I frequently do after a package didn't go through QA but was still worth distributing.
After a while, when it started with kernel-module packages, I got this:
ValueError: kernel-module-ov511-2.25-0_2.4.20_20.9.dag.rh90.i686.rpm: unknown prog: ['/sbin/depmod', '-ae'] Traceback (most recent call last): File "./pyrpm.py", line 676, in ? verifyAllRpms() File "./pyrpm.py", line 663, in verifyAllRpms rrpm = RRpm(rpm) File "./pyrpm.py", line 509, in __init__ (self.post, self.postprog) = rpm.getScript("postin", "postinprog") File "./pyrpm.py", line 415, in getScript self.raiseErr("unknown prog: %s" % prog) File "./pyrpm.py", line 59, in raiseErr raise ValueError, "%s: %s" % (self.filename, err)
These messages are printed for each package. The command I ran was:
find /dar/packages/ -type f -name "*.rpm" | xargs -i ./pyrpm.py --strict '{}' ; | grep -vE 'unknown (packager|vendor|distribution)' | sort | uniq -c
I ended it after a lot of these 'errors'. Is the traceback intentional ?
Kind regards, -- dag wieers, dag@wieers.com, http://dag.wieers.com/ -- [all I want is a warm bed and a kind word and unlimited power]
Hello Dag,
I've copied a newer version to http://people.redhat.com/laroche/pyrpm/ It can now also read the cpio data part of rpm packages and has several items cleaned up. I'd be interested to hear more feedback from python experts about possibly improvements. ;-)
The "--strict" option should only be used for Fedora Core development branch.
I've ran it on about 28000 packages, mostly unknown tag values:
unknown distribution: Dag Apt Repository for Red Hat 7.3 unknown packager: Dries Verachtert dries@ulyssis.org unknown vendor: Dag Apt Repository, http://dag.wieers.com/apt/
Yepp, content check for "--strict" not useful for non-FC-devel.
However it also triggered a problem:
ValueError: amavisd-new-milter-2.2.0-2.0.rh8.test.i386.rpm: wrong data in rpm lead Traceback (most recent call last): File "./pyrpm.py", line 676, in ? verifyAllRpms() File "./pyrpm.py", line 657, in verifyAllRpms rpm = verifyRpm(a, legacy) File "./pyrpm.py", line 583, in verifyRpm if rpm.readHeader(): File "./pyrpm.py", line 308, in readHeader self.parseLead(leaddata) File "./pyrpm.py", line 110, in parseLead self.raiseErr("wrong data in rpm lead") File "./pyrpm.py", line 59, in raiseErr raise ValueError, "%s: %s" % (self.filename, err)
on files like:
perl-Tk-804.026-1.rhfc1.test.i386.rpm amavisd-new-2.2.0-2.0.rh8.test.i386.rpm xpde-0.4.0-1.1.fc2.test.i386.rpm
Fortunately all of these have been renamed files where the repotag has been changed to 'test'. Something I frequently do after a package didn't go through QA but was still worth distributing.
Should also not be happen without the "--strict" option.
After a while, when it started with kernel-module packages, I got this:
ValueError: kernel-module-ov511-2.25-0_2.4.20_20.9.dag.rh90.i686.rpm: unknown prog: ['/sbin/depmod', '-ae'] Traceback (most recent call last): File "./pyrpm.py", line 676, in ? verifyAllRpms() File "./pyrpm.py", line 663, in verifyAllRpms rrpm = RRpm(rpm) File "./pyrpm.py", line 509, in __init__ (self.post, self.postprog) = rpm.getScript("postin", "postinprog") File "./pyrpm.py", line 415, in getScript self.raiseErr("unknown prog: %s" % prog) File "./pyrpm.py", line 59, in raiseErr raise ValueError, "%s: %s" % (self.filename, err)
These messages are printed for each package. The command I ran was:
find /dar/packages/ -type f -name "*.rpm" | xargs -i ./pyrpm.py --strict '{}' ; | grep -vE 'unknown (packager|vendor|distribution)' | sort | uniq -c
I ended it after a lot of these 'errors'. Is the traceback intentional ?
I should change the tracebacks into errors only, so that you can still enable this option and look at the items we would not like to have in FC-devel.
Thanks for running this test. Looks like the rpm parser is stable enough for existing packages and we mostly deal with the "noise" the parser is also checking right now.
greetings,
Florian La Roche
Florian La Roche wrote:
Hello Dag,
I've copied a newer version to http://people.redhat.com/laroche/pyrpm/ It can now also read the cpio data part of rpm packages and has several items cleaned up. I'd be interested to hear more feedback from python experts about possibly improvements. ;-)
If python rpm experts is your goal, try the rpm-python list please.
The "--strict" option should only be used for Fedora Core development branch.
I've ran it on about 28000 packages, mostly unknown tag values:
unknown distribution: Dag Apt Repository for Red Hat 7.3 unknown packager: Dries Verachtert dries@ulyssis.org unknown vendor: Dag Apt Repository, http://dag.wieers.com/apt/
Yepp, content check for "--strict" not useful for non-FC-devel.
However it also triggered a problem:
ValueError: amavisd-new-milter-2.2.0-2.0.rh8.test.i386.rpm: wrong data in rpm lead Traceback (most recent call last): File "./pyrpm.py", line 676, in ? verifyAllRpms() File "./pyrpm.py", line 657, in verifyAllRpms rpm = verifyRpm(a, legacy) File "./pyrpm.py", line 583, in verifyRpm if rpm.readHeader(): File "./pyrpm.py", line 308, in readHeader self.parseLead(leaddata) File "./pyrpm.py", line 110, in parseLead self.raiseErr("wrong data in rpm lead") File "./pyrpm.py", line 59, in raiseErr raise ValueError, "%s: %s" % (self.filename, err)
on files like:
perl-Tk-804.026-1.rhfc1.test.i386.rpm amavisd-new-2.2.0-2.0.rh8.test.i386.rpm xpde-0.4.0-1.1.fc2.test.i386.rpm
Fortunately all of these have been renamed files where the repotag has been changed to 'test'. Something I frequently do after a package didn't go through QA but was still worth distributing.
Should also not be happen without the "--strict" option.
After a while, when it started with kernel-module packages, I got this:
ValueError: kernel-module-ov511-2.25-0_2.4.20_20.9.dag.rh90.i686.rpm: unknown prog: ['/sbin/depmod', '-ae'] Traceback (most recent call last): File "./pyrpm.py", line 676, in ? verifyAllRpms() File "./pyrpm.py", line 663, in verifyAllRpms rrpm = RRpm(rpm) File "./pyrpm.py", line 509, in __init__ (self.post, self.postprog) = rpm.getScript("postin", "postinprog") File "./pyrpm.py", line 415, in getScript self.raiseErr("unknown prog: %s" % prog) File "./pyrpm.py", line 59, in raiseErr raise ValueError, "%s: %s" % (self.filename, err)
These messages are printed for each package. The command I ran was:
find /dar/packages/ -type f -name "*.rpm" | xargs -i ./pyrpm.py --strict '{}' ; | grep -vE 'unknown (packager|vendor|distribution)' | sort | uniq -c
I ended it after a lot of these 'errors'. Is the traceback intentional ?
I should change the tracebacks into errors only, so that you can still enable this option and look at the items we would not like to have in FC-devel.
Thanks for running this test. Looks like the rpm parser is stable enough for existing packages and we mostly deal with the "noise" the parser is also checking right now.
OK, so you can read *.rpm format in native python. Good, even though you are not verifying signatures or digests of what you are reading.
At the content level, you are attempting explicit enums with limited value sets. You can do that, but if an explicit enum for certain tags is the goal, well, that is more easily arranged in rpmbuild rather than vetting every *.rpm package built in the wild.
At the semantic level, pyrpm is not even close to extracting and verifying immutable header regions, nor the sort properties of tag value sets.
And existing packages are really easy to vet. Try pkgs produced by rpm-3.0.4 and earlier for some real fun. %description and %verifyscript, exercise left for pyrpm.py.
Have fun!
73 de Jeff
devel@lists.stg.fedoraproject.org