It's been a number of weeks since my last update, so I thought I'd let everyone know where things are at.
I've spent most of these last few weeks reworking zchunk's API to make it easier to use and more in line with what other compression tools use, and I'm mostly happy with it now. Writing a simple zchunk file can be done in a few lines of code, while reading one is also simple.
I've also added zchunk support to createrepo_c (see https://github.com/jdieter/createrepo_c), but I haven't yet created a pull request because I'm not sure if my current implementation is the best method. My current effort only zchunks primary.xml, filelists.xml and other.xml and doesn't change the sort order.
The one area of zchunk that still needs some API work is the download and chunk merge API, and I'm planning to clean that up as I add zchunk support to librepo.
Some things I'd still like to add to zchunk: * A python API * GPG signatures in addition to (possibly replacing) overall data checksum * An expiry field? (I'm obviously thinking about signed repodata here) * Tests * More tests * Other arch testing (it's currently only tested on x86_64)
I'd welcome any feedback or flames.
Jonathan
On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter jdieter@gmail.com wrote:
It's been a number of weeks since my last update, so I thought I'd let everyone know where things are at.
I've spent most of these last few weeks reworking zchunk's API to make it easier to use and more in line with what other compression tools use, and I'm mostly happy with it now. Writing a simple zchunk file can be done in a few lines of code, while reading one is also simple.
I've also added zchunk support to createrepo_c (see https://github.com/jdieter/createrepo_c), but I haven't yet created a pull request because I'm not sure if my current implementation is the best method. My current effort only zchunks primary.xml, filelists.xml and other.xml and doesn't change the sort order.
Fedora COPR, Open Build Service, Mageia, and openSUSE also append AppStream data to repodata to ship AppStream information. Is there a way we can incorporate this into zck rpm-md? There's been an issue for a while to support generating the AppStream metadata as part of the createrepo_c run using the libappstream-builder library[1], which may lend itself to doing this properly.
[1]: https://github.com/rpm-software-management/createrepo_c/issues/75
The one area of zchunk that still needs some API work is the download and chunk merge API, and I'm planning to clean that up as I add zchunk support to librepo.
Some things I'd still like to add to zchunk:
- A python API
- GPG signatures in addition to (possibly replacing) overall data checksum
I'd rather not lose checksums, but GPG signatures would definitely be necessary, as openSUSE needs them, and we'd definitely like to have them in Fedora[2], COPR[3], and Mageia[4].
[2]: https://pagure.io/releng/issue/133 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1373331 [4]: https://bugs.mageia.org/show_bug.cgi?id=19432
- An expiry field? (I'm obviously thinking about signed repodata here)
Do we need an expiry field if we properly processed the key revocation/expiration in librepo? My understanding is that current hiccup with it is that we don't, and that the GPG keyring used in librepo is independent of the RPM keyring (which it shouldn't be).
On Mon, 2018-04-16 at 09:00 -0400, Neal Gompa wrote:
On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter jdieter@gmail.com wrote:
I've also added zchunk support to createrepo_c (see https://github.com/jdieter/createrepo_c), but I haven't yet created a pull request because I'm not sure if my current implementation is the best method. My current effort only zchunks primary.xml, filelists.xml and other.xml and doesn't change the sort order.
Fedora COPR, Open Build Service, Mageia, and openSUSE also append AppStream data to repodata to ship AppStream information. Is there a way we can incorporate this into zck rpm-md? There's been an issue for a while to support generating the AppStream metadata as part of the createrepo_c run using the libappstream-builder library[1], which may lend itself to doing this properly.
Is it repomd.xml that actually gets changed or primary.xml / filelists.xml / other.xml?
If it's repomd.xml, then it really shouldn't make any difference because I'm not currently zchunking it. As far as I can see, the only reason to zchunk it would be to have an embedded GPG signature once they're supported in zchunk.
The one area of zchunk that still needs some API work is the download and chunk merge API, and I'm planning to clean that up as I add zchunk support to librepo.
Some things I'd still like to add to zchunk:
- A python API
- GPG signatures in addition to (possibly replacing) overall data checksum
I'd rather not lose checksums, but GPG signatures would definitely be necessary, as openSUSE needs them, and we'd definitely like to have them in Fedora[2], COPR[3], and Mageia[4].
Fair enough. Would we want zchunk to support multiple GPG signatures or is one enough?
- An expiry field? (I'm obviously thinking about signed repodata here)
Do we need an expiry field if we properly processed the key revocation/expiration in librepo? My understanding is that current hiccup with it is that we don't, and that the GPG keyring used in librepo is independent of the RPM keyring (which it shouldn't be).
Ah, that makes sense. Forget that idea then.
Jonathan
On Mon, Apr 16, 2018 at 12:32 PM, Jonathan Dieter jdieter@gmail.com wrote:
On Mon, 2018-04-16 at 09:00 -0400, Neal Gompa wrote:
On Mon, Apr 16, 2018 at 8:47 AM, Jonathan Dieter jdieter@gmail.com wrote:
I've also added zchunk support to createrepo_c (see https://github.com/jdieter/createrepo_c), but I haven't yet created a pull request because I'm not sure if my current implementation is the best method. My current effort only zchunks primary.xml, filelists.xml and other.xml and doesn't change the sort order.
Fedora COPR, Open Build Service, Mageia, and openSUSE also append AppStream data to repodata to ship AppStream information. Is there a way we can incorporate this into zck rpm-md? There's been an issue for a while to support generating the AppStream metadata as part of the createrepo_c run using the libappstream-builder library[1], which may lend itself to doing this properly.
Is it repomd.xml that actually gets changed or primary.xml / filelists.xml / other.xml?
If it's repomd.xml, then it really shouldn't make any difference because I'm not currently zchunking it. As far as I can see, the only reason to zchunk it would be to have an embedded GPG signature once they're supported in zchunk.
repomd.xml is being changed, so it should be fine, then. It'd be nice to be able to chunk up AppStream data eventually, though.
The one area of zchunk that still needs some API work is the download and chunk merge API, and I'm planning to clean that up as I add zchunk support to librepo.
Some things I'd still like to add to zchunk:
- A python API
- GPG signatures in addition to (possibly replacing) overall data checksum
I'd rather not lose checksums, but GPG signatures would definitely be necessary, as openSUSE needs them, and we'd definitely like to have them in Fedora[2], COPR[3], and Mageia[4].
Fair enough. Would we want zchunk to support multiple GPG signatures or is one enough?
Historically, we've used only one GPG key because that's what we do with RPMs, but technically you can specify multiple keys in a .repo file for Yum, DNF, and Zypper to use for validating packages and metadata, so it's absolutely possible to have more. I'd probably suggest if it's not too difficult, supporting multiple signatures.
Hello Jonathan,
On Mon, Apr 16, 2018 at 2:47 PM, Jonathan Dieter jdieter@gmail.com wrote:
It's been a number of weeks since my last update, so I thought I'd let everyone know where things are at.
I've spent most of these last few weeks reworking zchunk's API to make it easier to use and more in line with what other compression tools use, and I'm mostly happy with it now. Writing a simple zchunk file can be done in a few lines of code, while reading one is also simple.
I've also added zchunk support to createrepo_c (see https://github.com/jdieter/createrepo_c), but I haven't yet created a pull request because I'm not sure if my current implementation is the best method. My current effort only zchunks primary.xml, filelists.xml and other.xml and doesn't change the sort order.
Once it is in createrepo_c, we could try employing it in Fedora COPR.
The one area of zchunk that still needs some API work is the download and chunk merge API, and I'm planning to clean that up as I add zchunk support to librepo.
Some things I'd still like to add to zchunk:
- A python API
- GPG signatures in addition to (possibly replacing) overall data checksum
- An expiry field? (I'm obviously thinking about signed repodata here)
- Tests
- More tests
- Other arch testing (it's currently only tested on x86_64)
I'd welcome any feedback or flames.
Jonathan _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists. fedoraproject.org
On Tue, 2018-04-17 at 09:08 +0200, Michal Novotny wrote:
Hello Jonathan,
On Mon, Apr 16, 2018 at 2:47 PM, Jonathan Dieter jdieter@gmail.com wrote:
It's been a number of weeks since my last update, so I thought I'd let everyone know where things are at.
I've spent most of these last few weeks reworking zchunk's API to make it easier to use and more in line with what other compression tools use, and I'm mostly happy with it now. Writing a simple zchunk file can be done in a few lines of code, while reading one is also simple. I've also added zchunk support to createrepo_c (see https://github.com/jdieter/createrepo_c), but I haven't yet created a pull request because I'm not sure if my current implementation is the best method. My current effort only zchunks primary.xml, filelists.xml and other.xml and doesn't change the sort order.
Once it is in createrepo_c, we could try employing it in Fedora COPR.
Ok, done. This copr currently has zchunk and createrepo_c in it. I did have to disable the python tests for createrepo_c which means I probably wouldn't use the python bindings with this release.
https://copr.fedorainfracloud.org/coprs/jdieter/zchunk/
To enable zchunk creation, run createrepo_c --zck. I've created dictionaries that are appropriate for Fedora's metadata at https://www.jdieter.net/downloads/zchunk-dicts, and they can be used with --zck-primary-dict, --zck-filelists-dict and --zck-other-dict.
To make zchunk downloads efficient, the same dictionary must be used each time metadata is generated. Dictionaries aren't mandatory, but they greatly reduce the size of the compressed metadata.
Jonathan
On Tue, Apr 17, 2018 at 4:20 PM, Jonathan Dieter jdieter@gmail.com wrote:
On Tue, 2018-04-17 at 09:08 +0200, Michal Novotny wrote:
Hello Jonathan,
On Mon, Apr 16, 2018 at 2:47 PM, Jonathan Dieter jdieter@gmail.com wrote:
It's been a number of weeks since my last update, so I thought I'd let everyone know where things are at.
I've spent most of these last few weeks reworking zchunk's API to make it easier to use and more in line with what other compression tools use, and I'm mostly happy with it now. Writing a simple zchunk file can be done in a few lines of code, while reading one is also simple. I've also added zchunk support to createrepo_c (see https://github.com/jdieter/createrepo_c), but I haven't yet created a pull request because I'm not sure if my current implementation is the best method. My current effort only zchunks primary.xml, filelists.xml and other.xml and doesn't change the sort order.
Once it is in createrepo_c, we could try employing it in Fedora COPR.
Ok, done. This copr currently has zchunk and createrepo_c in it. I did have to disable the python tests for createrepo_c which means I probably wouldn't use the python bindings with this release.
https://copr.fedorainfracloud.org/coprs/jdieter/zchunk/
To enable zchunk creation, run createrepo_c --zck. I've created dictionaries that are appropriate for Fedora's metadata at https://www.jdieter.net/downloads/zchunk-dicts, and they can be used with --zck-primary-dict, --zck-filelists-dict and --zck-other-dict.
To make zchunk downloads efficient, the same dictionary must be used each time metadata is generated. Dictionaries aren't mandatory, but they greatly reduce the size of the compressed metadata.
Alright, I will deploy it on staging. But we will need to get it into Fedora's DistGit first to be able to use it on COPR production instance afterwards... Anyway, looking forward to start experimenting with it.
Thank you!
Jonathan _______________________________________________ infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists. fedoraproject.org
On Tue, 2018-04-17 at 17:39 +0200, Michal Novotny wrote:
On Tue, Apr 17, 2018 at 4:20 PM, Jonathan Dieter jdieter@gmail.com wrote:
On Tue, 2018-04-17 at 09:08 +0200, Michal Novotny wrote:
Hello Jonathan,
Once it is in createrepo_c, we could try employing it in Fedora COPR.
Ok, done. This copr currently has zchunk and createrepo_c in it. I did have to disable the python tests for createrepo_c which means I probably wouldn't use the python bindings with this release.
https://copr.fedorainfracloud.org/coprs/jdieter/zchunk/
To enable zchunk creation, run createrepo_c --zck. I've created dictionaries that are appropriate for Fedora's metadata at https://www.jdieter.net/downloads/zchunk-dicts, and they can be used with --zck-primary-dict, --zck-filelists-dict and --zck-other-dict.
To make zchunk downloads efficient, the same dictionary must be used each time metadata is generated. Dictionaries aren't mandatory, but they greatly reduce the size of the compressed metadata.
Alright, I will deploy it on staging. But we will need to get it into Fedora's DistGit first to be able to use it on COPR production instance afterwards... Anyway, looking forward to start experimenting with it.
Thank you!
I'm assuming that you're referring here to getting zchunk packaged into Fedora. I'd really like to finalize the file format (we're close, but I still need a good way of storing signatures in it) and the download API before releasing it into Fedora proper.
I would recommend using the dicts mentioned above as they give me over 40% space savings for both other.xml.zck and primary.xml.zck. Do please let me know if you run into any problems.
Thanks, Jonathan
On Tue, Apr 17, 2018 at 3:05 PM, Jonathan Dieter jdieter@gmail.com wrote:
On Tue, 2018-04-17 at 17:39 +0200, Michal Novotny wrote:
On Tue, Apr 17, 2018 at 4:20 PM, Jonathan Dieter jdieter@gmail.com wrote:
On Tue, 2018-04-17 at 09:08 +0200, Michal Novotny wrote:
Hello Jonathan,
Once it is in createrepo_c, we could try employing it in Fedora COPR.
Ok, done. This copr currently has zchunk and createrepo_c in it. I did have to disable the python tests for createrepo_c which means I probably wouldn't use the python bindings with this release.
https://copr.fedorainfracloud.org/coprs/jdieter/zchunk/
To enable zchunk creation, run createrepo_c --zck. I've created dictionaries that are appropriate for Fedora's metadata at https://www.jdieter.net/downloads/zchunk-dicts, and they can be used with --zck-primary-dict, --zck-filelists-dict and --zck-other-dict.
To make zchunk downloads efficient, the same dictionary must be used each time metadata is generated. Dictionaries aren't mandatory, but they greatly reduce the size of the compressed metadata.
Alright, I will deploy it on staging. But we will need to get it into Fedora's DistGit first to be able to use it on COPR production instance afterwards... Anyway, looking forward to start experimenting with it.
Thank you!
I'm assuming that you're referring here to getting zchunk packaged into Fedora. I'd really like to finalize the file format (we're close, but I still need a good way of storing signatures in it) and the download API before releasing it into Fedora proper.
I'm looking forward to this!
I would recommend using the dicts mentioned above as they give me over 40% space savings for both other.xml.zck and primary.xml.zck. Do please let me know if you run into any problems.
Are those dictionaries Fedora specific? If so, how can other distributions generate similar ones? If not, still, how were they made? :)
On Mon, 2018-04-23 at 00:27 -0400, Neal Gompa wrote:
On Tue, Apr 17, 2018 at 3:05 PM, Jonathan Dieter jdieter@gmail.com wrote:
I'm assuming that you're referring here to getting zchunk packaged into Fedora. I'd really like to finalize the file format (we're close, but I still need a good way of storing signatures in it) and the download API before releasing it into Fedora proper.
I'm looking forward to this!
I've updated the file format to allow for multiple signatures, updated the zchunk code to recognize the existence of a signature (while still not checking it), and have released as zchunk-0.3.0 in COPR. I've also added in 32-bits of flags that we can use to extend the format in a backwards-compatible way.
The current zchunk format description is at: https://github.com/jdieter/zchunk/blob/master/zchunk_format.txt
I would recommend using the dicts mentioned above as they give me over
40% space savings for both other.xml.zck and primary.xml.zck. Do please let me know if you run into any problems.
Are those dictionaries Fedora specific? If so, how can other distributions generate similar ones? If not, still, how were they made? :)
They were generated from Fedora metadata, but they should help with any distribution's repodata. I generated them by splitting a few day's worth of metadata along package boundaries, stripping out any checksums, and then running zstd --train * on the directory containing the split metadata. The script I used is available at https://www.jdieter.net/downloads/zchunk-dicts/split.py, and I hope to write up proper instructions at some point.
Jonathan
I've released zchunk-0.4.0 which has the last (hopefully) backwards- incompatible file format change. Files created by zchunk < 0.4.0 will be unreadable by 0.4.0+.
Zchunk 0.4.0 now has four bytes of flags, so, barring any bone-headed disasters in the file format, any further file format changes will be backwards-compatible.
The latest release is available here: https://github.com/jdieter/zchunk/archive/0.4.0.tar.gz
The file format is documented here: https://github.com/jdieter/zchunk/blob/master/zchunk_format.txt
A copr with the latest release (and zchunk-enabled createrepo_c) is here: https://copr.fedorainfracloud.org/coprs/jdieter/zchunk
My next step is to add zchunk support to librepo.
A quick summary of the features I wanted to add: On Mon, 2018-04-16 at 15:47 +0300, Jonathan Dieter wrote:
- A python API
Still needs to be done.
- GPG signatures in addition to (possibly replacing) overall data checksum
Signatures have now been added to the file format in addition to the overall checksum. The current implementation can't actually read or add a signature, though.
- An expiry field? (I'm obviously thinking about signed repodata here)
As per feedback, this isn't necessary.
- Tests
- More tests
The framework is in place for this, and I have added a single test case. More to come.
- Other arch testing (it's currently only tested on x86_64)
I've built and tested on ARM, ppc64le, i686 and x86_64 and everything seems to be working just fine. I have not yet tested on aarch64.
Jonathan
infrastructure@lists.fedoraproject.org