Source RPM size

List overview All Threads
Download

newer

older

[PATCH] Comment on the need for...

tarballs vs ABI xml files in...

Florian Weimer

7 Jul 2017 7 Jul '17

4:04 p.m.

Would it concern you if the source RPM size grew to about 110 MiB?

This data would not have to be uploaded with a “fedpkg new-sources” command for every upstream import, only once per Fedora release cycle.

Thanks, Florian

Show replies by date

Florian Weimer

7 Mar 7 Mar

6:46 a.m.

On 07/07/2017 11:04 PM, Florian Weimer wrote:

...

Would it concern you if the source RPM size grew to about 110 MiB?

This data would not have to be uploaded with a “fedpkg new-sources” command for every upstream import, only once per Fedora release cycle.

I don't think anyone has replied to this question so far.

We can probably bring it down a bit (to ~95 MiB or thereabouts), but it obviously will grow again.

The background of my question is whether we can ship the entire upstream Git history in the SRPM, and add a Git bundle on top of that with the Fedora changes (which can contain merges from the upstream release branch, among other things). There would not be any patch files anymore, just a baseline bundle (unchanged after the upstream release) and an updates bundle.

There would have to be a tool to create the updates bundle from the original (non-dist-git) Git repository. It would also extract the glibc.spec file from the source Git repository and synthesize a Release: entry and a bit of %changelog contents (listing only those commits since the last manual %changelog update in the committed glibc.spec file).

This is peripherally related to Carlos' ABI checking efforts. Once we follow that model, we could simply put the ABI definitions as plain XML files in a subdirectory in the non-dist-git repository. Changes to the ABI definitions could be reviewed along other changes to it.

Thanks, Florian

Carlos O'Donell

11:11 a.m.

On 03/07/2018 04:46 AM, Florian Weimer wrote:

...

On 07/07/2017 11:04 PM, Florian Weimer wrote:

...
Would it concern you if the source RPM size grew to about 110 MiB?

It would not concern me.

...

...
This data would not have to be uploaded with a “fedpkg new-sources” command for every upstream import, only once per Fedora release cycle.

OK.

...

I don't think anyone has replied to this question so far.

I must have missed this, sorry, or I would have responded.

What really matters to our users is the size of the binary rpms, and so things like locale-archive growing +80MiB is a more serious issue, but we thankfully can combat that easily because we have split the language packs and glibc-all-langpacks is the only thing that grew, not the default glibc package.

...

We can probably bring it down a bit (to ~95 MiB or thereabouts), but it obviously will grow again.

The background of my question is whether we can ship the entire upstream Git history in the SRPM, and add a Git bundle on top of that with the Fedora changes (which can contain merges from the upstream release branch, among other things). There would not be any patch files anymore, just a baseline bundle (unchanged after the upstream release) and an updates bundle.

Is your idea to have a stand-alone SRPM that can be self-hosting e.g. generate it's own files without external tooling?

I encourage this kind of self-sufficient hosting of the SRPM, any non-self-hosting makes things harder for us and users.

The alternative is that we have external tooling in glibc-maintainer-scripts which takes as input all of the source trees (in git) we need to regenerate the tree right?

...

There would have to be a tool to create the updates bundle from the original (non-dist-git) Git repository. It would also extract the glibc.spec file from the source Git repository and synthesize a Release: entry and a bit of %changelog contents (listing only those commits since the last manual %changelog update in the committed glibc.spec file).

Sounds great!

...

This is peripherally related to Carlos' ABI checking efforts. Once we follow that model, we could simply put the ABI definitions as plain XML files in a subdirectory in the non-dist-git repository. Changes to the ABI definitions could be reviewed along other changes to it.

Would we create an upstream Fedora branch that is based on the glibc release, and then layer the ABI baselines there?

Then we would work directly on the upstream Fedora branch? Sounds great.

-- Cheers, Carlos.

Florian Weimer

9 Mar 9 Mar

10:15 a.m.

On 03/07/2018 06:11 PM, Carlos O'Donell wrote:

...

On 03/07/2018 04:46 AM, Florian Weimer wrote:

...
On 07/07/2017 11:04 PM, Florian Weimer wrote:

...
Would it concern you if the source RPM size grew to about 110 MiB?

It would not concern me.

...
...
This data would not have to be uploaded with a “fedpkg new-sources” command for every upstream import, only once per Fedora release cycle.

OK.

...
I don't think anyone has replied to this question so far.

I must have missed this, sorry, or I would have responded.

What really matters to our users is the size of the binary rpms, and so things like locale-archive growing +80MiB is a more serious issue, but we thankfully can combat that easily because we have split the language packs and glibc-all-langpacks is the only thing that grew, not the default glibc package.

...
We can probably bring it down a bit (to ~95 MiB or thereabouts), but it obviously will grow again.

The background of my question is whether we can ship the entire upstream Git history in the SRPM, and add a Git bundle on top of that with the Fedora changes (which can contain merges from the upstream release branch, among other things). There would not be any patch files anymore, just a baseline bundle (unchanged after the upstream release) and an updates bundle.

Is your idea to have a stand-alone SRPM that can be self-hosting e.g. generate it's own files without external tooling?

The goal is to have the complete development history in the SRPM, so you could use it for future development if you wanted. I would keep the actual scripts external.

The source Git repository contains a glibc.spec file. All changes to the actual SRPM glibc.spec originate from there, with the help of the sync script.

The sync script would do something like this:

* Get the head commit from the base bundle in some way (we could put that into glibc.spec if anything else fails).

* Generate a new incremental bundle from that commit to the current HEAD (or more likely the matching branch under origin/, to encourage that changes have been pushed to the source repository).

* Enforce some syntax regarding commit messages (excluding content which comes in through merges from upstream release branches for Fedora). Downstream will probably reject merges altogether, except perhaps for hotfix branches.

* Find the last commit which changed the %changelog tail in glibc.spec.

* Synthesize an ever-increasing number on top of that (something like “git log --pretty=oneline | wc -l”) and use that to generate a release number which is strictly monotonically increasing.

* Summarize the commits since the last %changelog edit, maybe with the help of some headers in the new commits. Add this to the %changelog as a synthesized entry. (From the destination Git, this will patch the existing top-most entry.)

* Summarize the commits since the last sync commit to the destination repository and put that into the commit message, including bug numbers required to get past policy receive hooks which are present in some dist-git implementations, and finally commit to the destination dist-git repository.

It's not completely trivial to implement this, but it's probably of higher long-term value than any polishing of the existing two-way sync scripts I've developed. 8-/

...

I encourage this kind of self-sufficient hosting of the SRPM, any non-self-hosting makes things harder for us and users.

It's self-sufficient, perhaps except for the policy check against the upstream repository (for recognizing merge commits which do not need specially formatted commit messages).

...

The alternative is that we have external tooling in glibc-maintainer-scripts which takes as input all of the source trees (in git) we need to regenerate the tree right?

I would strongly suggest to have that part of the tooling externally, so that fixes become available immediately on all branches. At least in Fedora, the script could perhaps copy itself into the source RPM automatically, though, but it might not be entirely straightforward to recognize if the local version is indeed newer than the version being replaced.

...

...
This is peripherally related to Carlos' ABI checking efforts. Once we follow that model, we could simply put the ABI definitions as plain XML files in a subdirectory in the non-dist-git repository. Changes to the ABI definitions could be reviewed along other changes to it.

Would we create an upstream Fedora branch that is based on the glibc release, and then layer the ABI baselines there?

Yes, we could have an abi/ directory tree with those files there. Any ABI change would also have to update the baseline files. That may not work so well for rawhide, though.

Thanks, Florian

Carlos O'Donell

12:33 p.m.

On 03/09/2018 10:15 AM, Florian Weimer wrote:

...

On 03/07/2018 06:11 PM, Carlos O'Donell wrote:

...
Would we create an upstream Fedora branch that is based on the glibc release, and then layer the ABI baselines there?

Yes, we could have an abi/ directory tree with those files there. Any ABI change would also have to update the baseline files. That may not work so well for rawhide, though.

What would not work well?

What are the "baseline" files?

If we went with upstream branches I would assume they would look like this:

master (glibc) -> master (rawhide)

Here master (rawhide) has an abi/ directory, but it isn't verified against.

When master (glibc) freezes we start doing ABI verification on rawhide spec files.

When we branch to f29, what do we do?

* Branch from release/2.28/master to create f29. * Merge master (rawhide) changes into f29 branch. * Update final abi/* files.

Then I assume we need not modify master (rawhide) abi/ files as f29 beings to deviate if we add GLIBC_PRIVATE symbols. The branch from which the git-bundle is created need to be changed.

Cheers, Carlos.

Florian Weimer

14 Mar 14 Mar

10:23 a.m.

On 03/09/2018 07:33 PM, Carlos O'Donell wrote:

...

On 03/09/2018 10:15 AM, Florian Weimer wrote:

...
On 03/07/2018 06:11 PM, Carlos O'Donell wrote:

...
Would we create an upstream Fedora branch that is based on the glibc release, and then layer the ABI baselines there?

Yes, we could have an abi/ directory tree with those files there. Any ABI change would also have to update the baseline files. That may not work so well for rawhide, though.

What would not work well?

What are the "baseline" files?

If we went with upstream branches I would assume they would look like this:

master (glibc) -> master (rawhide)

Here master (rawhide) has an abi/ directory, but it isn't verified against.

When master (glibc) freezes we start doing ABI verification on rawhide spec files.

Oh, I thought you wanted to do continuous verification during development.

Then the problem is that if you merge from rawhide, you need to make ABI edits at the same time, and it won't be clear which changes in the merge commit come from the upstream changes, and which are due to adjustments of downstream-only (internal) ABI adjustments as the result of resolving (semantic) merge conflicts.

Thanks, Florian

Carlos O'Donell

10:37 a.m.

On 03/14/2018 09:23 AM, Florian Weimer wrote:

...

On 03/09/2018 07:33 PM, Carlos O'Donell wrote:

...
On 03/09/2018 10:15 AM, Florian Weimer wrote:

...
On 03/07/2018 06:11 PM, Carlos O'Donell wrote:

...
Would we create an upstream Fedora branch that is based on the glibc release, and then layer the ABI baselines there?

Yes, we could have an abi/ directory tree with those files there. Any ABI change would also have to update the baseline files. That may not work so well for rawhide, though.

What would not work well?

What are the "baseline" files?

If we went with upstream branches I would assume they would look like this:

master (glibc) -> master (rawhide)

Here master (rawhide) has an abi/ directory, but it isn't verified against.

When master (glibc) freezes we start doing ABI verification on rawhide spec files.

Oh, I thought you wanted to do continuous verification during development.

I do. I didn't want to suggest it upfront though until we'd worked a bit more with verification at the individual release phases.

...

Then the problem is that if you merge from rawhide, you need to make ABI edits at the same time, and it won't be clear which changes in the merge commit come from the upstream changes, and which are due to adjustments of downstream-only (internal) ABI adjustments as the result of resolving (semantic) merge conflicts.

I don't follow. Could you expand on this?

When and what do we merge from rawhide?

Usually we branch from rawhide and then do not make any more changes to the branch.

Cheers, Carlos.

Florian Weimer

10:38 a.m.

On 03/14/2018 04:37 PM, Carlos O'Donell wrote:

...

On 03/14/2018 09:23 AM, Florian Weimer wrote:

...
On 03/09/2018 07:33 PM, Carlos O'Donell wrote:

...
On 03/09/2018 10:15 AM, Florian Weimer wrote:

...
On 03/07/2018 06:11 PM, Carlos O'Donell wrote:

...
Would we create an upstream Fedora branch that is based on the glibc release, and then layer the ABI baselines there?

Yes, we could have an abi/ directory tree with those files there. Any ABI change would also have to update the baseline files. That may not work so well for rawhide, though.

What would not work well?

What are the "baseline" files?

If we went with upstream branches I would assume they would look like this:

master (glibc) -> master (rawhide)

Here master (rawhide) has an abi/ directory, but it isn't verified against.

When master (glibc) freezes we start doing ABI verification on rawhide spec files.

Oh, I thought you wanted to do continuous verification during development.

I do. I didn't want to suggest it upfront though until we'd worked a bit more with verification at the individual release phases.

...
Then the problem is that if you merge from rawhide, you need to make ABI edits at the same time, and it won't be clear which changes in the merge commit come from the upstream changes, and which are due to adjustments of downstream-only (internal) ABI adjustments as the result of resolving (semantic) merge conflicts.

...

I don't follow. Could you expand on this?

When and what do we merge from rawhide?

Sorry, I meant to write “when we merge changes from upstream master into rawhide”.

Thanks, Florian

Carlos O'Donell

3:56 p.m.

On 03/14/2018 09:38 AM, Florian Weimer wrote:

...

...
...
Then the problem is that if you merge from rawhide, you need to make ABI edits at the same time, and it won't be clear which changes in the merge commit come from the upstream changes, and which are due to adjustments of downstream-only (internal) ABI adjustments as the result of resolving (semantic) merge conflicts.

Sorry, I meant to write “when we merge changes from upstream master into rawhide”.

I see two cases:

(a) Upstream has an abi/ directory.

(b) Upstream does not have an abi/ directory.

I also assume that we will be rebasing rawhide against upstream master, keeping our changes ahead of the upstream master.

In the case of (a), any downstream changes to the ABI file would be made in the same individual commits that made changes to code. So during the rebase you would have to consider just one commit at a time as you reapplied them to your new rebased position.

If they were just additions, we could simplify this by moving the additions to xi:include'd files that existed only in downstream.

I have filed this RFE https://sourceware.org/bugzilla/show_bug.cgi?id=22971

In the case of (b) we have several options. My favorite is just to consider abi/ of rawhide to be the point at which *we* start tracking ABI, and so nothing special needs to be done until we get our work upstream and we're in situation (a), using libabigail upstream to do our ABI tracking instead of our bespoke ad-hoc symbol lists.

Cheers, Carlos.

Florian Weimer

14 May 14 May

9:26 a.m.

On 03/14/2018 09:56 PM, Carlos O'Donell wrote:

...

On 03/14/2018 09:38 AM, Florian Weimer wrote:

...
...
...
Then the problem is that if you merge from rawhide, you need to make ABI edits at the same time, and it won't be clear which changes in the merge commit come from the upstream changes, and which are due to adjustments of downstream-only (internal) ABI adjustments as the result of resolving (semantic) merge conflicts.

Sorry, I meant to write “when we merge changes from upstream master into rawhide”.

I see two cases:

(a) Upstream has an abi/ directory.

(b) Upstream does not have an abi/ directory.

I also assume that we will be rebasing rawhide against upstream master, keeping our changes ahead of the upstream master.

In the case of (a), any downstream changes to the ABI file would be made in the same individual commits that made changes to code. So during the rebase you would have to consider just one commit at a time as you reapplied them to your new rebased position.

If they were just additions, we could simplify this by moving the additions to xi:include'd files that existed only in downstream.

I have filed this RFE https://sourceware.org/bugzilla/show_bug.cgi?id=22971

In the case of (b) we have several options. My favorite is just to consider abi/ of rawhide to be the point at which *we* start tracking ABI, and so nothing special needs to be done until we get our work upstream and we're in situation (a), using libabigail upstream to do our ABI tracking instead of our bespoke ad-hoc symbol lists.

I find (b) rather burdensome because the ABI update has to happen as part of the merge commit. So you have to do the merge, do a build, figure out the ABI differences, applied that to the real merge, and then build again. This is not my idea of a good development process.

Thanks, Florian

Carlos O'Donell

9:38 a.m.

On 05/14/2018 10:26 AM, Florian Weimer wrote:

...

On 03/14/2018 09:56 PM, Carlos O'Donell wrote:

...
On 03/14/2018 09:38 AM, Florian Weimer wrote:

...
...
...
Then the problem is that if you merge from rawhide, you need to make ABI edits at the same time, and it won't be clear which changes in the merge commit come from the upstream changes, and which are due to adjustments of downstream-only (internal) ABI adjustments as the result of resolving (semantic) merge conflicts.

Sorry, I meant to write “when we merge changes from upstream master into rawhide”.

I see two cases:

(a) Upstream has an abi/ directory.

(b) Upstream does not have an abi/ directory.

I also assume that we will be rebasing rawhide against upstream master, keeping our changes ahead of the upstream master.

In the case of (a), any downstream changes to the ABI file would be made in the same individual commits that made changes to code. So during the rebase you would have to consider just one commit at a time as you reapplied them to your new rebased position.

If they were just additions, we could simplify this by moving the additions to xi:include'd files that existed only in downstream.

I have filed this RFE https://sourceware.org/bugzilla/show_bug.cgi?id=22971

In the case of (b) we have several options. My favorite is just to consider abi/ of rawhide to be the point at which *we* start tracking ABI, and so nothing special needs to be done until we get our work upstream and we're in situation (a), using libabigail upstream to do our ABI tracking instead of our bespoke ad-hoc symbol lists.

I find (b) rather burdensome because the ABI update has to happen as part of the merge commit. So you have to do the merge, do a build, figure out the ABI differences, applied that to the real merge, and then build again. This is not my idea of a good development process.

What is your baseline for the reference of "burdensome?"

What would define as "good?"

The process you describe is largely mechanical, and easy enough to automate.

Today my process is like this:

(1) Do the merge. (2) Scratch build. (3) Verify scratch build. (4) Commit. (5) Do the final build.

The ABI baseline update and merge commit both happen at (4).

If you poll the rest of the glibc team I'd argue you find the rest of us are doing something very similar. I never omit (2) and (3), and if you're going to do a scratch build, then you just do this:

(1) Do the merge. (2) Scratch build + ABI verify mode. (3) Verify scratch build + Verify and update ABI. (4) Commit. (5) Do the final build.

The number of steps are the same.

Yes, our automated sync script omits (2) and (3), but it need not, and it would just be a mechanical process of waiting a bit longer on a sync to get the results, and slightly safer in the long run?

-- Cheers, Carlos.

Florian Weimer

9:50 a.m.

On 05/14/2018 04:38 PM, Carlos O'Donell wrote:

...

The process you describe is largely mechanical, and easy enough to automate.

If everything is mechanical and easy to automate, why is it that the only person regularly releasing Fedora updates is me?

...

Today my process is like this:

(1) Do the merge. (2) Scratch build. (3) Verify scratch build. (4) Commit. (5) Do the final build.

The ABI baseline update and merge commit both happen at (4).

Do you do a scratch build on all architectures? I don't think this is feasible because it turns every update into a multi-day effort.

...

If you poll the rest of the glibc team I'd argue you find the rest of us are doing something very similar. I never omit (2) and (3), and if you're going to do a scratch build, then you just do this:

(1) Do the merge. (2) Scratch build + ABI verify mode. (3) Verify scratch build + Verify and update ABI. (4) Commit. (5) Do the final build.

The number of steps are the same.

This would need something that dumps out the ABI differences in a base64 blob (similar to what we use to exfiltrate coredumps). Is this included in your patches? I didn't see it there.

I think what you suggests requires local builds, not scratch builds in Koji. This means that one needs to reserve a machine capable of building the relevant Fedora branch. This is currently difficult to automate for various reasons.

Thanks, Florian

Carlos O'Donell

10:38 a.m.

On 05/14/2018 10:50 AM, Florian Weimer wrote:

...

On 05/14/2018 04:38 PM, Carlos O'Donell wrote:

...
The process you describe is largely mechanical, and easy enough to automate.

If everything is mechanical and easy to automate, why is it that the only person regularly releasing Fedora updates is me?

My apologies, I did not intend to make it sound like this was a trivial process. In fact the merge is difficult.

Before you arrived on the team, we just didn't do updates, because we didn't have the resources. And I hope that with your scripts, and the rest of the team we can have a schedule that works so you don't have to do that :-)

...

...
Today my process is like this:

(1) Do the merge. (2) Scratch build. (3) Verify scratch build. (4) Commit. (5) Do the final build.

The ABI baseline update and merge commit both happen at (4).

Do you do a scratch build on all architectures? I don't think this is feasible because it turns every update into a multi-day effort.

I do a scratch build on architectures all the time.

My experience is that this does not turn every update into a multi-day effort.

Yes, you have to wait, you kick it off early, and then go do something else.

...

...
If you poll the rest of the glibc team I'd argue you find the rest of us are doing something very similar. I never omit (2) and (3), and if you're going to do a scratch build, then you just do this:

(1) Do the merge. (2) Scratch build + ABI verify mode. (3) Verify scratch build + Verify and update ABI. (4) Commit. (5) Do the final build.

The number of steps are the same.

This would need something that dumps out the ABI differences in a base64 blob (similar to what we use to exfiltrate coredumps). Is this included in your patches? I didn't see it there.

Look at patch 3 (cleaned up and matches your suggestion about SONAME verification and storage for ABI files).

There are 3 binary toggles:

- Are we verifying ABI? If so, check it, and issue an error if we don't match. - Are we in ABI warning mode only? If so, convert errors to warnings. Build continues. - Are we in ABI saving mode? If so, collect the result of the ABI in a distinct file and store in the rpms.

Yes, you have to alter the spec file and issue a build with an altered spec file that is in "ABI saving mode + ABI warning mode" and you'll get the best results in term of information about the build itself.

...

I think what you suggests requires local builds, not scratch builds in Koji. This means that one needs to reserve a machine capable of building the relevant Fedora branch. This is currently difficult to automate for various reasons.

I suggest scratch builds in koji, which can be automated, and anyone outside of Red Hat can do in Fedora.

What benefit do local builds have? And if we did do local builds, we don't need Fedora, just a machine that can run a Fedora mock build?

-- Cheers, Carlos.

Florian Weimer

15 May 15 May

6:44 a.m.

I've decided to try a different approach first:

https://bugzilla.redhat.com/show_bug.cgi?id=1578348 https://pagure.io/releng/issue/7498

This would eliminate the need for manual use of “git archive”, “git bundle”, or whatever is used for generating the data that goes in the source RPM. If Fedora releng decides to support this, I suggest that we structure the new patch process around that, and not custom tooling.

(This is independent of where we store the ABI data and where we perform the validation.)

Thanks, Florian

Carlos O'Donell

7:20 a.m.

On 05/15/2018 07:44 AM, Florian Weimer wrote:

...

I've decided to try a different approach first:

https://bugzilla.redhat.com/show_bug.cgi?id=1578348 https://pagure.io/releng/issue/7498

This would eliminate the need for manual use of “git archive”, “git bundle”, or whatever is used for generating the data that goes in the source RPM. If Fedora releng decides to support this, I suggest that we structure the new patch process around that, and not custom tooling.

(This is independent of where we store the ABI data and where we perform the validation.)

Agreed! This looks like a good standardized alternative to your archive/bundle idea.

-- Cheers, Carlos.

2178

Age (days ago)

2490

Last active (days ago)

glibc@lists.fedoraproject.org

14 comments

2 participants

tags (0)

participants (2)

Carlos O'Donell
Florian Weimer