This has also been discussed in a separate thread, but I thought I would place it in a separate post in case people didn't notice.
I'm proposing the SourceURL Guideline be changed as follows:
1. Change GitHub section to reflect Git Hosting Services, so as not to tie the guideline to a specific service 2. Include information on how to handle Git Submodules 3. Clarify when commit hash vs. Git Tag should be used
Here is the Draft document https://fedoraproject.org/wiki/User:Gbcox/PackagingDrafts/SourceURL.
Here is the diff of the draft to the original version https://fedoraproject.org/w/index.php?title=User%3AGbcox%2FPackagingDrafts%2FSourceURL&diff=416463&oldid=416169 .
Thank you in advance for your comments.
Le 26/06/2015 05:45, Gerald B. Cox a écrit :
This has also been discussed in a separate thread, but I thought I would place it in a separate post in case people didn't notice.
I'm proposing the SourceURL Guideline be changed as follows:
- Change GitHub section to reflect Git Hosting Services, so as not to tie the guideline to a specific service
- Include information on how to handle Git Submodules
- Clarify when commit hash vs. Git Tag should be used
Here is the Draft document https://fedoraproject.org/wiki/User:Gbcox/PackagingDrafts/SourceURL.
I strongly believe the "commit hash" method should be preferred as in the current Guidelines (and obviously not a "last solution")
About the "Git Submodules", I think this is directly conflicting with the "no bundled" Guidelines.
If a package use a submodule, this mean it use another project which have to be packaged separately, and used.
Example:
https://github.com/10gen-labs/mongo-php-driver-prototype/tree/master/src => dependency on libbson + libmongc
https://github.com/mongodb/mongo-c-driver/tree/master/src => dependency on libbson etc...
(and the tar command shoud --exclude .git, and the chmod have no sense)
Remi.
Here is the diff of the draft to the original version https://fedoraproject.org/w/index.php?title=User%3AGbcox%2FPackagingDrafts%2FSourceURL&diff=416463&oldid=416169 .
Thank you in advance for your comments.
-- packaging mailing list packaging@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/packaging
Remi,
Thanks so much for taking the time to comment.
On Fri, Jun 26, 2015 at 7:44 AM, Remi Collet Fedora@famillecollet.com wrote:
I strongly believe the "commit hash" method should be preferred as in the current Guidelines (and obviously not a "last solution")
The order that a method appears doesn't denote a preference in and of
itself. Please help me understand something. Why are you so concerned about the use of Git Tags? I have included text which clearly states that if the packager believes that re-tagging is being used, he MUST follow a specific procedure to resolve that issue. If there is a problem with the archive the checksum won't match. What is the harm if we later find that upstream did re-tag? The archive with the embedded commit information is already in the srpm. The act of re-tagging can't change that. We always know the commit hash version of that archive.
About the "Git Submodules", I think this is directly conflicting with the "no bundled" Guidelines.
The fact that upstream uses the submodule capability doesn't mean that the code within that submodule is a bundled library. If it is found to be, then of course the "No Bundled Libraries" guideline would apply.
Le 26/06/2015 17:47, Gerald B. Cox a écrit :
Remi,
Thanks so much for taking the time to comment.
On Fri, Jun 26, 2015 at 7:44 AM, Remi Collet Fedora@famillecollet.com wrote:
I strongly believe the "commit hash" method should be preferred as in the current Guidelines (and obviously not a "last solution")
The order that a method appears doesn't denote a preference in and of
itself. Please help me understand something. Why are you so concerned about the use of Git Tags? I have included text which clearly states that if the packager believes that re-tagging is being used, he MUST follow a specific procedure to resolve that issue.
This is exactly what I named "last solution".
Your proposal switch the Guidelines from "must use hash commit"
to "could use hash commit if no other solution exists, but really you should try something else before"
When I could understand a small relax (s/must/should) but I can't agree with such a big change.
More, I have ~100 packages which use the hash commit. This is really a clean way. Never encounter any issue.
And it is very simple to retrieve Ex, for packagist registered packages, "hash commit" is the reference, see e.g. https://packagist.org/packages/phpunit/phpunit
And it is very simple to run test build (pre/post release) Just bump the commit value (and I do it every day, often because some upstream ask me to run a test build before release)
More, I have lot of packages which use "pecl" upstream archive where I'm slowly moving to github/hash commit solution. Yes! instead of upstream archive, because in some case the archive doesn't include test suite. e.g. php-pecl-mongo.
And because upstream archive often are a mess, e.g. http://news.php.net/php.pecl.dev/13044
I even have to use "git snapshot" (in some case, but seems to be the new way to exclude test from archive, using .gitattributes, damned composer world...) e.g. https://github.com/Andrewsville/PHP-Token-Reflection/issues/68
So, from my past experience, I definitively approved the current Guidelines as the correct one.
"For a number of reasons (immutability, availability, uniqueness), you must use the full commit revision hash when referring to the sources."
Cheers, Remi.
On Fri, Jun 26, 2015 at 9:18 AM, Remi Collet Fedora@famillecollet.com wrote:
This is exactly what I named "last solution".
Your proposal switch the Guidelines from "must use hash commit"
to "could use hash commit if no other solution exists, but really you should try something else before"
First of all, the current guidelines do not say that you must use commit hash, and my text doesn't at all say what you're implying. You're reading in something that simply isn't there.
It would be helpful if you could answer the questions I previously asked. Here they are again:
Why are you so concerned about the use of Git Tags? I have included text which clearly states that if the packager believes that re-tagging is being used, he MUST follow a specific procedure to resolve that issue.
If there is a problem with the archive the checksum won't match. The archive with the embedded commit information is already in the srpm. The act of re-tagging can't change that. We always know the commit hash version of that archive. What is the harm if we later find that upstream did re-tag?
On 06/26/2015 11:18 AM, Remi Collet wrote:
And because upstream archive often are a mess, e.g.http://news.php.net/php.pecl.dev/13044
This is no different than upstream releasing tarballs to an FTP or HTTP directory with the same version yet different contents. Are you suggesting we switch every Fedora package to use a SCM commit ID instead of a release tarball? This is what I see you suggesting.
Banning the use of a legitimate form of release tarball generation because of a handful of offenders is hardly logical.
On Fri, Jun 26, 2015 at 09:33:21AM -0700, Gerald B. Cox wrote:
On Fri, Jun 26, 2015 at 9:18 AM, Remi Collet Fedora@famillecollet.com wrote:
This is exactly what I named "last solution". Your proposal switch the Guidelines from A A A A "must use hash commit" toA A A "could use hash commit if no other solution exists, A A A A but really you should try something else before"
A First of all, the current guidelines do not say that you must use commit hash, and my text doesn't at all say what you're implying.A You're reading in something that simply isn't there. It would be helpful if you could answer the questions I previously asked.A HereA they are again: Why are you so concerned about theA use of Git Tags?A I have included textA which clearly states that if the packagerA believes that re-tagging is being used,A he MUST follow a specific procedureA to resolve that issue.A A If there is a problem with the archive the checksumA won't match. A The archive with the embedded commit information is already in the srpm. A The act of re-tagging can't change that.A We always know the commit hash version of that archive. A What is the harm if we later find that upstream did re-tag?
Maybe because it kinda feels like shooting oneself in the foot: If you think this will hurt, don't do it! Well, no, let's just do the right thing first, the one we can rely on and just not shoot ourselves in the foot to start with :)
The risk is also ending up with a situation where a bunch of packages are using one approach, another bunch are doing something else, and yet a third bunch are doing yet another way because of x, y, z. Tags are a nice git features, but due to the nature of git itself, are a moving target. Relying on it is not a wise thing to do. You may understand the pros and cons, you may know that tags are moving target but do not forget that we have a lot of people in community, including packagers that are not developers. I think have one way of doing things and have this way be the most secure one is better than offering multiple options left at the discretion of people that may or may not have a deep understanding of the stake.
Pierre
Le 26/06/2015 18:33, Gerald B. Cox a écrit :
On Fri, Jun 26, 2015 at 9:18 AM, Remi Collet Fedora@famillecollet.com wrote:
This is exactly what I named "last solution".
Your proposal switch the Guidelines from "must use hash commit"
to "could use hash commit if no other solution exists, but really you should try something else before"
First of all, the current guidelines do not say that you must use commit hash,
Sorry, but we probably don't read the same Guidelines.
my text doesn't at all say what you're implying. You're reading in something that simply isn't there.
Sorry, but this is exactly how I read your proposal.
It would be helpful if you could answer the questions I previously asked. Here they are again:
Why are you so concerned about the use of Git Tags?
Immutability
End of discussion for me. Feel free to play with breaking things which are not broken.
Remi.
On Fri, Jun 26, 2015 at 9:50 AM, Remi Collet Fedora@famillecollet.com wrote:
Why are you so concerned about the use of Git Tags?
Immutability
The commit hash is permanently associated with the archive. You haven't
really answered my question.
On Qui, 2015-06-25 at 20:45 -0700, Gerald B. Cox wrote:
This has also been discussed in a separate thread, but I thought I would place it in a separate post in case people didn't notice.
I'm proposing the SourceURL Guideline be changed as follows:
- Change GitHub section to reflect Git Hosting Services, so as not
to tie the guideline to a specific service 2. Include information on how to handle Git Submodules 3. Clarify when commit hash vs. Git Tag should be used
Here is the Draft document.
Here is the diff of the draft to the original version.
Thank you in advance for your comments.
I'd like ask other thing, I'd like for split the examples. I'd like one section just for github, I think it deserves and one section just for bitbucket.org if you think that is relevant and at last one section to generic git hosting services .
Thanks,
On Qui, 2015-06-25 at 20:45 -0700, Gerald B. Cox wrote:
This has also been discussed in a separate thread, but I thought I would place it in a separate post in case people didn't notice.
I'm proposing the SourceURL Guideline be changed as follows:
- Change GitHub section to reflect Git Hosting Services, so as not
to tie the guideline to a specific service 2. Include information on how to handle Git Submodules 3. Clarify when commit hash vs. Git Tag should be used
Here is the Draft document.
Here is the diff of the draft to the original version.
Thank you in advance for your comments.
I'd like ask other thing, I'd like ask for split the examples. I'd like one section just for github, I think it deserves and one section just for bitbucket.org if you think that is relevant and at last one section to generic git hosting services .
Thanks,
On Fri, Jun 26, 2015 at 9:44 AM, Pierre-Yves Chibon pingou@pingoured.fr wrote:
The risk is also ending up with a situation where a bunch of packages are using one approach, another bunch are doing something else, and yet a third bunch are doing yet another way because of x, y, z. Tags are a nice git features, but due to the nature of git itself, are a moving target. Relying on it is not a wise thing to do. You may understand the pros and cons, you may know that tags are moving target but do not forget that we have a lot of people in community, including packagers that are not developers. I think have one way of doing things and have this way be the most secure one is better than offering multiple options left at the discretion of people that may or may not have a deep understanding of the stake.
Git Tags are not a moving target. Just because some people are abusing them doesn't mean we ban that functionality. The Draft guideline addresses clearly what to do if you believe someone is engaging in re-tagging. The current guideline is silent.
As I mentioned previously, the commit hash is part of the generated archive. That information is never lost, regardless of what upstream does with the Git Tag.
On Fri, Jun 26, 2015 at 10:05 AM, Sérgio Basto sergio@serjux.com wrote:
I'd like ask other thing, I'd like for split the examples. I'd like one section just for github, I think it deserves and one section just for bitbucket.org
if you think that is relevant and at last one section to generic git hosting services .
I thought of that, but I was trying to keep things short and concise and didn't want to duplicate the same text over and over again. All the Git commands I use in the examples are the same among the online services (they all use Git). The difference relates to to the construction of the download string. I included the two most popular services as the example.
On Fri, Jun 26, 2015 at 1:11 PM, Gerald B. Cox gbcox@bzb.us wrote:
On Fri, Jun 26, 2015 at 9:44 AM, Pierre-Yves Chibon pingou@pingoured.fr wrote:
The risk is also ending up with a situation where a bunch of packages are using one approach, another bunch are doing something else, and yet a third bunch are doing yet another way because of x, y, z. Tags are a nice git features, but due to the nature of git itself, are a moving target. Relying on it is not a wise thing to do. You may understand the pros and cons, you may know that tags are moving target but do not forget that we have a lot of people in community, including packagers that are not developers. I think have one way of doing things and have this way be the most secure one is better than offering multiple options left at the discretion of people that may or may not have a deep understanding of the stake.
Git Tags are not a moving target. Just because some people are abusing them doesn't mean we ban that functionality. The Draft guideline addresses clearly what to do if you believe someone is engaging in re-tagging. The current guideline is silent.
The best practices for using git aren't going to be universally applied by all upstreams. They're just not. People make mistakes, people decide to do things their own way, and I don't think it's our responsibility or problem as Fedora packagers to police upsteams for git tagging like it is for more important things like LICENSE files. The wording of the comment for re-tagging upstreams is also pretty onerous:
# Upstream is known to engage in the practice of Re-tagging # and was notified on dd-mm-yyyy that this is not allowed. # Full commit hash is being used since they have not # corrected the situation.
Yikes. Not allowed by whom? Is Fedora now the git police? Does an upstream even care what Fedora thinks of its policies? In the end, it's worth making best practice suggestions to upstream if their moving tags is causing problems for the packager, but I don't think it's worth dropping useful software from the distribution or pasting violation notices in our spec files for something as trivial as a disagreement over git best practices.
The biggest complaint I have with this draft is that it introduces a lot of extra unnecessary work for packagers. Anyone packaging software coming from a git repository first has to figure out if they're able to use tags or not, based on a very opaque criteria. How do you know that a project is moving tags? Can you audit it before you package it? How much work does that take? Further, once you do package something, how do you make sure that the tag you used in your source URL doesn't change? Do you need to check your source checksum vs upstream when making point releases? Or do you check it monthly or weekly? Is there any way to automate this process for people that maintain a lot of packages?
The current guidelines are good for a number of reasons, including * There is **one** way to get releases from github * There are code snippets to copy/paste to make life easier * The rpm source URL will ALWAYS point to the code you used to generate the RPM
I don't think your proposed guidelines are an improvement to the current guidelines. They're creating a lot of extra work and complexity just to make things look nicer in some cases.
As I mentioned previously, the commit hash is part of the generated archive. That information is never lost, regardless of what upstream does with the Git Tag.
How are you generating archives? If you generate them with a url like github.com/$USER/%{name}/archive/%{name}-%{tag}.tar.gz2, then there's no commit hash anywhere in the archive (at least that I can see.) If you're generating the archives a different way, you need to include an example of how to do so in your guideline draft. The "Git Tags" section of the draft provides no guidance for how to actually assemble a source URL from a git tag. It just says that tags can be formatted differently project to project, and then goes on to talk about tags moving.
Rich
On Fri, Jun 26, 2015 at 1:11 PM, Gerald B. Cox gbcox@bzb.us wrote:
On Fri, Jun 26, 2015 at 9:44 AM, Pierre-Yves Chibon pingou@pingoured.fr wrote:
The risk is also ending up with a situation where a bunch of packages are using one approach, another bunch are doing something else, and yet a third bunch are doing yet another way because of x, y, z. Tags are a nice git features, but due to the nature of git itself, are a moving target. Relying on it is not a wise thing to do. You may understand the pros and cons, you may know that tags are moving target but do not forget that we have a lot of people in community, including packagers that are not developers. I think have one way of doing things and have this way be the most secure one is better than offering multiple options left at the discretion of people that may or may not have a deep understanding of the stake.
Git Tags are not a moving target. Just because some people are abusing them doesn't mean we ban that functionality. The Draft guideline addresses clearly what to do if you believe someone is engaging in re-tagging. The current guideline is silent.
The best practices for using git aren't going to be universally applied by all upstreams. They're just not. People make mistakes, people decide to do things their own way, and I don't think it's our responsibility or problem as Fedora packagers to police upsteams for git tagging like it is for more important things like LICENSE files. The wording of the comment for re-tagging upstreams is also pretty onerous:
# Upstream is known to engage in the practice of Re-tagging # and was notified on dd-mm-yyyy that this is not allowed. # Full commit hash is being used since they have not # corrected the situation.
Yikes. Not allowed by whom? Is Fedora now the git police? Does an upstream even care what Fedora thinks of its policies? In the end, it's worth making best practice suggestions to upstream if their moving tags is causing problems for the packager, but I don't think it's worth dropping useful software from the distribution or pasting violation notices in our spec files for something as trivial as a disagreement over git best practices.
The biggest complaint I have with this draft is that it introduces a lot of extra unnecessary work for packagers. Anyone packaging software coming from a git repository first has to figure out if they're able to use tags or not, based on a very opaque criteria. How do you know that a project is moving tags? Can you audit it before you package it? How much work does that take? Further, once you do package something, how do you make sure that the tag you used in your source URL doesn't change? Do you need to check your source checksum vs upstream when making point releases? Or do you check it monthly or weekly? Is there any way to automate this process for people that maintain a lot of packages?
The current guidelines are good for a number of reasons, including * There is **one** way to get releases from github * There are code snippets to copy/paste to make life easier * The rpm source URL will ALWAYS point to the code you used to generate the RPM
I don't think your proposed guidelines are an improvement to the current guidelines. They're creating a lot of extra work and complexity just to make things look nicer in some cases.
As I mentioned previously, the commit hash is part of the generated archive. That information is never lost, regardless of what upstream does with the Git Tag.
How are you generating archives? If you generate them with a url like github.com/$USER/%{name}/archive/%{name}-%{tag}.tar.gz2, then there's no commit hash anywhere in the archive (at least that I can see.) If you're generating the archives a different way, you need to include an example of how to do so in your guideline draft. The "Git Tags" section of the draft provides no guidance for how to actually assemble a source URL from a git tag. It just says that tags can be formatted differently project to project, and then goes on to talk about tags moving.
Rich
On Fri, Jun 26, 2015 at 13:49:10 -0400, Rich Mattes richmattes@gmail.com wrote:
The biggest complaint I have with this draft is that it introduces a lot of extra unnecessary work for packagers. Anyone packaging software coming from a git repository first has to figure out if they're able to use tags or not, based on a very opaque criteria. How do you know that a project is moving tags? Can you audit it before you package it? How much work does that take? Further, once you do package something, how do you make sure that the tag you used in your source URL doesn't change? Do you need to check your source checksum vs upstream when making point releases? Or do you check it monthly or weekly? Is there any way to automate this process for people that maintain a lot of packages?
Someone used to run something that would check that the files in packages' source urls were still fetchable and hadn't changed. I haven't seen a report from this process in a long time, so it might not be run any more.
As long as github is returning the same bits when it regenerates an archive file for the same tag (when the tag doesn't change) I think using tags in URLs is good enough. (Because in theory we can automate checking for changes without getting lots of false positives.) This appears to be how things work now.
On Fri, Jun 26, 2015 at 10:49 AM, Rich Mattes richmattes@gmail.com wrote:
The best practices for using git aren't going to be universally applied by all upstreams. They're just not. People make mistakes, people decide to do things their own way, and I don't think it's our responsibility or problem as Fedora packagers to police upsteams for git tagging like it is for more important things like LICENSE files. ...
Yikes. Not allowed by whom? Is Fedora now the git police? Does an upstream even care what Fedora thinks of its policies?
ROFL...Rich, excellent point. I put that in because there had been some feedback concerning re-tagging. If we're going to do something about it I think it is more helpful to try to educate upstream than to just issue a blanket ban in Fedora. I have no problem whatsoever in changing that instruction. I do think it is helpful to put into the Spec file some type of comment that a particular upstream is re-tagging, if we believe it is worthwhile. I'm still trying to understand the real harm to Fedora if somehow a re-tagged package slips through.
The biggest complaint I have with this draft is that it introduces a lot of extra unnecessary work for packagers. Anyone packaging software coming from a git repository first has to figure out if they're able to use tags or not, based on a very opaque criteria. How do you know that a project is moving tags? Can you audit it before you package it? How much work does that take? Further, once you do package something, how do you make sure that the tag you used in your source URL doesn't change? Do you need to check your source checksum vs upstream when making point releases? Or do you check it monthly or weekly? Is there any way to automate this process for people that maintain a lot of packages?
Good point, again, I put that text in there because some people were concerned about the harm to Fedora with re-tagging. I believe this would mainly come up in the package review process, you would know if someone had re-tagged because the checksum wouldn't match when fedora-review did the check. Yes, that would only be for that point in time. It wasn't my intent to have the packager continually audit to check for re-tagging.
The current guidelines are good for a number of reasons, including
- There is **one** way to get releases from github
- There are code snippets to copy/paste to make life easier
- The rpm source URL will ALWAYS point to the code you used to generate
the RPM
That still holds true in my Draft; but it makes clear that we're talking
about Git, not just GitHub.
I don't think your proposed guidelines are an improvement to the current guidelines. They're creating a lot of extra work and complexity just to make things look nicer in some cases.
My intent was to clarify the current guidelines and bring them up-to-date and clear up some misleading statements.
As I mentioned previously, the commit hash is part of the generated
archive. That information
is never lost, regardless of what upstream does with the Git Tag.
How are you generating archives? If you generate them with a url like github.com/$USER/%{name}/archive/%{name}-%{tag}.tar.gz2
, then there's no commit hash anywhere in the archive (at least that I can see.) If you're generating the archives a different way, you need to include an example of how to do so in your guideline draft.
You obtain the commit hash via git get-tar-commit-id < $tar_file_name Remember to execute this command against the tar file and not the compressed archive.
The "Git Tags" section of the draft provides no guidance for how to actually assemble a source URL from a git tag. It just says that tags can be formatted differently project to project, and then goes on to talk about tags moving.
I thought it was intuitive and didn't need an example... but I have no problem whatsoever with adding an example if folks think it would be helpful.
On Sex, 2015-06-26 at 10:17 -0700, Gerald B. Cox wrote:
On Fri, Jun 26, 2015 at 10:05 AM, Sérgio Basto sergio@serjux.com wrote: I'd like ask other thing, I'd ask like for split the examples. I'd like one section just for github, I think it deserves and one section just for bitbucket.org
if you think that is relevant and at last one section to generic git hosting services .
I thought of that, but I was trying to keep things short and concise and didn't want to duplicate the same text over and over again. All the Git commands I use in the examples are the same among the online services (they all use Git). The difference relates to to the construction of the download string. I included the two most popular services as the example.
Hi, I think we should though not in our work but the work for the packager and will be more simple for a packager have splitted examples , maybe be in a second phase.
Anyway I'd like optimize this line: Instead : Source0: https://github.com/$OWNER/$PROJECT/archive/%%7Bcommit%7D/$PROJECT-%%7Bcommit...
Source0: https://github.com/$OWNER/$PROJECT/archive/%%7Bcommit%7D/$PROJECT-%%7Bshortc...
Best regards,
On Fri, 26 Jun 2015 13:06:33 -0500 Bruno Wolff III bruno@wolff.to wrote:
Someone used to run something that would check that the files in packages' source urls were still fetchable and hadn't changed. I haven't seen a report from this process in a long time, so it might not be run any more.
Yeah, that was me. I just haven't had time to do it.
I was hoping someday we could make it a taskotron check, but thats not yet happened.
http://www.scrye.com/~kevin/fedora/sourcecheck/
has the output of the runs and the shell script I used to download and check against the uploaded versions. Someone else could run this anytime they liked. ;)
kevin
On Fri, Jun 26, 2015 at 2:37 PM, Kevin Fenzi kevin@scrye.com wrote:
Yeah, that was me. I just haven't had time to do it.
Hey Kevin,
Maybe you can explain something to me because I'm not getting it. Why is this considered a significant issue? For example, someone downloads project-tag1 using Git. tag1 is effectively %{version}
The tar file which is downloaded is permanently associated with 40...character...tag..a
Then at sometime down the road, upstream decides oh, I made a mistake, I now want to associate project-tag1 with 40...character...tag..b (even though that is considered "insane")
In the srpm, we still have 40...character...tag..a as the commit hash associated to %{version}
The only thing I can imagine is that the next release of that package in Fedora would just increment the Release tag by 1 and leave the %{version} the same.
Why is this such a big issue within the Fedora community?
I'm re-thinking the fact I added all that text regarding re-tagging. Yes, it's bad, but as someone pointed out, we're not the Git police - and even though some people believe it is pervasive, I consider that anecdotal. Unless I can understand more about the harmful impact, I believe I'm just causing more confusion by discussing it.
On Fri, Jun 26, 2015 at 2:32 PM, Sérgio Basto sergio@serjux.com wrote:
Instead : Source0:
https://github.com/$OWNER/$PROJECT/archive/%%7Bcommit%7D/$PROJECT-%%7Bcommit...
Source0:
https://github.com/$OWNER/$PROJECT/archive/%%7Bcommit%7D/$PROJECT-%%7Bshortc...
Sergio,
You mean you don't like the full 40-character-hash? ;-)
I actually thought of that since the full 40-character hash is stored in the tar file anyway; the issue however is that GitHub doesn't care if you provide the short value or not. The file you get will be named with the full 40-character hash. Since that is the case, I left it using the %{commit}. Now, if you want to rename the file being downloaded, you can do that by appending #/$newname as shown in the Bitbucket example.
Based upon feedback I've restructured the Draft.
1. I've removed all references to re-tagging. I believe it was adding unnecessary complexity. 2. I've restructured to list the commit revision method first. 3. The Git Tag section is now better compartmentalized to make it simple to remove if required.
On Fri, 26 Jun 2015 16:11:02 -0700 "Gerald B. Cox" gbcox@bzb.us wrote:
On Fri, Jun 26, 2015 at 2:37 PM, Kevin Fenzi kevin@scrye.com wrote:
Yeah, that was me. I just haven't had time to do it.
Hey Kevin,
Maybe you can explain something to me because I'm not getting it.
Well, I was answering the question about the old script I used to run, I actually didn't say anything about tags. ;)
The sourcecheck script I ran would check the source(s) that had been uploaded to our lookaside cache against the upstream version (where the spec had a full url to the source we could download and checksum) For the cases where there was not a full url, we just skipped that package.
If the downloaded source was the same in the lookaside, great. If the source wasn't downloadable from that url it likely moved or there was a problem with upstream's site. In the final case, if the checksum differed it meant that the maintainer made a mistake uploading or upstream changed the same release after it was released.
Why is this considered a significant issue? For example, someone downloads project-tag1 using Git. tag1 is effectively %{version}
The tar file which is downloaded is permanently associated with 40...character...tag..a
Then at sometime down the road, upstream decides oh, I made a mistake, I now want to associate project-tag1 with 40...character...tag..b (even though that is considered "insane")
In the srpm, we still have 40...character...tag..a as the commit hash associated to %{version}
The only thing I can imagine is that the next release of that package in Fedora would just increment the Release tag by 1 and leave the %{version} the same.
Why is this such a big issue within the Fedora community?
Because you have multiple things with the same name.
Say upstream releases project v1 with tag1. Maintainer downloads it, builds it and sends it to users. Now upstream decides they want to move the tag on v2 to tag2.
Now you and all fedora users think v1 is tag1. Upstream thinks v1 is tag2. When all these people talk they get confused. Upstream might say: "oh, we fixed that and moved the tag", but then what does the fedora maintainer do? v1-2? What do people reporting bugs report against? How can you tell how long a security vulnerability has been out if it was in v1?
I'm re-thinking the fact I added all that text regarding re-tagging. Yes, it's bad, but as someone pointed out, we're not the Git police - and even though some people believe it is pervasive, I consider that anecdotal. Unless I can understand more about the harmful impact, I believe I'm just causing more confusion by discussing it.
I think telling upstream how to use tags isn't that great, but also I don't think we should depend on them.
I absolutely think it's fine for up to tell upstreams that release the same version with different content different times is bad and that they should not do it. It causes pain for everyone, themselves included.
kevin
On Fri, Jun 26, 2015 at 9:29 PM, Kevin Fenzi kevin@scrye.com wrote:
Upstream might say: "oh, we fixed that and moved the tag", but then what does the fedora maintainer do? v1-2? What do people reporting bugs report against? How can you tell how long a security vulnerability has been out if it was in v1?
Got it. Thanks much for taking the time to explain that. I changed the draft about 30 minutes ago to take out the references to re-tagging. It was just complicating things. My main goal was to add information on submodules, make it a bit more vendor neutral, and bring it up to date as far as Git was concerned. The Git Tag section is now structured to make it easy to remove if required.
On Sex, 2015-06-26 at 17:14 -0700, Gerald B. Cox wrote:
On Fri, Jun 26, 2015 at 2:32 PM, Sérgio Basto sergio@serjux.com wrote: Instead : Source0: https://github.com/$OWNER/$PROJECT/archive/%%7Bcommit%7D/$PROJECT-%%7Bcommit...
Source0: https://github.com/$OWNER/$PROJECT/archive/%{commit}/$PROJECT-%{shortcommit}.tar.gz
Sergio,
You mean you don't like the full 40-character-hash? ;-)
Yes, we can avoid file names with 40-character-hash.
I actually thought of that since the full 40-character hash is stored in the tar file anyway; the issue however is that GitHub doesn't care if you provide the short value or not. The file you get will be named with the full 40-character hash. Since that is the case, I left it using the %{commit}. Now, if you want to rename the file being downloaded, you can do that by appending #/$newname as shown in the Bitbucket example.
The only difference is the filename length instead we have: audacity-dea351aa4820efd7ce8c2254930f942a6590472b.tar.gz we will have a filename much shorter: audacity-dea351a.tar.gz
The content of tar.gz is exactly the same and the rest of spec doesn't change. This suggestion was inspired on [1].
[1] http://pkgs.fedoraproject.org/cgit/simarrange.git/tree/simarrange.spec
Thanks,
On Jun 26, 2015 9:30 PM, "Kevin Fenzi" kevin@scrye.com wrote:
In the final case, if the checksum differed it meant that the maintainer made a mistake uploading or upstream changed the same release after it was released.
Or somewhere between upstream and us the tarball was modified (someone hacked github, someone gained commit to upstream and then tried top cover their tracks, a malicious package maintainer on our side, etc) This is the case that we definitely want to raise warning flags about.
-Toshio
On Sat, Jun 27, 2015 at 7:32 AM, Sérgio Basto sergio@serjux.com wrote:
You mean you don't like the full 40-character-hash? ;-)
Yes, we can avoid file names with 40-character-hash.
The only difference is the filename length instead we have: audacity-dea351aa4820efd7ce8c2254930f942a6590472b.tar.gz we will have a filename much shorter: audacity-dea351a.tar.gz
The content of tar.gz is exactly the same and the rest of spec doesn't change. This suggestion was inspired on [1].
Sergio, I was kind of joking with my comment, I don't particularly like them either. ;-)
I don't have a problem with all doing that, the full hash can be extracted from the tar file if one so desires. I updated the draft, the one thing I changed from your example was I added a #/ to make it clear it was a renaming.
packaging@lists.fedoraproject.org