Hi.
I have created new tool license-validate
https://pagure.io/copr/license-validate/
And I packaged it for Fedora. Here is the review request:
https://bugzilla.redhat.com/show_bug.cgi?id=2035680
The goal of this tool is to validate the string in the License tag in the spec file. I.e.
LIcense: GPLv2
---------------^^^^ this part only
It doe **not** check if it actually agree with the actual code or even %license file. We have `licensecheck` for that.
The Fedora's package already contains list of license from Licensing:Main and you can run it as
$ license-validate-v'GPLv1 or (MIT and BSD)' Approved license
or
$ license-validate-v'GPL or (MIT and BSD)' No terminal defined for 'G' at line 1 col 1
GPL or (MIT and BSD)
...
Not a valid license string
which fails because GPL is not valid short name.
My next goal will be to download all Fedora's spec files, extract the license line and run it through this script. But I am going to be few days offline, so anyone who want step in QE shoes can do that - I will not be mad :)
Comments are welcomed.
Miroslav
On Sun, Dec 26, 2021 at 10:08:24PM +0100, Miroslav Suchý wrote:
My next goal will be to download all Fedora's spec files, extract the license line and run it through this script. But I am going to be few days offline, so anyone who want step in QE shoes can do that
- I will not be mad :)
I would suggest holding off on that, as we are working on updating the guidelines to use SPDX identifiers (and therefore SPDX expressions).
On Sun, Dec 26, 2021 at 4:46 PM Matthew Miller mattdm@fedoraproject.org wrote:
On Sun, Dec 26, 2021 at 10:08:24PM +0100, Miroslav Suchý wrote:
My next goal will be to download all Fedora's spec files, extract the license line and run it through this script. But I am going to be few days offline, so anyone who want step in QE shoes can do that
- I will not be mad :)
I would suggest holding off on that, as we are working on updating the guidelines to use SPDX identifiers (and therefore SPDX expressions).
SPDX expression logic is identical to Fedora's, so that will not change. The identifiers will be changing in phases, so the tool is useful today and it is definitely worth working through now.
Dne 27. 12. 21 v 0:44 Neal Gompa napsal(a):
I would suggest holding off on that, as we are working on updating the guidelines to use SPDX identifiers (and therefore SPDX expressions).
Any ETA?
SPDX expression logic is identical to Fedora's, so that will not change. The identifiers will be changing in phases, so the tool is useful today and it is definitely worth working through now.
SPDX expression [1] are actually slightly better, because they have defined BNF grammar.
Currently used strings in the License tag, does not use any grammar at all. Every tool I know threat it as a string.
IIRC This [2] is the first approach to define the grammar. If/When me to SPDX expression, then we just update the file with grammar and (as Neal pointed out) everything else will not change.
With SPDX expression, we can use other tools. But e.g., `license-expression` [3] uses its own parser. And therefore is looong (2k lines of code). While license-validate has only 40 lines long.
[1 ]https://spdx.github.io/spdx-spec/v2-draft/SPDX-license-expressions/
[2] https://pagure.io/copr/license-validate/blob/main/f/grammar.lark
[3] https://github.com/nexB/license-expression
Miroslav
Neal Gompa ngompa13@gmail.com writes:
SPDX expression logic is identical to Fedora's, so that will not change.
I don't believe that's correct.
For instance, for the LGPL, SPDX uses "LGPL-2.0-only" and "LGPL-2.0-or-later", while Fedora currently uses "LGPLv2" and "LGPLv2+".
(From https://spdx.org/licenses/ and https://fedoraproject.org/wiki/Licensing:Main )
Be well, --Robbie
On Tue, Jan 4, 2022 at 2:25 PM Robbie Harwood rharwood@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
SPDX expression logic is identical to Fedora's, so that will not change.
I don't believe that's correct.
For instance, for the LGPL, SPDX uses "LGPL-2.0-only" and "LGPL-2.0-or-later", while Fedora currently uses "LGPLv2" and "LGPLv2+".
(From https://spdx.org/licenses/ and https://fedoraproject.org/wiki/Licensing:Main )
Those are the identifiers, not the *logic*. SPDX and Fedora both use the same boolean logic terms ("and"/"or"/"with") and support parenthetical expressions. Fedora mandates lowercase, SPDX doesn't care, but examples historically are uppercase. Fedora will retain its expression logic system, complete with lowercase terms (since that makes the expressions more readable).
On Tue, Jan 04, 2022 at 02:40:06PM -0500, Neal Gompa wrote:
On Tue, Jan 4, 2022 at 2:25 PM Robbie Harwood rharwood@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
SPDX expression logic is identical to Fedora's, so that will not change.
I don't believe that's correct.
For instance, for the LGPL, SPDX uses "LGPL-2.0-only" and "LGPL-2.0-or-later", while Fedora currently uses "LGPLv2" and "LGPLv2+".
(From https://spdx.org/licenses/ and https://fedoraproject.org/wiki/Licensing:Main )
Those are the identifiers, not the *logic*. SPDX and Fedora both use the same boolean logic terms ("and"/"or"/"with") and support parenthetical expressions. Fedora mandates lowercase, SPDX doesn't care, but examples historically are uppercase. Fedora will retain its expression logic system, complete with lowercase terms (since that makes the expressions more readable).
One of the difficult things with the Fedora abbreviations is that tokens can have spaces in them. For example, the Apache 2.0 license in Fedora is called "ASL 2.0". This makes it really hard to work with in software.
Likewise, we have historically allowed full expressions through that contain otherwise forbidden licenses. For example, many Perl module packages use the License tag "GPL+ or Artistic" so in a way that entire expression is treated as a token.
This information is currently captured in this JSON file (not the original author, but I make use of the file):
https://github.com/rpminspect/rpminspect-data-fedora/blob/master/licenses/fe...
rpminspect's license check uses this data to validate the License tag in RPM headers based on the rules as they exist in the packaging guidelines plus the assorted expressions we have historically allowed through that would not otherwise validate.
If your License tag fails the check in rpminspect, it will report the unapproved token based on the fedora.json file it read.
All of this is to say that the ongoing effort to permit SPDX expressions in the License is to make this inspection more predictable and Fedora's License tags more useful.
Thanks,
Dne 04. 01. 22 v 21:03 David Cantrell napsal(a):
One of the difficult things with the Fedora abbreviations is that tokens can have spaces in them. For example, the Apache 2.0 license in Fedora is called "ASL 2.0". This makes it really hard to work with in software.
Likewise, we have historically allowed full expressions through that contain otherwise forbidden licenses. For example, many Perl module packages use the License tag "GPL+ or Artistic" so in a way that entire expression is treated as a token.
This information is currently captured in this JSON file (not the original author, but I make use of the file):
https://github.com/rpminspect/rpminspect-data-fedora/blob/master/licenses/fe...
rpminspect's license check uses this data to validate the License tag in RPM headers based on the rules as they exist in the packaging guidelines plus the assorted expressions we have historically allowed through that would not otherwise validate.
*nod*
The string
'GPL+ or Artistic or MIT'
evaluates license-validate as correct, while rpminspect results that as bad license.
Miroslav
On Wed, Jan 05, 2022 at 02:59:33AM +0100, Miroslav Suchý wrote:
Dne 04. 01. 22 v 21:03 David Cantrell napsal(a):
One of the difficult things with the Fedora abbreviations is that tokens can have spaces in them. For example, the Apache 2.0 license in Fedora is called "ASL 2.0". This makes it really hard to work with in software.
Likewise, we have historically allowed full expressions through that contain otherwise forbidden licenses. For example, many Perl module packages use the License tag "GPL+ or Artistic" so in a way that entire expression is treated as a token.
This information is currently captured in this JSON file (not the original author, but I make use of the file):
https://github.com/rpminspect/rpminspect-data-fedora/blob/master/licenses/fe...
rpminspect's license check uses this data to validate the License tag in RPM headers based on the rules as they exist in the packaging guidelines plus the assorted expressions we have historically allowed through that would not otherwise validate.
*nod*
The string
'GPL+ or Artistic or MIT'
evaluates license-validate as correct, while rpminspect results that as bad license.
But this expression is not valid. It would be valid as
(GPL+ or Artistic) or MIT
On Mon, Jan 10, 2022 at 5:41 PM David Cantrell dcantrell@redhat.com wrote:
On Wed, Jan 05, 2022 at 02:59:33AM +0100, Miroslav Suchý wrote:
Dne 04. 01. 22 v 21:03 David Cantrell napsal(a):
One of the difficult things with the Fedora abbreviations is that tokens can have spaces in them. For example, the Apache 2.0 license in Fedora is called "ASL 2.0". This makes it really hard to work with in software.
Likewise, we have historically allowed full expressions through that contain otherwise forbidden licenses. For example, many Perl module packages use the License tag "GPL+ or Artistic" so in a way that entire expression is treated as a token.
This information is currently captured in this JSON file (not the original author, but I make use of the file):
https://github.com/rpminspect/rpminspect-data-fedora/blob/master/licenses/fe...
rpminspect's license check uses this data to validate the License tag in RPM headers based on the rules as they exist in the packaging guidelines plus the assorted expressions we have historically allowed through that would not otherwise validate.
*nod*
The string
'GPL+ or Artistic or MIT'
evaluates license-validate as correct, while rpminspect results that as bad license.
But this expression is not valid. It would be valid as
(GPL+ or Artistic) or MIT
That's probably splitting hairs. The "and" and "or" boolean connectives are associative, so the parentheses can be dropped without losing information. Something different would be mixing "and" and "or", then the parentheses are necessary to preserve structural information. But for a list of only-"or"-ed or only-"and"-ed licenses, no parentheses are necessary.
Fabio
On Tue, Jan 11, 2022 at 11:32:12AM +0100, Fabio Valentini wrote:
On Mon, Jan 10, 2022 at 5:41 PM David Cantrell dcantrell@redhat.com wrote:
On Wed, Jan 05, 2022 at 02:59:33AM +0100, Miroslav Suchý wrote:
Dne 04. 01. 22 v 21:03 David Cantrell napsal(a):
One of the difficult things with the Fedora abbreviations is that tokens can have spaces in them. For example, the Apache 2.0 license in Fedora is called "ASL 2.0". This makes it really hard to work with in software.
Likewise, we have historically allowed full expressions through that contain otherwise forbidden licenses. For example, many Perl module packages use the License tag "GPL+ or Artistic" so in a way that entire expression is treated as a token.
This information is currently captured in this JSON file (not the original author, but I make use of the file):
https://github.com/rpminspect/rpminspect-data-fedora/blob/master/licenses/fe...
rpminspect's license check uses this data to validate the License tag in RPM headers based on the rules as they exist in the packaging guidelines plus the assorted expressions we have historically allowed through that would not otherwise validate.
*nod*
The string
'GPL+ or Artistic or MIT'
evaluates license-validate as correct, while rpminspect results that as bad license.
But this expression is not valid. It would be valid as
(GPL+ or Artistic) or MIT
That's probably splitting hairs. The "and" and "or" boolean connectives are associative, so the parentheses can be dropped without losing information. Something different would be mixing "and" and "or", then the parentheses are necessary to preserve structural information. But for a list of only-"or"-ed or only-"and"-ed licenses, no parentheses are necessary.
Correct when taking the tokens individually. I was basing my reply on the fact that Fedora had approved "GPL+ or Artistic" as a single token. Separately GPL+ is approved and Artistic is not approved. Again, just looking at the data as found in spec files and the approved license list.
Since this thread, this topic has been discussed and addressing the inconsistency here is part of the license data cleanup effort.
Thanks,
Neal Gompa ngompa13@gmail.com writes:
On Tue, Jan 4, 2022 at 2:25 PM Robbie Harwood rharwood@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
SPDX expression logic is identical to Fedora's, so that will not change.
I don't believe that's correct.
For instance, for the LGPL, SPDX uses "LGPL-2.0-only" and "LGPL-2.0-or-later", while Fedora currently uses "LGPLv2" and "LGPLv2+".
(From https://spdx.org/licenses/ and https://fedoraproject.org/wiki/Licensing:Main )
Those are the identifiers, not the *logic*. SPDX and Fedora both use the same boolean logic terms ("and"/"or"/"with") and support parenthetical expressions. Fedora mandates lowercase, SPDX doesn't care, but examples historically are uppercase. Fedora will retain its expression logic system, complete with lowercase terms (since that makes the expressions more readable).
Fine, but that's misleading in this context. Right now the tool flags (and therefore bugs have been filed for) the identifiers as well.
Be well, --Robbie
On Tue, Jan 4, 2022 at 3:10 PM Robbie Harwood rharwood@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
On Tue, Jan 4, 2022 at 2:25 PM Robbie Harwood rharwood@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
SPDX expression logic is identical to Fedora's, so that will not change.
I don't believe that's correct.
For instance, for the LGPL, SPDX uses "LGPL-2.0-only" and "LGPL-2.0-or-later", while Fedora currently uses "LGPLv2" and "LGPLv2+".
(From https://spdx.org/licenses/ and https://fedoraproject.org/wiki/Licensing:Main )
Those are the identifiers, not the *logic*. SPDX and Fedora both use the same boolean logic terms ("and"/"or"/"with") and support parenthetical expressions. Fedora mandates lowercase, SPDX doesn't care, but examples historically are uppercase. Fedora will retain its expression logic system, complete with lowercase terms (since that makes the expressions more readable).
Fine, but that's misleading in this context. Right now the tool flags (and therefore bugs have been filed for) the identifiers as well.
Yes, and those should still happen. I'm saying that what license-validate *does* and the code written for it will be useful with SPDX identifiers or Fedora ones, because the meat of the tool (the logical parsing and handling) would be the same.
On Tue, Jan 04, 2022 at 03:21:20PM -0500, Neal Gompa wrote:
On Tue, Jan 4, 2022 at 3:10 PM Robbie Harwood rharwood@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
On Tue, Jan 4, 2022 at 2:25 PM Robbie Harwood rharwood@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
SPDX expression logic is identical to Fedora's, so that will not change.
I don't believe that's correct.
For instance, for the LGPL, SPDX uses "LGPL-2.0-only" and "LGPL-2.0-or-later", while Fedora currently uses "LGPLv2" and "LGPLv2+".
(From https://spdx.org/licenses/ and https://fedoraproject.org/wiki/Licensing:Main )
Those are the identifiers, not the *logic*. SPDX and Fedora both use the same boolean logic terms ("and"/"or"/"with") and support parenthetical expressions. Fedora mandates lowercase, SPDX doesn't care, but examples historically are uppercase. Fedora will retain its expression logic system, complete with lowercase terms (since that makes the expressions more readable).
Fine, but that's misleading in this context. Right now the tool flags (and therefore bugs have been filed for) the identifiers as well.
Yes, and those should still happen. I'm saying that what license-validate *does* and the code written for it will be useful with SPDX identifiers or Fedora ones, because the meat of the tool (the logical parsing and handling) would be the same.
I feel like I'm missing something, but rpminspect has been doing what license-validate does for years now. It's ready for SPDX expressions. Results show up for Fedora builds in Zuul. Or you can run it locally.
Thanks,
Dne 04. 01. 22 v 21:33 David Cantrell napsal(a):
I feel like I'm missing something, but rpminspect has been doing what license-validate does for years now. It's ready for SPDX expressions. Results show up for Fedora builds in Zuul. Or you can run it locally.
Here is my motivation:
We are automating packing - hundred thousands libraries to be automatically packaged. But some of them have restrictive license. For that we need **simple** tool.
rpminspect is big beast for that. I learned that I can run
rpminspect-fedora-Fjson-Tlicense foo.src.rpm
But I need something which can evaluate just the string.
Miroslav
On Wed, Jan 05, 2022 at 03:10:47AM +0100, Miroslav Suchý wrote:
Dne 04. 01. 22 v 21:33 David Cantrell napsal(a):
I feel like I'm missing something, but rpminspect has been doing what license-validate does for years now. It's ready for SPDX expressions. Results show up for Fedora builds in Zuul. Or you can run it locally.
Here is my motivation:
We are automating packing - hundred thousands libraries to be automatically packaged. But some of them have restrictive license. For that we need **simple** tool.
rpminspect is big beast for that. I learned that I can run
rpminspect-fedora-Fjson-Tlicense foo.src.rpm
But I need something which can evaluate just the string.
That's fair. I've tried to make rpminspect as simple as possible. It is just a CLI tool you install and run. What you're trying to do is not out of scope of rpminspect. I could add another CLI tool called licenseck that just checks the License tags in a given spec file.
My main concern here is you have introduced another License tag parser that produces different results and uses a different set of source data than other tools. There is an ongoing project to both normalize and centralize the License tag data. This is to replace the wiki as the source.
Neal Gompa ngompa13@gmail.com writes:
On Tue, Jan 4, 2022 at 3:10 PM Robbie Harwood rharwood@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
On Tue, Jan 4, 2022 at 2:25 PM Robbie Harwood rharwood@redhat.com wrote:
Neal Gompa ngompa13@gmail.com writes:
SPDX expression logic is identical to Fedora's, so that will not change.
I don't believe that's correct.
For instance, for the LGPL, SPDX uses "LGPL-2.0-only" and "LGPL-2.0-or-later", while Fedora currently uses "LGPLv2" and "LGPLv2+".
(From https://spdx.org/licenses/ and https://fedoraproject.org/wiki/Licensing:Main )
Those are the identifiers, not the *logic*. SPDX and Fedora both use the same boolean logic terms ("and"/"or"/"with") and support parenthetical expressions. Fedora mandates lowercase, SPDX doesn't care, but examples historically are uppercase. Fedora will retain its expression logic system, complete with lowercase terms (since that makes the expressions more readable).
Fine, but that's misleading in this context. Right now the tool flags (and therefore bugs have been filed for) the identifiers as well.
Yes, and those should still happen.
I don't see why that would be the case, given that as Matthew Miller mentioned upthread the plan is to move to SPDX identifiers in the distro. That seems like useless churn and waste of maintainer time.
Be well, --Robbie
Dec 26, 2021 4:46:12 PM Matthew Miller mattdm@fedoraproject.org:
I would suggest holding off on that, as we are working on updating the guidelines to use SPDX identifiers (and therefore SPDX expressions).
I'm happy that we're moving in this direction. It's confusing to have two different sets of license identifiers to remember, and Fedora shouldn't be reinventing the wheel. Am I allowed to switch my existing packages to use the SPDX license identifiers, or should I hold off?
On 27/12/2021 01:16, Maxwell G (@gotmax23) via devel wrote:
Am I allowed to switch my existing packages to use the SPDX license identifiers, or should I hold off?
You should wait until this feature is fully accepted accepted and merged.
Tell me when you get the rpm up so I can add it: https://release-monitoring.org/project/226644/
Dne 26. 12. 21 v 22:45 Matthew Miller napsal(a):
On Sun, Dec 26, 2021 at 10:08:24PM +0100, Miroslav Suchý wrote:
My next goal will be to download all Fedora's spec files, extract the license line and run it through this script. But I am going to be few days offline, so anyone who want step in QE shoes can do that
- I will not be mad :)
I would suggest holding off on that, as we are working on updating the guidelines to use SPDX identifiers (and therefore SPDX expressions).
So far the deviations are mostly typos. So audit with currently used short names will easy the later transition.
I have started doing the audit. All the deviation will be tracked here:
https://bugzilla.redhat.com/show_bug.cgi?id=2035991
Miroslav
On 29. 12. 21 13:42, Miroslav Suchý wrote:
Dne 26. 12. 21 v 22:45 Matthew Miller napsal(a):
On Sun, Dec 26, 2021 at 10:08:24PM +0100, Miroslav Suchý wrote:
My next goal will be to download all Fedora's spec files, extract the license line and run it through this script. But I am going to be few days offline, so anyone who want step in QE shoes can do that
- I will not be mad :)
I would suggest holding off on that, as we are working on updating the guidelines to use SPDX identifiers (and therefore SPDX expressions).
So far the deviations are mostly typos. So audit with currently used short names will easy the later transition.
I have started doing the audit. All the deviation will be tracked here:
Wouldn't it be faster to submit pull requests instead of buzgillas when the fix is obvious?
Dne 27. 12. 21 v 11:33 Björn Persson napsal(a):
$ license-validate-v'GPL or (MIT and BSD)' No terminal defined for 'G' at line 1 col 1
Approximately nobody will understand "No terminal defined for 'G'". Can the error message be improved?
I know. I wish to improve it too. This is my first time I work with LARK parser.
I was tempting to replace it with vague message "There is an error". But the error above points you to correct place in the string. So it seems to be slighty better.
Once I learn how to improve, I will do it. Suggestion is welcomed. :)
Miroslav
I have created new tool license-validate: https://pagure.io/copr/license-validate/
I've written something relatively similar a few years back (https://github.com/suve/vrms-rpm). I took a look at the code - using a proper parser is definitely a better solution than the error-prone, manual matching happening in my program. May be a good idea for some v3.0. ;)
My personal suggestion would be to add a "line by line" mode for interactive usage, so instead of:
$ license-validate --verbose "First" $ license-validate --verbose "Second"
one could do something like:
$ license-validate --stdin < licenses.txt
A.FI.
Dne 29. 12. 21 v 17:22 Artur Frenszek-Iwicki napsal(a):
My personal suggestion would be to add a "line by line" mode for interactive usage, so instead of:
K.I.S.S. I always start with tool which can handle one item. And then built large tool on top of it.
$ license-validate --verbose "First" $ license-validate --verbose "Second"
one could do something like:
$ license-validate --stdin < licenses.txt
Here it is:
https://pagure.io/copr/license-validate/blob/main/f/download-all-fedora-lice...
https://pagure.io/copr/license-validate/blob/main/f/check-all-licenses
Miroslav
devel@lists.stg.fedoraproject.org