Hey folks!
Just wanted to flag up that, now the new Bodhi version has been deployed to production, critpath updates are gated on openQA test results. If any openQA test for your critpath update failed, the gating status will be marked as 'failed' and you will not be able to push it stable.
Waivers can be issued for failed tests where appropriate:
https://docs.fedoraproject.org/en-US/ci/gating/#_waive
But in almost all cases a failure indicates either a genuine bug or an opportunity to improve the test, so I'd prefer to avoid use of waivers where possible. I am trying to keep an eye on all failed tests, but if you have a blocked update and you don't understand the failure and I haven't yet commented on it, please do poke me and I'll take a look.
If a failure looks like some kind of transient issue, several folks have the power to rerun tests: myself, lruzicka, kparal, tflink, abokovoy (abbra / ab), pwhalen, and sumantrom. You can ask one of us to do it. There have been plans in the past to implement some sort of rerun request system in Bodhi but no-one's quite had the roundtuits to work it out yet; sorry about that.
Thanks everyone, please be patient with any kinks while we see how this goes :)
Hey folks!
Just wanted to flag up that, now the new Bodhi version has been deployed to production, critpath updates are gated on openQA test results. If any openQA test for your critpath update failed, the gating status will be marked as 'failed' and you will not be able to push it stable.
Waivers can be issued for failed tests where appropriate:
https://docs.fedoraproject.org/en-US/ci/gating/#_waive
But in almost all cases a failure indicates either a genuine bug or an opportunity to improve the test, so I'd prefer to avoid use of waivers where possible. I am trying to keep an eye on all failed tests, but if you have a blocked update and you don't understand the failure and I haven't yet commented on it, please do poke me and I'll take a look.
If a failure looks like some kind of transient issue, several folks have the power to rerun tests: myself, lruzicka, kparal, tflink, abokovoy (abbra / ab), pwhalen, and sumantrom. You can ask one of us to do it. There have been plans in the past to implement some sort of rerun request system in Bodhi but no-one's quite had the roundtuits to work it out yet; sorry about that.
A "re-run tests" button should be displayed in this case if the logged in user has the power to edit the update (commit access or provenpackager). Do you mean that it doesn't work?
Thanks everyone, please be patient with any kinks while we see how this goes :)
On Thu, 2021-05-13 at 19:29 +0000, Mattia Verga via devel wrote:
Hey folks!
Just wanted to flag up that, now the new Bodhi version has been deployed to production, critpath updates are gated on openQA test results. If any openQA test for your critpath update failed, the gating status will be marked as 'failed' and you will not be able to push it stable.
Waivers can be issued for failed tests where appropriate:
https://docs.fedoraproject.org/en-US/ci/gating/#_waive
But in almost all cases a failure indicates either a genuine bug or an opportunity to improve the test, so I'd prefer to avoid use of waivers where possible. I am trying to keep an eye on all failed tests, but if you have a blocked update and you don't understand the failure and I haven't yet commented on it, please do poke me and I'll take a look.
If a failure looks like some kind of transient issue, several folks have the power to rerun tests: myself, lruzicka, kparal, tflink, abokovoy (abbra / ab), pwhalen, and sumantrom. You can ask one of us to do it. There have been plans in the past to implement some sort of rerun request system in Bodhi but no-one's quite had the roundtuits to work it out yet; sorry about that.
A "re-run tests" button should be displayed in this case if the logged in user has the power to edit the update (commit access or provenpackager). Do you mean that it doesn't work?
I believe that's strictly wired up to Fedora CI. It doesn't do anything for openQA.
Hello,
On Fri, May 14, 2021 at 2:19 AM Adam Williamson adamwill@fedoraproject.org wrote:
On Thu, 2021-05-13 at 19:29 +0000, Mattia Verga via devel wrote:
Hey folks!
Just wanted to flag up that, now the new Bodhi version has been deployed to production, critpath updates are gated on openQA test results. If any openQA test for your critpath update failed, the gating status will be marked as 'failed' and you will not be able to push it stable.
Waivers can be issued for failed tests where appropriate:
https://docs.fedoraproject.org/en-US/ci/gating/#_waive
But in almost all cases a failure indicates either a genuine bug or an opportunity to improve the test, so I'd prefer to avoid use of waivers where possible. I am trying to keep an eye on all failed tests, but if you have a blocked update and you don't understand the failure and I haven't yet commented on it, please do poke me and I'll take a look.
If a failure looks like some kind of transient issue, several folks have the power to rerun tests: myself, lruzicka, kparal, tflink, abokovoy (abbra / ab), pwhalen, and sumantrom. You can ask one of us to do it. There have been plans in the past to implement some sort of rerun request system in Bodhi but no-one's quite had the roundtuits to work it out yet; sorry about that.
A "re-run tests" button should be displayed in this case if the logged in user has the power to edit the update (commit access or provenpackager). Do you mean that it doesn't work?
I believe that's strictly wired up to Fedora CI. It doesn't do anything for openQA.
I thought, under the hood, the button is just telling Bodhi to send the "bodhi.update.status.testing.koji-build-group.build.complete" [1] message again, so all CI systems listening should trigger? This isn't the case for openQA?
Thanks, Michal
[1]: https://apps.fedoraproject.org/datagrepper/raw?topic=org.fedoraproject.prod....
-- Adam Williamson Fedora QA IRC: adamw | Twitter: adamw_ha https://www.happyassassin.net
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Fri, 2021-05-14 at 06:06 +0200, Michal Srb wrote:
I thought, under the hood, the button is just telling Bodhi to send the "bodhi.update.status.testing.koji-build-group.build.complete" [1] message again, so all CI systems listening should trigger? This isn't the case for openQA?
Thanks, Michal
Ah, I didn't remember precisely what it does :D
openQA doesn't currently trigger off that message, for the fairly simple reason that it didn't exist when I wrote the openQA update triggering stuff. It triggers off Bodhi "update submitted to testing" and "update edited" messages.
I can take a look at whether I can adjust it to use that message as well/instead, I'll have to make a bit of time for it tomorrow.
On Thu, 2021-05-13 at 23:47 -0700, Adam Williamson wrote:
On Fri, 2021-05-14 at 06:06 +0200, Michal Srb wrote:
I thought, under the hood, the button is just telling Bodhi to send the "bodhi.update.status.testing.koji-build-group.build.complete" [1] message again, so all CI systems listening should trigger? This isn't the case for openQA?
Thanks, Michal
Ah, I didn't remember precisely what it does :D
openQA doesn't currently trigger off that message, for the fairly simple reason that it didn't exist when I wrote the openQA update triggering stuff. It triggers off Bodhi "update submitted to testing" and "update edited" messages.
I can take a look at whether I can adjust it to use that message as well/instead, I'll have to make a bit of time for it tomorrow.
Well, I looked into it a bit, and...
https://github.com/fedora-infra/bodhi/issues/4217
is where I'm at. I kind of don't want to change the openQA tests to always trigger on koji-build-group.build.complete , for a specific reason: it's sent when the update is pushed to updates-testing. This gets done in batches. So if I use that message at the trigger, openQA will sit there idle most of the time, then when an updates-testing push happens, it will try and test a *lot* of updates, all at once. It only has limited worker capacity, so some will wind up sitting in a queue for maybe several hours.
The openQA tests do not retrieve the packages from updates-testing - they pull them directly from Koji - so they don't need to wait for them to be in updates-testing. So things work quite nicely now, where we schedule the openQA tests whenever a maintainer pushes the button to submit the update or edit it; this way we don't get sudden batches of multiple updates to test at once, usually, and the work is spread out over time.
So ideally I would like to adjust the triggering to only trigger tests on that koji-build-group.build.complete message *when it is marked as being a re-trigger request*. But I can't really do that, because...well...that bug.
I've just realized that this currently doesn't work: Rawhide updates still aren't marked as critpath.
See https://github.com/fedora-infra/bodhi/issues/4177#issuecomment-841350366
On Fri, 2021-05-14 at 16:28 +0000, Mattia Verga via devel wrote:
I've just realized that this currently doesn't work: Rawhide updates still aren't marked as critpath.
See https://github.com/fedora-infra/bodhi/issues/4177#issuecomment-841350366
Oh, that's fine. This is only for stable and Branched anyway. openQA does not test Rawhide updates at present, though that's something I want to work towards this year.
So this seems like a good idea, but I notice that all tests are marked as failed until the results arrive. This leads to incorrect failure emails and incorrect UI indicating lots of test failures where none exist. Doesn't seem ready for production yet.
On Fri, 2021-05-14 at 14:40 -0500, Michael Catanzaro wrote:
So this seems like a good idea, but I notice that all tests are marked as failed until the results arrive. This leads to incorrect failure emails and incorrect UI indicating lots of test failures where none exist. Doesn't seem ready for production yet.
I don't know about emails, but the UI isn't indicating a failure, it's indicating a missing result. This *is* what it shows if you read it carefully. It doesn't say a test failed.
If emails are getting sent out in this situation we should probably fix that, but I don't think you can do much about it in Bodhi UI besides possibly tweak the representation to be a bit clearer about the difference between a failed test and a missing result. The fact that a result is missing is significant information and is the reason the test cannot be gated; it needs to be communicated.
This part isn't any different from existing gating on Fedora CI tests, AFAIK (although those may run faster and so the state exists for less time).
On Fri, May 14 2021 at 02:17:13 PM -0700, Adam Williamson adamwill@fedoraproject.org wrote:
I don't know about emails, but the UI isn't indicating a failure, it's indicating a missing result. This *is* what it shows if you read it carefully. It doesn't say a test failed.
That's incorrect, see e.g.:
https://bodhi.fedoraproject.org/updates/FEDORA-2021-72b0305521
The gating status changed to failed, then to waiting, then back to failed, then to passed. And yes, each status change triggers an email.
On Fri, 2021-05-14 at 18:20 -0500, Michael Catanzaro wrote:
On Fri, May 14 2021 at 02:17:13 PM -0700, Adam Williamson adamwill@fedoraproject.org wrote:
I don't know about emails, but the UI isn't indicating a failure, it's indicating a missing result. This *is* what it shows if you read it carefully. It doesn't say a test failed.
That's incorrect, see e.g.:
https://bodhi.fedoraproject.org/updates/FEDORA-2021-72b0305521
The gating status changed to failed, then to waiting, then back to failed, then to passed. And yes, each status change triggers an email.
Well, yes, the *gating status* is indeed failed when the tests are not yet run. We've been around this rodeo a few times; nothing else is really possible without something like a test status equivalent to resultsdb, an execdb. But that doesn't exist.
I'm not sure why the change to 'waiting' then back to 'failed'. That part looks odd and might be something we could work on. But it's kinda expected at present that the status would go to failed at first and then passed once the test results appear. We could possibly do something kludgey in Bodhi to make it not send out emails for gating status changes right at the time of submission, or something.
You reported my complaints as: https://github.com/fedora-infra/bodhi/issues/4219. Thanks!
I'm going to go ahead and ask that the gating be turned off until it's fixed. If it doesn't display accurate test results, it's not even close to being ready yet. Good gating is good, but bad gating is worse than no gating.
On Tue, 2021-05-18 at 08:26 -0500, Michael Catanzaro wrote:
You reported my complaints as: https://github.com/fedora-infra/bodhi/issues/4219. Thanks!
I'm going to go ahead and ask that the gating be turned off until it's fixed. If it doesn't display accurate test results, it's not even close to being ready yet. Good gating is good, but bad gating is worse than no gating.
I don't really think this is necessary? It's not really 'displaying' anything inaccurate at any point. Just not totally clear. When the tests aren't yet run it shows that some results are missing, which is true. When they are finished it shows the correct status and gates correctly. What is it that you're saying is actually not accurate?
On Tue, May 18, 2021 at 08:38:29AM -0700, Adam Williamson wrote:
On Tue, 2021-05-18 at 08:26 -0500, Michael Catanzaro wrote:
You reported my complaints as: https://github.com/fedora-infra/bodhi/issues/4219. Thanks!
I'm going to go ahead and ask that the gating be turned off until it's fixed. If it doesn't display accurate test results, it's not even close to being ready yet. Good gating is good, but bad gating is worse than no gating.
I don't really think this is necessary? It's not really 'displaying' anything inaccurate at any point. Just not totally clear. When the tests aren't yet run it shows that some results are missing, which is true. When they are finished it shows the correct status and gates correctly. What is it that you're saying is actually not accurate?
I don't think it needs to be turned off, but the issue does need to be addressed. The first time I got a failure message, I had to go and investigate. Took a minute to figure out what was going on. Now that I know that it is the issue, I am ignoring those messages. I haven't seen a real failure yet on anything I submitted, but I would assume the message looks the same, and I likely won't notice until I go into bodhi to do something with the update. As for now, I think the gating is probably more valuable than the flawed output of the tool. Leave it on, but perhaps add some priority to fixing the issue?
Justin
On Tue, 2021-05-18 at 10:55 -0500, Justin Forbes wrote:
On Tue, May 18, 2021 at 08:38:29AM -0700, Adam Williamson wrote:
On Tue, 2021-05-18 at 08:26 -0500, Michael Catanzaro wrote:
You reported my complaints as: https://github.com/fedora-infra/bodhi/issues/4219. Thanks!
I'm going to go ahead and ask that the gating be turned off until it's fixed. If it doesn't display accurate test results, it's not even close to being ready yet. Good gating is good, but bad gating is worse than no gating.
I don't really think this is necessary? It's not really 'displaying' anything inaccurate at any point. Just not totally clear. When the tests aren't yet run it shows that some results are missing, which is true. When they are finished it shows the correct status and gates correctly. What is it that you're saying is actually not accurate?
I don't think it needs to be turned off, but the issue does need to be addressed. The first time I got a failure message, I had to go and investigate. Took a minute to figure out what was going on. Now that I know that it is the issue, I am ignoring those messages. I haven't seen a real failure yet on anything I submitted, but I would assume the message looks the same, and I likely won't notice until I go into bodhi to do something with the update. As for now, I think the gating is probably more valuable than the flawed output of the tool. Leave it on, but perhaps add some priority to fixing the issue?
I'm already planning to work on it today.
V Tue, May 18, 2021 at 10:55:31AM -0500, Justin Forbes napsal(a):
On Tue, May 18, 2021 at 08:38:29AM -0700, Adam Williamson wrote:
On Tue, 2021-05-18 at 08:26 -0500, Michael Catanzaro wrote:
You reported my complaints as: https://github.com/fedora-infra/bodhi/issues/4219. Thanks!
I'm going to go ahead and ask that the gating be turned off until it's fixed. If it doesn't display accurate test results, it's not even close to being ready yet. Good gating is good, but bad gating is worse than no gating.
I don't really think this is necessary? It's not really 'displaying' anything inaccurate at any point. Just not totally clear. When the tests aren't yet run it shows that some results are missing, which is true. When they are finished it shows the correct status and gates correctly. What is it that you're saying is actually not accurate?
I don't think it needs to be turned off, but the issue does need to be addressed. The first time I got a failure message, I had to go and investigate. Took a minute to figure out what was going on. Now that I know that it is the issue, I am ignoring those messages. I haven't seen a real failure yet on anything I submitted, but I would assume the message looks the same, and I likely won't notice until I go into bodhi to do something with the update.
For me, the problem is not a web interface. For me the problem is e-mail notifications. Whenever I do a build in Rawhide, I get a message that an update was submitted (I could live without this message, but still good) and then 30 minutes later I get a message that the tests failed. This message is very obtrusive:
I need go to the web interface and there I will find that the tests did not fail. The tests have not yet finished. I do not recommend ignoring this message because if the tests indeed fail, the update will never be pushed into stable. It's not usually the case, but it has already happened to me once. At the end, that's the purpose of gating. The problem is that this message is _indistinguishable_ from a real failure.
Then after finishing the tests (a few minutes later), I get a third message that the update was pushed to stable (because all tests passed).
-- Petr
On Wed, 2021-05-19 at 22:54 +0100, Pete Walter wrote:
I waited over an hour on openQA test results that never came. Ended up waiving https://bodhi.fedoraproject.org/updates/FEDORA-2021-8cdffadc43. After a bunch of searching I found https://openqa.fedoraproject.org but there was no indication that it had even started running the tests for this update. Frustrating experience.
Sorry for the trouble. But there's obviously something odd there. The gating is not intended to be active for Rawhide updates at all, because we don't run the tests for Rawhide. It is only supposed to happen for stable and Branched. I think it must be working this way most of the time, or else Rawhide would've ground to a halt and there'd be a lot more angry people.
How did you create that update exactly? Was it from a side tag? Thanks!
On Wed, 2021-05-19 at 14:59 -0700, Adam Williamson wrote:
On Wed, 2021-05-19 at 22:54 +0100, Pete Walter wrote:
I waited over an hour on openQA test results that never came. Ended up waiving https://bodhi.fedoraproject.org/updates/FEDORA-2021-8cdffadc43. After a bunch of searching I found https://openqa.fedoraproject.org but there was no indication that it had even started running the tests for this update. Frustrating experience.
Sorry for the trouble. But there's obviously something odd there. The gating is not intended to be active for Rawhide updates at all, because we don't run the tests for Rawhide. It is only supposed to happen for stable and Branched. I think it must be working this way most of the time, or else Rawhide would've ground to a halt and there'd be a lot more angry people.
How did you create that update exactly? Was it from a side tag? Thanks!
This is how most Rawhide updates look:
https://bodhi.fedoraproject.org/updates/FEDORA-2021-e0ea4205ef
note it's not marked as a critpath update, and the gating status is 'ignored'...
On Wed, 2021-05-19 at 15:01 -0700, Adam Williamson wrote:
On Wed, 2021-05-19 at 14:59 -0700, Adam Williamson wrote:
On Wed, 2021-05-19 at 22:54 +0100, Pete Walter wrote:
I waited over an hour on openQA test results that never came. Ended up waiving https://bodhi.fedoraproject.org/updates/FEDORA-2021-8cdffadc43. After a bunch of searching I found https://openqa.fedoraproject.org but there was no indication that it had even started running the tests for this update. Frustrating experience.
Sorry for the trouble. But there's obviously something odd there. The gating is not intended to be active for Rawhide updates at all, because we don't run the tests for Rawhide. It is only supposed to happen for stable and Branched. I think it must be working this way most of the time, or else Rawhide would've ground to a halt and there'd be a lot more angry people.
How did you create that update exactly? Was it from a side tag? Thanks!
This is how most Rawhide updates look:
https://bodhi.fedoraproject.org/updates/FEDORA-2021-e0ea4205ef
note it's not marked as a critpath update, and the gating status is 'ignored'...
Hmm. So, looking into this a bit...I think it's one of those cases where things mostly "work" by mistake.
I think my intent/expectation was that Bodhi would query Greenwave with product_version "fedora-rawhide" for Rawhide updates, and the policy is crafted to not apply the gating to that version, and so everything would be hunky dory.
But, uh, I don't think that's what happens. I think Bodhi queries Greenwave for "fedora-35", because Bodhi is all set up to treat Rawhide as "Fedora 35", that's what the release is called in Bodhi. But we still don't happen to trigger the gating for *most* Rawhide updates *because they don't seem to be considered critical path updates even when they contain critical path packages*.
This gating stuff kicks in only for critpath updates (we set Bodhi to use a different 'decision_context' for critpath and non-critpath updates), because openQA only tests those. So we are just happening to not apply gating to Rawhide updates because they almost never seem to be tagged as critical path. That's kind of a fortunate accident, though. It's not how it *should* work.
So I guess there are kinda two questions here:
1. Is it actually right that Rawhide auto-created updates with critpath packages in them aren't marked as critpath? If not, we should fix that.
2. How can we best sensibly tweak things so we don't gate on Rawhide updates that *do* get marked as critpath?
I'm going to think about #2 now.
On Wed, 2021-05-19 at 15:23 -0700, Adam Williamson wrote:
- How can we best sensibly tweak things so we don't gate on Rawhide
updates that *do* get marked as critpath?
I'm going to think about #2 now.
So all the clever ways I can think of to do this for now kinda suck, so I went with a dull but fairly correct (I hope) one:
https://pagure.io/fedora-infra/ansible/c/c9ee450c6a56cb0fa900ae5980d1d3d37dc...
the drawback of that is we have to remember to manually update that list of versions each time a release branches. Which will inevitably get forgotten sometime. That's why I wanted to use the wildcards. But it's the best option I can think of for now. Sorry again for the trouble.
On Wed, May 19, 2021 at 6:39 PM Adam Williamson adamwill@fedoraproject.org wrote:
the drawback of that is we have to remember to manually update that list of versions each time a release branches. Which will inevitably get forgotten sometime. That's why I wanted to use the wildcards. But it's the best option I can think of for now. Sorry again for the trouble.
Would it be helpful for me to add this to the QA and/or RelEng schedules for F35+? That way there's at least a written record that it needs to be done, so if we forget we can't blame Past Us. :-)
On Thu, May 20, 2021 at 10:15:18AM -0400, Ben Cotton wrote:
On Wed, May 19, 2021 at 6:39 PM Adam Williamson adamwill@fedoraproject.org wrote:
the drawback of that is we have to remember to manually update that list of versions each time a release branches. Which will inevitably get forgotten sometime. That's why I wanted to use the wildcards. But it's the best option I can think of for now. Sorry again for the trouble.
Would it be helpful for me to add this to the QA and/or RelEng schedules for F35+? That way there's at least a written record that it needs to be done, so if we forget we can't blame Past Us. :-)
Please do. :)
kevin
On Thu, 2021-05-20 at 08:46 -0700, Kevin Fenzi wrote:
On Thu, May 20, 2021 at 10:15:18AM -0400, Ben Cotton wrote:
On Wed, May 19, 2021 at 6:39 PM Adam Williamson adamwill@fedoraproject.org wrote:
the drawback of that is we have to remember to manually update that list of versions each time a release branches. Which will inevitably get forgotten sometime. That's why I wanted to use the wildcards. But it's the best option I can think of for now. Sorry again for the trouble.
Would it be helpful for me to add this to the QA and/or RelEng schedules for F35+? That way there's at least a written record that it needs to be done, so if we forget we can't blame Past Us. :-)
Please do. :)
Yeah, it would probably make sense for releng as it should happen at exactly the point a release branches. Call it "Update Greenwave policy product_versions" or something. There are several other policies in that file that need updating at the same time as well as the openQA one.
On Tue, 2021-05-18 at 08:26 -0500, Michael Catanzaro wrote:
You reported my complaints as: https://github.com/fedora-infra/bodhi/issues/4219. Thanks!
I'm going to go ahead and ask that the gating be turned off until it's fixed. If it doesn't display accurate test results, it's not even close to being ready yet. Good gating is good, but bad gating is worse than no gating.
Hey again folks! It's been a while, but I wanted to note that Bodhi 5.7.1 with all my changes related to this and other issues that came up to do with update gating is now deployed in production. I checked and so far it *seems* like everything is working. The status ping-pong on update creation doesn't happen, and status is being correctly updated in response to new results. I hope you'll find it better now.
Please let me know if you see any problems. Note that openQA update tests are currently failing unpredictably due to https://pagure.io/fedora-infrastructure/issue/9234 ; I'm re-firing them as I notice them, and I will take a look at the tests and see if there's anything I can do to try and mitigate the problem (fiddle with the retries or something).
devel@lists.stg.fedoraproject.org