I had reason today to go poking through the ResultsDB fedmsg history:
https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb
doing that makes it apparent that the pipeline is kinda flooding ResultsDB with "test results" that seem to be nothing more than "the pipeline decided not to test this package". Something like 99% of pipeline results appear to be of this type - either "ci.pipeline.allpackages-build.package.ignored" or "ci.pipeline.package.ignore". There've been 20 ci.pipeline.allpackages- build.package.ignored results filed in just the last 7 minutes, for instance, and it seems like there's a more or less constant stream of them.
Are these results really necessary or useful? ResultsDB is built to handle the load, I think, and my use case of poking through fedmsg history looking for messages from the pipeline *actually doing something* is not a super common one, but still, if this isn't really necessary, it seems like it might be a good idea to just not do it.
(Also, on the topic of why there are two topics - is one from the new pipeline and one from the old? Do we still need the old pipeline running?)
Thanks folks!
On Wed, Dec 5, 2018 at 2:38 PM Adam Williamson adamwill@fedoraproject.org wrote:
I had reason today to go poking through the ResultsDB fedmsg history:
https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb
doing that makes it apparent that the pipeline is kinda flooding ResultsDB with "test results" that seem to be nothing more than "the pipeline decided not to test this package". Something like 99% of pipeline results appear to be of this type - either "ci.pipeline.allpackages-build.package.ignored" or "ci.pipeline.package.ignore". There've been 20 ci.pipeline.allpackages- build.package.ignored results filed in just the last 7 minutes, for instance, and it seems like there's a more or less constant stream of them.
Are these results really necessary or useful? ResultsDB is built to handle the load, I think, and my use case of poking through fedmsg history looking for messages from the pipeline *actually doing something* is not a super common one, but still, if this isn't really necessary, it seems like it might be a good idea to just not do it.
(Also, on the topic of why there are two topics - is one from the new pipeline and one from the old? Do we still need the old pipeline running?)
I'll let Bruno or Miroslav speak to whether we can turn off the .ignore messages, but to answer your last question, yes, there are two topics because one is the new pipeline and one is the old (atomic) pipeline. The old (atomic) pipeline has not been updated for anything past Fedora 27, so it may be the case that the old one can stop running.
Best, Johnny
Thanks folks!
Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net _______________________________________________ CI mailing list -- ci@lists.fedoraproject.org To unsubscribe send an email to ci-leave@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/ci@lists.fedoraproject.org
On Wed, 2018-12-05 at 15:06 -0500, Johnny Bieren wrote:
On Wed, Dec 5, 2018 at 2:38 PM Adam Williamson adamwill@fedoraproject.org wrote:
I had reason today to go poking through the ResultsDB fedmsg history:
https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb
doing that makes it apparent that the pipeline is kinda flooding ResultsDB with "test results" that seem to be nothing more than "the pipeline decided not to test this package". Something like 99% of pipeline results appear to be of this type - either "ci.pipeline.allpackages-build.package.ignored" or "ci.pipeline.package.ignore". There've been 20 ci.pipeline.allpackages- build.package.ignored results filed in just the last 7 minutes, for instance, and it seems like there's a more or less constant stream of them.
Are these results really necessary or useful? ResultsDB is built to handle the load, I think, and my use case of poking through fedmsg history looking for messages from the pipeline *actually doing something* is not a super common one, but still, if this isn't really necessary, it seems like it might be a good idea to just not do it.
(Also, on the topic of why there are two topics - is one from the new pipeline and one from the old? Do we still need the old pipeline running?)
I'll let Bruno or Miroslav speak to whether we can turn off the .ignore messages,
To be clear, I can see why the pipeline might want to *emit a fedmsg on its own topic* saying that it ignored a package. What doesn't seem to be necessary is to *submit a result to ResultsDB* which looks like this:
https://taskotron.fedoraproject.org/resultsdb/results/25608385
it's kind of a bizarre result, because it sort of looks like a test passed, but really it just means...there was no test at all. Why submit a 'test result' for a case where nothing was tested?
I mentioned fedmsgs because I was looking at the fedmsgs *emitted by ResultsDB itself* when a result is created (as that's what greenwave is using - I was fixing stuff in greenwave). But I'm not suggesting the pipeline shouldn't emit messages *on its own topic* when it ignores something, just that it shouldn't submit a "test result" to ResultsDB when it does that.
but to answer your last question, yes, there are two topics because one is the new pipeline and one is the old (atomic) pipeline. The old (atomic) pipeline has not been updated for anything past Fedora 27, so it may be the case that the old one can stop running.
F27 is EOL as of now, so it seems like that might be the case, yeah.
Thanks!
On Wed, 2018-12-05 at 12:16 -0800, Adam Williamson wrote:
To be clear, I can see why the pipeline might want to *emit a fedmsg on its own topic* saying that it ignored a package. What doesn't seem to be necessary is to *submit a result to ResultsDB* which looks like this:
https://taskotron.fedoraproject.org/resultsdb/results/25608385
it's kind of a bizarre result, because it sort of looks like a test passed, but really it just means...there was no test at all. Why submit a 'test result' for a case where nothing was tested?
I mentioned fedmsgs because I was looking at the fedmsgs *emitted by ResultsDB itself* when a result is created (as that's what greenwave is using - I was fixing stuff in greenwave). But I'm not suggesting the pipeline shouldn't emit messages *on its own topic* when it ignores something, just that it shouldn't submit a "test result" to ResultsDB when it does that.
Actually, thinking it through a little more, I can guess at a possible reason: it may be for greenwave policy purposes. Perhaps the idea is that you can set a rule like "either the pipeline tests must have passed, or the package must have been ignored"? The idea might be that this is essentially a way to tell *greenwave* (which consumes fedmsgs from resultsdb) that the pipeline did not test the package.
If so, I kinda get the idea, but it seems like a bit of a hack around a limitation in greenwave (one I've run into as well, as openQA does not test all updates, so it's hard to write a greenwave policy for update tests...), rather than 'the right way to do things'. It doesn't seem like this has been actually done, either, at least in Fedora, as Fedora's greenwave policy doesn't have the string 'ignore' in it AFAICS.
Hi
On Wed, Dec 5, 2018 at 9:23 PM Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2018-12-05 at 12:16 -0800, Adam Williamson wrote:
To be clear, I can see why the pipeline might want to *emit a fedmsg on its own topic* saying that it ignored a package. What doesn't seem to be necessary is to *submit a result to ResultsDB* which looks like this:
https://taskotron.fedoraproject.org/resultsdb/results/25608385
it's kind of a bizarre result, because it sort of looks like a test passed, but really it just means...there was no test at all. Why submit a 'test result' for a case where nothing was tested?
I mentioned fedmsgs because I was looking at the fedmsgs *emitted by ResultsDB itself* when a result is created (as that's what greenwave is using - I was fixing stuff in greenwave). But I'm not suggesting the pipeline shouldn't emit messages *on its own topic* when it ignores something, just that it shouldn't submit a "test result" to ResultsDB when it does that.
Actually, thinking it through a little more, I can guess at a possible reason: it may be for greenwave policy purposes. Perhaps the idea is that you can set a rule like "either the pipeline tests must have passed, or the package must have been ignored"? The idea might be that this is essentially a way to tell *greenwave* (which consumes fedmsgs from resultsdb) that the pipeline did not test the package.
I am not aware of any reason to keep in resultsdb ignore packages really. OSCI does not have any use case for now.
TBH we did not write or contribute much the upstream resultsdb updater, the SUCCESS state is moreover completely misleading yeah.
Let's just fix upstream resultsdb listener to ignore these, I am creating a PR in:
https://pagure.io/ci-resultsdb-listener
Best regards, /M
If so, I kinda get the idea, but it seems like a bit of a hack around a limitation in greenwave (one I've run into as well, as openQA does not test all updates, so it's hard to write a greenwave policy for update tests...), rather than 'the right way to do things'. It doesn't seem like this has been actually done, either, at least in Fedora, as Fedora's greenwave policy doesn't have the string 'ignore' in it AFAICS.
On Thu, Dec 06, 2018 at 01:16:43PM +0100, Miroslav Vadkerti wrote:
Hi On Wed, Dec 5, 2018 at 9:23 PM Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2018-12-05 at 12:16 -0800, Adam Williamson wrote: > > To be clear, I can see why the pipeline might want to *emit a fedmsg on > its own topic* saying that it ignored a package. What doesn't seem to > be necessary is to *submit a result to ResultsDB* which looks like > this: > > https://taskotron.fedoraproject.org/resultsdb/results/25608385 > > it's kind of a bizarre result, because it sort of looks like a test > passed, but really it just means...there was no test at all. Why submit > a 'test result' for a case where nothing was tested? > > I mentioned fedmsgs because I was looking at the fedmsgs *emitted by > ResultsDB itself* when a result is created (as that's what greenwave is > using - I was fixing stuff in greenwave). But I'm not suggesting the > pipeline shouldn't emit messages *on its own topic* when it ignores > something, just that it shouldn't submit a "test result" to ResultsDB > when it does that. Actually, thinking it through a little more, I can guess at a possible reason: it may be for greenwave policy purposes. Perhaps the idea is that you can set a rule like "either the pipeline tests must have passed, or the package must have been ignored"? The idea might be that this is essentially a way to tell *greenwave* (which consumes fedmsgs from resultsdb) that the pipeline did not test the package.
I am not aware of any reason to keep in resultsdb ignore packages really. OSCI does not have any use case for now. TBH we did not write or contribute much the upstream resultsdb updater, the SUCCESS state is moreover completely misleading yeah. Let's just fix upstream resultsdb listener to ignore these, I am creating a PR in: Â Â https://pagure.io/ci-resultsdb-listener
The issue is greenwave, how does bodhi know if an update is waiting to be tested vs is ignored by the sytem?
Pierre
On Thu, 2018-12-06 at 19:23 +0100, Pierre-Yves Chibon wrote:
On Thu, Dec 06, 2018 at 01:16:43PM +0100, Miroslav Vadkerti wrote:
Hi On Wed, Dec 5, 2018 at 9:23 PM Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2018-12-05 at 12:16 -0800, Adam Williamson wrote: > > To be clear, I can see why the pipeline might want to *emit a fedmsg on > its own topic* saying that it ignored a package. What doesn't seem to > be necessary is to *submit a result to ResultsDB* which looks like > this: > > https://taskotron.fedoraproject.org/resultsdb/results/25608385 > > it's kind of a bizarre result, because it sort of looks like a test > passed, but really it just means...there was no test at all. Why submit > a 'test result' for a case where nothing was tested? > > I mentioned fedmsgs because I was looking at the fedmsgs *emitted by > ResultsDB itself* when a result is created (as that's what greenwave is > using - I was fixing stuff in greenwave). But I'm not suggesting the > pipeline shouldn't emit messages *on its own topic* when it ignores > something, just that it shouldn't submit a "test result" to ResultsDB > when it does that. Actually, thinking it through a little more, I can guess at a possible reason: it may be for greenwave policy purposes. Perhaps the idea is that you can set a rule like "either the pipeline tests must have passed, or the package must have been ignored"? The idea might be that this is essentially a way to tell *greenwave* (which consumes fedmsgs from resultsdb) that the pipeline did not test the package.
I am not aware of any reason to keep in resultsdb ignore packages really. OSCI does not have any use case for now. TBH we did not write or contribute much the upstream resultsdb updater, the SUCCESS state is moreover completely misleading yeah. Let's just fix upstream resultsdb listener to ignore these, I am creating a PR in: Â Â https://pagure.io/ci-resultsdb-listener
The issue is greenwave, how does bodhi know if an update is waiting to be tested vs is ignored by the sytem?
Yeah, that's what I suggested above - but it seems this wasn't really *intended* for that.
Still, if we do want to tackle that problem, we do need to come up with some ideas. It *is* a real problem, if we want to gate on tests which are not run on all packages. I suggested the possibility that Greenwave could have a 'test must pass or not have run' type of rule, but you or someone else pointed out that comes with obvious holes, like Bodhi will consider the update to be passing that rule if the test has been *scheduled* but not *finished* yet.
Sending 'ignore' results or something like that is one potential way to deal with the problem, though I can see it getting messy. For instance, openQA runs different sets of tests on different updates. I had to set up a whitelist mechanism to get the server tests run on FreeIPA-related updates that aren't on the critical path; as part of that whitelisting mechanism, I set things up so it runs *only* the server tests on those updates, it skips the desktop tests). What do we do if we want to gate on the desktop tests? openQA would have to do something like send a 'test not run' result for *every individual test it *might* have run*, or send something like a 'ignored desktop tests' result and greenwave would have to somehow be able to "understand" that.
Conceptually this approach would be sort of mildly abusing ResultsDB as a database of test *execution*, and that seems kinda the wrong design to me.
We do in fact *have* a database of test execution! It's called ExecDB:
https://pagure.io/taskotron/execdb
and it's part of Taskotron. I believe right now it's not really set up to be usable entirely independently from the rest of Taskotron like ResultsDB is (though jskladan and kparal might know better than me), but then, ResultsDB wasn't initially either. Perhaps it *could* be made more independent, along the lines of ResultsDB, and we could then have openQA and the pipeline and Autocloud and so on report test execution status to that db, just like they report results to ResultsDB.
Or, of course, if there's a similar component in the pipeline or anywhere else which could be shared by other systems, we could use that instead. Basically, I think the 'right' way to do this is to have some kind of shared record of test execution status. Then we can more reasonably have policies like "if this test is actually run for this artifact, it must pass".
Another possibility here is to have Greenwave track test execution status based on the standardized 'CI' fedmsgs; if we require that any tests we want to gate on must be run in systems that emit messages in the standardized format, it might be viable to work things that way.
I guess there's still a slightly fuzzy area around "is this test really not being run, or is the test system just down or behind or missed updating the status for some reason?", but I think that's manageable. We could have some kind of configurable 'timeout' in Greenwave by which time it assumes required tests must have been scheduled. For instance, if the rule says "test 'foobar' must pass or not be run", for the first five minutes after some new artifact shows up, that rule is unsatisfied (to give test systems time to notice the artifact and schedule the tests); after five minutes, if no test system has registered that it will be executing that test, Greenwave would count the rule as satisfied on the basis that the test is not being run. The more 'robust' approach would be to require test systems to explicitly register that they have *chosen not to run* the test, but that could be difficult to implement (in fact, I can say confidently that for openQA it *is* difficult to implement, as I can't really make the scheduler ask openQA "OK, I'm not really going to run tests for this flavor, but if I *did*, what tests *would* you run?" - that's just not a thing).
Anyone have any thoughts?
On Thu, Dec 06, 2018 at 12:13:59PM -0800, Adam Williamson wrote:
On Thu, 2018-12-06 at 19:23 +0100, Pierre-Yves Chibon wrote:
On Thu, Dec 06, 2018 at 01:16:43PM +0100, Miroslav Vadkerti wrote:
Hi On Wed, Dec 5, 2018 at 9:23 PM Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2018-12-05 at 12:16 -0800, Adam Williamson wrote: > > To be clear, I can see why the pipeline might want to *emit a fedmsg on > its own topic* saying that it ignored a package. What doesn't seem to > be necessary is to *submit a result to ResultsDB* which looks like > this: > > https://taskotron.fedoraproject.org/resultsdb/results/25608385 > > it's kind of a bizarre result, because it sort of looks like a test > passed, but really it just means...there was no test at all. Why submit > a 'test result' for a case where nothing was tested? > > I mentioned fedmsgs because I was looking at the fedmsgs *emitted by > ResultsDB itself* when a result is created (as that's what greenwave is > using - I was fixing stuff in greenwave). But I'm not suggesting the > pipeline shouldn't emit messages *on its own topic* when it ignores > something, just that it shouldn't submit a "test result" to ResultsDB > when it does that. Actually, thinking it through a little more, I can guess at a possible reason: it may be for greenwave policy purposes. Perhaps the idea is that you can set a rule like "either the pipeline tests must have passed, or the package must have been ignored"? The idea might be that this is essentially a way to tell *greenwave* (which consumes fedmsgs from resultsdb) that the pipeline did not test the package.
I am not aware of any reason to keep in resultsdb ignore packages really. OSCI does not have any use case for now. TBH we did not write or contribute much the upstream resultsdb updater, the SUCCESS state is moreover completely misleading yeah. Let's just fix upstream resultsdb listener to ignore these, I am creating a PR in: Â Â https://pagure.io/ci-resultsdb-listener
The issue is greenwave, how does bodhi know if an update is waiting to be tested vs is ignored by the sytem?
Yeah, that's what I suggested above - but it seems this wasn't really *intended* for that.
It very much was intended for this, back then there was no opting-in via a file in dist-git so the only way to know for the decision system if a package is gated and the tests haven't run vs is not gated at all was: have the CI system tell that it ignores the package.
Still, if we do want to tackle that problem, we do need to come up with some ideas. It *is* a real problem, if we want to gate on tests which are not run on all packages. I suggested the possibility that Greenwave could have a 'test must pass or not have run' type of rule, but you or someone else pointed out that comes with obvious holes, like Bodhi will consider the update to be passing that rule if the test has been *scheduled* but not *finished* yet.
I will not dispute that it can be causing problems, nor that we can improve it, just stating the history of the current situation :)
Sending 'ignore' results or something like that is one potential way to deal with the problem, though I can see it getting messy. For instance, openQA runs different sets of tests on different updates. I had to set up a whitelist mechanism to get the server tests run on FreeIPA-related updates that aren't on the critical path; as part of that whitelisting mechanism, I set things up so it runs *only* the server tests on those updates, it skips the desktop tests). What do we do if we want to gate on the desktop tests? openQA would have to do something like send a 'test not run' result for *every individual test it *might* have run*, or send something like a 'ignored desktop tests' result and greenwave would have to somehow be able to "understand" that.
Conceptually this approach would be sort of mildly abusing ResultsDB as a database of test *execution*, and that seems kinda the wrong design to me.
We do in fact *have* a database of test execution! It's called ExecDB:
But my understanding is that execdb follows the status of the running test, a package that is ignored could be approach as a result (something along the lines of "no results is a result").
and it's part of Taskotron. I believe right now it's not really set up to be usable entirely independently from the rest of Taskotron like ResultsDB is (though jskladan and kparal might know better than me), but then, ResultsDB wasn't initially either. Perhaps it *could* be made more independent, along the lines of ResultsDB, and we could then have openQA and the pipeline and Autocloud and so on report test execution status to that db, just like they report results to ResultsDB.
Or, of course, if there's a similar component in the pipeline or anywhere else which could be shared by other systems, we could use that instead. Basically, I think the 'right' way to do this is to have some kind of shared record of test execution status. Then we can more reasonably have policies like "if this test is actually run for this artifact, it must pass".
Another possibility here is to have Greenwave track test execution status based on the standardized 'CI' fedmsgs; if we require that any tests we want to gate on must be run in systems that emit messages in the standardized format, it might be viable to work things that way.
I guess there's still a slightly fuzzy area around "is this test really not being run, or is the test system just down or behind or missed updating the status for some reason?", but I think that's manageable. We could have some kind of configurable 'timeout' in Greenwave by which time it assumes required tests must have been scheduled. For instance, if the rule says "test 'foobar' must pass or not be run", for the first five minutes after some new artifact shows up, that rule is unsatisfied (to give test systems time to notice the artifact and schedule the tests); after five minutes, if no test system has registered that it will be executing that test, Greenwave would count the rule as satisfied on the basis that the test is not being run. The more 'robust' approach would be to require test systems to explicitly register that they have *chosen not to run* the test, but that could be difficult to implement (in fact, I can say confidently that for openQA it *is* difficult to implement, as I can't really make the scheduler ask openQA "OK, I'm not really going to run tests for this flavor, but if I *did*, what tests *would* you run?" - that's just not a thing).
Anyone have any thoughts?
Another approach could also be: gate everything except what the packages set in dist-git to not be gated (basically an opt-out instead of an opt-in), this way there is no longer a need for a notification from the CI system about ignoring since the decision maker will know not to look for test results. This has of course a high cost: make packagers opt-out instead of opting in.
Thinking of it, since packages now opt-in, maybe we do not need the ignore anymore, since the decision maker will know to gate package only if they have asked to be gated, which means: if there are no test results, then either the tests haven't run in which case it should be gated or the tests failed to run, which should be notified to the CI pipeline admins and the best person to do this would be the packager here as well.
Does that make sense?
I'd want to run this by Ralph or Luiz to ensure this line of thought isn't too far in the weeds, but it doesn't sound too far in the weeds for me :)
Pierre
On Fri, 2018-12-07 at 10:10 +0100, Pierre-Yves Chibon wrote:
Yeah, that's what I suggested above - but it seems this wasn't really *intended* for that.
It very much was intended for this, back then there was no opting-in via a file in dist-git so the only way to know for the decision system if a package is gated and the tests haven't run vs is not gated at all was: have the CI system tell that it ignores the package.
Oh, OK. From Miroslav and Vit's replies, it sounded like this (the 'ignore' results) was just a thing that no-one had really thought through much and just kinda 'happened', it wasn't consciously designed to help with this issue.
Still, if we do want to tackle that problem, we do need to come up with some ideas. It *is* a real problem, if we want to gate on tests which are not run on all packages. I suggested the possibility that Greenwave could have a 'test must pass or not have run' type of rule, but you or someone else pointed out that comes with obvious holes, like Bodhi will consider the update to be passing that rule if the test has been *scheduled* but not *finished* yet.
I will not dispute that it can be causing problems, nor that we can improve it, just stating the history of the current situation :)
OK, I wasn't clear that you were saying it actually *was* intended to solve this problem. Thanks.
Sending 'ignore' results or something like that is one potential way to deal with the problem, though I can see it getting messy. For instance, openQA runs different sets of tests on different updates. I had to set up a whitelist mechanism to get the server tests run on FreeIPA-related updates that aren't on the critical path; as part of that whitelisting mechanism, I set things up so it runs *only* the server tests on those updates, it skips the desktop tests). What do we do if we want to gate on the desktop tests? openQA would have to do something like send a 'test not run' result for *every individual test it *might* have run*, or send something like a 'ignored desktop tests' result and greenwave would have to somehow be able to "understand" that.
Conceptually this approach would be sort of mildly abusing ResultsDB as a database of test *execution*, and that seems kinda the wrong design to me.
We do in fact *have* a database of test execution! It's called ExecDB:
But my understanding is that execdb follows the status of the running test, a package that is ignored could be approach as a result (something along the lines of "no results is a result").
At this point we're getting pretty philosophical :P To me, personally, that doesn't feel right - "no result" *isn't* a result. But I can see that it might feel correct to someone else. Maybe I'm the minority here. :P
We would still need to come up with some way to cover the "ran *some* tests, skipped *some* tests" scenario if we wanted to carry on with this approach, though.
Anyone have any thoughts?
Another approach could also be: gate everything except what the packages set in dist-git to not be gated (basically an opt-out instead of an opt-in), this way there is no longer a need for a notification from the CI system about ignoring since the decision maker will know not to look for test results. This has of course a high cost: make packagers opt-out instead of opting in.
Thinking of it, since packages now opt-in, maybe we do not need the ignore anymore, since the decision maker will know to gate package only if they have asked to be gated, which means: if there are no test results, then either the tests haven't run in which case it should be gated or the tests failed to run, which should be notified to the CI pipeline admins and the best person to do this would be the packager here as well.
Does that make sense?
It makes sense for our *current* implementation of *update* gating, mostly (though it does kinda place a load on the update owner to know what tests will be run for the update, as opposed to just letting them say 'well hey that sounds like a good test to gate on, assuming it gets run' and have The System handle it if that test *doesn't* get run).
But that's only what we're doing right now, and I know some folks want us to go a lot further with update gating. What if, say, we decided we wanted to default gate updates on the 'desktop terminal' openQA test, on the basis that if that test gets run and fails, there's probably a serious problem somewhere? That test isn't run on *all* updates. We can hardly tell all packagers "you must go figure out if this test runs on your package, and opt it out of the gating if it doesn't", that seems kinda impractical to me.
Hi,
Anyone have any thoughts?
Let me try to summarize mine.
# The org.centos.prod.ci.pipeline.allpackages-build.package.ignored testcase
The problem I see here is that this testcase doesn't fit in any of designs for Greenwave, CI Messages and/or ResultsDB. Simply because "ignored" supposed to be the outcome of the test, not the testcase itself.
In CI Messages schema we define "not applicable" as a possible value for the status field [1]. This message supposed to mean
"We were trying to run a certain testcase named "<testcase_name>, but there was a certain logic in the test framework or pipeline which led us to skipping it".
This is the result of our testcase, which we want share with all consumers of the _testcase_, to let them do something about it. While "org.centos.prod.ci.pipeline.allpackages-build.package.ignored" is neither testcase nor a result. It is just an informational message on the Message Bus, and probably a redundant one, thus it shouldn't be in the ResultsDB at all.
So for me it is not really a question, rather a work item: we need to find out how to fix it, and fix it. Most likely by aligning the pipeline messages to CI Messages format.
# How to use non_applicable result in gating policy
Now given by the testcase result with "not_applicable" as an outcome, can we use it in Greenwave?
Afaik currently PassingTestCase rule [2] only checks if testcase has PASS status or waived. I think it needs to be adjusted and it needs to be configurable.
We need another rule, something like
ParametrizedTestCaseRule(testcase=TEST_CASE_NAME, outcomes=[PASS, NOT_APPLICABLE, WAIVED])
or maybe
PassingOrSkippedTestCaseRule(testcase=TEST_CASE_NAME)
Or both.
This way we can use all outcomes in our decision making process.
# But should there be the "not applicable" outcome at all and should we treat it like PASS?
I think it should be possible to report results like that. There are certain use cases and extensive test suites where you can skip test results partially or temporarily, or based on certain parameters. And if certain CI system provides this kind of flexibility, it should be able to communicate it.
But I think there shouldn't be a "treat non_applicable like PASS" approach by default. We need to clearly identify those testcases where it makes sense and use it only for this limited subset.
And the smaller the list - the better.
# How to deal with exceptions for a global Greenwave policy
I think that the better way to treat exceptions is to make them explicit. We shouldn't try to identify exceptions based on their test results, but rather have a list of them predefined and stored somewhere near the global policy itself.
To justify: CI systems, test suites and test runs can be misconfigured. We can in theory disable a certain feature on a lower level by mistake and we need an independent source of truth to verify our results against it.
And it goes again back to Greenwave. It currently provides Remote Rule which allows reading additional testcases from a dist-git repo on a per project basis.
We would need a NotReallyRemoteRule which would allow overrides to global policy on a per project basis, using set of rules configured additionally on a Greenwave server itself or another centralized storage.
# ExecDB vs ResultsDB
This topic probably needs the thread on its own.
If I understood correctly, ExecDB is the database of test jobs, while ResultsDB is the database of test cases.
And we were recently discussing if it is possible to extend ResultsDB into a Results State Machine:
Imagine that we have several CI systems capable of running the same test case scenario. If we extend ResultsDB status field with PENDING and IN PROGRESS values we can use it as a task tracker.
1) When new artifact get's created, we check with Greenwave which test cases we need to gate it.
2) Then we create "(artifact_id, testcase_id, pending)" entry in ResultsDB for each of them.
3) CI system periodically checks if there is a "(artifact_id, testcase_id, pending)" entry in ResultsDB, which is not overriden by "(artifact_id, testcase_id, in progress)" result.
4) If it finds one, it triggers the testcase and sends the "(artifact_id, testcase_id, in progress)" message to ResultsDB.
This idea does go into direction of ExecDB execution tracker, but using (artifact,testcase) pair as a primary key.
Do you think ExecDB is a better place for it?
[1] https://pagure.io/fedora-ci/messages/blob/master/f/schemas/common.yaml#_106 [2] https://docs.pagure.org/greenwave/policies.html#passingtestcaserule
On Sun, 2018-12-09 at 16:19 +0100, Aleksandra Fedorova wrote:
Hi,
Anyone have any thoughts?
Let me try to summarize mine.
# The org.centos.prod.ci.pipeline.allpackages-build.package.ignored testcase
The problem I see here is that this testcase doesn't fit in any of designs for Greenwave, CI Messages and/or ResultsDB. Simply because "ignored" supposed to be the outcome of the test, not the testcase itself.
In CI Messages schema we define "not applicable" as a possible value for the status field [1]. This message supposed to mean
"We were trying to run a certain testcase named "<testcase_name>, but there was a certain logic in the test framework or pipeline which led us to skipping it".
So, as noted upthread, I have a problem with this: it is not actually always simple for the thing sending out the message to *know* this.
Let me explain the openQA case a bit more. It does not exactly have a simple "list of test cases". Rather...well, there are these things called "job templates", which mean something like "run this 'test suite' (approximately, a test case) when tests are requested for this 'flavor' (approximately, an image type, like 'Workstation live'), 'arch' and 'version' (which can be a wildcard).
So what our openQA scheduler code is actually doing here is approximately this. It takes an update and figures out what Fedora release it's for; that's the 'version'. It then decides which 'flavors' it is going to request openQA run the tests for (for updates, the 'flavors' are 'workstation', 'server', 'workstation-upgrade' and 'server-upgrade' - the upgrade tests are in separate flavors so they can be skipped when the update being tested is for the oldest currently-supported release, so we don't try and test upgrading from an EOL release). Then it says "hey, openQA, run the 'X' flavor tests for this update which is version 'Y'". (Currently we only run update tests on x86_64, but we'll likely add other arches at some point). It's then *openQA's* job to figure out what tests that actually means, and schedule jobs for each test.
The way the fedmsg stuff works is that, when openQA schedules a test (or starts running a test or completes a test, etc.), it sends out a message on an internal sort of message bus-y thing; I wrote a plugin which then sends out fedmsgs based on those internal messages. But of course, in this case, nothing *happens* in openQA itself, there is no event we can possibly send out a fedmsg in response to. And the scheduler only knows that it's not running tests for this or that 'flavor' and 'version' - it does not know, and cannot know, what actual tests *would have been run if it did*.
It is *possible* to solve this, I guess. My first thought about how to do that would be to actually add this feature to openQA. It would be a pretty weird API request - basically "Here is a request that looks like the one we send when we want you to run some tests. Now, we want you to explicitly **NOT** run these tests, then report exactly what it is that you didn't do". :P
Internally it'd just sort of hook into the job creation code, only it wouldn't actually do the step where it makes the created jobs 'real'; it'd just create the sort of 'prospective' jobs, send out internal events, and produce a response to the request, then just throw them away. It probably wouldn't actually be too hard to do, it'd just be a rather...odd thing to have.
This is the result of our testcase, which we want share with all consumers of the _testcase_, to let them do something about it. While "org.centos.prod.ci.pipeline.allpackages-build.package.ignored" is neither testcase nor a result. It is just an informational message on the Message Bus, and probably a redundant one, thus it shouldn't be in the ResultsDB at all.
So for me it is not really a question, rather a work item: we need to find out how to fix it, and fix it. Most likely by aligning the pipeline messages to CI Messages format.
This part seems fine, sure. Remember I was talking about ResultsDB results initially here, not fedmsgs, but I guess the results are being reported by something which listens to the fedmsgs and forwards them, or something like that?
# How to use non_applicable result in gating policy
Now given by the testcase result with "not_applicable" as an outcome, can we use it in Greenwave?
Afaik currently PassingTestCase rule [2] only checks if testcase has PASS status or waived. I think it needs to be adjusted and it needs to be configurable.
We need another rule, something like
ParametrizedTestCaseRule(testcase=TEST_CASE_NAME, outcomes=[PASS,
NOT_APPLICABLE, WAIVED])
or maybe
PassingOrSkippedTestCaseRule(testcase=TEST_CASE_NAME)
Or both.
This way we can use all outcomes in our decision making process.
Yes, this is approximately what I was imagining too.
# But should there be the "not applicable" outcome at all and should we treat it like PASS?
I think it should be possible to report results like that. There are certain use cases and extensive test suites where you can skip test results partially or temporarily, or based on certain parameters. And if certain CI system provides this kind of flexibility, it should be able to communicate it.
But I think there shouldn't be a "treat non_applicable like PASS" approach by default. We need to clearly identify those testcases where it makes sense and use it only for this limited subset.
Agreed. It seems entirely reasonable that we might want to write a rule which really *is* only satisfied on PASS, on the basis that that test shouldn't ever be not run in that particular situation, and if it *isn't* run, that means something is wrong.
And the smaller the list - the better.
# How to deal with exceptions for a global Greenwave policy
I think that the better way to treat exceptions is to make them explicit. We shouldn't try to identify exceptions based on their test results, but rather have a list of them predefined and stored somewhere near the global policy itself.
To justify: CI systems, test suites and test runs can be misconfigured. We can in theory disable a certain feature on a lower level by mistake and we need an independent source of truth to verify our results against it.
And it goes again back to Greenwave. It currently provides Remote Rule which allows reading additional testcases from a dist-git repo on a per project basis.
We would need a NotReallyRemoteRule which would allow overrides to global policy on a per project basis, using set of rules configured additionally on a Greenwave server itself or another centralized storage.
I'm honestly not quite sure what you're talking about here, sorry :) What's an 'exception' in this context? Are you talking about what WaiverDB does?
# ExecDB vs ResultsDB
This topic probably needs the thread on its own.
If I understood correctly, ExecDB is the database of test jobs, while ResultsDB is the database of test cases.
I don't think that's exactly it, no. ExecDB's description explains it fairly well:
"ExecDB is a database that stores the execution status of jobs running inside the Taskotron framework."
basically, it's just the bit of Taskotron where it keeps information like 'test X on item Y was scheduled', 'test X on item Y is running', 'test X on item Y is complete'.
ResultsDB is intended to be exactly what the name says: a database of test results. "Test X on item Y was run and the outcome was pass", "Test Z on item Y ran and the outcome was fail". That kinda thing. Of course such a thing winds up with a list of all the test cases for which results have been reported to it, but that's a sort of incidental detail: AFAIK, that's not one of its *intended purposes*, and it's not formally intended to be a 'source of truth' as regards what test cases "exist" in any given context, I don't think. It's really there to be: a database of test results.
And we were recently discussing if it is possible to extend ResultsDB into a Results State Machine:
Imagine that we have several CI systems capable of running the same test case scenario. If we extend ResultsDB status field with PENDING and IN PROGRESS values we can use it as a task tracker.
- When new artifact get's created, we check with Greenwave which test
cases we need to gate it.
- Then we create "(artifact_id, testcase_id, pending)" entry in
ResultsDB for each of them.
- CI system periodically checks if there is a "(artifact_id,
testcase_id, pending)" entry in ResultsDB, which is not overriden by "(artifact_id, testcase_id, in progress)" result.
A technical note on this part: ResultsDB doesn't really have any concept of results "overriding" each other, this is something that has to be done on the consumer side. All ResultsDB does is store results and let you access them. Of course, you *can* easily do this on the consumer side by just filtering to the latest results, or whatever - assuming your definition of what 'result' is 'current' is a straightforward one to implement...
- If it finds one, it triggers the testcase and sends the
"(artifact_id, testcase_id, in progress)" message to ResultsDB.
This idea does go into direction of ExecDB execution tracker, but using (artifact,testcase) pair as a primary key.
Do you think ExecDB is a better place for it?
My personal opinion is that this is possible but would be quite an abuse of the system, and it would be better to store this somewhere else. That's just not what it's for. But the most important opinion would I guess be Josef's, as he's much closer to this system than I am :) (And Tim's, of course, but he's still away). CCing Josef to make sure he's reading.
I haven't spent enough time to digest and understand everything that has been said here, but I can speak a bit about ResultsDB/ExecDB/etc.
The main reason for creating ExecDB was because we've had a ton of infra errors as "results" in ResultsDB, and that bothered us. We initially dumped everything into ResultsDB, so when e.g. a VM client couldn't be created, or dnf repos couldn't be reached for installing prerequisite packages, we posted it as ERROR/CRASHED/etc results in ResultsDB. It was the easiest way to access error logs, etc. After some time, our database was drowning in errors, which brought performance issues, readability issues (when trying to navigate the results in the web UI), and overall we thought these issues should be separated - execution status vs actual test results. Our tools and our infra reliability slowly improved, and we moved the execution tracking to ExecDB. So the only time our Taskotron tasks create a result in ResultsDB is when the execution proceeds smoothly and the test creates a proper results file (the test can opt to not create it, then no results is reported). But every time there's an entry in ExecDB. That brought a bit more sanity into test results management, in our view. However, please note that our ExecDB is quite bare-bones, for example its web UI doesn't have a search functionality, and you have to know the UUID and how to construct the URL in order to access the necessary details. It's far from being easy to use for anyone but core contributors. Our goal was for package maintainers to be able to search in it and figure out why there's no results in ResultsDB for their package/test. But that never happened.
But infra and similar errors was not what is discussed here. IIUIC you're talking "ignored" or "nothing to report" results (the first one might be known to the scheduler, the second one might be discovered by the test). And we actually submit those to ResultsDB too, even from Taskotron tasks. As an example, see abicheck results: https://taskotron.fedoraproject.org/resultsdb/results?testcases=dist.abichec... Anything that has "no binary RPMs" or "no publicly exported ABI" in Note, that's basically a "nothing to report" result. Either it was not a C program, or it didn't have public libraries. In both cases there's no need to run abicheck on them. Another example are python-versions results: https://taskotron.fedoraproject.org/resultsdb/results?testcases=dist.python-... We run those on all packages, and it automatically gives PASSED for anything that doesn't contain Python files.
We could be simply not sending those results to ResultsDB (and even better, we could avoid executing those tests at all), but we'd have to make sure the package maintainers are provided with this information (that this testcase is "automatically passing" for your package) and that the gating arbiter is also aware. So of course we chose the "horribly inefficient but simple to implement" way and submit everything. There are optimizations that can be made, for example decide whether to run C or Python tests based on rpm filelist or rpm requires. Such code should ideally be a library that is then shared between the test system scheduler, the gating arbiter and optionally any user oriented UI (like Bodhi). This would also introduce more points of failure, because you'd (likely) depend on additional remote services (like Koji). That's why I'm not surprised if Fedora CI sends "ignored, therefore passed" results for packages which don't have a test suite in distgit. It's the easiest solution.
Of course your proposed case with openqa's 'desktop terminal' testcase is even more problematic, because it's not run every time (as opposed to previous examples). It's going to be interesting to figure out a way to handle this in the gating process in a robust fashion, and I'm not going to claim I have good answers for that. But I'm a firm believer that "let's wait X minutes and then consider it passed" is something we definitely shouldn't do :)
It is *possible* to solve this, I guess. My first thought about how to
do that would be to actually add this feature to openQA. It would be a pretty weird API request - basically "Here is a request that looks like the one we send when we want you to run some tests. Now, we want you to explicitly **NOT** run these tests, then report exactly what it is that you didn't do". :P
Internally it'd just sort of hook into the job creation code, only it wouldn't actually do the step where it makes the created jobs 'real'; it'd just create the sort of 'prospective' jobs, send out internal events, and produce a response to the request, then just throw them away. It probably wouldn't actually be too hard to do, it'd just be a rather...odd thing to have.
So how is this different from say `ansible-playbook --list-tasks`? Or anything with --dry-run? I think it's very reasonable to be able to ask the scheduler "what jobs would you schedule with these input arguments?". It could be one way how to make greenwave aware of which tests to require for a specific package/compose. It's somewhat inflexible because it requires greenwave calling into openqa and rely on openqa code execution, but it's better than nothing. A better approach would be to have openqa scheduler as a standalone tool which consumes some configuration and you can easily run it locally, but that might be a serious engineering effort.
As a side note, this is exactly why we scrapped AutoQA and started Taskotron. The AutoQA scheduler has been fully programmable, Turing-complete, and each task could decide whether to run or not based on passed arguments. It has been a massive pain to tell in advance what's going to run, and any small code error could send the whole thing tumbling down. We separated the scheduler into a standalone project (taskotron-trigger) and made the configuration yaml-based and much less powerful. But it's now much easier to see what's running when just in a glance, and the internal logic could be used as a library in a different project.
On 12/5/18 3:06 PM, Johnny Bieren wrote:
On Wed, Dec 5, 2018 at 2:38 PM Adam Williamson <adamwill@fedoraproject.org mailto:adamwill@fedoraproject.org> wrote:
I had reason today to go poking through the ResultsDB fedmsg history: https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb doing that makes it apparent that the pipeline is kinda flooding ResultsDB with "test results" that seem to be nothing more than "the pipeline decided not to test this package". Something like 99% of pipeline results appear to be of this type - either "ci.pipeline.allpackages-build.package.ignored" or "ci.pipeline.package.ignore". There've been 20 ci.pipeline.allpackages- build.package.ignored results filed in just the last 7 minutes, for instance, and it seems like there's a more or less constant stream of them. Are these results really necessary or useful? ResultsDB is built to handle the load, I think, and my use case of poking through fedmsg history looking for messages from the pipeline *actually doing something* is not a super common one, but still, if this isn't really necessary, it seems like it might be a good idea to just not do it. (Also, on the topic of why there are two topics - is one from the new pipeline and one from the old? Do we still need the old pipeline running?)
I'll let Bruno or Miroslav speak to whether we can turn off the .ignore messages, but to answer your last question, yes, there are two topics because one is the new pipeline and one is the old (atomic) pipeline. The old (atomic) pipeline has not been updated for anything past Fedora 27, so it may be the case that the old one can stop running.
Fedora 27 is end of life, so I think that would be the right thing to do.
Dne 05. 12. 18 v 20:37 Adam Williamson napsal(a):
I had reason today to go poking through the ResultsDB fedmsg history:
https://apps.fedoraproject.org/datagrepper/raw?category=resultsdb
doing that makes it apparent that the pipeline is kinda flooding ResultsDB with "test results" that seem to be nothing more than "the pipeline decided not to test this package". Something like 99% of pipeline results appear to be of this type - either "ci.pipeline.allpackages-build.package.ignored" or "ci.pipeline.package.ignore". There've been 20 ci.pipeline.allpackages- build.package.ignored results filed in just the last 7 minutes, for instance, and it seems like there's a more or less constant stream of them.
Are these results really necessary or useful? ResultsDB is built to handle the load, I think, and my use case of poking through fedmsg history looking for messages from the pipeline *actually doing something* is not a super common one, but still, if this isn't really necessary, it seems like it might be a good idea to just not do it.
(Also, on the topic of why there are two topics - is one from the new pipeline and one from the old? Do we still need the old pipeline running?)
Thanks folks!
I guess these ResultDB entries are similar to messages with "fedmsg notification" subject sent. The subject itself is hilarious and the content as well.
Vít