Hello,
I have just opened a Request for Resource ticket (#4562) for Koschei and I would like it to eventually become an official Fedora service.
Please let me know what you think about my proposal. I'm happy to answer any questions and provide more information.
[1] http://fedoraproject.org/wiki/Request_For_Resources
On Fri, 10 Oct 2014 12:00:06 +0200 Mikolaj Izdebski mizdebsk@redhat.com wrote:
Hello,
I have just opened a Request for Resource ticket (#4562) for Koschei and I would like it to eventually become an official Fedora service.
Great. I've added some pointers there to our request for resources documents, but it looks like you already found them. ;)
Please let me know what you think about my proposal. I'm happy to answer any questions and provide more information.
ok, some general questions, please excuse me if they are dumb. ;)
high level:
* How well does it keep up currently? I know you are careful not to overload koji, but I wonder if that means things like perl builds are often behind because there are so many of them?
* right now the service is opt-in right? Someone adds a group and packages in that group and then when one of them changes it scratch rebuilds the rest. Do you see a time/case when we could just make it operate on all builds? that is, build foo is made, and it just does all the things that buildrequire foo?
* The notifications of failed builds currently are via fedmsg? We should investigate adding this to FMN if it's not already there, so anyone interested could be notified via that.
todo's/ideas:
* Could this ever be a koji plugin? Or does it do too much on top of that to ever be a plugin?
* Might it be possible to run on all the broken deps packages in rawhide/branched? This would depend I guess on the compose process generating fedmsgs with those package names, but if so it could tell maintainers "hey, your package is broken in rawhide, but a simple rebuild will fix it" (or any other group that just wants to go fix them).
* boost is another group of packages I could see this being useful for. Perhaps it would be worth reaching out to the boost maintainers?
* Could this be used to scratch build packages that are ExcludeArch/ExclusiveArch with that removed? ie, to tell maintainers, "hey, you exclude arm, but it builds ok, are you sure thats fine?"
technical:
* Can this application be load balanced any? Ie, if we have two of them could they operate against the same db at the same time?
* Are there any common sysadmin tasks we need to know about with the instance? Is there any special process to start/stop/reinstall it?
* When there is koji maint, should we stop this service? How do we gracefully do that and start it again?
Thats all I can think of right now. :)
kevin
On Wed, Oct 15, 2014 at 01:31:57PM -0600, Kevin Fenzi wrote:
- How well does it keep up currently? I know you are careful not to overload koji, but I wonder if that means things like perl builds are often behind because there are so many of them?
It would be great if there was a way to quantify and monitor this during runtime with both nagios and collectd.
- The notifications of failed builds currently are via fedmsg? We should investigate adding this to FMN if it's not already there, so anyone interested could be notified via that.
The Koschei team submitted patches to FMN last week, so it should be already ready. :)
- Are there any common sysadmin tasks we need to know about with the instance? Is there any special process to start/stop/reinstall it?
I got to check it out on the cloud node and it is 4 systemd-managed services. It was pretty straight forward to control for whatever short amount of time I was trying to do so.
On 10/15/2014 10:10 PM, Ralph Bean wrote:
On Wed, Oct 15, 2014 at 01:31:57PM -0600, Kevin Fenzi wrote:
- How well does it keep up currently? I know you are careful not to overload koji, but I wonder if that means things like perl builds are often behind because there are so many of them?
It would be great if there was a way to quantify and monitor this during runtime with both nagios and collectd.
I must admit I'm not familliar with nagios or collectd, but we may look into this if that's needed.
On 10/15/2014 09:31 PM, Kevin Fenzi wrote:
ok, some general questions, please excuse me if they are dumb. ;)
high level:
- How well does it keep up currently? I know you are careful not to overload koji, but I wonder if that means things like perl builds are often behind because there are so many of them?
Koji has more than enough resources to sustain current Koschei load (~3000 packages). Storage might become problematic if more packages are added (scratch build results are kept for some period of time), but we have a solution for that (see [2] or [3]). If more Koji builders are ever need then I think it sould be quite easy to add them, as long as there is budget for that.
- right now the service is opt-in right? Someone adds a group and packages in that group and then when one of them changes it scratch rebuilds the rest. Do you see a time/case when we could just make it operate on all builds? that is, build foo is made, and it just does all the things that buildrequire foo?
For now only some subset of all packages is tracked by Koschei, but the ultimate goal is to track all packages - they would be added automatically after first build appears on Koji and removed when they are blocked. What would be up to individuals is maintainig package groups. (One package can be in any number of groups.)
- The notifications of failed builds currently are via fedmsg? We should investigate adding this to FMN if it's not already there, so anyone interested could be notified via that.
fedmsg publishing is already operational as can be seen on [1]. FMN rule has been recently added. The new FMN is not yet in production, but in (hopefully near) future users will be able to enable email or IRC notifications for buildability status of packages they are interested in.
todo's/ideas:
- Could this ever be a koji plugin? Or does it do too much on top of that to ever be a plugin?
Koschei has its own architecture and converting it to Koji plugin would require substantial amount of work. In other words, it should be possible, but I don't see any good reason to do so.
- Might it be possible to run on all the broken deps packages in rawhide/branched? This would depend I guess on the compose process generating fedmsgs with those package names, but if so it could tell maintainers "hey, your package is broken in rawhide, but a simple rebuild will fix it" (or any other group that just wants to go fix them).
This is an interesting idea.
A simillar feature was planned for future. The idea was that Koschei could be resolving runtime dependencies of all packages besides just build dependencies. Users would be able to see whether package is installable and if yes, see its installation size with dependencies (in terms of MB to download, MB installed size and package count). There would be graphs showing how this changes in time. (We had a simillar POC service runnig for a few months, see [4].)
We could extend this and make Koschei resolve runtime dependencies of successful scratch builds it runs. In case scratch build would fix broken package in offcial repo, a fedmsg would be emited.
- boost is another group of packages I could see this being useful for. Perhaps it woul<d be worth reaching out to the boost maintainers?
I don't know specifics of boost packages, but we'll cosider any feature request.
- Could this be used to scratch build packages that are ExcludeArch/ExclusiveArch with that removed? ie, to tell maintainers, "hey, you exclude arm, but it builds ok, are you sure thats fine?"
This would require generating a new SRPM with ExcludeArch/ExclusiveArch removed, which requires installing all build dependencies, so it should be done by Koji as buildSRPMfromSCM task. This in turn requires Koschei having ability to push to some branch in SCM or maintaining separate git repo and changing Koji policy to allow scratch builds from it. And of course this would have to be implemented in Koschei. Not impossible, but looks like a lot of work for something that could be done manually by running some script from time to time.
technical:
- Can this application be load balanced any? Ie, if we have two of them could they operate against the same db at the same time?
To answer this question I need to elaborate more about Koschei architecture. tl;dr yes, it can be load balanced well.
Koschei conisits of four systemd services, WSGI webapp and database. Separate services don't need to communicate with each other - they just need access to database and services they integrate with (like Koji or fedmsg). They can be on separate machines and there can be muiltiple instances of some of them running concurrently.
scheduler - schedules scratch builds and submits them to Koji. Theoretically there could be many schedulers running concurrently, but this is not needed as a single scheduler should be able to handle many thousands of packages easily.
watcher - listens to fedmsg and updates database accordingly. It makes sense to have only one watcher.
polling - periodically asks Koji about statuses of runnig scratch builds and package statuses (this is fallback mechanism necessary in case fedmsg message delivery fails). Only one polling service makes sense as this is only fallback methanism and can be ran every hour or even less often.
resolver - resolves build dependencies of all packages when new repo is generated. Dep resolution is a CPU intensive task. Depending on number of packages tracked this may take up to a few hours of CPU time (estimate made for 100,000 pkgs, I'm thinking about future here). Resolver service can be configured to use multiple threads and it should scale linearly.
reporter (WSGI webapp running in Apache httpd with mod_wsgi) - provides web UI for users. There can be muiltiple webapps runnig behing some HTTP balancer if needed.
Database itself can be load balanced too (we are using PostgreSQL).
To sum up, all components either don't need load balancing or can be load-balanced.
- Are there any common sysadmin tasks we need to know about with the instance? Is there any special process to start/stop/reinstall it?
Installing Koschei is done by installing RPM package from Fedora or EPEL repositories and coping a single config file. Managing all services (starting, stopping, viewing logs etc.) is done using standard system tools (systemctl, journalctl and so on). There is nothing special to be done besides standard sysadmin stuff (updating packages, viewing logs, backing up database and so on).
- When there is koji maint, should we stop this service? How do we gracefully do that and start it again?
In this case you can stop Koschei services that communicate with Koji (using systemctl stop koschei-<servicename>) and start them when maintenance is over. Web UI will remain functional during that time, but there will be no new build scheduled.
Thats all I can think of right now. :)
I hope this pretty long email answers your questions.
[1] https://apps.fedoraproject.org/datagrepper/raw?topic=org.fedoraproject.prod.... [2] https://bugzilla.redhat.com/show_bug.cgi?id=1130233 [3] https://fedorahosted.org/koji/ticket/284 [4] https://sochotni.fedorapeople.org/min_install/main.html
On Thu, 16 Oct 2014 11:14:01 +0200 Mikolaj Izdebski mizdebsk@redhat.com wrote:
On 10/15/2014 09:31 PM, Kevin Fenzi wrote:
ok, some general questions, please excuse me if they are dumb. ;)
high level:
- How well does it keep up currently? I know you are careful not to overload koji, but I wonder if that means things like perl builds
are often behind because there are so many of them?
Koji has more than enough resources to sustain current Koschei load (~3000 packages). Storage might become problematic if more packages are added (scratch build results are kept for some period of time), but we have a solution for that (see [2] or [3]). If more Koji builders are ever need then I think it sould be quite easy to add them, as long as there is budget for that.
Great. I wasn't sure if it was keeping up or not. ;) Interesting idea on the immediate purge.
- right now the service is opt-in right? Someone adds a group and packages in that group and then when one of them changes it
scratch rebuilds the rest. Do you see a time/case when we could just make it operate on all builds? that is, build foo is made, and it just does all the things that buildrequire foo?
For now only some subset of all packages is tracked by Koschei, but the ultimate goal is to track all packages - they would be added automatically after first build appears on Koji and removed when they are blocked. What would be up to individuals is maintainig package groups. (One package can be in any number of groups.)
ok. Groups are only managed manually by maintainers?
And deps are then checked when something updates in a group and the rest of the group rebuilt? or can you explain when a scratch build is fired?
- The notifications of failed builds currently are via fedmsg? We should investigate adding this to FMN if it's not already there,
so anyone interested could be notified via that.
fedmsg publishing is already operational as can be seen on [1]. FMN rule has been recently added. The new FMN is not yet in production, but in (hopefully near) future users will be able to enable email or IRC notifications for buildability status of packages they are interested in.
Great!
todo's/ideas:
- Could this ever be a koji plugin? Or does it do too much on top of that to ever be a plugin?
Koschei has its own architecture and converting it to Koji plugin would require substantial amount of work. In other words, it should be possible, but I don't see any good reason to do so.
Fair enough.
- Might it be possible to run on all the broken deps packages in rawhide/branched? This would depend I guess on the compose process generating fedmsgs with those package names, but if so it could
tell maintainers "hey, your package is broken in rawhide, but a simple rebuild will fix it" (or any other group that just wants to go fix them).
This is an interesting idea.
A simillar feature was planned for future. The idea was that Koschei could be resolving runtime dependencies of all packages besides just build dependencies. Users would be able to see whether package is installable and if yes, see its installation size with dependencies (in terms of MB to download, MB installed size and package count). There would be graphs showing how this changes in time. (We had a simillar POC service runnig for a few months, see [4].)
We could extend this and make Koschei resolve runtime dependencies of successful scratch builds it runs. In case scratch build would fix broken package in offcial repo, a fedmsg would be emited.
Yeah, something to consider...
- boost is another group of packages I could see this being useful
for. Perhaps it woul<d be worth reaching out to the boost maintainers?
I don't know specifics of boost packages, but we'll cosider any feature request.
ok. Boost often updates once a cycle or so, and lots of dependent packages need rebuilding. If we could see which of those fail it could be helpfull. But this is up to boost maintainers I suppose.
- Could this be used to scratch build packages that are ExcludeArch/ExclusiveArch with that removed? ie, to tell
maintainers, "hey, you exclude arm, but it builds ok, are you sure thats fine?"
This would require generating a new SRPM with ExcludeArch/ExclusiveArch removed, which requires installing all build dependencies, so it should be done by Koji as buildSRPMfromSCM task. This in turn requires Koschei having ability to push to some branch in SCM or maintaining separate git repo and changing Koji policy to allow scratch builds from it. And of course this would have to be implemented in Koschei. Not impossible, but looks like a lot of work for something that could be done manually by running some script from time to time.
Yeah, agreed.
technical:
- Can this application be load balanced any? Ie, if we have two of
them could they operate against the same db at the same time?
To answer this question I need to elaborate more about Koschei architecture. tl;dr yes, it can be load balanced well.
...snip...
To sum up, all components either don't need load balancing or can be load-balanced.
ok. And I suspect this service would be ok just being one instance as well, since it's not critical. ie, if we had to update/reboot it we could just do so and it would go on, no need to keep everything up 100%?
- Are there any common sysadmin tasks we need to know about with the instance? Is there any special process to start/stop/reinstall
it?
Installing Koschei is done by installing RPM package from Fedora or EPEL repositories and coping a single config file. Managing all services (starting, stopping, viewing logs etc.) is done using standard system tools (systemctl, journalctl and so on). There is nothing special to be done besides standard sysadmin stuff (updating packages, viewing logs, backing up database and so on).
Excellent.
- When there is koji maint, should we stop this service? How do we gracefully do that and start it again?
In this case you can stop Koschei services that communicate with Koji (using systemctl stop koschei-<servicename>) and start them when maintenance is over. Web UI will remain functional during that time, but there will be no new build scheduled.
Thats all I can think of right now. :)
I hope this pretty long email answers your questions.
It does indeed. Thanks for taking the time to do so. :)
kevin
----- Original Message -----
From: "Kevin Fenzi" kevin@scrye.com To: infrastructure@lists.fedoraproject.org Sent: Thursday, 16 October, 2014 5:51:19 PM Subject: Re: [RFR #4562] Koschei - continuous integration in Koji
On Thu, 16 Oct 2014 11:14:01 +0200 Mikolaj Izdebski mizdebsk@redhat.com wrote:
On 10/15/2014 09:31 PM, Kevin Fenzi wrote:
ok, some general questions, please excuse me if they are dumb. ;)
high level:
- How well does it keep up currently? I know you are careful not to overload koji, but I wonder if that means things like perl builds
are often behind because there are so many of them?
Koji has more than enough resources to sustain current Koschei load (~3000 packages). Storage might become problematic if more packages are added (scratch build results are kept for some period of time), but we have a solution for that (see [2] or [3]). If more Koji builders are ever need then I think it sould be quite easy to add them, as long as there is budget for that.
Great. I wasn't sure if it was keeping up or not. ;) Interesting idea on the immediate purge.
- right now the service is opt-in right? Someone adds a group and packages in that group and then when one of them changes it
scratch rebuilds the rest. Do you see a time/case when we could just make it operate on all builds? that is, build foo is made, and it just does all the things that buildrequire foo?
For now only some subset of all packages is tracked by Koschei, but the ultimate goal is to track all packages - they would be added automatically after first build appears on Koji and removed when they are blocked. What would be up to individuals is maintainig package groups. (One package can be in any number of groups.)
ok. Groups are only managed manually by maintainers?
Currently yes. There is a web interface for modifying them by users, but it's not enabled in production, because we haven't decided who should have the permission. If we allow everyone to manage groups, it might become quite chaotic. My idea was to have global groups representing language stacks and other types of "natural" grouping which will be managed by corresponding SIGs and then have groups created by regular users which will be namespaced with their account name and displayed separately in the UI. What do you think about that?
And deps are then checked when something updates in a group and the rest of the group rebuilt? or can you explain when a scratch build is fired?
No, groups are just convenience for displaying packages and filtering messages, but they don't affect scheduling. Packages are rebuilt when their build dependencies (also transitive) change, regardless of their groups. So when there is a package foo which depends on bar, when bar is updated, we detect it after Koji repo is generated and raise foo's priority. Then we submit scratch-builds for packages with priority higher than X, ordered by the priority, so packages with most updates will be scheduled first. X is configurable and currently set that one direct dependency update is enough to trigger a scratch-build. Priority is also slowly raised over time.
- The notifications of failed builds currently are via fedmsg? We should investigate adding this to FMN if it's not already there,
so anyone interested could be notified via that.
fedmsg publishing is already operational as can be seen on [1]. FMN rule has been recently added. The new FMN is not yet in production, but in (hopefully near) future users will be able to enable email or IRC notifications for buildability status of packages they are interested in.
Great!
todo's/ideas:
- Could this ever be a koji plugin? Or does it do too much on top of that to ever be a plugin?
Koschei has its own architecture and converting it to Koji plugin would require substantial amount of work. In other words, it should be possible, but I don't see any good reason to do so.
Fair enough.
- Might it be possible to run on all the broken deps packages in rawhide/branched? This would depend I guess on the compose process generating fedmsgs with those package names, but if so it could
tell maintainers "hey, your package is broken in rawhide, but a simple rebuild will fix it" (or any other group that just wants to go fix them).
This is an interesting idea.
A simillar feature was planned for future. The idea was that Koschei could be resolving runtime dependencies of all packages besides just build dependencies. Users would be able to see whether package is installable and if yes, see its installation size with dependencies (in terms of MB to download, MB installed size and package count). There would be graphs showing how this changes in time. (We had a simillar POC service runnig for a few months, see [4].)
We could extend this and make Koschei resolve runtime dependencies of successful scratch builds it runs. In case scratch build would fix broken package in offcial repo, a fedmsg would be emited.
Yeah, something to consider...
- boost is another group of packages I could see this being useful
for. Perhaps it woul<d be worth reaching out to the boost maintainers?
I don't know specifics of boost packages, but we'll cosider any feature request.
ok. Boost often updates once a cycle or so, and lots of dependent packages need rebuilding. If we could see which of those fail it could be helpfull. But this is up to boost maintainers I suppose.
- Could this be used to scratch build packages that are ExcludeArch/ExclusiveArch with that removed? ie, to tell
maintainers, "hey, you exclude arm, but it builds ok, are you sure thats fine?"
This would require generating a new SRPM with ExcludeArch/ExclusiveArch removed, which requires installing all build dependencies, so it should be done by Koji as buildSRPMfromSCM task. This in turn requires Koschei having ability to push to some branch in SCM or maintaining separate git repo and changing Koji policy to allow scratch builds from it. And of course this would have to be implemented in Koschei. Not impossible, but looks like a lot of work for something that could be done manually by running some script from time to time.
Yeah, agreed.
technical:
- Can this application be load balanced any? Ie, if we have two of
them could they operate against the same db at the same time?
To answer this question I need to elaborate more about Koschei architecture. tl;dr yes, it can be load balanced well.
...snip...
To sum up, all components either don't need load balancing or can be load-balanced.
ok. And I suspect this service would be ok just being one instance as well, since it's not critical. ie, if we had to update/reboot it we could just do so and it would go on, no need to keep everything up 100%?
Yes, stopping the resolver will just delay builds a bit. So even if we stop it for few hours, nothing bad will happen.
- Are there any common sysadmin tasks we need to know about with the instance? Is there any special process to start/stop/reinstall
it?
Installing Koschei is done by installing RPM package from Fedora or EPEL repositories and coping a single config file. Managing all services (starting, stopping, viewing logs etc.) is done using standard system tools (systemctl, journalctl and so on). There is nothing special to be done besides standard sysadmin stuff (updating packages, viewing logs, backing up database and so on).
Excellent.
- When there is koji maint, should we stop this service? How do we gracefully do that and start it again?
In this case you can stop Koschei services that communicate with Koji (using systemctl stop koschei-<servicename>) and start them when maintenance is over. Web UI will remain functional during that time, but there will be no new build scheduled.
Thats all I can think of right now. :)
I hope this pretty long email answers your questions.
It does indeed. Thanks for taking the time to do so. :)
kevin
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
Michael Simacek
On 10/16/2014 05:51 PM, Kevin Fenzi wrote:
And deps are then checked when something updates in a group and the rest of the group rebuilt? or can you explain when a scratch build is fired?
Strictly speaking scratch build is scheduled when all the following conditions are met: * package has the highest priority from all packages known to Koschei, * priority is above some threshold (configurable), * Koji load is below some threshold (configurable, 50 % ATM), * number of currently running Koji taks started by Koschei is below some threshold (also configurable, currently 50 concurrent tasks).
Package priority depends on several fatcors (time since last rebuild, dependency changes, package state, package importance and more). For more information about priority see [1].
From higher-level perspective, with current settings package will be
rebuilt when: * at least build-dependency changes, or * one week passes since last successful build or scratch build, or * someone (currently only admin) requests immediate rebuild.
[1] https://fedoraproject.org/wiki/Koschei#Priority
infrastructure@lists.fedoraproject.org