On Thu, Jun 04, 2020 at 07:02:41PM -0700, Kevin Fenzi wrote:
On Mon, Jun 01, 2020 at 04:18:35PM +0200, Adrian Reber wrote:
Our MirrorManager setup exports the current state of all mirrors every hour at :30 to a protobuf based file which is then used by the mirrorlist servers to answer the requests from yum and dnf.
The Python script requires up to 10GB of memory and takes between 35 and 50 minutes. The script does a lot of SQL queries and also some really big SQL queries joining up to 6 large MirrorManager tables.
I have rewritten this Python script in Rust and now it only needs around 1 minute instead of 35 to 50 minutes and only 600MB instead of 10GB.
Wow. nice!
I think the biggest difference is that I am almost not doing any joins in my SQL request. I download all the tables once and then I do a lot of loops over the downloaded tables and this seems to be massively faster.
As the mirrorlist-server in Rust has proven to be extremely stable over the last months we have been using it I would also like to replace the mirrorlist protbuf input generation with my new Rust based code.
I am planing to try out the new protobuf file in staging in the next days and would then try to get my new protobuf generation program into Fedora. Once it is packaged I would discuss here how and if we want to deploy in Fedora's infrastructure.
Cool. You will need to hurry as staging goes off on monday, and back in a few weeks. :)
Then I just have to wait a bit. No problem.
Having the possibility to generate the mirrorlist input data in about a minute would significantly reduce the load on the database server and enable us to react much faster if broken protobuf data has been synced to the mirrorlist servers on the proxies.
Yeah, and I wonder if it would let us revisit the entire sequence from 'update push finished' to updated mirrorlist server.
Probably. As the new code will not run on the current RHEL 7 based mm-backend01 would it make sense to run a short running service like this on Fedora's OpenShift? We could also create a new read-only (SELECT only) database account for this.
Adrian