We started getting some failed builds today due to koji not being able to get git checkouts from pkgs01.
$ git clone -n git://pkgs.fedoraproject.org/gnome-shell /var/lib/mock/f17-build-1342530-216385/root/tmp/scmroot/gnome-shell fatal: read error: Connection reset by peer Cloning into /var/lib/mock/f17-build-1342530-216385/root/tmp/scmroot/gnome-shell...
and
$ git clone -n git://pkgs.fedoraproject.org/digikam /var/lib/mock/f17-build-1342529-216385/root/tmp/scmroot/digikam fatal: read error: Connection reset by peer Cloning into /var/lib/mock/f17-build-1342529-216385/root/tmp/scmroot/digikam...
There were some IP's hitting pkgs01 pretty hard on checkouts, but that turns out to not be the real issue. There were a number of old stale connections laying around, making it so it hit xinetd limits much faster than normal.
I killed 46 old git upload-pack processes and 51 old stale ssh connections that were all from Feb, then restarted xinetd. This seemed to clear things up.
We may want to look at a automated script to clean up these processes?
kevin
Maybe a scripts that kill every SSH script older than 3 day or so ?
Regards Luciano
2012/5/15 Kevin Fenzi kevin@scrye.com
We started getting some failed builds today due to koji not being able to get git checkouts from pkgs01.
$ git clone -n git://pkgs.fedoraproject.org/gnome-shell/var/lib/mock/f17-build-1342530-216385/root/tmp/scmroot/gnome-shell fatal: read error: Connection reset by peer Cloning into /var/lib/mock/f17-build-1342530-216385/root/tmp/scmroot/gnome-shell...
and
$ git clone -n git://pkgs.fedoraproject.org/digikam/var/lib/mock/f17-build-1342529-216385/root/tmp/scmroot/digikam fatal: read error: Connection reset by peer Cloning into /var/lib/mock/f17-build-1342529-216385/root/tmp/scmroot/digikam...
There were some IP's hitting pkgs01 pretty hard on checkouts, but that turns out to not be the real issue. There were a number of old stale connections laying around, making it so it hit xinetd limits much faster than normal.
I killed 46 old git upload-pack processes and 51 old stale ssh connections that were all from Feb, then restarted xinetd. This seemed to clear things up.
We may want to look at a automated script to clean up these processes?
kevin
infrastructure mailing list infrastructure@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/infrastructure
On Tue, 15 May 2012 17:38:39 -0300 Luciano Facchinelli facchinelli.luciano@gmail.com wrote:
Maybe a scripts that kill every SSH script older than 3 day or so ?
Yeah, for the ssh side... the git upload-pack processes are launched by xinetd, so they would slightly different.
kevin
Hi,
On Tue, 2012-05-15 at 14:12 -0600, Kevin Fenzi wrote:
We started getting some failed builds today due to koji not being able to get git checkouts from pkgs01.
$ git clone -n git://pkgs.fedoraproject.org/gnome-shell /var/lib/mock/f17-build-1342530-216385/root/tmp/scmroot/gnome-shell fatal: read error: Connection reset by peer Cloning into /var/lib/mock/f17-build-1342530-216385/root/tmp/scmroot/gnome-shell...
and
$ git clone -n git://pkgs.fedoraproject.org/digikam /var/lib/mock/f17-build-1342529-216385/root/tmp/scmroot/digikam fatal: read error: Connection reset by peer Cloning into /var/lib/mock/f17-build-1342529-216385/root/tmp/scmroot/digikam...
There were some IP's hitting pkgs01 pretty hard on checkouts,
Was one of them coming from Hong Kong?
If so, please accept my apologies.
My company produces an EL6-based distro with some packages taken from Fedora (where we need more up-to-date stuff), so yesterday I was testing a script to automate the process of comparing the version of the packages in our Git with the ones in the Fedora Git.
The script goes as follows: for each module in our git: clone the module get the evr add a git remote pointing to the Fedora git fetch that remote switch to the fedora branch we are based on get the evr
So I've been doing quite a lot of fetching from the Fedora Git yesterday.
I was actually wondering whether I had any significant impact and whether there was a test environment I could hit instead of the production one, but when I arrived at work this morning I found this thread before asking.
So... Is there such an environment? Or is it just a bad idea for me to hit the Fedora git in this way, and I should find another way to check for updates?
but that turns out to not be the real issue. There were a number of old stale connections laying around, making it so it hit xinetd limits much faster than normal.
I killed 46 old git upload-pack processes and 51 old stale ssh connections that were all from Feb, then restarted xinetd. This seemed to clear things up.
I haven't done any upload yesterday (only fetches by the aforementioned script), and I clone anonymously (to avoid the SSH overhead, not to pretend it's not me), so I'm not responsible for these ones.
But again, sorry for the trouble I may have caused.
I'll pause my working on this script for now, let me know if what I'm doing is ok for Fedora and if I can resume it.
Thanks,
On Wed, 16 May 2012 11:07:07 +0800 Mathieu Bridon bochecha@fedoraproject.org wrote:
Was one of them coming from Hong Kong?
Not sure. ;)
If so, please accept my apologies.
I don't think it was you. (Or any specific ip). :)
The problem was mainly that there were a bunch of stuck git processes, that seemed to be counted against the limits in xinetd. It has a '50 connections per second and if you hit that, disable for 10 seconds' setting. I think all the stale connections were counting against the 50 connection limit and making it hit it all the time...
...snip...
So... Is there such an environment? Or is it just a bad idea for me to hit the Fedora git in this way, and I should find another way to check for updates?
No, there's not... but I don't think your script caused this. ;)
I haven't done any upload yesterday (only fetches by the aforementioned script), and I clone anonymously (to avoid the SSH overhead, not to pretend it's not me), so I'm not responsible for these ones.
git upload-pack is the anon ones via xinetd. Thats it sending data out to the connection.
But again, sorry for the trouble I may have caused.
I'll pause my working on this script for now, let me know if what I'm doing is ok for Fedora and if I can resume it.
I think it's fine to resume... and we can ping you if we see problems with it.
kevin
infrastructure@lists.fedoraproject.org