Hi,
For about a year now we have a bunch of project on github. With it, we benefit from its UI and easy access for new contributors. But github is proprietary so not so in line with our spirit. Although git is decentralized I think we should keep a clone of our projects on fedorahosted, just for the shake of saying, our infra does not depend on github and if github goes down tomorrow, one can still access the sources.
With that I mind, I created the ticket #4212 [1], the idea is to run a cron job that would clone/pull all the projects under the fedora-infra group of github.
Souradeep has worked on it and we have a first cron that we can run and that will update or clone all repos from github.
However, currently it clones the repo w/o using --mirror, so the repo cloned are not bare repo. Meaning if we place them under /srv/git they will seat next to the bare repo we create (for example there would be a /srv/git/fedocal (clone from github) and a /srv/git/fedocal.git (canonical git repo)). They migth support git clone ssh://... but I have no idea how cgit would handle them.
I see two solutions: * we use the current script: - place them in /srv/git/ or somewhere else? - see to place them somewhere where someone can clone them - see how cgit handles them * adjust the script - if the repo does not exists: git clone --mirrot - else: - check remote - if not github in remotes: add new remote github - update the clone: git update remote
Do you have another idea/suggestion?
Thanks, Pierre
[1] https://fedorahosted.org/fedora-infrastructure/ticket/4212
On Thu, 20 Feb 2014 12:33:39 +0100 Pierre-Yves Chibon pingou@pingoured.fr wrote:
Hi,
For about a year now we have a bunch of project on github. With it, we benefit from its UI and easy access for new contributors. But github is proprietary so not so in line with our spirit. Although git is decentralized I think we should keep a clone of our projects on fedorahosted, just for the shake of saying, our infra does not depend on github and if github goes down tomorrow, one can still access the sources.
With that I mind, I created the ticket #4212 [1], the idea is to run a cron job that would clone/pull all the projects under the fedora-infra group of github.
...snip...
Do you have another idea/suggestion?
I think doing this is a great idea. ;)
What about using grokmirror?
https://github.com/mricon/grokmirror
(packaged as python-grokmirror)
kevin
On Thu, Feb 20, 2014 at 08:19:22AM -0700, Kevin Fenzi wrote:
On Thu, 20 Feb 2014 12:33:39 +0100 Pierre-Yves Chibon pingou@pingoured.fr wrote:
Hi,
For about a year now we have a bunch of project on github. With it, we benefit from its UI and easy access for new contributors. But github is proprietary so not so in line with our spirit. Although git is decentralized I think we should keep a clone of our projects on fedorahosted, just for the shake of saying, our infra does not depend on github and if github goes down tomorrow, one can still access the sources.
With that I mind, I created the ticket #4212 [1], the idea is to run a cron job that would clone/pull all the projects under the fedora-infra group of github.
...snip...
Do you have another idea/suggestion?
I think doing this is a great idea. ;)
What about using grokmirror?
https://github.com/mricon/grokmirror
(packaged as python-grokmirror)
kevin
Hi guys,
I don't know our git infra yet (I'm still in apprentices) so it'll be a blind shoot.
When it's about mirroring git(hub) repos than I think that it's good to consider gitorious as the solution. It's FOSS and just do the job. Even bi-directionally. I think that might be interesting for you guys :)
And a working solution from Gluster guys:
http://www.gluster.org/2013/08/mirroring-into-gitorious/
On Thu, Feb 20, 2014 at 04:35:43PM +0100, Maciej Lasyk wrote:
On Thu, Feb 20, 2014 at 08:19:22AM -0700, Kevin Fenzi wrote:
On Thu, 20 Feb 2014 12:33:39 +0100 Pierre-Yves Chibon pingou@pingoured.fr wrote:
Hi,
For about a year now we have a bunch of project on github. With it, we benefit from its UI and easy access for new contributors. But github is proprietary so not so in line with our spirit. Although git is decentralized I think we should keep a clone of our projects on fedorahosted, just for the shake of saying, our infra does not depend on github and if github goes down tomorrow, one can still access the sources.
With that I mind, I created the ticket #4212 [1], the idea is to run a cron job that would clone/pull all the projects under the fedora-infra group of github.
[...]
When it's about mirroring git(hub) repos than I think that it's good to consider gitorious as the solution. It's FOSS and just do the job. Even bi-directionally. I think that might be interesting for you guys :)
And a working solution from Gluster guys:
If I understand this correctly, they are just using the --mirror option of git clone and run a cron job to keep things in sync. So that's close to what we want but I don't think gitorious adds us anything that we can't already do
Interesting reading though, gives ideas :)
Pierre
On Thu, Feb 20, 2014 at 05:28:03PM +0100, Pierre-Yves Chibon wrote:
On Thu, Feb 20, 2014 at 04:35:43PM +0100, Maciej Lasyk wrote:
On Thu, Feb 20, 2014 at 08:19:22AM -0700, Kevin Fenzi wrote:
On Thu, 20 Feb 2014 12:33:39 +0100 Pierre-Yves Chibon pingou@pingoured.fr wrote:
Hi,
For about a year now we have a bunch of project on github. With it, we benefit from its UI and easy access for new contributors. But github is proprietary so not so in line with our spirit. Although git is decentralized I think we should keep a clone of our projects on fedorahosted, just for the shake of saying, our infra does not depend on github and if github goes down tomorrow, one can still access the sources.
With that I mind, I created the ticket #4212 [1], the idea is to run a cron job that would clone/pull all the projects under the fedora-infra group of github.
[...]
When it's about mirroring git(hub) repos than I think that it's good to consider gitorious as the solution. It's FOSS and just do the job. Even bi-directionally. I think that might be interesting for you guys :)
And a working solution from Gluster guys:
If I understand this correctly, they are just using the --mirror option of git clone and run a cron job to keep things in sync. So that's close to what we want but I don't think gitorious adds us anything that we can't already do
Interesting reading though, gives ideas :)
Pierre
Hmm right - I gave you wrong example. I read about GROK and believe that it will do the job for you :)
But I just wanted to show you that gitolite can be used in many ways. Just about mirroring: http://gitolite.com/gitolite/mirroring.html
Just check the first paragraph - it will give you the whole look :)
On Thu, Feb 20, 2014 at 08:19:22AM -0700, Kevin Fenzi wrote:
On Thu, 20 Feb 2014 12:33:39 +0100 Pierre-Yves Chibon pingou@pingoured.fr wrote:
Hi,
For about a year now we have a bunch of project on github. With it, we benefit from its UI and easy access for new contributors. But github is proprietary so not so in line with our spirit. Although git is decentralized I think we should keep a clone of our projects on fedorahosted, just for the shake of saying, our infra does not depend on github and if github goes down tomorrow, one can still access the sources.
With that I mind, I created the ticket #4212 [1], the idea is to run a cron job that would clone/pull all the projects under the fedora-infra group of github.
...snip...
Do you have another idea/suggestion?
I think doing this is a great idea. ;)
What about using grokmirror?
https://github.com/mricon/grokmirror
(packaged as python-grokmirror)
This looks really quite nice, I should have checked for it before opening the ticket. @souradeep would you like to have a look at this and see if we can set it up? It might save us some work but taking care of the mirroring part.
Pierre
Hi, Grokmirror really looks efficient and perfect for the job. @pingou, I just had a look at the project, I am currently trying to set this up.
On Thu, Feb 20, 2014 at 11:29 AM, Pierre-Yves Chibon pingou@pingoured.frwrote:
I think doing this is a great idea. ;)
What about using grokmirror?
https://github.com/mricon/grokmirror
(packaged as python-grokmirror)
This looks really quite nice, I should have checked for it before opening the ticket.
Just to caution -- for grokmirror to work, the remote server needs to provide the manifest file, which github does not. There is a "grok-dumb-pull" utility in grokmirror that works for mirroring repositories that don't provide a manifest, so that will work, but it's not as efficient ( http://manpages.ubuntu.com/manpages/trusty/man1/grok-dumb-pull.1.html).
Other than that, I'm all for Fedora using grokmirror. :)
Best,
On Sat, Feb 22, 2014 at 01:17:23PM -0500, Konstantin Ryabitsev wrote:
On Thu, Feb 20, 2014 at 11:29 AM, Pierre-Yves Chibon pingou@pingoured.fr wrote:
> I think doing this is a great idea. ;) > > What about using grokmirror? > > https://github.com/mricon/grokmirror > > (packaged as python-grokmirror) This looks really quite nice, I should have checked for it before opening the ticket.
Just to caution -- for grokmirror to work, the remote server needs to provide the manifest file, which github does not. There is a "grok-dumb-pull" utility in grokmirror that works for mirroring repositories that don't provide a manifest, so that will work, but it's not as efficient (http://manpages.ubuntu.com/manpages/trusty/man1/grok-dumb-pull.1.html).
grok-dumb-pull seems to be the way to go in the current environment. However I have a couple of questions: - has there been any thoughts on trying to get grokmirror included in github as one of the hooks they provide by default? This would reduce their bandwith usage for all the project that, like us, would like to mirror their github repo elsewhere. - One thing that isn't clear to me is how the manifest.js is made available. It is just provided at a specified URL or is it in fact included in the git repo?
There is one more thing I'd like to see in grokmirror, but I'll make a pull-request for it ;-)
Thanks for the feedbacks, Pierre
On Tue, Feb 25, 2014 at 5:53 AM, Pierre-Yves Chibon pingou@pingoured.frwrote:
grok-dumb-pull seems to be the way to go in the current environment. However I have a couple of questions:
- has there been any thoughts on trying to get grokmirror included in
github as one of the hooks they provide by default? This would reduce their bandwith usage for all the project that, like us, would like to mirror their github repo elsewhere.
I've not approached them for anything like this, but I would imagine a single manifest file would not work at all for all of github's repositories. The largest collection we currently manage with grokmirror is 5,500 repositories and though it does admirably well, it's getting to the point where parsing/writing the manifest file is taking upwards of a second. They could probably generate a manifest file per user, but not a single manifest for all the repositories they host.
- One thing that isn't clear to me is how the manifest.js is made
available. It is just provided at a specified URL or is it in fact included in the git repo?
It's made available outside git repositories as a simple http download. This way we can make use of an extremely lightweight HTTP with "if-newer-than" header and bail out early if the remote manifest hasn't changed.
There is one more thing I'd like to see in grokmirror, but I'll make a pull-request for it ;-)
I see it. I'll try to move on it shortly. We need a 0.4.0 out, and it's probably the last big change that would go into it.
Best,
On Tue, Feb 25, 2014 at 09:16:19AM -0500, Konstantin Ryabitsev wrote:
On Tue, Feb 25, 2014 at 5:53 AM, Pierre-Yves Chibon pingou@pingoured.fr wrote:
grok-dumb-pull seems to be the way to go in the current environment. However I have a couple of questions: - has there been any thoughts on trying to get grokmirror included in github as one of the hooks they provide by default? This would reduce their bandwith usage for all the project that, like us, would like to mirror their github repo elsewhere.
I've not approached them for anything like this, but I would imagine a single manifest file would not work at all for all of github's repositories. The largest collection we currently manage with grokmirror is 5,500 repositories and though it does admirably well, it's getting to the point where parsing/writing the manifest file is taking upwards of a second. They could probably generate a manifest file per user, but not a single manifest for all the repositories they host.
Per-user or per-organization might already just do the job. Tbh, I was even thinking of a per-project manifest.
- One thing that isn't clear to me is how the manifest.js is made available. It is just provided at a specified URL or is it in fact included in the git repo?
It's made available outside git repositories as a simple http download. This way we can make use of an extremely lightweight HTTP with "if-newer-than" header and bail out early if the remote manifest hasn't changed.
So, in theory, it should be duable for github, might as simple as https://github.com/fedora-infra/fedocal/manifest where they have https://github.com/fedora-infra/fedocal/branches now.
Food for thoughts I guess :)
Thanks for the feedbacks! Pierre
On Tue, Feb 25, 2014 at 9:36 AM, Pierre-Yves Chibon pingou@pingoured.frwrote:
I've not approached them for anything like this, but I would imagine a single manifest file would not work at all for all of github's repositories. The largest collection we currently manage with
grokmirror
is 5,500 repositories and though it does admirably well, it's getting
to
the point where parsing/writing the manifest file is taking upwards
of a
second. They could probably generate a manifest file per user, but
not a
single manifest for all the repositories they host.
Per-user or per-organization might already just do the job. Tbh, I was even thinking of a per-project manifest.
Per-project would largely defeat the purpose of grokmirror. The goal was to be able to check the status of thousands of repositories with only one REST call. If you have as many manifest files as there are projects, just doing "git pull" in each project would be about as efficient as a "http get", so might as well just run "grok-dumb-pull".
Regards, -K
infrastructure@lists.fedoraproject.org