We've got problems

List overview All Threads
Download

newer

older

fedora infra rsa key submitted

Noisy cron

Mike McGrath

18 Dec 2008 18 Dec '08

3:06 p.m.

Hey everyone, so there's lots or projects going on right now and I'm going to spend some time to prioritize them and hopefully get help on them.

The wiki:

We need to upgrade the wiki, like yesterday. We're on an unsupported branch right now.

Whats the hangup? HNP is our ACL plugin. Some might remember when we said "A wiki is not a CMS" well, now we're being screwed by it. It has been made clear to us that certain pages in the wiki must A) stay on the wiki and B) be editable by only a subset of people. HNP is not supported by newer versions of mediawiki.

Ticket #1072

Koji:

The koji database is borked right now. We're getting by but its a ticking timebomb. There was some data corruption with our backplane issues this last week. Damaged portions equal no more then 160K at this point (I'm still doing other measurements). This is less then .0002% of the data of that database. The problem is that when I try to do dumps of the data pgsql fails. This is a must fix.

Ticket #1069

nfs1:

NFS1's IO load is just not right. Something isn't behaving as it should and I'm just not sure whats going on there yet. We need to do a full examination and trend of it. This involves moving cvs1 to another location and involves moving releng2 to xen1 to help ease some load. Additionally we need to move kojipkgs1 to another location (probably xen1) and enable a proper caching for it. We also need to finally get a valid backup of nfs1. This still hasn't happened. Its difficult to test because of the high load on the disks, backups take 4+ days. lots of things can go wrong during that time.

Tickets: #1061, #1074, #1075, #1076.

backup2:

DR backups is a project I've been trying to get in shape. its general there, needs some polish. The dr user is there. But an audit and verification of everything we need to back up needs to be done.

Ticket: #1077

CSRF:

CSRF is a pretty serious deal, toshio is working on it but I'm sure he can use some help.

Ticket: #992

FAS:

Ricky has been working on some FAS stuff but some outstanding systems must be implemented quickly particularly as it relates to stale users. We need to get the password expiration stuff in, and we need to document and define what each user state means and how it will relate to other applications.

FAS: #83

If you don't have access to the systems of this stuff, we likely won't be able to sponsor and train you in time to get this stuff done. Sorry, but you can still look at the code issues mentioned above or test alternatives for the wiki.

If you have access and are working on something else, if you are able please stop doing that thing and work on one of the above issues. Even after these issues are done I've got a good 6 months of stuff backed up before new things can be done. We've literally exploded in size. Over the last two years the services, nodes, etc that we support has grown over 10 times and our core team has only grown by a few members. We're doing better then most OSS communities out there, but we can do better.

-Mike

Show replies by date

Mike McGrath

18 Dec 18 Dec

3:16 p.m.

On Thu, 18 Dec 2008, Mike McGrath wrote:

...

nfs1:

NFS1's IO load is just not right. Something isn't behaving as it should and I'm just not sure whats going on there yet. We need to do a full examination and trend of it. This involves moving cvs1 to another location and involves moving releng2 to xen1 to help ease some load. Additionally we need to move kojipkgs1 to another location (probably xen1) and enable a proper caching for it. We also need to finally get a valid backup of nfs1. This still hasn't happened. Its difficult to test because of the high load on the disks, backups take 4+ days. lots of things can go wrong during that time.

Tickets: #1061, #1074, #1075, #1076.

more background on this, I know releng and some others are working to do fewer reads and writes to the nfs share. Thats a valuable effort, but I'm not convinced there's not something else going on there. The read/write speeds I've seen just are slower then I'd expect.

-Mike

Jeffrey Ollie

3:31 p.m.

On Thu, Dec 18, 2008 at 3:06 PM, Mike McGrath mmcgrath@redhat.com wrote:

...

Whats the hangup? HNP is our ACL plugin. Some might remember when we said "A wiki is not a CMS" well, now we're being screwed by it. It has been made clear to us that certain pages in the wiki must A) stay on the wiki and B) be editable by only a subset of people. HNP is not supported by newer versions of mediawiki.

Why do the pages have to stay on "THE" wiki. Why not set up a 2nd wiki instance for the sensitive pages and set up inter-wiki links so that linking isn't cumbersome.

-- Jeff Ollie "You know, I used to think it was awful that life was so unfair. Then I thought, wouldn't it be much worse if life were fair, and all the terrible things that happen to us come because we actually deserve them? So, now I take great comfort in the general hostility and unfairness of the universe." -- Marcus to Franklin in Babylon 5: "A Late Delivery from Avalon"

Mike McGrath

3:34 p.m.

On Thu, 18 Dec 2008, Jeffrey Ollie wrote:

...

On Thu, Dec 18, 2008 at 3:06 PM, Mike McGrath mmcgrath@redhat.com wrote:

...
Whats the hangup? HNP is our ACL plugin. Some might remember when we said "A wiki is not a CMS" well, now we're being screwed by it. It has been made clear to us that certain pages in the wiki must A) stay on the wiki and B) be editable by only a subset of people. HNP is not supported by newer versions of mediawiki.

Why do the pages have to stay on "THE" wiki. Why not set up a 2nd wiki instance for the sensitive pages and set up inter-wiki links so that linking isn't cumbersome.

That very well may work. G and quaid are working on something now. As far as inter-wiki links. I honestly know nothing about them.

-Mike

Jeffrey Ollie

3:43 p.m.

On Thu, Dec 18, 2008 at 3:34 PM, Mike McGrath mmcgrath@redhat.com wrote:

...

On Thu, 18 Dec 2008, Jeffrey Ollie wrote:

...
On Thu, Dec 18, 2008 at 3:06 PM, Mike McGrath mmcgrath@redhat.com wrote:

...
Whats the hangup? HNP is our ACL plugin. Some might remember when we said "A wiki is not a CMS" well, now we're being screwed by it. It has been made clear to us that certain pages in the wiki must A) stay on the wiki and B) be editable by only a subset of people. HNP is not supported by newer versions of mediawiki.

Why do the pages have to stay on "THE" wiki. Why not set up a 2nd wiki instance for the sensitive pages and set up inter-wiki links so that linking isn't cumbersome.

That very well may work. G and quaid are working on something now. As far as inter-wiki links. I honestly know nothing about them.

http://www.mediawiki.org/wiki/Help:Interwiki_linking#Interwiki_links http://www.mediawiki.org/wiki/Manual:Guide_to_setting_up_interwiki_linking

That's about all I know either.

Ian Weller

4:47 p.m.

On Thu, Dec 18, 2008 at 03:34:53PM -0600, Mike McGrath wrote:

...

That very well may work. G and quaid are working on something now. As far as inter-wiki links. I honestly know nothing about them.

Nigel set up some magic for interwiki links.

-- Ian Weller ianweller@gmail.com http://ianweller.org GnuPG fingerprint: E51E 0517 7A92 70A2 4226 B050 87ED 7C97 EFA8 4A36 "Technology is a word that describes something that doesn't work yet." ~ Douglas Adams

Karsten Wade

3:42 p.m.

On Thu, Dec 18, 2008 at 03:31:18PM -0600, Jeffrey Ollie wrote:

...

On Thu, Dec 18, 2008 at 3:06 PM, Mike McGrath mmcgrath@redhat.com wrote:

...
Whats the hangup? HNP is our ACL plugin. Some might remember when we said "A wiki is not a CMS" well, now we're being screwed by it. It has been made clear to us that certain pages in the wiki must A) stay on the wiki and B) be editable by only a subset of people. HNP is not supported by newer versions of mediawiki.

Why do the pages have to stay on "THE" wiki. Why not set up a 2nd wiki instance for the sensitive pages and set up inter-wiki links so that linking isn't cumbersome.

Thanks, yes, that's one viable solution. It might be a nice, fast way to go. But it's really just a bandage (more below.)

Another bandage Nigel and I just discussed is scripting a pull from non-ACL'd draft pages in the regular wiki (Legal, Packaging, etc.) and push to fedoraproject.org/Legal, docs.fp.o/packaging-guide. This is an uglier bandage with more custom coding, I reckon.

Why not use a wiki like this as a long term solution? It comes to content management. There is a small subset of all the content we maintain that needs:

* Version control and rollback

* Automatic publish/unpublish by rules (dates, packages released, etc.)

* Workflow to ensure quality before moving live (writer <=> editor => publish)

* Nice tool to add users to fine-grained groups to have roles in managing their content without accidentally stepping on other people's content

* Nice web-based wysiwyg editor

Just the last one is a nicety, the rest are really a must for this special content. More here:

https://fedoraproject.org/wiki/CMS_solution_for_Fedora_Project_websites

- Karsten

-- Karsten 'quaid' Wade, Community Gardener http://quaid.fedorapeople.org AD0E0C41

Nigel Jones

3:43 p.m.

On Thu, 2008-12-18 at 15:31 -0600, Jeffrey Ollie wrote:

...

On Thu, Dec 18, 2008 at 3:06 PM, Mike McGrath mmcgrath@redhat.com wrote:

...
Whats the hangup? HNP is our ACL plugin. Some might remember when we said "A wiki is not a CMS" well, now we're being screwed by it. It has been made clear to us that certain pages in the wiki must A) stay on the wiki and B) be editable by only a subset of people. HNP is not supported by newer versions of mediawiki.

Why do the pages have to stay on "THE" wiki. Why not set up a 2nd wiki instance for the sensitive pages and set up inter-wiki links so that linking isn't cumbersome.

Sorry, gotta do soft redirects, i.e.

"This page is now on a different wiki, click here to go it..."

It's a pain I know, but yeah, I'll follow up in a separate post.

- Nigel

Toshio Kuratomi

4:31 p.m.

Mike McGrath wrote:

...

CSRF:

CSRF is a pretty serious deal, toshio is working on it but I'm sure he can use some help.

Ticket: #992

Till brought up concerns with a decrease in usability to do it the way I've outlined. This is certainly a valid problem. The question is whether it outweighs the benefit of mitigating the effects of programmer errors. Till didn't reply to my last message... though it might be that he just decided I was too stubborn to change rather than agreeing with me :-). If anyone sees a way to reconcile both "click from email" and "prevent spoofing by default" let me know otherwise I'm committing code soon.

If anyone wants to help code, this is a problem that is easily broken into pieces. So one person can get involved with creating our custom version of tg.url() while someone else updates the identity provider and someone else updates the BaseClient implementations.

-Toshio

Toshio Kuratomi

19 Dec 19 Dec

1:40 p.m.

Toshio Kuratomi wrote:

...

Mike McGrath wrote:

...
CSRF:

CSRF is a pretty serious deal, toshio is working on it but I'm sure he can use some help.

Ticket: #992

Till brought up concerns with a decrease in usability to do it the way I've outlined. This is certainly a valid problem. The question is whether it outweighs the benefit of mitigating the effects of programmer errors. Till didn't reply to my last message... though it might be that he just decided I was too stubborn to change rather than agreeing with me :-). If anyone sees a way to reconcile both "click from email" and "prevent spoofing by default" let me know otherwise I'm committing code soon.

I woke up with a possible solution. Till, Luke, Ricky, and others does this seem doable?

= Background =

A method in the TG Controller can be marked as needing a non-anonymous identity like this:

@identity.require(identity.not_anonymous()) def foo(self): if 'admin' in identity.current.groups: # Do admin stuff else: # Do normal user stuff

The presence of the @identity.require(identity.not_anonymous()) is what forces the method to redirect to the login page when a user is not logged in.

If the @identity is not there and the code simply checks inside of the method, then it usually means the code will do different things if the user is anonymous or authenticated instead of depending on which group the user belongs to.

= Addition for CSRF =

The current proposal says that when identity is referenced, we'll check the CSRF token. If the token is not present or doesn't match, then we'll decide the user is anonymous. If the @identity.require() decorator is present, we will be checking at that level what the identity of the user is.

I think it would be possible for us to check whether the user has a CSRF token at this point. If not, but the tg-visit session cookie that the user sent is valid, we can redirect to a page that says "This page helps prevent CSRF spoofing. Click to continue to the _requested_resource_." The link will go to the original method but will contain the CSRF token. So if the user is in control of the browser they can click on the link and be taken to the resource using their current login session.

= Things this does not do =

* We don't do an automatic redirect here because I think the browser will process that redirect whether or not javascript is allowed to read it. As long as the browser processes the redirect automatically despite what the same-origin policy says, we've lost the CSRF protection. (Someone can check whether the 30X status codes and <meta refresh> tags all do this.)

* If the method in question allows anonymous access then you will get the anonymous page rather than the CSRF redirection.

- We might be able to ameliorate this by having the login screen understand the difference between not having a tg-visit cookie and not having a CSRF token. You'd still have to click to login to the page, (one click to login, the second to return from the CSRF protection) but the login screen could display the "Please click to continue" instead of forcing the user to retype their username and password and start a new session. - We might be able to ameliorate this even further if we return enough information to tell the page that the only reason we aren't logged in is because of CSRF protection. With that, the

= Other Notes =

* This would also solve the problem of how to do SSL Client Authentication. The SSL Cert alone would take you to the login screen. You'd then click on a link (with the CSRF token embedded) to take you to the screen you want.

* Here's a drawback of putting the CSRF token on GET requests -- When copy and pasting links, the user's CSRF token would be in the pasted information. Having the CSRF token added by javascript when the user clicks on any link on the page would get around this but requires that Javascript is enabled in the browser.

-Toshio

Nigel Jones

20 Dec 20 Dec

9:27 p.m.

On Thu, 2008-12-18 at 15:06 -0600, Mike McGrath wrote:

...

Hey everyone, so there's lots or projects going on right now and I'm going to spend some time to prioritize them and hopefully get help on them.

The wiki:

We need to upgrade the wiki, like yesterday. We're on an unsupported branch right now.

Whats the hangup? HNP is our ACL plugin. Some might remember when we said "A wiki is not a CMS" well, now we're being screwed by it. It has been made clear to us that certain pages in the wiki must A) stay on the wiki and B) be editable by only a subset of people. HNP is not supported by newer versions of mediawiki.

FYI,

I've started a process to move these pages into Namespaces so we can use a more supported extension until we can then move these pages _again_ into a CMS.

...

Ticket #1072

Koji:

The koji database is borked right now. We're getting by but its a ticking timebomb. There was some data corruption with our backplane issues this last week. Damaged portions equal no more then 160K at this point (I'm still doing other measurements). This is less then .0002% of the data of that database. The problem is that when I try to do dumps of the data pgsql fails. This is a must fix.

Ticket #1069

nfs1:

NFS1's IO load is just not right. Something isn't behaving as it should and I'm just not sure whats going on there yet. We need to do a full examination and trend of it. This involves moving cvs1 to another location and involves moving releng2 to xen1 to help ease some load. Additionally we need to move kojipkgs1 to another location (probably xen1) and enable a proper caching for it. We also need to finally get a valid backup of nfs1. This still hasn't happened. Its difficult to test because of the high load on the disks, backups take 4+ days. lots of things can go wrong during that time.

Tickets: #1061, #1074, #1075, #1076.

backup2:

DR backups is a project I've been trying to get in shape. its general there, needs some polish. The dr user is there. But an audit and verification of everything we need to back up needs to be done.

Ticket: #1077

CSRF:

CSRF is a pretty serious deal, toshio is working on it but I'm sure he can use some help.

Ticket: #992

FAS:

Ricky has been working on some FAS stuff but some outstanding systems must be implemented quickly particularly as it relates to stale users. We need to get the password expiration stuff in, and we need to document and define what each user state means and how it will relate to other applications.

FAS: #83

If you don't have access to the systems of this stuff, we likely won't be able to sponsor and train you in time to get this stuff done. Sorry, but you can still look at the code issues mentioned above or test alternatives for the wiki.

If you have access and are working on something else, if you are able please stop doing that thing and work on one of the above issues. Even after these issues are done I've got a good 6 months of stuff backed up before new things can be done. We've literally exploded in size. Over the last two years the services, nodes, etc that we support has grown over 10 times and our core team has only grown by a few members. We're doing better then most OSS communities out there, but we can do better.

-Mike

Fedora-infrastructure-list mailing list Fedora-infrastructure-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list

Kostas Georgiou

23 Dec 23 Dec

11:21 a.m.

On Thu, Dec 18, 2008 at 03:06:34PM -0600, Mike McGrath wrote:

...

nfs1:

NFS1's IO load is just not right. Something isn't behaving as it should and I'm just not sure whats going on there yet. We need to do a full examination and trend of it. This involves moving cvs1 to another location and involves moving releng2 to xen1 to help ease some load. Additionally we need to move kojipkgs1 to another location (probably xen1) and enable a proper caching for it. We also need to finally get a valid backup of nfs1. This still hasn't happened. Its difficult to test because of the high load on the disks, backups take 4+ days. lots of things can go wrong during that time.

Something like disktop.stp from http://sourceware.org/systemtap/wiki/ScriptsTools might be usefull in finding out what is causing the load.

Also have a look at https://bugzilla.redhat.com/show_bug.cgi?id=448130 if you are using the default CFQ IO scheduler and NFS1 is used for nfs traffic as the name suggests (it isn't just nfs pefrormance that is affected by slice_idle though).

Kostas

5611

Age (days ago)

5616

Last active (days ago)

infrastructure@lists.fedoraproject.org

11 comments

7 participants

tags (0)

participants (7)

Ian Weller
Jeffrey Ollie
Karsten Wade
Kostas Georgiou
Mike McGrath
Nigel Jones
Toshio Kuratomi