https://fedoraproject.org/wiki/Changes/Power4kPageSize
== Summary ==
On ppc64le, the kernel is currently compiled for 64k page size.
This change proposes using the more common 4k page size.
Some HPC workloads may be disadvantaged slightly. Workstation users are likely to encounter fewer bugs.
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
== Owner == * Name: [[User:pocock|Daniel Pocock]] * Email: daniel@pocock.pro
== Detailed Description ==
== Feedback == Discussed several times on devel, [https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/... latest here]
[https://forums.raptorcs.com/index.php/topic,248.msg1852.html discussed upstream in the Raptor forum]
== Benefit to Fedora == Better first impression for users of ppc64le workstations.
Users can focus on reporting ppc64le bugs without being sidetracked by page size bugs.
== Scope == * Proposal owners: [[DanielPocock]] * Other developers: please volunteer by adding your name here
* Release engineering: [https://pagure.io/releng/issue/9939 #9939] ** wait for 5.12 kernel, verify that it includes the Btrfs patches for arbitrary 4k / 64k sector size, independent of the page size ** create a kernel with 4k page size to run on the ppc64le build servers ** ensure the default kernel RPM in the distribution has 4k page size ** perform the mass rebuild running on the 4k page size ** create an installer ISO based on the revised kernel with 4k page size
* Policies and guidelines: no, as it is an arch-specific issues, most other architectures already have a 4k page size * Trademark approval: N/A (not needed for this Change) * Alignment with Objectives: none of the current objectives relate to this change
== Upgrade/compatibility impact == If the user has already formatted their root filesystem with Btrfs and a 64k sector size, they need to be using a Fedora kernel that supports both 4k and 64k. This is anticipated in a future kernel release, 5.12 and will hopefully be ready for F34 or F35[https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...].
== User Experience == New GPUs are more likely to just work on this non-x86 architecture, as long as the latest firmware, mesa, llvm are also used.
Btrfs, the default filesystem, will use the sector size identical to the running kernel's page size. As the 4k page size is more common, this will ensure Btrfs filesystems created on ppc64le hosts can be used on x86 and other hosts without hassle.
== Dependencies == All RPMs must be rebuilt on a server running the final page size (4k)
== Contingency Plan == * Contingency mechanism: Prepare a kernel with the original 64k config, install it on the build server, rebuild all the packages for this architecture * Contingency deadline: whenever the last time for a full rebuild or kernel change is possible * Blocks release? Yes, full rebuild of all packages must be completed before release
On Fri, Feb 12, 2021 at 10:21 AM Ben Cotton bcotton@redhat.com wrote:
https://fedoraproject.org/wiki/Changes/Power4kPageSize
== Summary ==
On ppc64le, the kernel is currently compiled for 64k page size.
This change proposes using the more common 4k page size.
Some HPC workloads may be disadvantaged slightly. Workstation users are likely to encounter fewer bugs.
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
Just as there are many things that expect the 64K page size. I am not doing this.
Justin
== Owner ==
- Name: [[User:pocock|Daniel Pocock]]
- Email: daniel@pocock.pro
== Detailed Description ==
== Feedback == Discussed several times on devel, [https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/... latest here]
[https://forums.raptorcs.com/index.php/topic,248.msg1852.html discussed upstream in the Raptor forum]
== Benefit to Fedora == Better first impression for users of ppc64le workstations.
Users can focus on reporting ppc64le bugs without being sidetracked by page size bugs.
== Scope ==
Proposal owners: [[DanielPocock]]
Other developers: please volunteer by adding your name here
Release engineering: [https://pagure.io/releng/issue/9939 #9939]
** wait for 5.12 kernel, verify that it includes the Btrfs patches for arbitrary 4k / 64k sector size, independent of the page size ** create a kernel with 4k page size to run on the ppc64le build servers ** ensure the default kernel RPM in the distribution has 4k page size ** perform the mass rebuild running on the 4k page size ** create an installer ISO based on the revised kernel with 4k page size
- Policies and guidelines: no, as it is an arch-specific issues, most
other architectures already have a 4k page size
- Trademark approval: N/A (not needed for this Change)
- Alignment with Objectives: none of the current objectives relate to
this change
== Upgrade/compatibility impact == If the user has already formatted their root filesystem with Btrfs and a 64k sector size, they need to be using a Fedora kernel that supports both 4k and 64k. This is anticipated in a future kernel release, 5.12 and will hopefully be ready for F34 or F35[https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...].
== User Experience == New GPUs are more likely to just work on this non-x86 architecture, as long as the latest firmware, mesa, llvm are also used.
Btrfs, the default filesystem, will use the sector size identical to the running kernel's page size. As the 4k page size is more common, this will ensure Btrfs filesystems created on ppc64le hosts can be used on x86 and other hosts without hassle.
== Dependencies == All RPMs must be rebuilt on a server running the final page size (4k)
== Contingency Plan ==
- Contingency mechanism: Prepare a kernel with the original 64k
config, install it on the build server, rebuild all the packages for this architecture
- Contingency deadline: whenever the last time for a full rebuild or
kernel change is possible
- Blocks release? Yes, full rebuild of all packages must be completed
before release
-- Ben Cotton He / Him / His Senior Program Manager, Fedora & CentOS Stream Red Hat TZ=America/Indiana/Indianapolis _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 12/02/2021 21:19, Justin Forbes wrote:
On Fri, Feb 12, 2021 at 10:21 AM Ben Cotton bcotton@redhat.com wrote:
https://fedoraproject.org/wiki/Changes/Power4kPageSize
== Summary ==
On ppc64le, the kernel is currently compiled for 64k page size.
This change proposes using the more common 4k page size.
Some HPC workloads may be disadvantaged slightly. Workstation users are likely to encounter fewer bugs.
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
Just as there are many things that expect the 64K page size. I am not doing this.
Justin
Can you please identify some of the things that expect 64k?
If the GPU drivers don't work that makes it a complete non-starter for many workstation users, or they have to compile their own kernels or obtain custom kernels from another user.
Dnia Fri, Feb 12, 2021 at 10:16:26PM +0100, Daniel Pocock napisał(a):
On 12/02/2021 21:19, Justin Forbes wrote:
On Fri, Feb 12, 2021 at 10:21 AM Ben Cotton bcotton@redhat.com wrote:
https://fedoraproject.org/wiki/Changes/Power4kPageSize
== Summary ==
On ppc64le, the kernel is currently compiled for 64k page size.
This change proposes using the more common 4k page size.
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
Just as there are many things that expect the 64K page size. I am not doing this.
Justin
Can you please identify some of the things that expect 64k?
If the GPU drivers don't work that makes it a complete non-starter for many workstation users, or they have to compile their own kernels or obtain custom kernels from another user.
Or just fix the GPU drivers. They're open source, after all.
On 13/02/2021 09:11, Tomasz Torcz wrote:
Dnia Fri, Feb 12, 2021 at 10:16:26PM +0100, Daniel Pocock napisał(a):
On 12/02/2021 21:19, Justin Forbes wrote:
On Fri, Feb 12, 2021 at 10:21 AM Ben Cotton bcotton@redhat.com wrote:
https://fedoraproject.org/wiki/Changes/Power4kPageSize
== Summary ==
On ppc64le, the kernel is currently compiled for 64k page size.
This change proposes using the more common 4k page size.
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
Just as there are many things that expect the 64K page size. I am not doing this.
Justin
Can you please identify some of the things that expect 64k?
If the GPU drivers don't work that makes it a complete non-starter for many workstation users, or they have to compile their own kernels or obtain custom kernels from another user.
Or just fix the GPU drivers. They're open source, after all.
The GPUs also have firmware blobs
On Sat, 13 Feb 2021 at 05:15, Daniel Pocock daniel@pocock.pro wrote:
On 13/02/2021 09:11, Tomasz Torcz wrote:
Dnia Fri, Feb 12, 2021 at 10:16:26PM +0100, Daniel Pocock napisał(a):
On 12/02/2021 21:19, Justin Forbes wrote:
On Fri, Feb 12, 2021 at 10:21 AM Ben Cotton bcotton@redhat.com
wrote:
https://fedoraproject.org/wiki/Changes/Power4kPageSize
== Summary ==
On ppc64le, the kernel is currently compiled for 64k page size.
This change proposes using the more common 4k page size.
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
Just as there are many things that expect the 64K page size. I am not doing this.
Justin
Can you please identify some of the things that expect 64k?
If the GPU drivers don't work that makes it a complete non-starter for many workstation users, or they have to compile their own kernels or obtain custom kernels from another user.
Or just fix the GPU drivers. They're open source, after all.
The GPUs also have firmware blobs
OK I think we need to back up a whole bunch here and talk directly about what is wanted and going on. I am going to try an outline what I have picked up from months of this back and forth:
1. There is some sort of PowerPC workstation which is going 'to market' somewhere soon. 2. It will have high powered video cards of a PC style so will be using the same 'firmware' that would be in x86_64. 3. Those drivers expect x86_64 4k buffer sizes. 4. Daniel would like to have Fedora Linux as an option or the operating system on it. 5. They have been trying to work out through various tickets how to make this happen.
Please correct the items above if needed. The questions that I don't see having been asked is: 1. Is Fedora interested in being offered on this hardware? 2. If it is, what changes is it willing to make it happen. 3. Are there other people interested in helping make this happen outside of Daniel
If those have been asked and answered, I apologize for not finding things. However this seems to be having someone throw 'softballs' at an iceberg in motion with the hopes that it will change course. It would be better if we answered somewhere on those 3 questions: 'Yes, and here is someone who can help you' or 'No we aren't interested in this' 'Yes but we need to work out ways to deal with our expected delvirables and here are the people who can help out' or 'No we can not make these changes because it affects our expected deliverables' 'Yes we are all buying these XYZ systems' or 'No most of us want this to work on our IBM Power 9 boxes with some other workload'.
On 13/02/2021 18:51, Stephen John Smoogen wrote:
On Sat, 13 Feb 2021 at 05:15, Daniel Pocock <daniel@pocock.pro mailto:daniel@pocock.pro> wrote:
On 13/02/2021 09:11, Tomasz Torcz wrote: > Dnia Fri, Feb 12, 2021 at 10:16:26PM +0100, Daniel Pocock napisał(a): >> >> >> On 12/02/2021 21:19, Justin Forbes wrote: >>> On Fri, Feb 12, 2021 at 10:21 AM Ben Cotton <bcotton@redhat.com <mailto:bcotton@redhat.com>> wrote: >>>> >>>> https://fedoraproject.org/wiki/Changes/Power4kPageSize >>>> >>>> == Summary == >>>> >>>> On ppc64le, the kernel is currently compiled for 64k page size. >>>> >>>> This change proposes using the more common 4k page size. >>>> >>>> Some things, like the AMD Radeon GPU drivers, firmware or related >>>> code, appear to be completely non-functional on the 64k page size. >>>> Insufficient upstream developers are testing such issues on this >>>> architecture. >>> >>> Just as there are many things that expect the 64K page size. I am not >>> doing this. >>> >>> Justin >> >> Can you please identify some of the things that expect 64k? >> >> If the GPU drivers don't work that makes it a complete non-starter for >> many workstation users, or they have to compile their own kernels or >> obtain custom kernels from another user. > > Or just fix the GPU drivers. They're open source, after all. The GPUs also have firmware blobs
OK I think we need to back up a whole bunch here and talk directly about what is wanted and going on. I am going to try an outline what I have picked up from months of this back and forth:
- There is some sort of PowerPC workstation which is going 'to market'
somewhere soon.
The workstations and motherboards are already available from Raptor: https://www.raptorcs.com/
Vikings is about to start shipping a European version: https://store.vikings.net/openpower
Many people already bought the Blackbird kits in the first production run.
- It will have high powered video cards of a PC style so will be using
the same 'firmware' that would be in x86_64. 3. Those drivers expect x86_64 4k buffer sizes.
This is a moving target, new GPUs arrive each year. It is like whack-a-mole, by the time people have it fixed with one model, the next model is available.
- Daniel would like to have Fedora Linux as an option or the operating
system on it.
I've been compiling kernels since the early days so I can work around this in my personal situation. I would like other Fedora users to have the best possible experience and as easily as possible.
I'll let other people answer your questions below...
- They have been trying to work out through various tickets how to make
this happen.
Please correct the items above if needed. The questions that I don't see having been asked is:
- Is Fedora interested in being offered on this hardware?
- If it is, what changes is it willing to make it happen.
- Are there other people interested in helping make this happen outside
of Daniel
If those have been asked and answered, I apologize for not finding things. However this seems to be having someone throw 'softballs' at an iceberg in motion with the hopes that it will change course. It would be better if we answered somewhere on those 3 questions: 'Yes, and here is someone who can help you' or 'No we aren't interested in this' 'Yes but we need to work out ways to deal with our expected delvirables and here are the people who can help out' or 'No we can not make these changes because it affects our expected deliverables' 'Yes we are all buying these XYZ systems' or 'No most of us want this to work on our IBM Power 9 boxes with some other workload'.
-- Stephen J Smoogen.
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Kevin,
On 2/13/2021 9:51:48 AM, Stephen John Smoogen smooge@gmail.com wrote:
On Sat, 13 Feb 2021 at 05:15, Daniel Pocock <daniel@pocock.pro [mailto:daniel@pocock.pro]> wrote:
On 13/02/2021 09:11, Tomasz Torcz wrote:
Dnia Fri, Feb 12, 2021 at 10:16:26PM +0100, Daniel Pocock napisał(a):
On 12/02/2021 21:19, Justin Forbes wrote:
On Fri, Feb 12, 2021 at 10:21 AM Ben Cotton <bcotton@redhat.com [mailto:bcotton@redhat.com]> wrote:
https://fedoraproject.org/wiki/Changes/Power4kPageSize [https://fedoraproject.org/wiki/Changes/Power4kPageSize]
== Summary ==
On ppc64le, the kernel is currently compiled for 64k page size.
This change proposes using the more common 4k page size.
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
Just as there are many things that expect the 64K page size. I am not doing this.
Justin
Can you please identify some of the things that expect 64k?
If the GPU drivers don't work that makes it a complete non-starter for many workstation users, or they have to compile their own kernels or obtain custom kernels from another user.
Or just fix the GPU drivers. They're open source, after all.
The GPUs also have firmware blobs
OK I think we need to back up a whole bunch here and talk directly about what is wanted and going on. I am going to try an outline what I have picked up from months of this back and forth:
1. There is some sort of PowerPC workstation which is going 'to market' somewhere soon. Alex Perez: Wellll, it's actually been on the market for a few years. There are several variants. I've had my Talos II Lite board since January of 2019.
2. It will have high powered video cards of a PC style so will be using the same 'firmware' that would be in x86_64. Alex Perez: Not will, does. Right now.
3. Those drivers expect x86_64 4k buffer sizes. Alex Perez: Some do, some don't.
4. Daniel would like to have Fedora Linux as an option or the operating system on it. 5. They have been trying to work out through various tickets how to make this happen.
Please correct the items above if needed. The questions that I don't see having been asked is: 1. Is Fedora interested in being offered on this hardware? Alex Perez: Err, people are already running Fedora on this exact hardware. You seem to think this is not something that's happening. You're mistaken.
On Sat, 13 Feb 2021 at 14:06, Alex Perez aperez@alexperez.com wrote:
Kevin,
On 2/13/2021 9:51:48 AM, Stephen John Smoogen smooge@gmail.com wrote:
On Sat, 13 Feb 2021 at 05:15, Daniel Pocock daniel@pocock.pro wrote:
On 13/02/2021 09:11, Tomasz Torcz wrote:
Dnia Fri, Feb 12, 2021 at 10:16:26PM +0100, Daniel Pocock napisał(a):
On 12/02/2021 21:19, Justin Forbes wrote:
On Fri, Feb 12, 2021 at 10:21 AM Ben Cotton bcotton@redhat.com
wrote:
https://fedoraproject.org/wiki/Changes/Power4kPageSize
== Summary ==
On ppc64le, the kernel is currently compiled for 64k page size.
This change proposes using the more common 4k page size.
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
Just as there are many things that expect the 64K page size. I am not doing this.
Justin
Can you please identify some of the things that expect 64k?
If the GPU drivers don't work that makes it a complete non-starter for many workstation users, or they have to compile their own kernels or obtain custom kernels from another user.
Or just fix the GPU drivers. They're open source, after all.
The GPUs also have firmware blobs
OK I think we need to back up a whole bunch here and talk directly about what is wanted and going on. I am going to try an outline what I have picked up from months of this back and forth:
- There is some sort of PowerPC workstation which is going 'to market'
somewhere soon.
*Alex Perez:* Wellll, it's actually been on the market for a few years. There are several variants. I've had my Talos II Lite board since January of 2019.
- It will have high powered video cards of a PC style so will be using
the same 'firmware' that would be in x86_64.
*Alex Perez:* Not will, does. Right now.
- Those drivers expect x86_64 4k buffer sizes.
*Alex Perez:* Some do, some don't.
- Daniel would like to have Fedora Linux as an option or the operating
system on it. 5. They have been trying to work out through various tickets how to make this happen.
Please correct the items above if needed. The questions that I don't see having been asked is:
- Is Fedora interested in being offered on this hardware?
*Alex Perez:* Err, people are already running Fedora on this exact hardware. You seem to think this is not something that's happening. You're mistaken.
From what had been said, I thought this was new systems which were not the ones already on the market. Thank you both for the corrections.
The GPUs also have firmware blobs
Could you provide some links to mailing list posts or bug reports where AMD developers confirm that their GPU firmware requires 4k pages? I think having some definitive sources will make this situation more clear.
So far the only amdgpu bug report I could find that relates to 64k pages[1] is a regression, as the reporter states the driver works with the 5.4 kernel. If someone with a power9 machine is willing to bisect the issue I think that would greatly increase the odds of this bug being resolved.
On 14/02/2021 05:41, Tom Seewald wrote:
The GPUs also have firmware blobs
Could you provide some links to mailing list posts or bug reports where AMD developers confirm that their GPU firmware requires 4k pages? I think having some definitive sources will make this situation more clear.
So far the only amdgpu bug report I could find that relates to 64k pages[1] is a regression, as the reporter states the driver works with the 5.4 kernel. If someone with a power9 machine is willing to bisect the issue I think that would greatly increase the odds of this bug being resolved.
A few people mention it in the Raptor forums[2]
Right now I don't have a spare machine for reboots, testing kernel and hardware issue like that. I try to keep the machine as stable as possible. If I can generate revenue from doing POWER9-specific work then I would be happy to have more of these machines.
Also, I don't have one of the Radeon Navi GPUs anyway, I defer buying one until I can get the new Radeon RX6800 XT. They were launched in November but there is still no stock here. If anybody knows how to get them for development then I can get a head start on these issues.
Its not just about me though: I'm sure that if other people buy these workstations too, that will spread the workload a lot more. Many hands make light work. As long as the number of developers on the platform is limited, we have to focus our efforts.
For example, once we go back to 4k page size, we can divide and conquer: any bugs that remain are not page size bugs and we can concentrate on fixing those first. After making good progress there, we could try 64k again.
Regards,
Daniel
2. https://forums.raptorcs.com/index.php/topic,100.msg1318.html#msg1318
A few people mention it in the Raptor forums[2]
If there are actually many other page size issues with amdgpu (or other drivers), then from what I can see the raptor/power9 community is unfortunately not reporting these problems upstream which makes it difficult for the developers to be aware of them.
Right now I don't have a spare machine for reboots, testing kernel and hardware issue like that. I try to keep the machine as stable as possible. If I can generate revenue from doing POWER9-specific work then I would be happy to have more of these machines.
Also, I don't have one of the Radeon Navi GPUs anyway, I defer buying one until I can get the new Radeon RX6800 XT. They were launched in November but there is still no stock here. If anybody knows how to get them for development then I can get a head start on these issues.
The bug report I linked mentioned a RX Vega 56, which was released in ~2017. I don't think a Navi card is required to reproduce the problem.
Its not just about me though: I'm sure that if other people buy these workstations too, that will spread the workload a lot more. Many hands make light work. As long as the number of developers on the platform is limited, we have to focus our efforts.
For example, once we go back to 4k page size, we can divide and conquer: any bugs that remain are not page size bugs and we can concentrate on fixing those first. After making good progress there, we could try 64k again.
It's a bit concerning that the power9 community isn't already actively reporting bugs upstream and working to resolve these issues. All niche desktop platforms need at least a few people to really step up and put in work to test, report, and help fix bugs. Right now it appears that the desire to move to 4k pages is in large part motivated by a lack of people willing and able to file bug reports and assist developers (e.g. bisect). I hope I am wrong, but I have not seen anyone bisect an issue in the power9-related amdgpu bug reports. This doesn't bode well for getting desktop-related power9 issues fixed, whether or not it is due to a non-4k page size.
My fear is that this is to some degree going against the ethos of Fedora by making a change solely to route around problems upstream rather than engaging with upstream to get the actual issues resolved.
Regards,
Daniel
Hi,
On 2/13/21 10:41 PM, Tom Seewald wrote:
The GPUs also have firmware blobs
Could you provide some links to mailing list posts or bug reports where AMD developers confirm that their GPU firmware requires 4k pages? I think having some definitive sources will make this situation more clear.
So far the only amdgpu bug report I could find that relates to 64k pages[1] is a regression, as the reporter states the driver works with the 5.4 kernel. If someone with a power9 machine is willing to bisect the issue I think that would greatly increase the odds of this bug being resolved.
I can confirm that amdgpu worked with 64k pages on arm in the past with various GPUs. I haven't been testing it regularly though, but we will want it for rhel/centos which use 64k pages on aarch64 too.
[1] https://gitlab.freedesktop.org/drm/amd/-/issues/1446 _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Fri, 2021-02-12 at 14:19 -0600, Justin Forbes wrote:
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
Just as there are many things that expect the 64K page size. I am not doing this.
I don't really find either of those to be compelling arguments, FWIW.
If we have broken code which makes assumptions about page sizes, we should fix it. Whatever assumption it makes.
We certainly *shouldn't* have assumptions like that in userspace, and we *shouldn't* need to rebuild userspace for such a change.
For a long time we had 4KiB pages on ppc32 AND 64KiB pages on ppc64, and we fixed things to use getpagesize() instead of hard-coding any value at all. It's possible that we have regressions, but we should probably fix those.
Once upon a time when we still supported the PS3, the 64KiB page size was fairly painful and inefficient there but IBM forced it because it apparently gave a significant performance win on boxes and workloads that they actually cared about. These days, I think the arguments for 64KiB are certainly no *less* compelling than they were then, when I conceded the change.
On 22/02/2021 17:55, David Woodhouse wrote:
On Fri, 2021-02-12 at 14:19 -0600, Justin Forbes wrote:
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
Just as there are many things that expect the 64K page size. I am not doing this.
I don't really find either of those to be compelling arguments, FWIW.
If we have broken code which makes assumptions about page sizes, we should fix it. Whatever assumption it makes.
We certainly *shouldn't* have assumptions like that in userspace, and we *shouldn't* need to rebuild userspace for such a change.
For a long time we had 4KiB pages on ppc32 AND 64KiB pages on ppc64, and we fixed things to use getpagesize() instead of hard-coding any value at all. It's possible that we have regressions, but we should probably fix those.
Once upon a time when we still supported the PS3, the 64KiB page size was fairly painful and inefficient there but IBM forced it because it apparently gave a significant performance win on boxes and workloads that they actually cared about. These days, I think the arguments for 64KiB are certainly no *less* compelling than they were then, when I conceded the change.
I feel that you underestimate the impact of the GPU driver issue
If the GPU driver doesn't work, people can't even log in and get started
If the GPU vendors don't test their code on ppc64le (and aarch64) then those platforms will always lag behind x86. Users will experience issues that have been fixed in the development phase for x86.
Personally, I'm not opposed to the 64k page size in principle: my concerns are about the practical issues.
If both 4k and 64k can be supported, if users can choose between installers for either page size, then the severity of the issue is reduced
Regards,
Daniel
Daniel Pocock wrote on 2/22/21 10:41 AM:
I feel that you underestimate the impact of the GPU driver issue
If the GPU driver doesn't work, people can't even log in and get started
If the GPU vendors don't test their code on ppc64le (and aarch64) then those platforms will always lag behind x86. Users will experience issues that have been fixed in the development phase for x86.
Personally, I'm not opposed to the 64k page size in principle: my concerns are about the practical issues.
If both 4k and 64k can be supported, if users can choose between installers for either page size, then the severity of the issue is reduced
How practical/impractical would it be to simply compile the ppc64le kernel for both page sizes, and then add a boot entry to GRUB for 4k page sizes, which is not the default.
Another option could be an official ppc64le-4K Fedora Spin.
On Mon, Feb 22, 2021 at 1:53 PM Alex Perez aperez@alexperez.com wrote:
Daniel Pocock wrote on 2/22/21 10:41 AM:
I feel that you underestimate the impact of the GPU driver issue
If the GPU driver doesn't work, people can't even log in and get started
If the GPU vendors don't test their code on ppc64le (and aarch64) then those platforms will always lag behind x86. Users will experience issues that have been fixed in the development phase for x86.
Personally, I'm not opposed to the 64k page size in principle: my concerns are about the practical issues.
If both 4k and 64k can be supported, if users can choose between installers for either page size, then the severity of the issue is reduced
How practical/impractical would it be to simply compile the ppc64le kernel for both page sizes, and then add a boot entry to GRUB for 4k page sizes, which is not the default.
Another option could be an official ppc64le-4K Fedora Spin.
Not practical at all. This is not just a simple matter of building another flavor of kernel, though even that is a lot to ask, and honestly, we have turned off flavors that had more users than I expect ppc64le workstation will have any time soon. But you still have the issue of writing special casing code for dnf at the very least. To be a full solution you would also need such code in anaconda and image building tools.
Justin
I feel that you underestimate the impact of the GPU driver issue
If the GPU driver doesn't work, people can't even log in and get started
I still do not understand why no one from the talos/ppc64le community is following up on that amdgpu regression[1] that was introduced with the 5.9 kernel. Surely the time and effort already taken recompiling kernels with 4k pages along with formulating and discussing this proposal for Fedora 35 has eclipsed spending part of an evening bisecting and replying to the amd developers? I have replied to the bug report to give some guidance on how to bisect kernel bugs as the reporter was not familiar with git. Hopefully some progress will be made in the coming days.
If the GPU vendors don't test their code on ppc64le (and aarch64) then those platforms will always lag behind x86. Users will experience issues that have been fixed in the development phase for x86.
Are there really pervasive issues with amdgpu and non-4k page sizes? If so, they are not being reported as there is only one bug open that mentions 64k pages in their bug tracker. I do not see sufficient evidence to suggest that it is not worthwhile to engage with upstream to fix real bugs. Being active participants with the upstream developers is a very good way to motivate them to care more about smaller platforms, and likewise not interacting with upstream is unfortunately a good way to decrease the mindshare of smaller desktop platforms.
Personally, I'm not opposed to the 64k page size in principle: my concerns are about the practical issues.
If both 4k and 64k can be supported, if users can choose between installers for either page size, then the severity of the issue is reduced
Regards,
Daniel
On 22/02/2021 21:18, Tom Seewald wrote:
I feel that you underestimate the impact of the GPU driver issue
If the GPU driver doesn't work, people can't even log in and get started
I still do not understand why no one from the talos/ppc64le community is following up on that amdgpu regression[1] that was introduced with the 5.9 kernel. Surely the time and effort already taken recompiling kernels with 4k pages along with formulating and discussing this proposal for Fedora 35 has eclipsed spending part of an evening bisecting and replying to the amd developers? I have replied to the bug report to give some guidance on how to bisect kernel bugs as the reporter was not familiar with git. Hopefully some progress will be made in the coming days.
Personally, I have an older GPU, RX 580 Polaris series, I will only spend dev time on the AMD Navi GPU issues after AMD makes the RX 6800 XT available in my region. I simply don't have that card and I'm not going to waste money buying the original Navi card, RX 5700, when the new card will arrive imminently.
Ultimately, even if it isn't hard to bisect, it doesn't feel fair that AMD is validating their drivers work on x86 before a release but the ppc64le users have to check things after a buggy release.
If the GPU vendors don't test their code on ppc64le (and aarch64) then those platforms will always lag behind x86. Users will experience issues that have been fixed in the development phase for x86.
Are there really pervasive issues with amdgpu and non-4k page sizes? If so, they are not being reported as there is only one bug open that mentions 64k pages in their bug tracker. I do not see sufficient evidence to suggest that it is not worthwhile to engage with upstream to fix real bugs. Being active participants with the upstream developers is a very good way to motivate them to care more about smaller platforms, and likewise not interacting with upstream is unfortunately a good way to decrease the mindshare of smaller desktop platforms.
I'm all in favor of collaboration with the AMD and kernel developers
Ultimately, the only way to ensure equality across different architectures is to have upstream developers using all of these architectures throughout their development cycle.
How can we encourage greater use of ppc64le and aarch64 in those communities? While it may sound trivial, I made a post here last week about how we can help people choose the right workstation through the wiki:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...
I estimate spending one or two hours in my own comparison of the Raptor motherboards and I hope the table allows other developers to save the same amount of time.
Personally, I'm not opposed to the 64k page size in principle: my concerns are about the practical issues.
If both 4k and 64k can be supported, if users can choose between installers for either page size, then the severity of the issue is reduced
Regards,
Daniel
On 22/02/2021 21:18, Tom Seewald wrote:
Personally, I have an older GPU, RX 580 Polaris series, I will only spend dev time on the AMD Navi GPU issues after AMD makes the RX 6800 XT available in my region. I simply don't have that card and I'm not going to waste money buying the original Navi card, RX 5700, when the new card will arrive imminently.
There is no indication from the bug report that it requires a Navi card to reproduce. The reporter stated that they are using a RX Vega 56 which is the previous gpu generation. Why do you believe this is specific to Navi devices?
Ultimately, even if it isn't hard to bisect, it doesn't feel fair that AMD is validating their drivers work on x86 before a release but the ppc64le users have to check things after a buggy release.
Unfortunately smaller platforms will almost always get less testing than the more popular platforms, and I don't see that trend changing in the foreseeable future. This is where motivated community members need to come in. I doubt amdgpu developers even have easy access to ppc64le hardware.
I will also say that regardless of ISA there are going to be times where bisection is needed. I have personally had to bisect and report an issue with amdgpu and I am using x86 hardware. There's also a decent chance I'm going to be bisecting another amdgpu bug this evening. I am not expecting you or others to do things that I am not willing to do myself.
I'm all in favor of collaboration with the AMD and kernel developers
Ultimately, the only way to ensure equality across different architectures is to have upstream developers using all of these architectures throughout their development cycle.
It would of course be great if amd fully tested their drivers on every architecture that Linux supports, but I don't think that's currently a realistic expectation for *any* device/driver vendor. If amdgpu support is something that is important to IBM, Talos, or other members of OpenPower, then I think reaching out to developers and offering free ppc64le hardware or VM access for kernel development and testing would be an excellent start. Providing automated ppc64le build and boot testing for the amd-staging-drm-next tree would be great as well.
How can we encourage greater use of ppc64le and aarch64 in those communities? While it may sound trivial, I made a post here last week about how we can help people choose the right workstation through the wiki:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...
I estimate spending one or two hours in my own comparison of the Raptor motherboards and I hope the table allows other developers to save the same amount of time.
While there's no silver bullet, reaching out to the upstream developers (e.g. via their mailing list) and having a conversation with them can't hurt. Understanding their position and what they believe would help with testing is going to be an important part of the solution.
On Mon, 22 Feb 2021 21:19:26 -0000 "Tom Seewald" tseewald@gmail.com wrote:
On 22/02/2021 21:18, Tom Seewald wrote:
Personally, I have an older GPU, RX 580 Polaris series, I will only spend dev time on the AMD Navi GPU issues after AMD makes the RX 6800 XT available in my region. I simply don't have that card and I'm not going to waste money buying the original Navi card, RX 5700, when the new card will arrive imminently.
There is no indication from the bug report that it requires a Navi card to reproduce. The reporter stated that they are using a RX Vega 56 which is the previous gpu generation. Why do you believe this is specific to Navi devices?
Ultimately, even if it isn't hard to bisect, it doesn't feel fair that AMD is validating their drivers work on x86 before a release but the ppc64le users have to check things after a buggy release.
Unfortunately smaller platforms will almost always get less testing than the more popular platforms, and I don't see that trend changing in the foreseeable future. This is where motivated community members need to come in. I doubt amdgpu developers even have easy access to ppc64le hardware.
I will also say that regardless of ISA there are going to be times where bisection is needed. I have personally had to bisect and report an issue with amdgpu and I am using x86 hardware. There's also a decent chance I'm going to be bisecting another amdgpu bug this evening. I am not expecting you or others to do things that I am not willing to do myself.
I'm all in favor of collaboration with the AMD and kernel developers
Ultimately, the only way to ensure equality across different architectures is to have upstream developers using all of these architectures throughout their development cycle.
It would of course be great if amd fully tested their drivers on every architecture that Linux supports, but I don't think that's currently a realistic expectation for *any* device/driver vendor. If amdgpu support is something that is important to IBM, Talos, or other members of OpenPower, then I think reaching out to developers and offering free ppc64le hardware or VM access for kernel development and testing would be an excellent start. Providing automated ppc64le build and boot testing for the amd-staging-drm-next tree would be great as well.
There has been such idea within the OpenPOWER Foundation some time ago, to have a lab where (primarily) HW vendors would have access and could test their HW and drivers on a number of different platforms. I suppose we should revive this idea.
Dan
How can we encourage greater use of ppc64le and aarch64 in those communities? While it may sound trivial, I made a post here last week about how we can help people choose the right workstation through the wiki:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...
I estimate spending one or two hours in my own comparison of the Raptor motherboards and I hope the table allows other developers to save the same amount of time.
While there's no silver bullet, reaching out to the upstream developers (e.g. via their mailing list) and having a conversation with them can't hurt. Understanding their position and what they believe would help with testing is going to be an important part of the solution. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 22/02/2021 22:43, Dan Horák wrote:
On Mon, 22 Feb 2021 21:19:26 -0000 "Tom Seewald" tseewald@gmail.com wrote:
On 22/02/2021 21:18, Tom Seewald wrote:
Personally, I have an older GPU, RX 580 Polaris series, I will only spend dev time on the AMD Navi GPU issues after AMD makes the RX 6800 XT available in my region. I simply don't have that card and I'm not going to waste money buying the original Navi card, RX 5700, when the new card will arrive imminently.
There is no indication from the bug report that it requires a Navi card to reproduce. The reporter stated that they are using a RX Vega 56 which is the previous gpu generation. Why do you believe this is specific to Navi devices?
To clarify, there are a range of different issues with different amdgpu cards
If anybody can provide tips for troubleshooting in any of these bug reports it would be very welcome.
https://gitlab.freedesktop.org/drm/amd/-/issues/1519 navi2 - RX 6900 XT
https://gitlab.freedesktop.org/drm/amd/-/issues/1446 vega RX 56 Red Dragon
https://gitlab.freedesktop.org/drm/amd/-/issues/1293 Fiji-based cards since kernel 5.7
Ultimately, even if it isn't hard to bisect, it doesn't feel fair that AMD is validating their drivers work on x86 before a release but the ppc64le users have to check things after a buggy release.
Unfortunately smaller platforms will almost always get less testing than the more popular platforms, and I don't see that trend changing in the foreseeable future. This is where motivated community members need to come in. I doubt amdgpu developers even have easy access to ppc64le hardware.
Does anybody know about programs from AMD or any other vendor to make their hardware available to volunteers in the Fedora development world?
This is even more critical now that GPU delivery is such a huge problem. Shops that receive stock are selling it at double the RRP. Volunteers won't rush to buy those cards and it will take even longer to fully support them.
Simply having a critical mass of developers with the right hardware can make a huge difference. For example, with 7 developers, maybe a given platform is always overloaded but with 8 developers maybe the backlog of bugs is fixed faster than new bugs are discovered. The same phenomena appears in so many domains: what some of us perceive now is like the congestion on a freeway at peak hour when the cars come in faster than they can go out.
I will also say that regardless of ISA there are going to be times where bisection is needed. I have personally had to bisect and report an issue with amdgpu and I am using x86 hardware. There's also a decent chance I'm going to be bisecting another amdgpu bug this evening. I am not expecting you or others to do things that I am not willing to do myself.
I'm all in favor of collaboration with the AMD and kernel developers
Ultimately, the only way to ensure equality across different architectures is to have upstream developers using all of these architectures throughout their development cycle.
It would of course be great if amd fully tested their drivers on every architecture that Linux supports, but I don't think that's currently a realistic expectation for *any* device/driver vendor. If amdgpu support is something that is important to IBM, Talos, or other members of OpenPower, then I think reaching out to developers and offering free ppc64le hardware or VM access for kernel development and testing would be an excellent start. Providing automated ppc64le build and boot testing for the amd-staging-drm-next tree would be great as well.
There has been such idea within the OpenPOWER Foundation some time ago, to have a lab where (primarily) HW vendors would have access and could test their HW and drivers on a number of different platforms. I suppose we should revive this idea.
In the SIP world, we had SIPit events for this type of testing from time to time. Having people get together can be fun and productive. Maybe it will be possible in 2022 or beyond. From a Fedora perspective, it could be interesting to align it with the release schedule.
https://www.sipforum.org/news-events/test-event-wg-overview-and-charter-sipi...
Dan
How can we encourage greater use of ppc64le and aarch64 in those communities? While it may sound trivial, I made a post here last week about how we can help people choose the right workstation through the wiki:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.o...
I estimate spending one or two hours in my own comparison of the Raptor motherboards and I hope the table allows other developers to save the same amount of time.
While there's no silver bullet, reaching out to the upstream developers (e.g. via their mailing list) and having a conversation with them can't hurt. Understanding their position and what they believe would help with testing is going to be an important part of the solution. _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Wed, 24 Mar 2021 16:13:08 +0100 Daniel Pocock daniel@pocock.pro wrote:
On 22/02/2021 22:43, Dan Horák wrote:
On Mon, 22 Feb 2021 21:19:26 -0000 "Tom Seewald" tseewald@gmail.com wrote:
On 22/02/2021 21:18, Tom Seewald wrote:
Personally, I have an older GPU, RX 580 Polaris series, I will only spend dev time on the AMD Navi GPU issues after AMD makes the RX 6800 XT available in my region. I simply don't have that card and I'm not going to waste money buying the original Navi card, RX 5700, when the new card will arrive imminently.
There is no indication from the bug report that it requires a Navi card to reproduce. The reporter stated that they are using a RX Vega 56 which is the previous gpu generation. Why do you believe this is specific to Navi devices?
To clarify, there are a range of different issues with different amdgpu cards
If anybody can provide tips for troubleshooting in any of these bug reports it would be very welcome.
https://gitlab.freedesktop.org/drm/amd/-/issues/1519 navi2 - RX 6900 XT
https://gitlab.freedesktop.org/drm/amd/-/issues/1446 vega RX 56 Red Dragon
https://gitlab.freedesktop.org/drm/amd/-/issues/1293 Fiji-based cards since kernel 5.7
I suspect there is no other way besides bisecting on a system with the same/similar card ...
also adding another page size related issue https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Polaris-based cards, starting post-5.10, confirmed in 5.11-rc2
but here the original submitter of the bug was able to identify the offending commit
Dan
On 29/03/2021 11:49, Dan Horák wrote:
On Wed, 24 Mar 2021 16:13:08 +0100 Daniel Pocock daniel@pocock.pro wrote:
On 22/02/2021 22:43, Dan Horák wrote:
On Mon, 22 Feb 2021 21:19:26 -0000 "Tom Seewald" tseewald@gmail.com wrote:
On 22/02/2021 21:18, Tom Seewald wrote:
Personally, I have an older GPU, RX 580 Polaris series, I will only spend dev time on the AMD Navi GPU issues after AMD makes the RX 6800 XT available in my region. I simply don't have that card and I'm not going to waste money buying the original Navi card, RX 5700, when the new card will arrive imminently.
There is no indication from the bug report that it requires a Navi card to reproduce. The reporter stated that they are using a RX Vega 56 which is the previous gpu generation. Why do you believe this is specific to Navi devices?
To clarify, there are a range of different issues with different amdgpu cards
If anybody can provide tips for troubleshooting in any of these bug reports it would be very welcome.
https://gitlab.freedesktop.org/drm/amd/-/issues/1519 navi2 - RX 6900 XT
https://gitlab.freedesktop.org/drm/amd/-/issues/1446 vega RX 56 Red Dragon
https://gitlab.freedesktop.org/drm/amd/-/issues/1293 Fiji-based cards since kernel 5.7
I suspect there is no other way besides bisecting on a system with the same/similar card ...
also adding another page size related issue https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Polaris-based cards, starting post-5.10, confirmed in 5.11-rc2
but here the original submitter of the bug was able to identify the offending commit
In #1446 the original reporter has offered to bisect, I didn't see further feedback yet.
I'd be willing to contribute a small amount of time on the RX 6800 XT if I am able to get the card and if I feel the time will be productive, for example, if somebody more familiar with it can give me some specific things to test.
I would genuinely like to have the benefits of PCIe 4 and 16GB video RAM too but the cards are simply not available here.
As the Navi 2 cards appear to be more suited to my purposes, I wasn't really interested in spending time or money on the Navi 1 cards. I'd rather just put the effort in with a Navi 2 card.
Regards,
Daniel
On Mon, 29 Mar 2021 21:33:14 +0200 Daniel Pocock daniel@pocock.pro wrote:
On 29/03/2021 11:49, Dan Horák wrote:
On Wed, 24 Mar 2021 16:13:08 +0100 Daniel Pocock daniel@pocock.pro wrote:
On 22/02/2021 22:43, Dan Horák wrote:
On Mon, 22 Feb 2021 21:19:26 -0000 "Tom Seewald" tseewald@gmail.com wrote:
On 22/02/2021 21:18, Tom Seewald wrote:
Personally, I have an older GPU, RX 580 Polaris series, I will only spend dev time on the AMD Navi GPU issues after AMD makes the RX 6800 XT available in my region. I simply don't have that card and I'm not going to waste money buying the original Navi card, RX 5700, when the new card will arrive imminently.
There is no indication from the bug report that it requires a Navi card to reproduce. The reporter stated that they are using a RX Vega 56 which is the previous gpu generation. Why do you believe this is specific to Navi devices?
To clarify, there are a range of different issues with different amdgpu cards
If anybody can provide tips for troubleshooting in any of these bug reports it would be very welcome.
https://gitlab.freedesktop.org/drm/amd/-/issues/1519 navi2 - RX 6900 XT
https://gitlab.freedesktop.org/drm/amd/-/issues/1446 vega RX 56 Red Dragon
https://gitlab.freedesktop.org/drm/amd/-/issues/1293 Fiji-based cards since kernel 5.7
I suspect there is no other way besides bisecting on a system with the same/similar card ...
also adding another page size related issue https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Polaris-based cards, starting post-5.10, confirmed in 5.11-rc2
but here the original submitter of the bug was able to identify the offending commit
In #1446 the original reporter has offered to bisect, I didn't see further feedback yet.
I'd be willing to contribute a small amount of time on the RX 6800 XT if I am able to get the card and if I feel the time will be productive, for example, if somebody more familiar with it can give me some specific things to test.
I would genuinely like to have the benefits of PCIe 4 and 16GB video RAM too but the cards are simply not available here.
As the Navi 2 cards appear to be more suited to my purposes, I wasn't really interested in spending time or money on the Navi 1 cards. I'd rather just put the effort in with a Navi 2 card.
I have some good news, there is a fix for my issue, actually in 2 parts, both in the generic amdgpu memory management code. IMO it's worth to retry on the other cards as well.
https://gitlab.freedesktop.org/agd5f/linux/-/commit/fe001e70a55d0378328612be... - already in AMD's drm-next tree https://github.com/xen0n/linux/commit/84ada72983838bd7ce54bc32f5d34ac5b5aae1... - being discussed in https://lists.freedesktop.org/archives/dri-devel/2021-March/301805.html
Dan
David Woodhouse wrote:
On Fri, 2021-02-12 at 14:19 -0600, Justin Forbes wrote:
Just as there are many things that expect the 64K page size. I am not doing this.
I don't really find either of those to be compelling arguments, FWIW.
If we have broken code which makes assumptions about page sizes, we should fix it. Whatever assumption it makes.
We certainly *shouldn't* have assumptions like that in userspace, and we *shouldn't* need to rebuild userspace for such a change.
For a long time we had 4KiB pages on ppc32 AND 64KiB pages on ppc64, and we fixed things to use getpagesize() instead of hard-coding any value at all. It's possible that we have regressions, but we should probably fix those.
I think that all architectures should use the same page size if it is technically possible. I do not see any valid reason for ppc64le to be special there.
Once upon a time when we still supported the PS3, the 64KiB page size was fairly painful and inefficient there but IBM forced it because it apparently gave a significant performance win on boxes and workloads that they actually cared about. These days, I think the arguments for 64KiB are certainly no *less* compelling than they were then, when I conceded the change.
The input of the CPU manufacturer is certainly valid, but it should not be the only criterion for such a decision. Portability concerns should also be considered.
The fact that 64 KiB is not even the right choice for all ppc64le machines only makes it even more questionable to make a different tradeoff there than on other platforms, even if the main counterexample is now obsolete.
Kevin Kofler
On Fri, 12 Feb 2021 11:20:21 -0500 Ben Cotton bcotton@redhat.com wrote:
https://fedoraproject.org/wiki/Changes/Power4kPageSize
== Summary ==
On ppc64le, the kernel is currently compiled for 64k page size.
This change proposes using the more common 4k page size.
Some HPC workloads may be disadvantaged slightly. Workstation users are likely to encounter fewer bugs.
Some things, like the AMD Radeon GPU drivers, firmware or related code, appear to be completely non-functional on the 64k page size. Insufficient upstream developers are testing such issues on this architecture.
as usually there are benefits and drawbacks for both the 64k and 4k page sizes. I have been gathering some additional feedback through other channels. The resulting feedback is that 4k pages will or might hit some limitations or untested features in memory allocation algorithms, in the virtual machine management space (HPT vs radix, huge tables, etc), in server features like PMDK or in IO/interrupt management. On the other hand the feedback from other distros running with 4k page is positive. So from a desktop/workstation user perspective going to 4k should be OK.
Tom mentions later in this thread small or no activity in the community on further debugging or bisecting of the reported problems from 64k systems. I agree it's not good, but being able to provide such information to the developers requires both some knowledge and hardware (and time too). Due to the limited community size it might be difficult to get all those factors at the same time. I have some ideas how to improve this, but the various levels of lockdowns across the world don't make things like lab accesses easy.
The open question still is whether we should try to keep 64k as default as it would allow to find the remaining bugs and offer 4k kernel variant (COPR for ppc64le should be coming back soon), similar for the installer (a new remix/spin). After BTRFS removes the page size dependency, switching the kernels shouldn't cause any issues for users.
Dan
== Owner ==
- Name: [[User:pocock|Daniel Pocock]]
- Email: daniel@pocock.pro
== Detailed Description ==
== Feedback == Discussed several times on devel, [https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/... latest here]
[https://forums.raptorcs.com/index.php/topic,248.msg1852.html discussed upstream in the Raptor forum]
== Benefit to Fedora == Better first impression for users of ppc64le workstations.
Users can focus on reporting ppc64le bugs without being sidetracked by page size bugs.
== Scope ==
Proposal owners: [[DanielPocock]]
Other developers: please volunteer by adding your name here
Release engineering: [https://pagure.io/releng/issue/9939 #9939]
** wait for 5.12 kernel, verify that it includes the Btrfs patches for arbitrary 4k / 64k sector size, independent of the page size ** create a kernel with 4k page size to run on the ppc64le build servers ** ensure the default kernel RPM in the distribution has 4k page size ** perform the mass rebuild running on the 4k page size ** create an installer ISO based on the revised kernel with 4k page size
- Policies and guidelines: no, as it is an arch-specific issues, most
other architectures already have a 4k page size
- Trademark approval: N/A (not needed for this Change)
- Alignment with Objectives: none of the current objectives relate to
this change
== Upgrade/compatibility impact == If the user has already formatted their root filesystem with Btrfs and a 64k sector size, they need to be using a Fedora kernel that supports both 4k and 64k. This is anticipated in a future kernel release, 5.12 and will hopefully be ready for F34 or F35[https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/...].
== User Experience == New GPUs are more likely to just work on this non-x86 architecture, as long as the latest firmware, mesa, llvm are also used.
Btrfs, the default filesystem, will use the sector size identical to the running kernel's page size. As the 4k page size is more common, this will ensure Btrfs filesystems created on ppc64le hosts can be used on x86 and other hosts without hassle.
== Dependencies == All RPMs must be rebuilt on a server running the final page size (4k)
== Contingency Plan ==
- Contingency mechanism: Prepare a kernel with the original 64k
config, install it on the build server, rebuild all the packages for this architecture
- Contingency deadline: whenever the last time for a full rebuild or
kernel change is possible
- Blocks release? Yes, full rebuild of all packages must be completed
before release
-- Ben Cotton He / Him / His Senior Program Manager, Fedora & CentOS Stream Red Hat TZ=America/Indiana/Indianapolis _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Mon, Feb 15, 2021 at 6:39 PM Dan Horák dan@danny.cz wrote:
The open question still is whether we should try to keep 64k as default as it would allow to find the remaining bugs and offer 4k kernel variant (COPR for ppc64le should be coming back soon), similar for the installer (a new remix/spin). After BTRFS removes the page size dependency, switching the kernels shouldn't cause any issues for users.
I think it may be instructive to look at the enabling IPv6 had on the entire ecosystem (and going to ipv6-first networking). Which definitely broke things (and there remain, in the greater world, lots of things still broken when IPv6 is enabled). However, if we still used ipv4-first networking even more would almost certainly still be broken, because no one would experience or report the issues with IPv6.
If you agree that fixing the 64K bugs are important (and I personally think they are), you need to go 64K first to get the reports, and get the fixes.
On 15/02/2021 19:47, Gary Buhrmaster wrote:
On Mon, Feb 15, 2021 at 6:39 PM Dan Horák dan@danny.cz wrote:
The open question still is whether we should try to keep 64k as default as it would allow to find the remaining bugs and offer 4k kernel variant (COPR for ppc64le should be coming back soon), similar for the installer (a new remix/spin). After BTRFS removes the page size dependency, switching the kernels shouldn't cause any issues for users.
I think it may be instructive to look at the enabling IPv6 had on the entire ecosystem (and going to ipv6-first networking). Which definitely broke things (and there remain, in the greater world, lots of things still broken when IPv6 is enabled). However, if we still used ipv4-first networking even more would almost certainly still be broken, because no one would experience or report the issues with IPv6.
If you agree that fixing the 64K bugs are important (and I personally think they are), you need to go 64K first to get the reports, and get the fixes.
The problem is that not all ppc64le bugs are related to page size
I was recently looking at ffmpeg issues[1] that happen on any page size, that is now fixed and it also fixes issues in Blender.
Going to 4k page size, we effectively drain the swamp to the half-water mark. Some bugs will go away, other bugs will still be there.
The volume of workstation bugs is actually quite intimidating. Even for somebody with a lot of experience, it takes away a certain amount of energy. Some users and maybe even some developers will spend so much time on hacks and workarounds that they have no time or energy left to report the bugs, bisect them or even fix them.
But I do agree that we can't avoid 64k indefinitely. If there is a way to support both page sizes and run unit tests for all packages on both that would be really useful. In addition to unit tests, it would be useful to have a manual check on Firefox, Thunderbird, LibreOffice, etc before each major release on 64k.
64k issues for ppc64le will also get more attention when other architectures go 64k, then we won't have all the pressure on ppc64le users. IPv6 was for every architecture so the effort was spread a lot more widely.
Regards,
Daniel
On 15/02/2021 19:47, Gary Buhrmaster wrote:
On Mon, Feb 15, 2021 at 6:39 PM Dan Horák dan@danny.cz wrote:
The open question still is whether we should try to keep 64k as default as it would allow to find the remaining bugs and offer 4k kernel variant (COPR for ppc64le should be coming back soon), similar for the installer (a new remix/spin). After BTRFS removes the page size dependency, switching the kernels shouldn't cause any issues for users.
I think it may be instructive to look at the enabling IPv6 had on the entire ecosystem (and going to ipv6-first networking). Which definitely broke things (and there remain, in the greater world, lots of things still broken when IPv6 is enabled). However, if we still used ipv4-first networking even more would almost certainly still be broken, because no one would experience or report the issues with IPv6.
If you agree that fixing the 64K bugs are important (and I personally think they are), you need to go 64K first to get the reports, and get the fixes.
The problem is that not all ppc64le bugs are related to page size
Welcome to the world of non x86 architectures.
I was recently looking at ffmpeg issues[1] that happen on any page size, that is now fixed and it also fixes issues in Blender.
Going to 4k page size, we effectively drain the swamp to the half-water mark. Some bugs will go away, other bugs will still be there.
It doesn't drain the swap at all, it just changes the water from one to another.
The volume of workstation bugs is actually quite intimidating. Even for somebody with a lot of experience, it takes away a certain amount of energy. Some users and maybe even some developers will spend so much time on hacks and workarounds that they have no time or energy left to report the bugs, bisect them or even fix them.
At least you don't have to deal with big endian bugs in there too, and a bunch of us that have been working on non x86 architectures for years have no doubt solved a number of the problems already. aarch64 had a mix of 4K (Fedora) and 64K (RHEL) and we've dealt with 100s of these already, of course that doesn't rule out POWER specific ones.
But I do agree that we can't avoid 64k indefinitely. If there is a way to support both page sizes and run unit tests for all packages on both that would be really useful. In addition to unit tests, it would be useful to have a manual check on Firefox, Thunderbird, LibreOffice, etc before each major release on 64k.
No easily, apparently having something you can set via a kernel command line for this stuff isn't straight forward, I started asking for that functionality for pages sizes back in the early days of aarch64 and I'm still waiting.
64k issues for ppc64le will also get more attention when other architectures go 64k, then we won't have all the pressure on ppc64le users. IPv6 was for every architecture so the effort was spread a lot more widely.
aarch64 also has 64K page sizes so it's already a shared problem, and I've dealt with and fixed more bug around that I care to remember but you're not the only one that's had to deal with it.
On 16/02/2021 17:05, Peter Robinson wrote:
On 15/02/2021 19:47, Gary Buhrmaster wrote:
On Mon, Feb 15, 2021 at 6:39 PM Dan Horák dan@danny.cz wrote:
The open question still is whether we should try to keep 64k as default as it would allow to find the remaining bugs and offer 4k kernel variant (COPR for ppc64le should be coming back soon), similar for the installer (a new remix/spin). After BTRFS removes the page size dependency, switching the kernels shouldn't cause any issues for users.
I think it may be instructive to look at the enabling IPv6 had on the entire ecosystem (and going to ipv6-first networking). Which definitely broke things (and there remain, in the greater world, lots of things still broken when IPv6 is enabled). However, if we still used ipv4-first networking even more would almost certainly still be broken, because no one would experience or report the issues with IPv6.
If you agree that fixing the 64K bugs are important (and I personally think they are), you need to go 64K first to get the reports, and get the fixes.
The problem is that not all ppc64le bugs are related to page size
Welcome to the world of non x86 architectures.
Welcome or welcome back... I started on Color Computer 3 with Motorola 6809.
I was recently looking at ffmpeg issues[1] that happen on any page size, that is now fixed and it also fixes issues in Blender.
Going to 4k page size, we effectively drain the swamp to the half-water mark. Some bugs will go away, other bugs will still be there.
It doesn't drain the swap at all, it just changes the water from one to another.
My personal impression is that the combination of Btrfs page size and GPUs not working were a darker water. To get around that, I had to not only compile a kernel but also create a custom installer image with my kernel. I'm happy to share those things for other users.
The volume of workstation bugs is actually quite intimidating. Even for somebody with a lot of experience, it takes away a certain amount of energy. Some users and maybe even some developers will spend so much time on hacks and workarounds that they have no time or energy left to report the bugs, bisect them or even fix them.
At least you don't have to deal with big endian bugs in there too, and a bunch of us that have been working on non x86 architectures for years have no doubt solved a number of the problems already. aarch64 had a mix of 4K (Fedora) and 64K (RHEL) and we've dealt with 100s of these already, of course that doesn't rule out POWER specific ones.
Actually, as an upstream developer, when politics doesn't prevent me uploading packages to every distribution, I would carefully check unit test results from all architectures on both Fedora and Debian. If builds failed on a specific architecture or big endian I would make the effort to support it. But I understand some people spend a far greater percentage of their time on that than me and I'm glad so many things just work already on POWER9.
But I do agree that we can't avoid 64k indefinitely. If there is a way to support both page sizes and run unit tests for all packages on both that would be really useful. In addition to unit tests, it would be useful to have a manual check on Firefox, Thunderbird, LibreOffice, etc before each major release on 64k.
No easily, apparently having something you can set via a kernel command line for this stuff isn't straight forward, I started asking for that functionality for pages sizes back in the early days of aarch64 and I'm still waiting.
I wasn't really thinking about a runtime option, I was thinking about two completely parallel environments, each with their own copies of userland compiled on kernels with the corresponding page size.
Beyond the unit tests, it would also be interesting to use reproducible builds methods to compare userland binaries and see if they vary depending on the page size of the host where they were built. This could flush out more problems.
64k issues for ppc64le will also get more attention when other architectures go 64k, then we won't have all the pressure on ppc64le users. IPv6 was for every architecture so the effort was spread a lot more widely.
aarch64 also has 64K page sizes so it's already a shared problem, and I've dealt with and fixed more bug around that I care to remember but you're not the only one that's had to deal with it.
Yes and I've seen more reports from people trying that platform too, for example, the recent blog[1] about the HoneyComb
Are there other ways we can collaborate on this, for example, with a wiki page about known 64k issues?
Does anybody want to capture anything from this thread in the wiki page for the change[2] or is there any other place where it would be useful to have a summary?
Regards,
Daniel
1. https://fedoramagazine.org/fedora-aarch64-on-the-solidrun-honeycomb-lx2k/ 2. https://fedoraproject.org/wiki/Changes/Power4kPageSize
devel@lists.stg.fedoraproject.org