Now that pveclib https://munroesj52.github.io/ is safely integrated as a Fedora package and Fedora 31 is released, I though it was time to offer my expertise to package developers doing vector code.
In a previous life I learned a lot about the PowerISA and as the initial contributor of GCC's PowerPC implementation of the Intel (tm) vector intrinsic headers and as the primary implementer of pveclib, I learned a lot about the differences, similarities, and generally how GCC behaves when compiling vector codes and intrinsics.
This was documented in the" Linux on Power Porting Guide: Vector Intrinsics http://openpowerfoundation.org/wp-content/uploads/resources/Vector-Intrinsic...
And continues with the implementation of pveclib and associated documentation.
If this is of interest or you have issues, problems, questions about the best, good, practical approaches to vector programing for you package, Sent me a note and I will try to answer.
Hello, Steven.
On Monday, 28 October 2019 at 21:23, Steven Munroe wrote: [...]
If this is of interest or you have issues, problems, questions about the best, good, practical approaches to vector programing for you package, Sent me a note and I will try to answer.
Thanks! I'll try to remember to Cc: you on PPC related issues if you don't mind, then.
Regards, Dominik
[Posting, as this may be of general interest.]
Steven Munroe munroesj52@gmail.com writes:
Now that pveclib https://munroesj52.github.io/ is safely integrated as a Fedora package and Fedora 31 is released, I though it was time to offer my expertise to package developers doing vector code.
Thanks. One possibility is the BLIS linear algebra library. It provides a BLAS interface, with which I guess means something if you do SIMD implementation. It currently has sub-optimal support on ppc64le, since it uses generic C, not tuned code. (GCC does vectorize the C -- despite the continuing doubts of the developers! -- but I don't currently have access to POWER9 for measurements to find out how it actually performs.) IBM were supposed to be contributing a tuned implementation, but that hasn't appeared, so I assume one would be welcome. There's a single micro-kernel to implement at a minimum, though it might need separate ones for POWER 8 and 9.
The project home is https://github.com/flame/blis. Un-merged code to dispatch dynamically on POWER micro-architecture is under https://github.com/flame/blis/pull/345 but isn't necessary.
[Reasons to prefer BLIS over OpenBLAS generally are support for AVX512 on x86_64 and potentially better threaded performance https://github.com/flame/blis/blob/master/docs/Performance.md. However, lack of dynamic micro-architecture selection for POWER and ARM in the current release is a drawback, and OpenBLAS has more hand-tuning of various operations that probably help some applications.]
This might be interesting, but a bit vague. To be actionable I will need a little more information ....
Can you name the specific package, version, file, and function?
Does this function have a unit test (make check) and a *repeatable* performance test build into the package?
Otherwise this could turn into a bit of snip hunt.
On Wed, Oct 30, 2019 at 9:34 AM Dave Love loveshack@fedoraproject.org wrote:
[Posting, as this may be of general interest.]
Steven Munroe munroesj52@gmail.com writes:
Now that pveclib https://munroesj52.github.io/ is safely integrated
as a
Fedora package and Fedora 31 is released, I though it was time to offer
my
expertise to package developers doing vector code.
Thanks. One possibility is the BLIS linear algebra library. It provides a BLAS interface, with which I guess means something if you do SIMD implementation. It currently has sub-optimal support on ppc64le, since it uses generic C, not tuned code. (GCC does vectorize the C -- despite the continuing doubts of the developers! -- but I don't currently have access to POWER9 for measurements to find out how it actually performs.) IBM were supposed to be contributing a tuned implementation, but that hasn't appeared, so I assume one would be welcome. There's a single micro-kernel to implement at a minimum, though it might need separate ones for POWER 8 and 9.
The project home is https://github.com/flame/blis. Un-merged code to dispatch dynamically on POWER micro-architecture is under https://github.com/flame/blis/pull/345 but isn't necessary.
[Reasons to prefer BLIS over OpenBLAS generally are support for AVX512 on x86_64 and potentially better threaded performance https://github.com/flame/blis/blob/master/docs/Performance.md. However, lack of dynamic micro-architecture selection for POWER and ARM in the current release is a drawback, and OpenBLAS has more hand-tuning of various operations that probably help some applications.]
Steven Munroe munroesj52@gmail.com writes:
This might be interesting, but a bit vague. To be actionable I will need a little more information ....
Can you name the specific package, version, file, and function?
Does this function have a unit test (make check) and a *repeatable* performance test build into the package?
Otherwise this could turn into a bit of snip hunt.
It sounds as though that wasn't the sort of thing you were thinking of. However, a POWER9 micro-kernel has actually just been added, so this is probably moot. If you're interested, see https://github.com/flame/blis/commit/b426f9e04e5499c6f9c752e49c33800bfaadda4c
scitech@lists.fedoraproject.org