RE-POST February: SPO600 Week 6
This post would be about what I have learned during my sixth week of Software Portability and Optimization (SPO600) class.
SVE / SVE2
SVE2 is specifically designed to eliminate the need for a range of different SIMD implementations for different vector register widths.
SVE and SVE2 use vector and predicate registers:
- The vector registers used z, i.e. z1.h.
- The predicate register used p, i.e. p6
The variations within system architectures are called micro-architectural versions.
Micro-architectural Variations
- Implementation Variations
- in-order vs out-of-order execution.
- May affect performance but doesn't affect which instructions can be executed.
- do not cause code compatibility issues.
- Feature Variations
- adding registers, or new instructions.
- Change which instructions can be executed.
- Introduce code compatibility issues.
There are many SIMD implementations to use and it is difficult for programmers to decide which one to target. The possible solution for this is:
- Library Multi-Versioning (LMV)
- Providing multiple binaries and libraries.
- The file is selected at run-time by the operating system.
- Example: GCC HWCAPS Project.
- libraries and binaries are built multiple times for various microarchitectural levels.
- The system selects the best version supported by the current hardware.
- Advantage: Entirely handled by the build system / no code changes.
- Disadvantage: Creates a lot of duplication.
- Function Multi-Versioning (FMV)
- Multiple versions of code exist within the binaries and libraries.
- The code path is selected at run-time by a resolver function.
- Example: GCC/glibc iFunc.
- Multiple versions of functions.
- Resolver functions select function versions at runtime.
- Advantage: Can be strategically applied; causes much less duplication.
- Disadvantage: Requires code changes.
Now accessing our servers and the code used during class:
ll /public/spo600-sve-sve2-ifunc-examples.tgz
tar xvf /public/spo600-sve-sve2-ifunc-examples.tgz
cd sve2-test
make clean
vi Makefile
-march=armv8-a+sve2 - specifying an architecture.
If you remember, our machine does not have SVE and SVE2 capabilities but, we can use emulation to kind of do the job we want using qemu-aarch64.
In the sve2-test folder, these are the files that are available:
Let's look at the image-adjust.c file which allows us to specify a graphic image and brighten or dim the RBG channel individually:
less image-adjust.c
The implementation is selected by the compiler directives that look like the following:
cd tests/input
display bree.jpg
In the scripts subdirectory:
ll scripts
scripts/show_montage
However, in my case I cannot see the image of a cat, since the aarch64 machine that is currently in used does not have SVE and SVE2 capability, therefore I am getting this instead:
From the recorded lecture:
Notice that each core has an asimd flag, meaning that it has the advanced SIMD capability.
To look at the CPU part fields:
grep "CPU part" /proc/cpuinfo
Notice that each of the CPU part fields indicates the same value.
The other machine that has SVE and SVE2 capabilities, has 9 processors, where each core supports asimd, sve, and sve2. Then if we look at the CPU part fields. Four of them have a value, another 4 have a different value, and one has a third value.
When running ./sve-width-instrinsics in the aarch64 server provided by the instructor, it would not work since there is no SVE and SVE2 capabilities. However, it would work properly in a machine that has SVE and SVE2 pcapability.
Now looking at the dummy iFunc implementation:
Comments
Post a Comment