RE-POST February: SPO600 Week 6

This post would be about what I have learned during my sixth week of Software Portability and Optimization (SPO600) class.

SVE / SVE2

SVE2 is specifically designed to eliminate the need for a range of different SIMD implementations for different vector register widths.

SVE and SVE2 use vector and predicate registers:
  • The vector registers used z, i.e. z1.h.
  • The predicate register used p, i.e. p6
The variations within system architectures are called micro-architectural versions.

Micro-architectural Variations

  1. Implementation Variations
    • in-order vs out-of-order execution.
    • May affect performance but doesn't affect which instructions can be executed.
    • do not cause code compatibility issues.
  2. Feature Variations
    • adding registers, or new instructions.
    • Change which instructions can be executed.
    • Introduce code compatibility issues.
There are many SIMD implementations to use and it is difficult for programmers to decide which one to target. The possible solution for this is:
  1. Library Multi-Versioning (LMV)
    • Providing multiple binaries and libraries.
    • The file is selected at run-time by the operating system.
      • Example: GCC HWCAPS Project.
      • libraries and binaries are built multiple times for various microarchitectural levels.
      • The system selects the best version supported by the current hardware.
        • Advantage: Entirely handled by the build system / no code changes.
        • Disadvantage: Creates a lot of duplication.
  2. Function Multi-Versioning (FMV)
    • Multiple versions of code exist within the binaries and libraries.
    • The code path is selected at run-time by a resolver function.
      • Example: GCC/glibc iFunc.
      • Multiple versions of functions.
      • Resolver functions select function versions at runtime.
        • Advantage: Can be strategically applied; causes much less duplication.
        • Disadvantage: Requires code changes.
Now accessing our servers and the code used during class:
ll /public/spo600-sve-sve2-ifunc-examples.tgz
tar xvf /public/spo600-sve-sve2-ifunc-examples.tgz



Now I have the autoifunc and sve2-test available:


    
cd sve2-test
make clean
vi Makefile

Notice that the compiler flag has the -g option and optimization to level 3.
-march=armv8-a+sve2 - specifying an architecture.

If you remember, our machine does not have SVE and SVE2 capabilities but, we can use emulation to kind of do the job we want using qemu-aarch64.
 
In the sve2-test folder, these are the files that are available:
Let's look at the image-adjust.c file which allows us to specify a graphic image and brighten or dim the RBG channel individually:
less image-adjust.c
The heavy lifting is done by the adjust_channels.c which has 4 separate implementations of the same function.

The implementation is selected by the compiler directives that look like the following:

To build this:
make -j17

To display the cat bree:
cd tests/input
display bree.jpg

In the scripts subdirectory:
ll scripts
To show all the cat montage results:
scripts/show_montage

However, in my case I cannot see the image of a cat, since the aarch64 machine that is currently in used does not have SVE and SVE2 capability, therefore I am getting this instead:

From the recorded lecture
cat /etc/fedora-release
cat /proc/cpuinfo
Notice that each core has an asimd flag, meaning that it has the advanced SIMD capability.

To look at the CPU part fields:
grep "CPU part" /proc/cpuinfo

Notice that each of the CPU part fields indicates the same value.

The other machine that has SVE and SVE2 capabilities, has 9 processors, where each core supports asimd, sve, and sve2. Then if we look at the CPU part fields. Four of them have a value, another 4 have a different value, and one has a third value.

When running ./sve-width-instrinsics in the aarch64 server provided by the instructor, it would not work since there is no SVE and SVE2 capabilities. However, it would work properly in a machine that has SVE and SVE2 pcapability.

Now looking at the dummy iFunc implementation:
    
  
cd ~/ifunc
make
./ifunc-test
To see how this iFunc capability could be automatically applied to some code:

      
cd ~/autoifunc
vi function.c
scripts/autoifunc function.c

    
vi function_ifunc.c
make clean
make
make all-test
objdump -d main | less


Comments

Popular posts from this blog