RE-POST February: SPO600 Week 3 Part 1
This post will be about what I have learned during my third week of Software Portability and Optimization (SPO600) class.
Compilers: Targets & Tuning
Now that I have a bit of knowledge about the 6502 machine, it is time to
learn the 64-bit assembler.
The compiler would pick up the characteristics of the machine being used
automatically, which will be used as the default on knowing how to build the
software.
There are 2 reasons, why it is better not to have the compiler pick up those
characteristics.
- We want to
build cross-platform with the toolchain.
- When the
target is not a completely different type of system, but potentially
different classes of the same architecture.
There are 2 separate and related concepts that the compiler controls what it
outputs when building the software. This is the main topic of this post:
- Target
- Tuning
There are 2 options:
- Architecture
flag - specify the architecture that we want to target
- -march=target - controls
which instructions can be included in the instruction stream
- Tuning
flag - disables feature foo
- -mtune=target - know something about the system that we are building for and make some decisions
based on that. Controls the instruction tuning.
- Useful
for adjustment of software performance to work best with the latest and
greatest machine.
-m means machine
The x86_64 has continued to advance throughout these years due to the
vector or single instruction multiple data. In other words, the ability to
perform one operation, such as arithmetic or logical or comparison operations
to perform a single operation with multiple pieces of data in parallel.
Single Instruction/Multiple Data (SIMD)
SIMD is the latest and greatest instruction that doesn't run on older
processors.
Currently, it is no longer possible to build software that fits all. There
would be a possibility of losing 30-50% of performance by targeting older versions
of an architecture or breaking compatibility which is a significant problem.
Number of Registers
Another significant problem is in respect to the number of registers. In
this course, 6502 assembler is being used which has 3 registers available (X,
Y, and A). However, if a machine has 5 registers and some of the code uses the
other 2 registers. Then, if we take the same code and run it in the 6502
assembler, whenever we try to access the 2 registers, the software blows
up because there is no such register in the 6502 machine.
ARM64 and x86-64 Architecture
In class, the instructor has given us access to a 64-bit ARM machine and a
64-bit x86 machine, where the steps are found here. A couple of steps were different in my machine, which will be discussed in another post.
Commands |
Descriptions |
ls
<folder> |
list folders
and files in the current directory. Can also mention the <folder> name to
see the content inside that specified folder. |
uname -a |
Information
about the kernel and architecture. |
free |
Can look at how
much RAM memory each machine has. |
less
/proc/cpuinfo |
Look at CPU
info. |
clear |
clear the
terminal |
ll |
AKA ls -l,
which means long list. |
Table 1: List of commands that can be used in both 64-bit
ARM machines and 64-bit x86 machines.
ARM64 Machine
It has a Cortex A72 processor.
Commands |
Descriptions |
less
adjust_channels |
Contains
handwritten assembler that's embedded right into the middle of C code. |
make clean |
Wipe out the
software. |
make |
Set to build
the software and will perform some tasks. |
qemu-aarch64 |
emulation
tool to jump over illegal instructions. |
time |
time how long
the command takes to run. |
set|grep PATH |
show library
search path. |
echo $PATH |
show search
path for executables. |
Table 2: List of commands used for an ARM 64 machine.
After doing the make
command, we can see -march=armv8-a+sve2
option. Where:
- -m - flag that specifies
information about the machine
- armv8 - specifies the
architecture
- arm - family of
architectures
- v8 - the architectural
level version 8 which is the first of the 64-bit architecture levels from
ARM. Note that there may be a decimal number after the 8 which means
there were some sort of minor improvements that the company has made to
that architecture through the years.
- -a - the particular ARM
architecture level is targeted at an application processor and that is as
opposed to the processors that are intended for an embedded context or
real-time context where ultra-fine timing control is important.
- +sve2 - additional feature
which tells the compiler it is ok to use instructions that use SVE2
feature.
Therefore, all the instructions that are compatible with ARMv8 device and
any instructions, assembly language, or machine language instructions that use
the SVE2 capability are ok to include in the software that's going to be
emitted.
The make
command will also run
some tests:
./image-adjust4 tests/input/bree.jpg 2.0 2.0 2.0 tests/output/bree4c.jpg
The 2.0 are adjustment
factors where the red, green, and blue channels are going to be doubled in
brightness.
However, when running this command, the output would be:
illegal instruction (core dumped)
This is due to software being built in ARMv8 but with SVE2 capabilities
which the CPU does not have any idea of what instruction is supposed to do
because it doesn't include SVE2.
Solutions:
- Rebuild
the software for just ARMv8
- Use a
software emulation tool where it will jump in every time there is
an illegal instruction.
- use
qemu-aarch64
before the testing line. This emulation would run at 1% of the speed of the hardware. - Not
really useful in real-life deployment. For example in x86, it installs a
co-processor (second chip) to handle floating-point and their operations.
However, this is slow.
In this case, it would be ok because most of the instructions are going to
execute fine. The few instructions that would use SVE2 instructions are going
to be handled in software.
Different Ways to Build Software
- For the
lowest common denominator. Pick an architecture level that is vaguely modern
but not the latest and greatest that only 2% of the population own.
- In such
a way that at runtime, it figures out the capabilities of the machine.
Therefore, would test the machine and check what capability this machine
is able to execute. After making that decision, decide between different
code paths:
- This
is the most common method currently being used.
- Optimize
at the library level and relax on the hardware capabilities.
- Might
have different versions of the functions and do an assessment about what
the machine does and based on that, a certain version of that
particular function would be used.
- Detection
of the hardware is done at the beginning so that we don't constantly do
it for each different function. Then use a function pointer to get the
specific version of the function that is appropriate for the machine.
- Significant
burden on the developer. Only use it for heavy-duty number crunching or
data crunching (multimedia, cryptography, data compression,
decompression, AI, etc.)
iFunc
A toolchain that would make a decision once when the software initializes and remembers that decision by setting the function pointer appropriately. There will be more information about it next week.
Comments
Post a Comment