RE-POST February: SPO600 Week 4 Part 2

This post would be a continuation about what I have learned during my fourth week of Software Portability and Optimization (SPO600) class.

Two Computer Architectures

To get good system performance from the CPU.
  1. RISC - Reduced Instruction Set Computing - AArch64
  2. CISC - Complex Instruction Set Computing - x86

RISC

Concept: Choose a very small number of instructions to optimize the hardware to execute those few instructions fast and put a lot of intelligence in the compiler to pick and optimize the instruction stream.

At the time, this was the best path forward because it involved high clock rates.

CISC

Concept: Make the machines capable of doing heavy lifting. Meaning that there would be more complex instructions that did more complicated things in a single instruction. This would require less intelligence in the compilers.

Currently, every modern powerful processor includes both RISC and CISC aspects.

Instruction Set Architecture (ISA)

The design of the instruction sets or instruction sets that a processor is going to execute. For example:
  1. Internal structure and layout of the processor (register file).
    • How many register?
    • How wide?
    • Are they dedicated to particular uses or can any register be used for anything?
  2. Operations that process (+, -, *, /, etc...)
  3. Write as 6502 or encode everything into a fixed length binary value.

x86_64 Registers

16 general-purpose registers:
Register name
RAX Register A Extended
RBX Register B Extended
RCX Register C Extended
RDX Register D Extended
RBP Register Base Point (start of stack)
RSP Register Stack Point (current stack location, growing down)
RSI Register Source Index
RSD Register Destination Index
R8 ... R15 Remaining General-Purpose Registers

In a 32 bits - the high bits are ignored and the registers name are: EAX, EBX, ECX, EDX, R8D ... R15D.
The D means double word since a word is 16 bits.

to get the lowest 16 bits of the 64 bits, use: AX, BX, CX, DX, R8W ... R15W.

It is also possible to refer to the upper 8 bits of a 16 bits value using: AH, BH, CH, DH

The lowest 8 bits can be referred to as AL, BL, CL, DL, R8L ... R15L.

32 bits is typically used for graphics data, 16 bits for audio data, and 8 bits for character data such as UTF-8 or ASCII.

AArch64 Registers

31 general-purpose registers plus one special register:
Register name
R0 ... R30 Registers 0 to 30
RSP / RZR Register for Stack Pointer, or Register for Zero

The "Register 31" can either be:
  • RSP: For instruction that can adjust the stack usefully.
  • RZR: For all other instructions that provide a zero value when reading, discard any value written.
When accessing the register in software, the assembler would need to know if it's 64 bits or 32 bits.
The 64-bit registers would be XS0 ... X30, and XZR, while the 32 bits would be W0 ... W30, and WZR.

The LDR_ (Load D Register) and STR_ (STore Register) accept a suffix character indicating the number of bits to be loaded or stored:
  • Q = Quadword = 64 bits
  • D = Double word = 32 bits
  • W = Word = 16 bits
  • B = Byte = 8 bits

Application Binary Interface (ABI)

A way to communicate with the operating system (OS) and other pieces of code such as subroutines, methods, or procedures. This step is basically a decision and voluntary.
  1. Procedure calls
  2. Syscalls - ask OS to do something on your behalf

Executable and Linkable Format (ELF)

Building some software into default output from our compiler toolchain.

In both the AArch64 and x86_64 machine:

ll /public/spo600-assemble-lab-examples.tgz

The extension .tgz file means that it is a tar file created with the tar archiving program and it has been compressed.

To expand the archive:
tar xvf public/spo600-assemble-lab-examples.tgz


    
cd spo600
cd examples
cd hello
ll
// OR
cd spo600/examples/hello


    
cd c
make clean
ll
To look into the hello.c file, the following command was done:
cat hello.c

To build the program:
gcc hello.c -o hello

To analyze the hello file, the following command was done:
file hello



Both machines return a 64-but ELF file which is executable. LSB means that the least significant byte is first (little endian). Not stripped means that there are still symbols present in the file.

To run the file:
./hello

To look at the size of the file:
ll hello
>


To know more about a particular file, use:
objdump --help        // list of objdump commands
objdump -h hello

To zoom in/out of your terminal in Windows, use CTRL + scroll up/down, can also do CTRL + "+"/"-". To clear the terminal, use CTRL + l.


The one that we care would be:
  • text - the body of our program (machine language).
  • plt - procedure linking table, where we connect procedures that are in shared libraries.
  • rodata - read only data.
It is possible to disassemble the code which will look similar to the 6502:
    
objdump -d hello | less
/<main>



Notice that in the AArch64 machine, all the instructions are the same length (32 bits), while in the x86 machine, some instructions are in 1 byte, 3 bytes, and even 5 bytes. In more complicated programs, it is possible to see really long instructions of up to 17 bytes.

To understand these instructions, different references can be referred to:
ldd hello - analyze hello program
See that 3 libraries are required to run this program, which is not linked to this program (not ELF).
  1. A library that provides access to the OS from the application.
  2. The address indicated that this is already in memory, thus the existing copy of that library can be shared instead of allocating more memory to upload that library.
  3. A linker library to connect the libc library to our executable
To build more versions of this library, use the make command:

The built-in function has been turned off (-fno-builtin) and the static version is going to have the printf function taken from the libc library and inserted into ELF file. The optimization is at level 3.

After cleaning it with make clean and using make command again:

Now to look at those files, using ll hello hello-static hello-opt command:


Notice that the hello and hello-opt file sizes are almost the same while the hello-static size is much larger.

Now to see the disassemble code again: objdump -d hello | less
As you can see, it is no longer using <puts@plt> but instead uses <printf@plt>.

To observe this software being executed, use: strace ./hello |& less

This shows us all the interactions between the OS and this piece of software.
mmap - take a file and map it into memory. It's setting up the shared library so that we can access lib64/libc.so.6 file.

Comparing it with: strace ./hello-static |& less

Notice that there is no mention of opening the library and mmap.

In hello2.c:
Notice that it uses the write() function instead of the printf() function. Write is a wrapper function that is provided in the C library.

objdump -d hello2 |& less:

Now using the third version: cat hello3.c

objdump -d hello3 | less:

Now let's look at this in assembler: 
cd ../assembler
cd x86_64
// OR
cd aarch64

This is where things would diverge since the assembler is platform-specific.



There are 2 versions on the x86 side due to the assembler syntax that was never fully standardized.



.text - wants the code to be placed into the text section of the ELF file
.global _start - very important. Identify the global symbol which needs to be maintained and visible in the final executable. This symbol is also being used as a label.
.ascii - similar to dcb in 6502.

Use make command:





Let's take a look: ll hello


Notice the smaller size of the executable file compared to when it was compiled with C. This time, the x86 machine is larger than the aarch64 machine.

If we do: objdump -d hello


Notice that it is really compact compared to the compiler in C. The only code that is here is the same as in the cat hello.s file.

However, what is similar is the number of bytes each instruction takes. In aarch64, they use 4 bytes for every instruction while in x86 would vary. Right now, it uses mostly 7 and the system calls uses 2 bytes.

Comments

Popular posts from this blog