Writing portable ARM64 assembly

Apr 13, 2023

An unfortunate side effect of the rising popularity of Apple’s ARM-based computers is an increase in unportable assembly code which targets the 64-bit ARM ISA. This is because developers are writing these bits of assembly code to speed up their programs when run on Apple’s ARM-based computers, without considering the other 64-bit ARM devices out there, such as SBCs and servers running Linux or BSD.

The good news is that it is very easy to write assembly which targets Apple’s computers as well as the other 64-bit ARM devices running operating systems other than Darwin. It just requires being aware of a few differences between the Mach-O and ELF ABIs, as well as knowing what Apple-specific syntax extensions to avoid. By following the guidance in this blog, you will be able to write assembly code which is portable between Apple’s toolchain, the official ARM assembly toolchain, and the GNU toolchain.

Differences between the ELF and Mach-O ABIs

Modern UNIX systems, including Linux-based systems largely use the ELF binary format. Apple uses Mach-O in Darwin instead for historical reasons. This is not a requirement for Apple imposed by their use of Mach, indeed, OSFMK, the kernel that Darwin, MkLinux and OSF/1 are all based on, supports ELF binaries just fine. Apple just decided to use the Mach-O format instead.

When it comes to writing assembly (or, really, just linking code in general) targeting Darwin, the main difference to be aware of is that all symbols are prefixed with a single underscore. For example, if you have a function that would be declared in C like:

extern void unmask(const char *payload, const char *mask, size_t len);

On Darwin, the function in your assembly code must be defined as _unmask.

The other major difference is that ELF defines different classes of data, for example STT_FUNC and STT_OBJECT. There is no equivalence in Mach-O, and thus the .type directive that you would use when writing assembly for ELF targets is not supported.

A brief note on Platform ABIs

You will also need to be aware of minor differences between the Darwin ABI and other platform ABIs. A notable example is that the x18 register is reserved by the Darwin ABI and is explicitly zeroed on context switches in some cases. This register is also reserved on Android, but not on GNU/Linux or Alpine.

Apple-specific vector mnemonics

The other main thing to watch out for is Apple’s custom mnemonics for NEON. In order to make writing NEON code less cumbersome, Apple introduced a set of mnemonics that allow simplification of specifying NEON instructions. For example, if you are targeting Apple devices only, you might write an exclusive-or NEON instruction like so:

eor.16b v2, v2, v0

This is an Apple-specific extension to the ARM assembly syntax. The official ARM assembly manual specifies that the memory layout must be specified for each register:

eor     v2.16b, v2.16b, v0.16b

Abstracting the ABI details with some macros

The good news is that the ABI details can easily be abstracted with a few macros. As for using NEON functions, the answer is simple: stick to what the ARM manual says to do, rather than using Apple’s mnemonics.

There are two macros that you need. These can be placed in a header file somewhere if wanted.

The first macro allows you to deal with the underscore requirement of the Darwin ABI:

#ifdef __APPLE__
# define PROC_NAME(__proc) _ ## __proc
#else
# define PROC_NAME(__proc) __proc
#endif

The second macro is optional, but it allows you to define the correct ELF symbol types outside of Apple’s toolchain:

#ifdef __clang__
# define TYPE(__proc, __typ)
#else
# define TYPE(__proc, __typ) .type __proc, __typ
#endif

Then you just write your assembly as normal, but using these macros:

.global PROC_NAME(unmask)
.align 2
TYPE(unmask, @function)
PROC_NAME(unmask):
   ...

And that’s all there is to it. As long as you follow these guidelines, you will have assembly which is portable to any UNIX-like environment on 64-bit ARM.