Go Assembly on the arm64

If you have ever looked at Go’s assembly language, you were likely confused into giving up. Against all odds, I recently managed to implement some Go assembly that lets you call a cdecl function in a native library directly from Go on the new M1 Apple chip platform, which is arm64/POSIX based. My joy at never having to look at Go assembly again was short-lived, however, as several of my friends asked me to write something up to explain what I had learned about Go assembly, thus forcing me to relive this traumatic experience in some detail. So here it is friends, I hope it helps.

Why would you use Go Assembly?

Go assembly lets you define a function that you can call from Go that can use all of the low-level assembly operators available on your platform. This means you can avoid using CGO entirely, and avoid having to set up the heinous external cross-compiling toolchains that CGO requires. Go itself can assemble Go assembly on every platform without installing external tools.

What is Go Assembly?

plan9 was a particular set of design principles applied to recreating Unix at Bell Labs in the 80’s. Golang is essentially the plan9 re-implementation of C++, and the Go assembler is based directly on the plan9 assembler (notice the common thread from plan9 to Golang is Rob Pike). Much like plan9 itself, Golang at first seems confusing, but over time you start to see the method behind the madness. Everything is consistent in a way that C++ isn’t, and removing inconsistencies lets you write better code faster, once you get used to it.

This is not true of the Go assembler, however. If there is some underlying pattern that makes everything consistently understandable, I haven’t seen it. It’s best to understand upfront that you are marching into the mouth of madness and stop trying to force things to make sense. Just relax and accept that the underlying design decisions are probably beyond our ability to comprehend.

Try to remember HOW this works and don’t worry about WHY it is this way. Trust me on this. Take some deep breaths, and here we go.

Go assembly does not have a one-to-one correspondence with native assembly

The most confusing part about Go assembly is that it looks like native assembly, but it isn’t.

It renames almost all the registers

For example, on arm64, the main registers that the Arm spec calls X0-X7 are referred to as R0-R7 in Go assembly. You will have to check the Go assembler documentation for every platform you work on to see what the registers have been renamed to.

It renames some of the operators entirely

For example, on arm64, the branch-to-register-with-link operator BLR is renamed to CALL, but BL is not renamed. This means that for every operator you want to use, you will have to check and make sure that the Go version of that operator for your platform is the same operator.

It changes the order of some arguments to operators from native assembly, but not all of them

It’s not even as simple as changing the order from right-to-left to left-to-right. For many operators, such as the ones that assign to pairs of registers, the order of the first two arguments is the same as native assembly, but the order of the rest of the arguments is reversed!

So not only will you have to check that the operator names haven’t changed, you’ll also have to check to see how the order of the operands is different in Go assembly.

Go assembly is not meant for you

This is very clearly a tool meant for use mainly by developers on the Go language runtime itself. There are tons of useful Go assembly libraries inside Go, but they’re all restricted to being called only from inside the Go runtime.

Even though it’s an abstraction over native assembly, Go assembly is different for every platform anyway

The Go assemblers for different platforms have some things in common, but they’re all different and will have different register and operator renaming rules. This means that you will have to write a different version of your Go assembly function for every target platform that you want it to run on!

Be very careful, because the same instructions (like MOVD or CALL) might have the same names in Go assembly from one platform to the other, but they might have very different behavior! Go assembly is really totally different variants of Go assembly for every platform.

The only documentation you really have to work with are two long pages that describe the naming of registers and operator arguments for arm64 and ppc64. You can text-search on these pages for the native assembly operators you’re familiar with and figure out which Go assembly operator corresponds to them. For every other platform, the only documentation is short notes on the main page.

Go assembly files typically make heavy use of build constraints as file suffixes, so that the right Go assembly file will be automatically included when you build for a particular platform. For a typical function, you’ll want to target amd64, arm64, and maybe 386, resulting in three different implementations in the files myfunc_amd64.s, myfunc_arm64.s, and myfunc_386.s. This also means that you will need to look at the Go assembler documentation for three different platforms, check the register and operator renaming and also how the arguments have been re-ordered.

Nice things about Go Assembly

This is a short list, but there are a few nice things about Go assembly over native assembly.

By default, it automatically aligns to 16 bytes, same as arm64 requires. You can easily change alignment at any point inside a function with the PCALIGN instruction, and it will automatically align the whole function correctly for you.

You don’t need to monkey with the stack as much, because it handles some of this for you.

Go assembly adds a bunch of strange rules that you don’t have to deal with in native assembly

To illustrate this, let’s look at a simple arm64 function in Go assembly that you can call from Go that simply calls a native function using the C calling convention (cdecl) with no arguments. The Go assembly function takes one argument, which is the address of the native function to call.

TEXT ·call0(SB),4,$16-8 
  MOVD    addr+0(FP), R0
  CALL    R0
  MOVD    R0, ret+8(FP)
  RET

TEXT ·call0(SB),4,$16-8

MOVD addr+0(FP), R0

CALL R0

MOVD R0, ret+8(FP)

RET

Function Definition

Line 1 is the function definition. The good news is that the function definitions are the same for all platforms in Go assembly. They start with the TEXT statement, which says this function should be mapped into the TEXT segment, which just means that it’s executable code and not data. So far so good.

Following that, however, is the function name, which begins with the bizarre centered dot character “·“. This character is a special symbol that tells the Go assembler that this function should be exposed to Go code. It’s possible to create assembly functions without this character, but they won’t be callable from Go. I have no idea how to type this character, I have been copy-pasting it from place to place since the first Go assembly file I ever saw.

After the centered dot is the function name and the (SB) part. This is not syntax similar to function parameters in Go, this is actually part of Go assembly’s weird addressing scheme which goes SYMBOLNAME+offset(relative to register), which you’ll see more of later. Function names are always declared relative to a special make-believe register called SB for “stack base”. Functions always start at the stack base with no offset, so they’re always fn(SB), no matter how many arguments they take.

Next on Line 1, we have “,4,”. That argument to a function declaration tells the Go assembler how to set up the frame for this function call. The value 4 here means NOSPLIT, which tells the Go assembler not to add a frame preamble that lets the frame be split. The other possible values for this are described in the Go assembly documentation here, but NOSPLIT should be fine unless your function gets very large.

Finally, on Line 1, we have this “$16-8” argument. The dollar sign is meaningless. The first number is the size of this function’s frame (16 bytes) and the second number is the size of the arguments to this function, which live in the caller’s frame (8 bytes, the length of the single 64-bit pointer argument to call0). If the function is set to NOSPLIT, these values are ignored. If the function is not set to NOSPLIT, they must be correct, so I try to set them correctly in case I turn NOSPLIT off at some point. The frame size appears to include the parameters from the caller’s frame as well as the return value from the current frame, so one way of thinking about this is that the frame extends from -8 (to include the argument passed to call0) forward 16 bytes (to include the return value from call0).

Now we’re finally off the first line and we get into the actual assembly!

Parameter and Return Naming

MOVD    addr+0(FP), R0

1	MOVD addr+0(FP), R0

On the next line of assembly, we see that SYMBOLNAME+offset(relative to register) format again in a reference to another make-believe register FP (for “frame pointer”). The FP register points at the base of the current stack frame, and all references to the FP register are REQUIRED to have a symbol name associated with them or it’s a compile error. Those symbols correspond to the names of the arguments being passed in.

Go assembly is left-to-right, meaning that MOV instructions copy from the left to the right. The above line of assembly is copying the value found at 0 bytes from the base of the frame (FP), which is named “addr”, to the first arm64 register (R0).

CALL    R0

CALL R0

As mentioned previously, the CALL operator in arm64 Go assembly maps to the BLR (branch-with-link-to-register) operator in native arm64 assembly. This calls the function whose address we passed in as an argument and copied to R0.

MOVD    R0, ret+8(FP)

1	MOVD R0, ret+8(FP)

On the next line, we set up the return value. Since this is on arm64, we have to read the Parameter Passing part of the arm64 ABI to learn that function parameters are passed in on the R0-R8 registers and return values are passed back with the same registers. The first return value will come back on R0, just like the first argument was on R0. To copy this value to the last 8 bytes of our 16 byte frame, we again have to use the SYMBOLNAME+offset(relative to register) syntax. We assign it the name “ret” and copy it to an offset of +8 bytes relative to the frame pointer.

The final line is a RET instruction, which returns from the function.

Passing an Argument with CALL

Well, we’ve made it this far, let’s take a look at the call1 function, which takes two arguments from Go (the pointer to call and a single argument to pass to that function).

TEXT ·call1(SB),4,$24-16
  MOVD    addr+0(FP), R1
  MOVD    arg1+8(FP), R0 
  CALL    R1
  MOVD    R0, ret+16(FP)
  RET

TEXT ·call1(SB),4,$24-16

MOVD addr+0(FP), R1

MOVD arg1+8(FP), R0

CALL R1

MOVD R0, ret+16(FP)

RET

This is very similar to call0. Please notice that we’ve increased the frame size and size of arguments for call1 by 8 bytes each, because we’re taking one more argument from Go, which will be passed to the internal function.

Now we’re going to use R0 for the first argument to pass to the internal function, and R1 to hold the address of the internal function. So on the first line, we copy the first argument from Go (named “addr”) from the base of the frame pointer to R1. On the next line, we copy the second argument from Go (named “arg1”) from an offset of +8 from the frame pointer to R0. Then we CALL the function at the address in R1, which will use the argument we set up in R0 as its first argument. Then we copy the return result from that CALL from R0 to a value named “ret” at an offset of +16 from the frame pointer and return.

If that made sense to you, I regret to inform you that you have already gone mad. My condolences, but now you can join my support group.

Other Notes

symbolname+8(SP) is referring to a different register than 8(SP)!

Similar to the make-believe registers FP and SB, there is another make-believe Go assembly register SP for “stack pointer”. While this is a real register on some platforms, on arm64, it is emulated. What’s worse is that there is another register on arm64 called SP which is not the stack pointer.

To resolve this conflict, Go assembly does something truly confusing. When the SYMBOLNAME+offset(SP) syntax is used (required when referencing the stack pointer as well, where the names are for local variables), Go assembly interprets this as meaning the “stack pointer” SP register and not the native arm64 register named SP. However, if you use just the offset(SP) syntax, it will refer to the native arm64 register instead.

GodBolt for arm64

Using this link, you can load GodBolt in arm64 mode, where you can code in C and immediately see the arm64 native assembly that the compiler generates.

Then you can go and search for each operator from the native assembly in the Go docs for the arm64 assembler and check if they have been renamed and how the operator order has changed.

In this way, it is possible to slowly convert native assembly to Go assembly. However, you will also have to remove the alignment code that the native compiler generates, because Go assembly will do this for you. You also have to replace the native assembly that references stack values with named references to the FP and SP pointers, for function arguments and local variables respectively. It’s probably best to use this method of conversion only for certain parts of native functions and to manually write the parts that reference FP and SP.

Conclusion

Go assembly was an attempt to make a slightly more universal style of assembly back before the invention of the modern generation of platform-independent intermediate representations like LLVM-IR. I would guess that its existence as an integral part of Go’s implementation is a throwback to Go’s plan9 origins, rather than a Good Thing.

I hope that sharing my suffering and the small bit of knowledge I’ve gained will help you in getting through whatever it was that brought you to this page in the first place… and I’m quite impressed with your tolerance for utter madness if you’ve made it this far.

Stay tuned to this channel, my next post will be what I actually made with Go assembler, and it’s a doozy! Here’s a hint, it’s the Go arm64 assembly implementations of call0 up to call6.

“I find this type of hackery distasteful.” – Rob Pike, referring to my crew