The Universal Loader for Go

As promised in the previous post, Go Assembly on the arm64, I have been working on a very special project for the past couple months, and I’m very pleased to announce the Universal Loader!

This Golang library provides a consistent interface across all platforms for loading shared libraries from memory and without using CGO. When I say “all platforms”, I mean Linux, Windows, and OSX including the new M1 Apple chip.

Until someone tells me differently, I am claiming that this is the very first loader to work on the M1, Golang or not. I haven’t tried it myself yet, but it will likely also work on any POSIX system, like BSD variants, without changes. If you try this on a new platform, let me know!

Additionally, the Linux backend of the loader does not use memfd, and I believe this is the first Golang loader to do that as well. The Linux ramdisk implementation memfd is very easy to use, but it’s also relatively easy to detect.

Consistent Interface

On all platforms, this is a basic example of how to use Universal to load a shared library from memory and call an exported symbol inside it:

Just pass in your library as a byte array, call LoadLibrary on it and give it a name, then you can Call() any exported symbol in that library with whatever arguments you want.

All you have to do differently on a different platform is load the right type of library for your platform. On OSX, you would load myLibrary.dyld, on Linux, and on Windows myLibrary.DLL.

Check out the examples/ folder in the repo and the tests for more details.

Algorithms and References

The Linux and Windows implementations are very straight-forward and based on the same basic algorithms that have been widely used for years, and I used malisal’s excellent repo as a reference, with some minor changes.

For the OSX loader, I referred heavily to MalwareUnicorn’s wonderful training, but I did have to make a few updates. For one thing, dyld is not guaranteed to be the next image in memory after the base image.

Also have to give a heartful thank you to C-Sto and lesnuages, who contributed code to the Windows and OSX loaders, respectively.

Last but not least, this library makes heavy use of our own fork of the debug library, so this would not have been possible without contributions over the years from the whole Binject crew, and it’s a perfect example of the power of the tools we’ve made.

This isn’t the first tool to come out of our work in Binject, and it definitely will not be the last.

Once you get locked into a serious m*****e collection, the tendency is to push it as far as you can.

Go Assembly on the arm64

If you have ever looked at Go’s assembly language, you were likely confused into giving up. Against all odds, I recently managed to implement some Go assembly that lets you call a cdecl function in a native library directly from Go on the new M1 Apple chip platform, which is arm64/POSIX based. My joy at never having to look at Go assembly again was short-lived, however, as several of my friends asked me to write something up to explain what I had learned about Go assembly, thus forcing me to relive this traumatic experience in some detail. So here it is friends, I hope it helps.

Why would you use Go Assembly?

Go assembly lets you define a function that you can call from Go that can use all of the low-level assembly operators available on your platform. This means you can avoid using CGO entirely, and avoid having to set up the heinous external cross-compiling toolchains that CGO requires. Go itself can assemble Go assembly on every platform without installing external tools.

What is Go Assembly?

plan9 was a particular set of design principles applied to recreating Unix at Bell Labs in the 80’s. Golang is essentially the plan9 re-implementation of C++, and the Go assembler is based directly on the plan9 assembler (notice the common thread from plan9 to Golang is Rob Pike). Much like plan9 itself, Golang at first seems confusing, but over time you start to see the method behind the madness. Everything is consistent in a way that C++ isn’t, and removing inconsistencies lets you write better code faster, once you get used to it.

This is not true of the Go assembler, however. If there is some underlying pattern that makes everything consistently understandable, I haven’t seen it. It’s best to understand upfront that you are marching into the mouth of madness and stop trying to force things to make sense. Just relax and accept that the underlying design decisions are probably beyond our ability to comprehend.

Try to remember HOW this works and don’t worry about WHY it is this way. Trust me on this. Take some deep breaths, and here we go.

Go assembly does not have a one-to-one correspondence with native assembly

The most confusing part about Go assembly is that it looks like native assembly, but it isn’t.

It renames almost all the registers

For example, on arm64, the main registers that the Arm spec calls X0-X7 are referred to as R0-R7 in Go assembly. You will have to check the Go assembler documentation for every platform you work on to see what the registers have been renamed to.

It renames some of the operators entirely

For example, on arm64, the branch-to-register-with-link operator BLR is renamed to CALL, but BL is not renamed. This means that for every operator you want to use, you will have to check and make sure that the Go version of that operator for your platform is the same operator.

It changes the order of some arguments to operators from native assembly, but not all of them

It’s not even as simple as changing the order from right-to-left to left-to-right. For many operators, such as the ones that assign to pairs of registers, the order of the first two arguments is the same as native assembly, but the order of the rest of the arguments is reversed!

So not only will you have to check that the operator names haven’t changed, you’ll also have to check to see how the order of the operands is different in Go assembly.

Go assembly is not meant for you

This is very clearly a tool meant for use mainly by developers on the Go language runtime itself. There are tons of useful Go assembly libraries inside Go, but they’re all restricted to being called only from inside the Go runtime.

Even though it’s an abstraction over native assembly, Go assembly is different for every platform anyway

The Go assemblers for different platforms have some things in common, but they’re all different and will have different register and operator renaming rules. This means that you will have to write a different version of your Go assembly function for every target platform that you want it to run on!

Be very careful, because the same instructions (like MOVD or CALL) might have the same names in Go assembly from one platform to the other, but they might have very different behavior! Go assembly is really totally different variants of Go assembly for every platform.

The only documentation you really have to work with are two long pages that describe the naming of registers and operator arguments for arm64 and ppc64. You can text-search on these pages for the native assembly operators you’re familiar with and figure out which Go assembly operator corresponds to them. For every other platform, the only documentation is short notes on the main page.

Go assembly files typically make heavy use of build constraints as file suffixes, so that the right Go assembly file will be automatically included when you build for a particular platform. For a typical function, you’ll want to target amd64, arm64, and maybe 386, resulting in three different implementations in the files myfunc_amd64.s, myfunc_arm64.s, and myfunc_386.s. This also means that you will need to look at the Go assembler documentation for three different platforms, check the register and operator renaming and also how the arguments have been re-ordered.

Nice things about Go Assembly

This is a short list, but there are a few nice things about Go assembly over native assembly.

By default, it automatically aligns to 16 bytes, same as arm64 requires. You can easily change alignment at any point inside a function with the PCALIGN instruction, and it will automatically align the whole function correctly for you.

You don’t need to monkey with the stack as much, because it handles some of this for you.

Go assembly adds a bunch of strange rules that you don’t have to deal with in native assembly

To illustrate this, let’s look at a simple arm64 function in Go assembly that you can call from Go that simply calls a native function using the C calling convention (cdecl) with no arguments. The Go assembly function takes one argument, which is the address of the native function to call.

Function Definition

Line 1 is the function definition. The good news is that the function definitions are the same for all platforms in Go assembly. They start with the TEXT statement, which says this function should be mapped into the TEXT segment, which just means that it’s executable code and not data. So far so good.

Following that, however, is the function name, which begins with the bizarre centered dot character “·“. This character is a special symbol that tells the Go assembler that this function should be exposed to Go code. It’s possible to create assembly functions without this character, but they won’t be callable from Go. I have no idea how to type this character, I have been copy-pasting it from place to place since the first Go assembly file I ever saw.

After the centered dot is the function name and the (SB) part. This is not syntax similar to function parameters in Go, this is actually part of Go assembly’s weird addressing scheme which goes SYMBOLNAME+offset(relative to register), which you’ll see more of later. Function names are always declared relative to a special make-believe register called SB for “stack base”. Functions always start at the stack base with no offset, so they’re always fn(SB), no matter how many arguments they take.

Next on Line 1, we have “,4,”. That argument to a function declaration tells the Go assembler how to set up the frame for this function call. The value 4 here means NOSPLIT, which tells the Go assembler not to add a frame preamble that lets the frame be split. The other possible values for this are described in the Go assembly documentation here, but NOSPLIT should be fine unless your function gets very large.

Finally, on Line 1, we have this “$16-8” argument. The dollar sign is meaningless. The first number is the size of this function’s frame (16 bytes) and the second number is the size of the arguments to this function, which live in the caller’s frame (8 bytes, the length of the single 64-bit pointer argument to call0). If the function is set to NOSPLIT, these values are ignored. If the function is not set to NOSPLIT, they must be correct, so I try to set them correctly in case I turn NOSPLIT off at some point. The frame size appears to include the parameters from the caller’s frame as well as the return value from the current frame, so one way of thinking about this is that the frame extends from -8 (to include the argument passed to call0) forward 16 bytes (to include the return value from call0).

Now we’re finally off the first line and we get into the actual assembly!

Parameter and Return Naming

On the next line of assembly, we see that SYMBOLNAME+offset(relative to register) format again in a reference to another make-believe register FP (for “frame pointer”). The FP register points at the base of the current stack frame, and all references to the FP register are REQUIRED to have a symbol name associated with them or it’s a compile error. Those symbols correspond to the names of the arguments being passed in.

Go assembly is left-to-right, meaning that MOV instructions copy from the left to the right. The above line of assembly is copying the value found at 0 bytes from the base of the frame (FP), which is named “addr”, to the first arm64 register (R0).

As mentioned previously, the CALL operator in arm64 Go assembly maps to the BLR (branch-with-link-to-register) operator in native arm64 assembly. This calls the function whose address we passed in as an argument and copied to R0.

On the next line, we set up the return value. Since this is on arm64, we have to read the Parameter Passing part of the arm64 ABI to learn that function parameters are passed in on the R0-R8 registers and return values are passed back with the same registers. The first return value will come back on R0, just like the first argument was on R0. To copy this value to the last 8 bytes of our 16 byte frame, we again have to use the SYMBOLNAME+offset(relative to register) syntax. We assign it the name “ret” and copy it to an offset of +8 bytes relative to the frame pointer.

The final line is a RET instruction, which returns from the function.

Passing an Argument with CALL

Well, we’ve made it this far, let’s take a look at the call1 function, which takes two arguments from Go (the pointer to call and a single argument to pass to that function).

This is very similar to call0. Please notice that we’ve increased the frame size and size of arguments for call1 by 8 bytes each, because we’re taking one more argument from Go, which will be passed to the internal function.

Now we’re going to use R0 for the first argument to pass to the internal function, and R1 to hold the address of the internal function. So on the first line, we copy the first argument from Go (named “addr”) from the base of the frame pointer to R1. On the next line, we copy the second argument from Go (named “arg1”) from an offset of +8 from the frame pointer to R0. Then we CALL the function at the address in R1, which will use the argument we set up in R0 as its first argument. Then we copy the return result from that CALL from R0 to a value named “ret” at an offset of +16 from the frame pointer and return.

If that made sense to you, I regret to inform you that you have already gone mad. My condolences, but now you can join my support group.

Other Notes

symbolname+8(SP) is referring to a different register than 8(SP)!

Similar to the make-believe registers FP and SB, there is another make-believe Go assembly register SP for “stack pointer”. While this is a real register on some platforms, on arm64, it is emulated. What’s worse is that there is another register on arm64 called SP which is not the stack pointer.

To resolve this conflict, Go assembly does something truly confusing. When the SYMBOLNAME+offset(SP) syntax is used (required when referencing the stack pointer as well, where the names are for local variables), Go assembly interprets this as meaning the “stack pointer” SP register and not the native arm64 register named SP. However, if you use just the offset(SP) syntax, it will refer to the native arm64 register instead.

GodBolt for arm64

Using this link, you can load GodBolt in arm64 mode, where you can code in C and immediately see the arm64 native assembly that the compiler generates.

Then you can go and search for each operator from the native assembly in the Go docs for the arm64 assembler and check if they have been renamed and how the operator order has changed.

In this way, it is possible to slowly convert native assembly to Go assembly. However, you will also have to remove the alignment code that the native compiler generates, because Go assembly will do this for you. You also have to replace the native assembly that references stack values with named references to the FP and SP pointers, for function arguments and local variables respectively. It’s probably best to use this method of conversion only for certain parts of native functions and to manually write the parts that reference FP and SP.


Go assembly was an attempt to make a slightly more universal style of assembly back before the invention of the modern generation of platform-independent intermediate representations like LLVM-IR. I would guess that its existence as an integral part of Go’s implementation is a throwback to Go’s plan9 origins, rather than a Good Thing.

I hope that sharing my suffering and the small bit of knowledge I’ve gained will help you in getting through whatever it was that brought you to this page in the first place… and I’m quite impressed with your tolerance for utter madness if you’ve made it this far.

Stay tuned to this channel, my next post will be what I actually made with Go assembler, and it’s a doozy! Here’s a hint, it’s the Go arm64 assembly implementations of call0 up to call6.

“I find this type of hackery distasteful.” – Rob Pike, referring to my crew

Using Binject

Binject is a sweet multipart library, making up several tools for code-caving and backdooring binaries via golang. The project was originally inspired as a rewrite of the backdoor factory in go and now that it’s functional this post will show you how to use it. In this post we are going to explore how you can use the library operationally for a number of tasks. We will start with an example of using some of the command line tools included with the project for the arbitrary backdooring of files. Next we will look at using the library to backdoor a file programmatically. Finally we will use the bdf caplet with bettercap to backdoor some binaries being transmitted on the network, on-the-fly. I want to give a shout out to the homie Vyrus, as a lot of this was inspired by him but in non-public projects, so I can’t link to his stuff. I also want to give a shoutout to Awgh, as he’s been an awesome mentor and powerhouse in implementing a lot of the Binject features. Below you can see the binjection command line tool being used to backdoor an arbitrary windows PE, on Linux. In the next section we will explore some of the command line features of Binject.

Using the command line tools included with Binject is pretty straightforward; the main library Binject/binjection contains a command line interface tool that exposes all of the existing functionality for backdooring files on macOS, Windows, and Linux. Above we can see go-donut being used to turn a gscript program into position independent shellcode, then we use the binjection command line tool to backdoor a Windows PE (a .exe file), all on a Linux OS. The binjection cli tool takes 3 main command line flags, “-f” to specify the target file to backdoor, “-s” to specify a file containing your shellcode in a raw bytecode format, and “-o” specifying where to write your new backdoor file. Optionally you can give a “-l” to write the output to a logfile instead of standard out. You can also specify the injection method to use, although the tool only supports a very limited and mostly default set currently. The binjection cli tool will automatically detect the executable type and backdoor it accordingly. Another library and command line tool included with the framework is Binject/go-donut, which is essentially just a port of TheWover/donut. We can see this being used above to prepare another program to be embedded in our target executable. I really like both of these command line tools because it’s easy to cross compile them for linux or macOS, giving me a really convenient way to generate my target shellcode regardless of what OS I’m operating from. Having the entire tool chain in go allows me to easily move my tools to whatever operating system or use them all together in the same codebase. Even if you’re not familiar with go, you can just as easily compile the cli tools and script them together with something like bash or powershell. Below we can see the binjection cli tool being used to backdoor ELF executables on Linux.

Using binjection programmatically as a go library is also super simple and arguably far more useful because you can now integrate it into so many more projects. The library calls are just as straight forward, basically a single function call depending on the binary type your backdooring. Here we can see it as a standalone example for others to use. We can also see it being implemented here for Windows in Sliver, a golang based c2 framework with tons of features. We can also use binjection in gscript, although it requires this embarrassingly small shim interface. This is insanely powerful functionality to be able to ship in an implant binary, as the implant can now backdoor, already persisted, legitimate binaries on the target system. You can even break down the supporting libraries and use other parts of Binject, like Binject/debug, as a triage tool, which we demonstrate with bintriage. Finally, to bring the project full circle, Binject has been integrated with bettercap for the on-the-fly backdooring of files on the network. It currently accomplishes this using bettercap’s ARP spoofing module, the network proxy module, and a helper tool to manage the file queue, making the whole process really clean. Using the integration is easy with the Binject/backdoorfactory helper tool. Simply follow these usage instructions, which just involves installing all of the necessary prerequisite tools, and then Binject/backdoorfactory will spit out the caplet and command you need you need for bettercap. You can see a demo of all of this together in the video at the end. So now you have a pretty good idea of some different ways you can use Binject. We also encourage people to submit pull requests to the library with new injection methods or even further enumerating the executable types. There is still a lot of work to be done here but you can use the library currently to great effect.

Back to the Backdoor Factory

backdoorfactory setting up the man-in-the-middle with bettercap and injecting a binary inside of a tar.gz as it’s being downloaded by wget (courtesy of sblip)

Backdoor Factory documentation

Backdoor Factory source code

About six years ago, during a conversation with a red teamer friend of mine, I received “The Look”. You know the look I’m talking about. It’s the one that keeps you reading every PoC and threat feed and hacker blog trying to avoid. That look that says “What rock have you been under, buddy? Literally everyone already knows about this.

In this case, my transgression was my ignorance of The Backdoor Factory.

The Backdoor Factory was released by Josh Pitts in 2013 and took the red teaming world by storm. It let you set up a network man-in-the-middle attack using ettercap and then intercept any files downloaded over the web and inject platform-appropriate shellcode into them automatically.

Man-in-the-Middle Attack Using ARP Spoofing

In the days before binary signing was widely enforced and wifi security was even worse than it is now, this was BALLER. People were using this right and left to intercept installer downloads, pop boxes, and get on corpnet (via wifi) or escalate (via ARP). It was like a rap video, or that scene in Goodfellas before the shit hits the fan.

But nothing lasts forever. Operating systems made some subtle changes and entropy took over, and so the age of The Backdoor Factory came to an end. Some time later, the thing actually stopped working and red teamers sadly packed up their shit and lumbered off to the fields of Jenkins.

Fear not, gentle reader, for our tale does not end here.

For some reason, a year and change back, I found myself once again needing something very much like The Backdoor Factory and stumbled on this “end of support” announcement. Perhaps still motivated by my shameful ignorance years ago, I thought “maybe I owe this thing something for all the good times” and took a look into the code to see if something could be fixed easily.

No, no it couldn’t. Not at all. But the general design and the vast majority of the logic was all in there. It worked alongside ettercap to do ARP spoofing, then intercepted file downloads, determined what format they were, selected an appropriate shellcode if one was available, and then had a bunch of different configurable methods to inject shellcode into all binary formats.

…It’s just that it was heaps and heaps of prototype-grade Python and byte-banged files. I have heard a rumor, similar to On The Road, that the original version had been written in a single night. It clearly was going to take longer than that to port this to something maintainable, but… I mean… automatic backdooring of downloaded files! This needed to happen. This needed to be a capability that red teamers just had available in the future. Fuck entropy.

Around this time, I pitched the idea of an end-to-end rewrite to some others and we started a little group of enthusiasts.

For each of the abstract areas of functionality from the original, we made a separate Go library. The shellcode repository functions went into shellcode. The logic that handles how to inject shellcode into different binary formats went into binjection. To replace the binary parsing and writing logic, we forked the standard Golang debug library, which already parsed all binary formats, and we simply added the ability to write modified files back out.

This gives us a powerful tool to write binary analysis and modification programs in Go. All of these components work together to re-implement the original functionality of BDF, but since they’ve been broken into separate libraries, they can be re-used in other programs easily.

Finally, to replace the ailing ettercap, we used bettercap, the new Golang replacement, which supports both ARP spoofing and DNS poisoning attacks. bettercap allows for extension “caplet” scripts using an embedded Javascript interpreter, so we wrote the Binject caplet that intercepts file downloads and sends them to our local backdoorfactory service for injection via a named pipe and then passes the injected files along to the original downloader.

The flow of a file through the components of the Backdoor Factory, on its journey to infection

Injection methods have been updated to work on current OS versions for Mach-O, PE, and ELF formats, and will be much easier to maintain in the future, since they’re now written to an abstract “binary manipulation” API.

To put a little extra flair on it, we’ve added the ability to intercept archives being downloaded, decompress them on the fly, inject shellcode into any binaries inside, recompress them, and send them on. Just cuz. In the future, we’re planning on adding some extra logic to bypass signature checks on certain types of files and some other special handlers for things like RPMs.

Now you will have to provide your own shellcode, backdoorfactory only ships with some test code, but if you’re targeting Windows, I’ve also ported the Donut loader to Golang, so you can use go-donut to convert any existing Windows binary (EXE/DLL/.NET/native) to an injectable, encrypted shellcode. It even has remote stager capabilities.

We fully intend to get into a lot more detail about how to use Donut and BDF in future posts, but don’t wait for us to get it together for some vaporware future blog post that may never come… You can try it yourself right now!

Encrypted-at-Rest Virtual File-System in Go

One incredibly powerful feature of the Go language that hasn’t seen a lot of use is its native support for virtual file-systems. Most C-style code is written using the standard file operations that all work on passing an integer “file descriptor” around. To re-purpose that code to use anything other than the OS-provided file would involve changing every file operation call in all of the source and libraries involved.

A “file” in Go is just another set of interfaces that you can replace.

The cool thing about that is that all Go code ever written to do operations on files will work just as well if you pass it some other thing you made that implements the file-system interfaces. All you have to do is replace the standard file-system implementations you’re used to, mostly provided by the os package, with your new virtual file-system, and all the ReadFile, OpenFile, etc. function calls will stay the same.

By using a couple of existing Go libraries that do the heavy lifting, it’s possible to implement a simple read-write virtual file-system that will be encrypted and compressed on disk in only a couple pages of code, as you’re about to see.

The two libraries we’ll use are:

rainycape’s VFS library – The VFS provided by the standard library is read-only. This library adds the ability to write to the virtual file system, and also already supports importing the file-system from several different compressed archive formats. We will simply extend this to add encryption, but as of this writing, rainycape’s VFS library is the only working implementation of a Go VFS with write implemented, so it will be the likely starting point for any other Go VFS project you might attempt.

bencrypt – An encryption utility package we will use just for the AES encrypt/decrypt functions.

The first of our two pages of code is a function that simply AES decrypts a compressed file with a given key and passes the decrypted compressed file into the rainycape VFS library, which provides a full read-write implementation of a virtual filesystem decompressed into memory from a compressed file.

The second page of code is to write your changes back from memory to the disk. This does the inverse of the OpenAndDecrypt operation, it just zips the file-system back up to an archive and then AES encrypts it with the given key.

The final piece of code we’ll need to get around is a simple function that will AES encrypt any file. We can then make our starting file-system as a compressed archive and use this function to AES encrypt it to be loaded by OpenAndDecrypt.

That VFS interface returned by OpenAndDecrypt can be used just like the regular os and ioutil interfaces you’re used to getting from the standard library… and it will be easy to retro-fit your existing code to use an interface that could be a VFS.

Here’s a quick example of how you could encrypt a Zip file and then open it up as if it were the os or ioutil packages (because they both implement the same interface).

Now, before you go typing all this into your favorite IDE, I should point out that all of these functions have already been added to the bencrypt library, so you can just call them directly from there.


Also, be aware that this simple method will not encrypt your files once they have been loaded into system memory. They could be read out unencrypted by a debugger, saved in a swap file, or a tool like Volatility.

Final Thoughts

To retro-fit your existing code that uses the os package, you’ll need a defined interface that you can use that could be bound either to os or to vfs. I would suggest stealing this one from the go-vfs package as a starting point.

The other kind of odd thing about this, is that it’s almost entirely a side-effect of how Go handles interfaces. Membership in interfaces is decided at runtime, based on whether or not the required functions exist in a type to satisfy the interface. This lets you define an abstract interface for a file-system after the fact that matches what os already does, and create a virtual file-system that implements that interface as well. Your existing code just needs to be repointed to your new interface instead of to os and everything else will work, even though your interface was written way later than the os package. The same trick could allow replacing or hooking other lateral aspects of standard library functions, that might yield similarly interesting results.