In previous blog posts, I detailed how a windows programmer can develop against RPC and solidified why I feel Beacon Object Files (BOFs) have become cemented as a usable technique for the time being. I will complete this mini-series by making the previous RPC POC code that we had into a BOF.
Planning
The first step of converting our BOF is to plan our end goal. First, we determine the scope of the conversion. In this case, our POC is small, so we are not stripping functionality out of a larger project; therefore, we will be converting the entire thing.
Next, we should decide if we are coding this for a specific implant’s runner or targeting a more general audience. I would at least like to be able to run whatever we output in Cobalt Strike, our internal implant, and Sliver. This means that we must code against the lowest common denominator of functionality for all functionality that we code. A couple restrictions are introduced via this combination:
- No more than 64 unique dynamic function resolution calls can be used.
- All defaults should be handled in the BOF, as no script logic can provide defaults.
Finally, we should decide what we want the user experience to be like. I do this as a planning step for most of the software I write. Knowing how you want a user to interact with the system you are coding can steer your development. For this case, I do not want the services being checked to be hardcoded. Instead, I want the user to be able to input any service name and receive the answer as to whether it exists on a given target. It should look something like <bofname> <target> <servicename>. I would want to code 'NT Service\' as the prefix on the BOF side, given that all checks will need to start with that string.
For this blog, I will not go through full incorporation into the named implants, as a much more reasonable implementation of these techniques would use the win32 API.
Preparing
We have decided to convert our entire POC. Up front, that is easy enough—we would simply convert every function call to a BOF equivalent. To do this we are going to first copy all the .c files out of our existing project and paste them into a new subfolder called 'BOF'. Next, we will open each file and start removing/converting code.
Another item we need to address is a BOF consists of a singular object file. While there are ways to coax the linker to join multiple object files into a singular object file, I find it much easier to just #include our various dependent .c files together, such that the compiler will recognize them as one unit and output one object file.
We should also check to see if C++ code is used. If so, our easiest path is likely to convert to c-based equivalents. While it is possible to get C++ code to compile into a runnable BOF, it requires familiarity with numerous assumptions. Unless you have a compelling reason to keep the C++ code, it is best to avoid it with a BOF.
Our original POC included and depended on the WindowsRpcHelper library. For this library, we need to break out the used functions and include them directly, be that in our source itself, or in a frequently reused common .c file that we could include between projects. An example of such a library is base.c from our SA BOF repository on GitHub.
Finally, for ease of development, we will be copying over a few files from the SA repository linked above. These files include base.c, bofdefs.h and beacon.h.
Executing
Here are the steps to convert this:
- Rename MS-lsat-poc.cpp to MS-lsat-poc.c.
- Create a new file called entry.c.
- Add a new function named 'go' that will parse out our arguments.
- Add code to parse our target server and service name.
- Modify bofdefs.h to allow compilation via MSVC and mingw.
- Define printf to internal_printf.
- Remove wmain from MS-lsat-poc.c.
- Include MS-lsat-poc.c in entry.c.
- Copy the make_unicode_str & intstrlen functions out of rpc_helpers.cpp and paste it into MS-lsat-poc.c above list_names.
- Convert any library calls throughout our program to BOF equivalents. These include:
a. RpcExceptionFilter (MS-lsat-poc.c)
b. RpcStringBindingComposeW (ms-lsar.c)
c. RpcBindingFromStringBindingW (ms-lsar.c)
d. RpcStringFreeW (ms-lsar.c)
e. RpcBindingFree (ms-lsar.c)
f. NdrClientCall2 (ms-lsar_win32/x64.c)
g. Replace midl_user_alloc / user_free with intAlloc/intFree defs - Add our bofdefs.h and include into any function using DFR.
- Attempt to compile and fix any errors. Loop until done. For me, these errors included:
a. Dbghelp conflicting with imagehlp.h in bofdefs.h. Removed dbghelp.h.
b. WINIMPM not existing under MSVC. Defined to WINBASEAPI.
c. Missing SOCKET def under MSVC. Included winsock2.h.
d. Commenting our own addrinfo define.
e. Replace __LONG32 with int on line 349 of bofdef.h.
f. Fix up warnings by changing list_names to a void and removing double dllimports.
g. Remove _stricmp from declared intrinsic functions. - Finally, we need to code up a basic CNA file to load this into Cobalt Strike.
Testing
We have a few items that need to be tested for stability.
First, we establish a test beacon on a lab machine, so we have somewhere to run and test our code. Once our test beacon is established and we try to execute our BOF in Cobalt Strike, we get a TON of relocation errors.
This is the result of Cobalt Strike's loader not handling the .pdata section. If we want any chance of this object file running in Cobalt Strike proper, we must eliminate the usage of that section. A fairly easy way to accomplish that task is by switching to the mingw compiler.
Said switch is non-trivial. There are a few code constructs within the RPC stubs with which mingw does not compile. However, these can be switched out. If you are interested in those changes, review the code and any other _MSC_VER defines used throughout the project.
After making that switch, we can run again and... we will have a dead beacon. Even though the code seems to be loaded, there is still some sort of relocation error. The error raised is for invalid memory access. Normally, I would accept the blame for this myself, but I happen to work with Kevin Haubris, the maintainer of COFFLoader, who help me dig into this error a little more.
After telling Haubris about my current work, he asks me to test whether it will load in COFFLoader, but I get memory access violations again. He then digs in and realizes that additional relocation types need to be handled, which his loader currently ignores (which is what I suspect Cobalt Strike is doing, but I will leave that up to Fortra to figure out).
A fair number of changes need to be made to support everything this new object file is trying to do. Without Kevin’s work, this BOF would have been dead in the water at this point. I suspect this may be specifically related to usage of custom binding handles, as a different RPC BOF can run inside Cobalt Strike without issue.
After those changes, we can now successfully run the BOF with COFFLoader, whether compiled with mingw or Visual Studio. So, perfect, our work is done, right? Not so fast. RPC uses exceptions to raise errors to the client program. Normally, this is handled using MSVC’s extensions for SEH (RpcTryExcept, RpcExcept & RpcEndExcept). Our issue is that an object file loaded with COFFLoader is not set up so that SEH works. In practice, this means that even a simple access denied will now crash our process because we are not capable of capturing the exception.
I consulted my co-workers again about these issues. This time, Adam Todd came to the rescue. He told me to check out AddVectoredExceptionHandler and SetUnhandledExceptionFilter. Both functions allow us to catch exceptions that are otherwise unhandled and to do so in a manner that is not frame based. This way, it matters less about how we are loaded, but we are left with a different problem. Once we catch that exception, what do we do? We can manipulate the registers, but capturing and re-setting up a stack frame may be unreliable in practice.
In the end, our best plan is to start a sacrificial thread and feed any exception that occurs back to us via the error code. This way, our action is clear: simply terminate the thread with the exception as the exit code—no unwinding or stack to clean up.
Further, while using threads in a BOF is normally bad practice, it is fine in this case since we are just going to block the main thread until the sacrificial thread finishes. This also precludes us from dealing with memory concurrency issues with our usage of things like internal_printf.
Review
Coding against raw RPC interfaces can already be a complex process when working under normal conditions. Moving that code into BOF is theoretically no more complex than converting any other code to BOF, but it comes with some unique considerations in practice. Going into this project, I did not expect to have issues with the loaders. In fairness to the various loaders, PEs have many relocation types, some of which do not typically appear, so it’s understandable they may not have been tested.
For this specific situation, we would have been much better off using the top level win32 API layer, as pointed out in the first post of this series. If this had been done, we would not have had to deal with the RPC stubs, and the backing libraries would have been loaded using loadlibrary, so the potential RPC exceptions/relocations would all have been handled. This is a good reminder that there are usually multiple ways to accomplish the same goal—take the time to review different options before committing to a particular method.