Learning to write a LLVM Pass for Code Obfuscation #1
This series of blogs is supposed to document my learning journey from a C/gcc nerd to a C++/LLVM chad. Expect this blog to be very informal, with the occasional rant, but it will document everything that I have learnt, including C++ internals, gimmicks, LLVM quirks and other references.
Setup
First thing first, we need to do some basic setup, starting with fetching the source code for the LLVM project:
1 | |
At the time of writing this, the latest commit is 8aafa50c7a2dfb8ca1d5cdf8980f7f2d259779f5 - incase you wanna follow along the exact version and stuff, just do:
1 | |
Next, we install some basics with:
1 | |
I would recommend running everything in a tmux session because some of these compilations take a while. That being said, let’s talk about LLVM (while my code compiles in the background).
Why LLVM?
All of this started with the following set of tweets:
use clang like a real man 😼
— 5pider (@C5pider) November 26, 2025
So, all the big boys were playing with clang and I wanted to do so as well. I had played a bit with it in the past when compiling stuff in C for Mac but never really looked into it. To me it was just another compiler.
That was lesson 1: clang is technically the fronend component of the LLVM compiler for C/C++ source code. It’s job is to take the source code, parse it into an AST and then lower that AST into LLVM IR.
This LLVM IR is what is of interest to us. Our goal is to write a LLVM IR pass which messes with this IR in a way that makes Reverse Engineering and static detections difficult, for which, unfortunately, we have to learn some C++. I am thinking about exploring the concepts as they come about.
Following the official guide
At this point, you should start following the Official Guide on how to write a LLVM pass. Follow the guide till you reach the FAQ section and then come back here.
Welcome back! I am assuming at this point you have written HelloWorld pass, seen it in action and also written a test for it.
Okay, time to talk about the code we just wrote, starting with the header file:
1 | |
First, we take a look at:
1 | |
Let’s break it down:
class HelloWorldPass: We are declaring a new class with the nameHelloWorldPasspublic OptionalPassInfoMixin<HelloWorldPass>: Here we see some C++ bullshit. Time for a detour and learn about a couple of C++ things.
Templates
Coming from C, C++ templates was something completely new to me. These are probably the simplest things to understand.
I would recommend going through GFG’s guide for this. But to simplify it, imagine this: You want to write a C program which can compare: two numbers, two decimals or two characters. You would end up writing some code like:
1 | |
Well, we can all agree that’s a lot of repeated code. While the main function body remains identicat, we have to maintain different functions just due to the nature of the data types we are dealing with. C++ attempts to solve this problem with templates. The same code in C++ would be something like:
1 | |
So, templates just help us write generic functions which can be used by any valid type T. A slightlty inaccurate but easy-to-understand way to understand templates is to consider them as functions which take the data type(can be a class, composite data and more as well) as well, along with the values of the type.
Template Classes
Templates go beyond just functions - we can have template classes as well. Consider the following terrible C code:
1 | |
A C++/OOPs based approach would be:
1 | |
But C++ allows us to optimize this further using templates. If you notice, IntFunc and FloatFunc share a lot of the same code - this is where templates come in.
1 | |
So - you can reuse the same class with different data types - and this will come in handy soon.
Mixins
First thing to know: Mixins are classes. It’s just that they are a special category of classes which have some specific properties. So we can say that:
All Mixins are classes, but not all classes are Mixins
A mixin is a class that provides a specific set of functionalities and is intended to be used in conjunction with other classes through multiple inheritance. Mixins are typically abstract or incomplete in themselves, meaning they do not represent a complete object but rather provide a modular way to add features to a class hierarchy. Think of Mixins as the ketchup of classes. You dont have ketchup on it’s own (hopefully), you add it on top of a HotDog to make it better. Extending this analogy, you can have the same ketchup go on a hotdog, fries or your chicken wings. While the base class of food items remain different - our Mixin aka ketchup remains the same to add to their taste(functionality) - hope that made sense.
In more technical terms, key characteristics of C++ Mixins are:
Non-Instantiable: Mixins often contain pure virtual functions (abstract methods) and may not have any concrete data members. This makes them unsuitable for standalone instantiation.
Functionality Addition: The primary purpose of mixins is to add specific behaviors or functionalities to a class without affecting its primary inheritance hierarchy.
Multiple Inheritance Utilization: Mixins are often used in conjunction with multiple inheritance, allowing a class to inherit from both a main base class and one or more mixins. This composition allows the class to have features from all its base classes.
Code Reusability: By using mixins, developers can reuse functionality across different class hierarchies without creating complex inheritance trees.
Avoiding Object Identity: Mixins do not represent a complete object model on their own. They are meant to be part of a larger class hierarchy and should not be instantiated directly.
This is an excellent example of C++ Mixins. I would highly recommend going through atleast the first two examples. We will see more of this in the next example.
CRTP
C++ has this curious thing called Curiously Recurring Template Pattern (CRTP). It is a C++ idiom where a derived class inherits from a base class template, passing itself as the template argument. This technique enables static polymorphism, allowing the base class to call methods in the derived class at compile time without the performance overhead of virtual functions (VTables).
But what does that jargon mean - coming from C? Take a look at the example code:
1 | |
We have a generic “Class” called Shape which has a vtable pointer and we have a function called get_area() which takes variables of this class. Now the shape can be a Square or a Circle - the function has no knowledge of this. It is upto those two classes to implement their respective area functions.
If we have to write the same in C++ using CRTP we would write something like:
1 | |
So now that we see the C++ code - it makes more sense. Why do we need this? Becuase standard dynamic polymorphism uses virtual functions, which require runtime pointer lookups via a VTable. CRTP resolves these calls at compile time. This makes it heavily utilized in high-frequency trading (HFT) and embedded systems where every CPU cycle matters.
So now back to LLVM. We left at the following definition:
1 | |
From the official guide:
This creates the class for the pass with a declaration of the run() method which actually runs the pass. Inheriting from OptionalPassInfoMixin
or RequiredPassInfoMixin sets up some more boilerplate so that we don’t have to write it ourselves. RequiredPassInfoMixin should be used for passes that cannot be skipped (e.g. AlwaysInlinerPass), while OptionalPassInfoMixin should be used for passes that can be skipped (e.g. optimization passes).
With the context we have of Mixins and CRTP at this point we should be good to go. We might dive into OptionalPassInfoMixin and RequiredPassInfoMixin later if the need be.
Looking at the actual implementation of the run() function (because remember - we had to implement the area function for each class ourselves?), we see:
1 | |
Which is pretty simple - we just print the function names. Now go follow the rest of the Official Guide on how to write a LLVM pass till the FAQ section and come back.
Hi! You are back! So at this point you should have a basic idea of how the thing works and written your first pass. But this doesn’t do anything useful. Also, in the example, we see they are using IR code - not something super useful. Ideally, we would want a clang flag or some other way to pass our C/C++ code directly. So I am planning to in the long run (as in by the end of this series) to implement something of that sort. Let’s see how it goes! Wish me luck and see you soon!