Reverse OOP from scratch

1. Preface
Nowadays, more and more malware is written in C++ instead of pure C. The difference between C++ and C when programming is the OOP. This causes many difficulties when reversing malware. Through this blog, I will guide you to reverse OOP in C++ most concisely and shortly like instant noodles.
Materials: https://github.com/n0pex3/blog/blob/main/oop.zip
2. Analysis
2.1. Overview
To be able to read and understand this blog, you must have basic knowledge of OOP in C++. When a source code is written in OOP style and compiled into machine code, have you ever wondered whether it is any different from programs written in pure C? I will help you answer the following questions when analyzing programs written in OOP style:
How are override functions in subclasses called in the program, or more broadly, what information will subclasses contain inside when initialized in memory?
How is the relationship between parent and child classes shown in the program?
Finally, we will apply the information we have to the IDA tool when analyzing.
You use the attached program and load it into IDA 32bit. The program is built from the following link. But please don't look at the source code, because when reversing malware, you actually don't have the source code.
2.2. Fields inside the subclass when initialized
When loading into IDA, we will get the result as shown below. We will go through each line of Assembly code to see what information is included in that object when the subclass is initialized in memory.

First, address 0x004010D8 is a new function that allocates 0x14 bytes long. Then, this allocated buffer is put into registers edi and ecx. The reason the program does this is because the edi register is chosen to be used as a base register to access other fields in the object by adding an offset. Meanwhile, the ecx register will hold the object address because the program we are analyzing is built using Visual Studio's MSVC. According to MSVC regulations, the calling convention used for virtual functions will be thiscall and the ecx register will always be used to hold the address of the object.

Next, we will be interested in addresses 0x004010E7, 0x004010EE and 0x004010F4. At these addresses, the program will put vftables into the object's fields in offset order 0x8, 0x0 and 0x4. You can understand Vftable as a table containing the addresses of virtual functions in the parent and child classes. The order of functions in the table is arranged in the order of virtual functions appearing in the parent class and then the child class in the program source code. In other words, when class B inherits class A, class B will extend from class A. Therefore, virtual function properties will be added in the same order of appearance. We'll take a look at the example below
class A {
int multiply(int x, int y) {
return x * y;
}
virtual void printSomething () {
printf("%s\n", "From class A with love");
}
}
class B : A {
virtual void printSomething () {
printf("%s\n", "From class B with love");
}
virtual int sum (int x, int y) {
return x + y;
}
}
Class B inherits class A, in which the printSomething function in class B will override the printSomething function in class A. Because the multiply function in class A is not a virtual function, this function will not appear in the vftable table. Furthermore, because there is no sum function in class A for the sum function in class B to override as is the case with the printSomething function, this function will exist independently in the vftable table. Our vftable table will look like this:
| Offset | Function | Class |
| 0x0 | printSomething | B |
| 0x4 | sum | B |
To be able to call virtual functions in the vftable table, the program needs to determine where vftable is in the object, then add an offset amount to call as 02 calls at addresses 0x00401114 and 0x0040111A of the shown program.

Going back to our program, based on IDA's suggestion, the object's offset 0x0 is vftable of class Ex5 and 0x8 is vftable of class Ex4. Next, at addresses 0x004010FB and 0x00401102, the program moves values 1 and 2 to offsets 0xC and 0x10 of the object. Ultimately, the object will have the following layout in memory:
| Offset | Fields |
| 0x0 | vftable class Ex5 |
| 0x4 | unknown |
| 0x8 | vftable class Ex4 |
| 0xc | 1 |
| 0x10 | 2 |
If we look at the entire program, we will not see the unknown field at offset 0x4 being used. This means that this field is not vftable; if it is, the program must initialize its value. Because there is no access to this field in the program, we will consider this just a normal field like the fields at offset 0xc and 0x10 of the object. Based on the layout of the object when it is initialized, you can see its arrangement rules. The value of vftable is ranked first, followed by the properties of that class. Finally, the properties of the inherited class will be ranked last. We see that there are 02 vftable in the object so it can be inferred that there are 02 classes: Ex5 inheriting Ex4. However, in this case, there are a total of 3 classes. I know that thanks to RTTI.
2.3. RTTI structure in the program
Returning to the second question at the beginning of the blog is how the parent-child relationship between classes is shown in the program and how I know there are a total of 3 classes at the end of section 2.2. We return the vftables that were injected into the object at addresses 0x004010E7, 0x004010EE, and 0x004010F4 using the program's mov instruction. We will go to the first vftable at location 0x004021E0. You will notice another variable above vftable at position 0x004021DC, which according to IDA is annotated const Ex4::`RTTI Complete Object Locator'.

RTTICompleteObjectLocator is the place that contains information related to that class and based on that we will know the relationship with other classes in the program. The structure of RTTICompleteObjectLocator is as follows. In it, we will be interested in the pTypeDescriptor and pClassDescriptor fields. The pTypeDescriptor field will contain information about the current class and pClassDescriptor will contain information about the classes it inherits from.
struct RTTICompleteObjectLocator {
DWORD signature;
DWORD offset;
DWORD cdOffset;
TypeDescriptor *pTypeDescriptor;
RTTIClassHierarchyDescriptor *pClassDescriptor;
}
The pTypeDescriptor field will have the data type TypeDescriptor. In there, the name field will contain the name of that class. We retrieve the value of the name on IDA and notice that the RTTICompleteObjectLocator at address 0x0x004021DC has the class name Ex4.
struct TypeDescriptor
{
const void* pVFTable;
void* spare;
char name[];
}

The pClassDescriptor field will carry the RTTIClassHierarchyDescriptor data type. The numBaseClasses field will contain the number of inherited class include itself. The pBaseClassArray field will be a pointer to the RTTIBaseClassArray elements. Each of these elements will represent information about the class it inherits from. In particular, the pTypeDescription field will point to the TypeDescriptor of the classes it inherits from. Based on this we will get that class name. In the current case, class Ex4 has only one function in vftable and RTTIBaseClassArray points to itself. Thus, class Ex4 does not inherit any class.
struct RTTIClassHierarchyDescriptor {
DWORD signature;
DWORD attributes;
DWORD numBaseClasses;
DWORD pBaseClassArray;
}

struct RTTIBaseClassArray {
DWORD pTypeDescription;
DWORD numContainedBases;
DWORD PMD.mdisp;
DWORD PMD.pdisp;
DWORD PMD.vdisp;
DWORD attributes;
DWORD pClassDescription;
}
Now you have finished a class, you do the same thing for the remaining 02 RTTICompleteObjectLocator at positions 0x004021E4 and 0x004021D4 . If you parse the information correctly, we will obtain a diagram of the relationship between classes as shown below. Thus, class Ex5 inherits two classes, Ex4 and Ex2.

In particular, Ex4's vftable has been inserted into the object at address 0x004010E7 by the program, but because the Ex5 class inherits this function, vftable will be overwritten again. That's why you see the program put vftable twice in the same location in the object. However, we do not see the program touching the vftable of class Ex2 it is combined with the vftable of class Ex5.
2.3. Reverse OOP in IDA
This is probably what you're looking forward to the most. Summarize the information we have so far. Class Ex5 inherits 02 classes in order Ex2 and Ex4. In particular, at the end of the object, there are 02 more properties at offset 0xc and 0x10, 02 properties can belong to class Ex4 or Ex5. Here, we will assume these two properties belong to class Ex5. In the vfttable table of class Ex2 + Ex5 there are 03 virtual functions, while class Ex4 only has 01 virtual function. Thus, we will need to create the following structures in IDA.
struct Ex4 {
Ex4_vtbl *__vftable;
}
struct Ex2 {
Ex5_vtbl *__vftable;
DWORD var1;
}
struct Ex5 : Ex2, Ex4 {
DWORD var2;
DWORD var3;
}
struct Ex5_vtbl {
int (__thiscall *sum)(Ex5 *this);
void (__thiscall *Print1)(Ex5 *this);
void (__thiscall *Print2)(Ex5 *this);
}
struct Ex4_vtbl {
void (__thiscall *Print3)(Ex4 *this);
}
For the struct containing virtual functions of each class, you need to write the correct syntax as nameOfClass_vtbl. Meanwhile, the vftable pointer declaration to point to that table must be named nameOfClass_vtbl *__vftable. Because the functions in the vftable table are virtual functions, there will be a calling convention of thiscall according to MSVC regulations. The first parameter in each function is also the address pointing to the object in memory, which is also the ecx register held as I presented above. For the Ex5 struct declaration as above, we will split it into 2 small structs and then use inheritance for those 2 structs. Because, if declared in struct Ex5, there will be 2 vftable variables, so it is invalid. Open the Local Type tab in IDA, type the above structs and cast the variable v4 as shown below. If successful, you will have the results as below.


For function sub_401050, you look at vftable at address 0x004021E8, clearly, this function is in that table but the program does not use vftable to call it. But to represent the OOP call like the source code, we will use mangled name to force it. Instead of having to read the document to understand the mangled name syntax, we will use the plugin Classy. Click the Add button and then type in the name Ex5. Then, you drag the Classy tab parallel to the Disassembly tab as shown below.

Then, you move to Ex5's vfttable at position 0x004021E8, move the mouse from position 0x004021E8 to 0x004021F4 and click the Set button next to "VTable: Not set" to add these functions to the Ex5 class.

Finally, you rename the function to Print1 as we declared in struct Ex5_vtbl by double-clicking on the 2nd row (ID: 1) and editing it.

The final result we get is as follows.


