Tag Archives: Unreal Engine

Demystifying VMT and RTTI in Unreal Engine C++

(Cover Photo: ©1984 – ITV Granada
Jeremy Brett, “Sherlock Holmes” TV Series)

It is no secret that, due to nature of modern OOP guidelines, Unreal Engine C++ source code is full of “virtual functions”. It is also no secret that, calling a virtual function is inherently slower than calling a non-virtual function… So, what’s the catch?

A non-virtual call is simply a jump (JMP) to a compiled-in pointer. On the other hand, a virtual call requires at least an extra indexed dereference, and sometimes a fixup addition, which is stored in a lookup table known as Virtual Method Table (VMT). This is simply why a virtual call is always slower than a non-virtual call.

To avoid this overhead, compilers usually steer clear of VMT whenever the call can be resolved at compile time. However, due to complex nature of Inheritence based class structures used in modern game engines, using VMT is unavoidable in most cases.

Virtual Method Table (VMT)

An object’s VMT (also known as vtable) contains the addresses of the object’s dynamically bound methods. Method calls are performed by fetching the method’s address from the object’s virtual method table.

C++ compiler creates a separate VMT for each class. When an object is created, a virtual pointer (VPTR) to this table is added as a hidden member of this object. The compiler also generates hidden code in the constructors of each class to initialize new object’s VPTR to the address of its class’ virtual method table.

The virtual method table is same for all objects belonging to the same class, and is therefore typically shared between them. Objects belonging to type-compatible classes in an inheritance hierarchy will have virtual method tables with the same layout.

Tip: The C++ standards do not mandate exactly how dynamic dispatch must be implemented, but compilers generally use minor variations on the same basic model. The VMT is generally a good performance trade-off to achieve dynamic dispatch, but there are alternatives, such as Binary Tree Dispatch (BTD), with higher performance but different costs.

Speaking of hidden codes and VMTs in the constructors of each class, each C++ object should also contain additional information about its type. An object’s data type is a crucial information in terms of casting.

Run-Time Type Information  (RTTI)

RTTI is a feature of the C++ programming language that exposes information about an object’s data type at runtime. It can apply to simple data types, such as integers and characters, or to generic types.

Run-Time Type Information is available only for classes that are polymorphic, which means they have at least one virtual method. In practice, this is not a limitation because base classes must have a virtual destructor to allow objects of derived classes to perform proper cleanup if they are deleted from a base pointer.

RTTI is used in three main C++ language elements:

  The dynamic_cast operator: Used for conversion of polymorphic types.

  The typeid operator: Used for identifying the exact type of an object.

  The type_info class: Used to hold the type information returned by the typeid operator.

In order to perform cast-related operations listed above, RTTI heavily uses VMT. For example, given an object of a polymorphic class, a type_info object can be obtained through the use of the typeid operator.  In principle, this is a simple operation which involves finding the VMT, through that finding the most-derived class object of which the object is part, and then extracting a pointer to the type_info object from that object’s virtual function table (or equivalent).

In terms of performance, using the dynamic_cast operator is more expensive than type_info. Given a pointer to an object of a polymorphic class, a cast to a pointer to another base subobject of the same derived class object can be done using a dynamic_cast.  In principle, this operation involves finding the VMT, through that finding the most-derived class object of which the object is part, and then using type information associated with that object to determine if the conversion (cast) is allowed, and finally performing any required adjustments of the pointer.  In principle, this checking involves the traversal of a data structure describing the base classes of the most derived class.  Thus, the run-time cost of a dynamic_cast may depend on the relative positions in the class hierarchy of the two classes involved.

Tip: In the original C++ design, Bjarne Stroustrup did not include RTTI, because he thought this mechanism was often misused.

Hacking VMT and RTTI Information using UE C++

In order to gather “header” information (that contains VMT and RTTI data) in an Unreal Engine C++ class/object, I have written the following LogClassHeader() C++ function using Visual Studio 2019 version 16.3.8 for Unreal Engine version 4.23.1.

void UDebugCore::LogClassHeader(void* const pThis, size_t nSize)
{
  FString fsClassName = "NULL";
  FString fsObjectName = "NULL";
  UObject* const pCastedToUObject = (UObject*)(pThis);
  void** const pCastedToHeader = reinterpret_cast<void**>(pThis);

  if (pCastedToUObject)
  {
    fsClassName = pCastedToUObject->GetClass()->GetName();
    fsObjectName = pCastedToUObject->GetName();
  }

  if (pCastedToHeader)
  {
    for (size_t i = 0; i < nSize / sizeof(void*); i++)
    {
      MACRO_PRINTF("Pointer[%04zu] = 0x%p", i, pCastedToHeader[i])
    }
  }
}

This function has 2 input parameters:

  pThis:  Object to extract Class Header Info from
  nSize:  Size of Header

Calling the function is very easy. You simply insert your call into the constructor of the class that you would like to hack. For example, the following code gathers C++ header information of <APlayerControllerThirdPerson> class.

APlayerControllerThirdPerson::APlayerControllerThirdPerson()
{
  UDebugCore::LogClassHeader(this, sizeof(*this));
}

When you run the code, all pointers that are available (and used) in the header of <APlayerControllerThirdPerson> will be listed. And, in case you need, instance name and class type is stored in fsObjectName and fsClassName variables, as a bonus.

Pointer[0000] = 0x00007FFC8A3ECB00
Pointer[0001] = 0x0000BC5B00000231
Pointer[0002] = 0x000001648346AC00
Pointer[0003] = 0x000AAD5B000AAD5B
Pointer[0004] = 0x0000000000000000
Pointer[0005] = 0x00000164834864E0
Pointer[0006] = 0x00007FFCAAB2EF98
Pointer[0007] = 0x00007FFCA8708208
Pointer[0008] = 0x0000000F00000000
            .....
             ...
              .

So, what do these numbers mean? Well, all these pointers are addresses of the virtual functions, followed by some of the member variables. In order to understand which is which, we need to decipher the structure of the data set.

Here comes the tricky part! With each and every update to Visual Studio C++ compiler, the structure tends to get updated as well. In other words, the header structure of a C++ class changes with each major compiler update. Try to think of it as a “living organism”. As I’m typing this sentence now, a new update with a new C++ header structure may be on its way. So, it is up to you (!) to analyze what’s going on under the hood.

Good news is, we can gather <template> information about C++ class header structure from the official Microsoft patent documents! Although they are not up-to-date, I think it is a good practice to start the investigation using the source of information itself.

Here is some of the Microsoft patents which describe various parts of C++ implementation used in Visual Studio:

  US Patent #5410705: “Method for generating an object data structure layout for a class in a compiler for an object-oriented programming language”

  US Patent #5617569: “Method and system for implementing pointers to members in a compiler for an object-oriented programming language”

  US Patent #5754862: “Method and system for accessing virtual base classes”

  US Patent #5297284: “Method and system for implementing virtual functions and virtual base classes and setting a this pointer for an object-oriented programming language”

  US Patent #5371891: “Method for object construction in a compiler for an object-oriented programming language”

  US Patent #5603030: “Method and system for destruction of objects using multiple destructor functions in an object-oriented computer system”

So, what I’m offering is, good old-fashioned reverse engineering”.

– “Is such a challenge worth it?”

– “Um, yes!… If it doesn’t break you, it can make you.”

References:

  Bjarne Stroustrup, “A History of C++: 1979—1991”, p. 50 – (March 1993)

  Keith Cooper & Linda Torczon, “Engineering A Compiler”, Morgan Kaufmann, 2nd Edition – (2011)

  Microsoft Visual Studio Documentation, “C++ Run-Time Type Information” – (November 2016)

  “Technical Report on C++ Performance”, OpenRCE.org – (September 2006)

  “Reversing Microsoft Visual C++: Classes, Methods and RTTI”, OpenRCE.org – (September 2006)

  “Intel® 64 and IA-32 Architectures Optimization Reference Manual” – (April 2018)

3-bit Node Graph Architecture for Next-Gen Game Development

Speaking of my latest video game development project, yet an another milestone achieved. – Quite a tough one, indeed!

But first, please allow me to focus on some of the very basic mathematical logic definitions heavily used in software engineering, so that we can clearly understand what’s going on under the hood of a decent game development process.

Don’t worry, it’s not rocket science 😉

Some theory

All video games have gameplay mechanics based on logic. A game is “a set of story driven goals to achieve” from a programmer’s perspective.

When you open a chest, solve a puzzle or kill an enemy, you are actually triggering a logic unit that is predefined within the game code. Depending on game’s technical requirements and gameplay complexity, there can be thousands of these units forming a web of logic units.

Game programmers tend to use graph theory for defining and coding logic units. Each unit is symbolized with a simple geometric shape. A box, a circle, anything… And these units are connected to each other with links.

  “Logic units” (nodes) represent tasks that the player will perform.

  “Links” (lines) represent the relationship between the logic units.

Behaviour Analysis

A node graph architecture is almost identical to an electronic circuit. When you start executing a node graph code, you are actually branching from one component (node, in our case) to an another by the rules you’ve set for the logic units, just like electric current flowing from a resistor to a capacitor. And, as you can guess, this type of signal flow is 100% linear.

When the player accomplishes a task, the node related to that event will be “expired”. In other words, it will be dead. Expired nodes cannot be resurrected. Once they’re done, they will be ignored (skipped) during code execution, forever. – Which is unlikely in electronics! An electronic component, such as a resistor, a diode, etc. cannot be conditionally turned on/off.

Back to 2002 for a “classic” implementation: Flagger

During the “Culpa Innata” development sessions, we precisely knew that we needed a node graph architecture for handling game’s complex execution flow. Many discussions were held on the method of implementation. All members of the core management & development team were expert electric/electronics engineers with no experience in video game production [Reference], but me! As a video game programmer, my perspective towards node graph theory was naturally very different, contrary to their classical approaches. I wasn’t thinking in terms of voltage, current, etc., but focused on just one thing: optimized code execution.

Thanks to my Zilog Z80 and Motorola 68000 assembly language programming background, I offered the term “Flag” for the base logic unit (node), and teamed up with Mr. Mete Balcı for 3 weeks. In December 2002, we developed a tool called “Flagger”.

Pros and Cons

Flagger was a C++ code generator with a very handy visual interface similar to UE4’s current Blueprint approach. Using Flagger, we were able to add nodes, connect them to each other, program the logic behind the nodes/links, and even take printout of the whole node graph scenario. When the visual logic design process was over, it was just a matter of selecting “Generate C++ code” from the menu, and source code was generated within minutes.

Over the following years, Flagger evolved into a more sophisticated development tool capable of handling various scenarios. Although it was a very handy tool and saved many hours during “Culpa Innata” sessions, there were a few problems with the classical node graph theory that the implementation was based on;

  Flags were single threaded. Only one node was allowed to execute at a time. No multi-threading.

  Flags were expirable. When a task was done, related flag (node) was marked as “expired”, not deleted for the sake of logic integrity.

  Flags were not reusable. Once they were expired, there was no way of resurrecting them. – Inefficient memory usage, thanks to hundreds of expired nodes.

  Flags were heavily loaded with variables. Too many dialogue related “customized” variables were defined for special cases (exceptions). – Inefficient memory usage, once again.

  Flag execution flow wasn’t well optimized because of node-tree search algorithm. The more nodes we had, the longer it took to accomplish the search.

  Flag execution was linear. When a node was expired, the graph code was first searching for related nodes and then retriggering the whole diagram from the beginning, like an electronic circuit simulator. – Well, that was ideal for modeling a circuit, not for developing a video game!

A Modern Approach: 3-bit Worker!

13 years later, I have once again found an opportunity to dive into node graph theory, and just completed implementing a new architecture for my latest video game development project. Unlike Flagger, it is something extraordinary! It is very… atypical, unconventional, unorthodox… Well, whatever… You got it 😉

First of all, it has nothing to do with classical electric/electronic circuit theory. This time, I’m on my own, and approaching the problem as a software engineer. Everything I designed/coded is based on game requirement specifications. In other words, it is implemented with “practical usage” in mind.

  I have defined the basic logic unit (node), as a “worker”.(Due to functional similarities, I simply borrowed this term from Web Workers.)

  A worker is a background task with adjustable priority settings. It performs/responds like a hardware interrupt.

  Each worker is multi-threaded.

  Depending on conditional requirements, a worker can expire and/or live forever. If expired, it can be resurrected and/or reinitialized, while preserving its previous state. So, a worker is a 100% reusable node.

  Each worker uses only 3-bits! No additional variables, no references, nothing else. – (If necessary, a worker offers flexible architecture for additional variables. However, I find it totally unnecessary. 3-bits are more than enough!)

  Workers are object oriented. They can easily be inherited.

  Inherited workers don’t need additional logic variables. All child workers share the same 3-bit information that they inherited from their parents!

  Each worker has a time dependent linear workflow. Just like a reel-to-reel tape recorder, it can be played, paused, slowed down, accelerated, fast forwarded, rewinded, and stopped.

  Workers can be non-linearly linked to other Workers! Which means, node-tree search algorithms are no more necessary. There is no “main loop” for executing nodes! Code execution is pre-cached for optimum performance.

  Workers are optimized for event driven methodology. No matter how many concurrent active workers (threads) you have in the scene, there is practically no CPU overhead. Ideal for mobile scenarios.

  Workers are managed by “Managers”. A Manager is inherited from base Worker node. So, any worker can be assigned as a Manager.

  Workers can communicate with each other and access shared variables via Managers.

  Whole architecture is 100% platform independent. For a showcase, I’ve implemented it for Unreal Engine 4 using C++ and Blueprints. It can easily be ported to other game engines; such as Unity, CryEngine, etc.

  And, most important of all, everything is meticulously tested. – It’s working as of today 🙂

Any drawbacks?

Sure… Due to complexity of comprehending “a set of non-linearly linked time dependent linear nodes”, debugging can be a nightmare. As always, designing simplified and organized logic sets reduces potential problems. – I keep my logic sets neat and tidy 😉

So, what’s next?

Well, to be honest, since all theoretical stuff is done, I’ll switch to game content development. I am quite sure that I’ll keep on adding/removing things to my 3-bit node graph architecture. I will keep on improving it while preserving its simplicity, for sure.

“It is vain to do with more what can be done with less.” – (William of Ockham)

Unreal Engine 4 for everyone!

Thanks to Unreal community, a rumour about the NEXT BIG THING has been doing the rounds for almost a year. Well… It seems this is more than BIG!

Today, Epic announced that “Unreal Engine 4 is for everyone!”

Not kidding… Now, it is for everyone!

For $19/month (plus, 5% of gross revenue), we can have access to everything, including the Unreal Editor in ready-to-run form, and the engine’s complete C++ source code hosted on GitHub for collaborative development. – Oh My God!

You can read Tim Sweeney’s post on the Unreal Engine website for more information.

Hats off to Epic!

Epic is one of the few companies in the video game industry that really take care of what developers are saying, and I am a great fan of them because of their nice attitude towards us.

Thank you Epic!