Tag Archives: Unreal Engine

The Occult 2022.2

“Innovation Engineering” is a great influence on the growth and survival of today’s world. As a method for solving technology and business problems to innovate, adapt, and enter new markets using the expertise in emerging technologies, it is an essential tool for creative minds.

Much like every other industry in the world, video game development industry is open to new ideas and products more than ever. Actually, it has never been more critical. New technologies emerge, as do new business models. Thanks to innovation culture, video game development process has become more than classic software engineering by evolving into modern innovation engineering.

The latest release of The Occult is a meticulous combination of software and innovation engineering. In order to maximize productivity and efficiency, outside the box thinking was vital. I had to come up with creative and unorthodox problem solving methods. So, rather than waiting for a-ha! moments, I relied on cross-disciplinary research and practices for creativity, and classic engineering methodology for productivity; simply harnessing the best of both worlds… And it worked, charmingly!

The Occult version 2022.2 is released!

The Occult, a virtual gameplay programming ecosystem for Unreal Engine, is now available for both Intel and ARM architectures by offering improved cross-platform compatibility, exciting new features, enhancements, and a few bug fixes. As usual, all commercial and personal video game development projects that I am currently involved in will benefit from the new/enhanced features available in this release.

New Architecture

Multiprocessing System Architecture: User-defined number of virtual processors can perform sync/async operations at the same time. Multiprocessing read/write operations on shared code/memory are 100% thread-safe at both virtual and physical access level.

Multiple Stacks for Multiple Processors: A private stack is assigned for each vCPU, and a public stack is shared among all processors. The public stack is mainly used for level-scope variables, and can be accessed via Index/Data bus. Contrary to the public stack, private stacks do not use Occult’s bus architecture anymore. They are hardwired to parent vCPUs for exceptional low-latency access!

Virtual/Native Code Switching: Instant code switching at instruction-precision level of detail creates an opportunity to inject native C++ code into virtual Assembly code and vice versa, while creating endless possibilities for hardcore code optimization.

New Features

      • Stack based smart serialization.
      • User-defined Trigger algorithm injection architecture.
      • User-defined Pivot lookup table architecture.
      • CCR driven stack error report mechanism.
      • Blazingly fast 32-bit text/numerical Tag driven instance management.

Enhancements

      • All coprocessors (Trigger, Flagger, Pooler) are enhanced to take advantage of new multiprocessing system.
      • New speed/size optimized Flagger node graph search algorithm is written in 92 bytes (yes, ninety-two!) using inline Assembly; small enough to flatter physical cache.
      • Enhanced native Intel architecture support for PC.
      • Enhanced native ARM architecture support for Apple Silicon Macs and Nintendo Switch.

Improved Game Executable Compatibility

      • Windows 10 – (21H2)
      • Windows 11 – (22H2)
      • macOS Big Sur 11.7 – (Intel/Apple Silicon)
      • macOS Monterey 12.6 – (Apple Silicon only)
      • iOS 14
      • iOS 15
      • iOS 16
      • Nintendo Switch

Improved C++ Code Compatibility

      • Unreal Engine 4 – (4.26.2, 4.27.2)
      • Unreal Engine 5 – (5.0.3)

Improved C++ Development Environment Compatibility

      • Microsoft Visual Studio 2019 version 16.11.19
      • Microsoft Visual Studio 2022 version 17.3.5
      • Apple Xcode version 14.0.1
      • JetBrains Rider 2022.2.3

Bug Fixes

      • Macro driven Load/Save settings filename obfuscation.
      • Immediate Operand format standardization.
      • Trigger Smart 2-Way OnFound/OnNotFound exception implementation.

 

Availability: The Occult is not available for public use. It is a proprietary software.
Copyright Notice: The Occult VM architecture and all related tools are conceived, designed, implemented and owned by Mert Börü. The Occult logo is designed and crafted by Tuncay Talayman.

Credo quia absurdum!

The Occult 2022.1

In terms of productivity, efficiency, and profitability, coding via versatile range of “cross-platform” tools is a must for 21st century video game developer.

However, regarding Unreal Engine’s cross-platform features, debugging, fine-tuning and optimizing game code simultaneously on multiple platforms can sometimes be a real headache even for veteran developers. It is a complex, time-consuming and error-prone task by its very nature. And, this is where The Occult’s latest release comes into play…

The Occult version 2022.1 is released!

I am truly excited to announce the latest release of The Occult. It is now available for both Intel and ARM architectures by offering improved platform compatibility, exciting new features, various enhancements, and a few bug fixes. As of today, all commercial and personal video game development projects that I am currently involved in will benefit from the new/enhanced features available in this release.

Enhancements

      • New native ARM architecture support for Apple Silicon Macs and Nintendo Switch.
      • Enhanced native Intel architecture support for PC.
      • New speed/size optimized bytecode generation using inline Intel and ARM Assembly routines.
      • Enhanced 64-bit cross-platform visual debugger/logger for VM, Stack, and Database.
      • Enhanced cross-platform 4-way Unreal Engine logger.
      • Increased Trigger coprocessor performance.
      • New speed/size optimized C++ data structures for Trigger coprocessor: Definition, Action, and Component.
      • New level manager architecture design & implementation.
      • New VM firmware routines for asynchronous Actor/Component management.
      • Enhanced OOR driven ALU instruction set implementation.

Improved Game Executable Compatibility

      • Windows 10
      • Windows 11
      • macOS Big Sur – (Intel/Apple Silicon)
      • macOS Monterey – (Apple Silicon only)
      • iOS 14
      • iOS 15
      • Nintendo Switch

Improved C++ Code Compatibility

      • Unreal Engine 4 – (4.26.2, 4.27.2)
      • Unreal Engine 5 – (EA2)

Improved C++ Development Environment Compatibility

      • Microsoft Visual Studio 2019 version 16.11.9
      • Microsoft Visual Studio 2022 version 17.0.5
      • Apple Xcode version 13.2.1
      • JetBrains Rider for Unreal Engine 2021.3.1
      • JetBrains Rider 2022.1 – (will be released in Q1 2022)

Bug Fixes

      • Level World offset implementation.
      • Stack driven smart VM pointer architecture for object type detection.
      • Stack driven object pointer validation.

 

Availability: The Occult is not available for public use. It is a proprietary software.
Copyright Notice: The Occult VM architecture and all related tools are conceived, designed, implemented and owned by Mert Börü. The Occult logo is designed and crafted by Tuncay Talayman.

Vivre libre ou mourir!

The Occult

The Occult

A Synchronous/Asynchronous Virtual Machine Architecture
for Unreal Engine

Mert Börü
Video Game Developer
mertboru@gmail.com

Abstract

The “Occult” virtual machine architecture is a cross-platform C++ middleware designed and implemented for Unreal Engine. The primary feature of the Occult is to deliver super-optimized AAA grade games. By adding a very thin-layer on top of Unreal Engine C++ API, the Occult provides a virtual microcomputer architecture with a synchronous/asynchronous 64-bit CPU, various task specific 32-bit coprocessors, and a modern assembly language gameplay programming ecosystem for handling real-time memory/asset management and in-game abstract object/data streaming. It can be used as a standalone solution or as a supplementary tool for existing Blueprint/C++ projects.

Continue reading The Occult

Demystifying VMT and RTTI in Unreal Engine C++

(Cover Photo: ©1984 – ITV Granada
Jeremy Brett, “Sherlock Holmes” TV Series)

It is no secret that, due to nature of modern OOP guidelines, Unreal Engine C++ source code is full of “virtual functions”. It is also no secret that, calling a virtual function is inherently slower than calling a non-virtual function… So, what’s the catch?

A non-virtual call is simply a jump (JMP) to a compiled-in pointer. On the other hand, a virtual call requires at least an extra indexed dereference, and sometimes a fixup addition, which is stored in a lookup table known as Virtual Method Table (VMT). This is simply why a virtual call is always slower than a non-virtual call.

To avoid this overhead, compilers usually steer clear of VMT whenever the call can be resolved at compile time. However, due to complex nature of Inheritence based class structures used in modern game engines, using VMT is unavoidable in most cases.

Virtual Method Table (VMT)

An object’s VMT (also known as vtable) contains the addresses of the object’s dynamically bound methods. Method calls are performed by fetching the method’s address from the object’s virtual method table.

C++ compiler creates a separate VMT for each class. When an object is created, a virtual pointer (VPTR) to this table is added as a hidden member of this object. The compiler also generates hidden code in the constructors of each class to initialize new object’s VPTR to the address of its class’ virtual method table.

The virtual method table is same for all objects belonging to the same class, and is therefore typically shared between them. Objects belonging to type-compatible classes in an inheritance hierarchy will have virtual method tables with the same layout.

Tip: The C++ standards do not mandate exactly how dynamic dispatch must be implemented, but compilers generally use minor variations on the same basic model. The VMT is generally a good performance trade-off to achieve dynamic dispatch, but there are alternatives, such as Binary Tree Dispatch (BTD), with higher performance but different costs.

Speaking of hidden codes and VMTs in the constructors of each class, each C++ object should also contain additional information about its type. An object’s data type is a crucial information in terms of casting.

Run-Time Type Information  (RTTI)

RTTI is a feature of the C++ programming language that exposes information about an object’s data type at runtime. It can apply to simple data types, such as integers and characters, or to generic types.

Run-Time Type Information is available only for classes that are polymorphic, which means they have at least one virtual method. In practice, this is not a limitation because base classes must have a virtual destructor to allow objects of derived classes to perform proper cleanup if they are deleted from a base pointer.

RTTI is used in three main C++ language elements:

  The dynamic_cast operator: Used for conversion of polymorphic types.

  The typeid operator: Used for identifying the exact type of an object.

  The type_info class: Used to hold the type information returned by the typeid operator.

In order to perform cast-related operations listed above, RTTI heavily uses VMT. For example, given an object of a polymorphic class, a type_info object can be obtained through the use of the typeid operator.  In principle, this is a simple operation which involves finding the VMT, through that finding the most-derived class object of which the object is part, and then extracting a pointer to the type_info object from that object’s virtual function table (or equivalent).

In terms of performance, using the dynamic_cast operator is more expensive than type_info. Given a pointer to an object of a polymorphic class, a cast to a pointer to another base subobject of the same derived class object can be done using a dynamic_cast.  In principle, this operation involves finding the VMT, through that finding the most-derived class object of which the object is part, and then using type information associated with that object to determine if the conversion (cast) is allowed, and finally performing any required adjustments of the pointer.  In principle, this checking involves the traversal of a data structure describing the base classes of the most derived class.  Thus, the run-time cost of a dynamic_cast may depend on the relative positions in the class hierarchy of the two classes involved.

Tip: In the original C++ design, Bjarne Stroustrup did not include RTTI, because he thought this mechanism was often misused.

Hacking VMT and RTTI Information using UE C++

In order to gather “header” information (that contains VMT and RTTI data) in an Unreal Engine C++ class/object, I have written the following LogClassHeader() C++ function using Visual Studio 2019 version 16.3.8 for Unreal Engine version 4.23.1.

void UDebugCore::LogClassHeader(void* const pThis, size_t nSize)
{
  FString fsClassName = "NULL";
  FString fsObjectName = "NULL";
  UObject* const pCastedToUObject = (UObject*)(pThis);
  void** const pCastedToHeader = reinterpret_cast<void**>(pThis);

  if (pCastedToUObject)
  {
    fsClassName = pCastedToUObject->GetClass()->GetName();
    fsObjectName = pCastedToUObject->GetName();
  }

  if (pCastedToHeader)
  {
    for (size_t i = 0; i < nSize / sizeof(void*); i++)
    {
      MACRO_PRINTF("Pointer[%04zu] = 0x%p", i, pCastedToHeader[i])
    }
  }
}

This function has 2 input parameters:

  pThis:  Object to extract Class Header Info from
  nSize:  Size of Header

Calling the function is very easy. You simply insert your call into the constructor of the class that you would like to hack. For example, the following code gathers C++ header information of <APlayerControllerThirdPerson> class.

APlayerControllerThirdPerson::APlayerControllerThirdPerson()
{
  UDebugCore::LogClassHeader(this, sizeof(*this));
}

When you run the code, all pointers that are available (and used) in the header of <APlayerControllerThirdPerson> will be listed. And, in case you need, instance name and class type is stored in fsObjectName and fsClassName variables, as a bonus.

Pointer[0000] = 0x00007FFC8A3ECB00
Pointer[0001] = 0x0000BC5B00000231
Pointer[0002] = 0x000001648346AC00
Pointer[0003] = 0x000AAD5B000AAD5B
Pointer[0004] = 0x0000000000000000
Pointer[0005] = 0x00000164834864E0
Pointer[0006] = 0x00007FFCAAB2EF98
Pointer[0007] = 0x00007FFCA8708208
Pointer[0008] = 0x0000000F00000000
            .....
             ...
              .

So, what do these numbers mean? Well, all these pointers are addresses of the virtual functions, followed by some of the member variables. In order to understand which is which, we need to decipher the structure of the data set.

Here comes the tricky part! With each and every update to Visual Studio C++ compiler, the structure tends to get updated as well. In other words, the header structure of a C++ class changes with each major compiler update. Try to think of it as a “living organism”. As I’m typing this sentence now, a new update with a new C++ header structure may be on its way. So, it is up to you (!) to analyze what’s going on under the hood.

Good news is, we can gather <template> information about C++ class header structure from the official Microsoft patent documents! Although they are not up-to-date, I think it is a good practice to start the investigation using the source of information itself.

Here is some of the Microsoft patents which describe various parts of C++ implementation used in Visual Studio:

  US Patent #5410705: “Method for generating an object data structure layout for a class in a compiler for an object-oriented programming language”

  US Patent #5617569: “Method and system for implementing pointers to members in a compiler for an object-oriented programming language”

  US Patent #5754862: “Method and system for accessing virtual base classes”

  US Patent #5297284: “Method and system for implementing virtual functions and virtual base classes and setting a this pointer for an object-oriented programming language”

  US Patent #5371891: “Method for object construction in a compiler for an object-oriented programming language”

  US Patent #5603030: “Method and system for destruction of objects using multiple destructor functions in an object-oriented computer system”

So, what I’m offering is, good old-fashioned reverse engineering”.

– “Is such a challenge worth it?”

– “Um, yes!… If it doesn’t break you, it can make you.”

References:

  Bjarne Stroustrup, “A History of C++: 1979—1991”, p. 50 – (March 1993)

  Keith Cooper & Linda Torczon, “Engineering A Compiler”, Morgan Kaufmann, 2nd Edition – (2011)

  Microsoft Visual Studio Documentation, “C++ Run-Time Type Information” – (November 2016)

  “Technical Report on C++ Performance”, OpenRCE.org – (September 2006)

  “Reversing Microsoft Visual C++: Classes, Methods and RTTI”, OpenRCE.org – (September 2006)

  “Intel® 64 and IA-32 Architectures Optimization Reference Manual” – (April 2018)

3-bit Node Graph Architecture for Next-Gen Game Development

Speaking of my latest video game development project, yet an another milestone achieved. – Quite a tough one, indeed!

But first, please allow me to focus on some of the very basic mathematical logic definitions heavily used in software engineering, so that we can clearly understand what’s going on under the hood of a decent game development process.

Don’t worry, it’s not rocket science 😉

Some theory

All video games have gameplay mechanics based on logic. A game is “a set of story driven goals to achieve” from a programmer’s perspective.

When you open a chest, solve a puzzle or kill an enemy, you are actually triggering a logic unit that is predefined within the game code. Depending on game’s technical requirements and gameplay complexity, there can be thousands of these units forming a web of logic units.

Game programmers tend to use graph theory for defining and coding logic units. Each unit is symbolized with a simple geometric shape. A box, a circle, anything… And these units are connected to each other with links.

  “Logic units” (nodes) represent tasks that the player will perform.

  “Links” (lines) represent the relationship between the logic units.

Behaviour Analysis

A node graph architecture is almost identical to an electronic circuit. When you start executing a node graph code, you are actually branching from one component (node, in our case) to an another by the rules you’ve set for the logic units, just like electric current flowing from a resistor to a capacitor. And, as you can guess, this type of signal flow is 100% linear.

When the player accomplishes a task, the node related to that event will be “expired”. In other words, it will be dead. Expired nodes cannot be resurrected. Once they’re done, they will be ignored (skipped) during code execution, forever. – Which is unlikely in electronics! An electronic component, such as a resistor, a diode, etc. cannot be conditionally turned on/off.

Back to 2002 for a “classic” implementation: Flagger

During the “Culpa Innata” development sessions, we precisely knew that we needed a node graph architecture for handling game’s complex execution flow. Many discussions were held on the method of implementation. All members of the core management & development team were expert electric/electronics engineers with no experience in video game production [Reference], but me! As a video game programmer, my perspective towards node graph theory was naturally very different, contrary to their classical approaches. I wasn’t thinking in terms of voltage, current, etc., but focused on just one thing: optimized code execution.

Thanks to my Zilog Z80 and Motorola 68000 assembly language programming background, I offered the term “Flag” for the base logic unit (node), and teamed up with Mr. Mete Balcı for 3 weeks. In December 2002, we developed a tool called “Flagger”.

Pros and Cons

Flagger was a C++ code generator with a very handy visual interface similar to UE4’s current Blueprint approach. Using Flagger, we were able to add nodes, connect them to each other, program the logic behind the nodes/links, and even take printout of the whole node graph scenario. When the visual logic design process was over, it was just a matter of selecting “Generate C++ code” from the menu, and source code was generated within minutes.

Over the following years, Flagger evolved into a more sophisticated development tool capable of handling various scenarios. Although it was a very handy tool and saved many hours during “Culpa Innata” sessions, there were a few problems with the classical node graph theory that the implementation was based on;

  Flags were single threaded. Only one node was allowed to execute at a time. No multi-threading.

  Flags were expirable. When a task was done, related flag (node) was marked as “expired”, not deleted for the sake of logic integrity.

  Flags were not reusable. Once they were expired, there was no way of resurrecting them. – Inefficient memory usage, thanks to hundreds of expired nodes.

  Flags were heavily loaded with variables. Too many dialogue related “customized” variables were defined for special cases (exceptions). – Inefficient memory usage, once again.

  Flag execution flow wasn’t well optimized because of node-tree search algorithm. The more nodes we had, the longer it took to accomplish the search.

  Flag execution was linear. When a node was expired, the graph code was first searching for related nodes and then retriggering the whole diagram from the beginning, like an electronic circuit simulator. – Well, that was ideal for modeling a circuit, not for developing a video game!

A Modern Approach: 3-bit Worker!

13 years later, I have once again found an opportunity to dive into node graph theory, and just completed implementing a new architecture for my latest video game development project. Unlike Flagger, it is something extraordinary! It is very… atypical, unconventional, unorthodox… Well, whatever… You got it 😉

First of all, it has nothing to do with classical electric/electronic circuit theory. This time, I’m on my own, and approaching the problem as a software engineer. Everything I designed/coded is based on game requirement specifications. In other words, it is implemented with “practical usage” in mind.

  I have defined the basic logic unit (node), as a “worker”.(Due to functional similarities, I simply borrowed this term from Web Workers.)

  A worker is a background task with adjustable priority settings. It performs/responds like a hardware interrupt.

  Each worker is multi-threaded.

  Depending on conditional requirements, a worker can expire and/or live forever. If expired, it can be resurrected and/or reinitialized, while preserving its previous state. So, a worker is a 100% reusable node.

  Each worker uses only 3-bits! No additional variables, no references, nothing else. – (If necessary, a worker offers flexible architecture for additional variables. However, I find it totally unnecessary. 3-bits are more than enough!)

  Workers are object oriented. They can easily be inherited.

  Inherited workers don’t need additional logic variables. All child workers share the same 3-bit information that they inherited from their parents!

  Each worker has a time dependent linear workflow. Just like a reel-to-reel tape recorder, it can be played, paused, slowed down, accelerated, fast forwarded, rewinded, and stopped.

  Workers can be non-linearly linked to other Workers! Which means, node-tree search algorithms are no more necessary. There is no “main loop” for executing nodes! Code execution is pre-cached for optimum performance.

  Workers are optimized for event driven methodology. No matter how many concurrent active workers (threads) you have in the scene, there is practically no CPU overhead. Ideal for mobile scenarios.

  Workers are managed by “Managers”. A Manager is inherited from base Worker node. So, any worker can be assigned as a Manager.

  Workers can communicate with each other and access shared variables via Managers.

  Whole architecture is 100% platform independent. For a showcase, I’ve implemented it for Unreal Engine 4 using C++ and Blueprints. It can easily be ported to other game engines; such as Unity, CryEngine, etc.

  And, most important of all, everything is meticulously tested. – It’s working as of today 🙂

Any drawbacks?

Sure… Due to complexity of comprehending “a set of non-linearly linked time dependent linear nodes”, debugging can be a nightmare. As always, designing simplified and organized logic sets reduces potential problems. – I keep my logic sets neat and tidy 😉

So, what’s next?

Well, to be honest, since all theoretical stuff is done, I’ll switch to game content development. I am quite sure that I’ll keep on adding/removing things to my 3-bit node graph architecture. I will keep on improving it while preserving its simplicity, for sure.

“It is vain to do with more what can be done with less.” – (William of Ockham)

Unreal Engine 4 for everyone!

Thanks to Unreal community, a rumour about the NEXT BIG THING has been doing the rounds for almost a year. Well… It seems this is more than BIG!

Today, Epic announced that “Unreal Engine 4 is for everyone!”

Not kidding… Now, it is for everyone!

For $19/month (plus, 5% of gross revenue), we can have access to everything, including the Unreal Editor in ready-to-run form, and the engine’s complete C++ source code hosted on GitHub for collaborative development. – Oh My God!

You can read Tim Sweeney’s post on the Unreal Engine website for more information.

Hats off to Epic!

Epic is one of the few companies in the video game industry that really take care of what developers are saying, and I am a great fan of them because of their nice attitude towards us.

Thank you Epic!