Taming a Beast: Cache

(Cover Photo:  © Granger – “Lion Tamer”
The American animal tamer Clyde Beatty
performing in the 1930s.)

The processor’s caches are for the most part transparent to software. When enabled, instructions and data flow through these caches without the need for explicit software control. However, knowledge of the behavior of these caches may be useful in optimizing software performance. If not tamed wisely, these innocent cache mechanisms can certainly be a headache for novice C/C++ programmers.

First things first… Before I start with example C/C++ codes showing some common pitfalls and urban caching myths that lead to hard-to-trace bugs, I would like to make sure that we are all comfortable with ‘cache related terms’.

Terminology

In theory, CPU cache is a very high speed type of memory that is placed between the CPU and the main memory. (In practice, it is actually inside the processor, mostly operating at the speed of the CPU.) In order to improve latency of fetching information from the main memory, cache stores some of the information temporarily so that the next access to the same chunk of information is faster. CPU cache can store both ‘executable instructions’ and ‘raw data’.

“… from cache, instead of going back to memory.”

When the processor recognizes that an information being read from memory is cacheable, the processor reads an entire cache line into the appropriate cache slot (L1, L2, L3, or all). This operation is called a cache line fill. If the memory location containing that information is still cached when the processor attempts to access to it again, the processor can read that information from the cache instead of going back to memory. This operation is called a cache hit.

Hierarchical Cache Structure of the Intel Core i7 Processors

When the processor attempts to write an information to a cacheable area of memory, it first checks if a cache line for that memory location exists in the cache. If a valid cache line does exist, the processor (depending on the write policy currently in force) can write that information into the cache instead of writing it out to system memory. This operation is called a write hit. If a write misses the cache (that is, a valid cache line is not present for area of memory being written to), the processor performs a cache line fill, write allocation. Then it writes the information into the cache line and (depending on the write policy currently in force) can also write it out to memory. If the information is to be written out to memory, it is written first into the store buffer, and then written from the store buffer to memory when the system bus is available.

“… cached in shared state, between multiple CPUs.”

When operating in a multi-processor system, The Intel 64 and IA-32 architectures have the ability to keep their internal caches consistent both with system memory and with the caches in other processors on the bus. For example, if one processor detects that another processor intends to write to a memory location that it currently has cached in shared state, the processor in charge will invalidate its cache line forcing it to perform a cache line fill the next time it accesses the same memory location. This type of internal communication between the CPUs is called snooping.

And finally, translation lookaside buffer (TLB) is a special type of cache designed for speeding up address translation for virtual memory related operations. It is a part of the chip’s memory-management unit (MMU). TLB keeps track of where virtual pages are stored in physical memory, thus speeds up ‘virtual address to physical address’ translation by storing a lookup page-table.

So far so good… Let’s start coding, and shed some light on urban caching myths. 😉

 

 

How to Guarantee Caching in C/C++

To be honest, under normal conditions, there is absolutely no way to guarantee that the variable you defined in C/C++ will be cached. CPU cache and write buffer management are out of scope of the C/C++ language, actually.

Most programmers assume that declaring a variable as constant will automatically turn it into something cacheable!

const int nVar = 33;

As a matter of fact, doing so will tell the C/C++ compiler that it is forbidden for the rest of the code to modify the variable’s value, which may or may not lead to a cacheable case. By using a const, you simply increase the chance of getting it cached. In most cases, compiler will be able to turn it into a cache hit. However, we can never be sure about it unless we debug and trace the variable with our own eyes.

 

 

How to Guarantee No Caching in C/C++

An urban myth states that, by using volatile type qualifier, it is possible to guarantee that a variable can never be cached. In other words, this myth assumes that it might be possible to disable CPU caching features for specific C/C++ variables in your code!

volatile int nVar = 33;

Actually, defining a variable as volatile prevents compiler from optimizing it, and forces the compiler to always refetch (read once again) the value of that variable from memory. But, this may or may not prevent it from caching, as volatile has nothing to do with CPU caches and write buffers, and there is no standard support for these features in C/C++.

So, what happens if we declare the same variable without const or volatile?

int nVar = 33;

Well, in most cases, your code will be executed and cached properly. (Still not guaranteed though.) But, one thing for sure… If you write ‘weird’ code, like the following one, then you are asking for trouble!

int nVar = 33;
while (nVar == 33)
{
   . . .
}

In this case, if the optimization is enabled, C/C++ compiler may assume that nVar never changes (always set to 33) due to no reference of nVar in loop’s body, so that it can be replaced with true for the sake of optimizing while condition.

while (true)
{
   . . .
}

A simple volatile type qualifier fixes the problem, actually.

volatile int nVar = 33;

 

 

What about Pointers?

Well, handling pointers is no different than taking care of simple integers.

Case #1:

Let’s try to evaluate the while case mentioned above once again, but this time with a Pointer.

int nVar = 33;
int *pVar = (int*) &nVar;
while (*pVar)
{
   . . .
}

In this case,

  nVar is declared as an integer with an initial value of 33,
  pVar is assigned as a Pointer to nVar,
  the value of nVar (33) is gathered using pointer pVar, and this value is used as a conditional statement in while loop.

On the surface there is nothing wrong with this code, but if aggressive C/C++ compiler optimizations are enabled, then we might be in trouble. – Yes, some compilers are smarter than others! 😉

Due to fact that the value of pointer variable has never been modified and/or accessed through the while loop, compiler may decide to optimize the frequently called conditional statement of the loop. Instead of fetching *pVar (value of nVar) each time from the memory, compiler might think that keeping this value in a register might be a good idea. This is known as ‘software caching’.

Now, we have two problems here:

1.) Values in registers are ‘hardware cached’. (CPU cache can store both instructions and data, remember?) If somehow, software cached value in the register goes out of sync with the original one in memory, the CPU will never be aware of this situation and will keep on caching the old value from hardware cache. – CPU cache vs software cache. What a mess!

Tip: Is that scenario really possible?! – To be honest, no. During the compilation process, the C/C++ compiler should be clever enough to foresee that problem, if-and-only-if *pVar has never been modified in loop’s body. However, as a programmer, it is our responsibility to make sure that compiler should be given ‘properly written code’ with no ambiguous logic/data treatment. So, instead of keeping our fingers crossed and expecting miracles from the compiler, we should take complete control over the direction of our code. Before making assumptions on how our code will be compiled, we should first make sure that our code is crystal clear.

2.) Since the value of nVar has never been modified, the compiler can even go one step further by assuming that the check against *pVar can be casted to a Boolean value, due to its usage as a conditional statement. As a result of this optimization, the code above might turn into this:

int nVar = 33;
int *pVar = (int*) &nVar;

if (*pVar)
{
   while (true)
   {
      . . .
   }
}

Both problems detailed above, can be fixed by using a volatile type qualifier. Doing so prevents the compiler from optimizing *pVar, and forces the compiler to always refetch the value from memory, rather than using a compiler-generated software cached version in registers.

int nVar = 33;
volatile int *pVar = (int*) &nVar;
while (*pVar)
{
   . . .
}

Case #2:

Here comes an another tricky example about Pointers.

const int nVar = 33;
int *pVar = (int*) &nVar;
*pVar = 0;

In this case,

  nVar is declared as a ‘constant’ variable,
  pVar is assigned as a Pointer to nVar,
  and, pVar is trying to change the ‘constant’ value of nVar!

Under normal conditions, no C/C++ programmer would make such a mistake, but for the sake of clarity let’s assume that we did.

If aggressive optimization is enabled, due to fact that;

a.) Pointer variable points to a constant variable,

b.) Value of pointer variable has never been modified and/or accessed,

some compilers may assume that the pointer can be optimized for the sake of software caching. So, despite *pVar = 0, the value of nVar may never change.

Is that all? Well, no… Here comes the worst part! The value of nVar is actually compiler dependent. If you compile the code above with a bunch of different C/C++ compilers, you will notice that in some of them nVar will be set to 0, and in some others set to 33 as a result of ‘ambiguous’ code compilation/execution. Why? Simply because, every compiler has its own standards when it comes to generating code for ‘constant’ variables. As a result of this inconsistent situation, even with just a single constant variable, things can easily get very complicated.

Tip: The best way to fix ‘cache oriented compiler optimization issues’, is to change the way you write code, with respect to tricky compiler specific optimizations in mind. Try to write crystal clear code. Never assume that compiler knows programming better than you. Always debug, trace, and check the output… Be prepared for the unexpected!

Fixing such brute-force compiler optimization issues is quite easy. You can get rid of const type qualifier,

const int nVar = 33;

or, replace const with volatile type qualifier,

volatile int nVar = 33;

or, use both!

const volatile int nVar = 33;
Tip: ‘const volatile’ combination is commonly used on embedded systems, where hardware registers that can be read and are updated by the hardware, cannot be altered by software. In such cases, reading hardware register’s value is never cached, always refetched from memory.

 

 

Rule of Thumb

Using volatile is absolutely necessary in any situation where compiler could make wrong assumptions about a variable keeping its value constant, just because a function does not change it itself. Not using volatile would create very complicated bugs due to the executed code that behaves as if the value did not change – (It did, indeed).

If code that works fine, somehow fails when you;

  Use cross compilers,
  Port code to a different compiler,
  Enable compiler optimizations,
  Enable interrupts,

make sure that your compiler is NOT over-optimizing variables for the sake of software caching.

Please keep in mind that, volatile has nothing to do with CPU caches and write buffers, and there is no standard support for these features in C/C++. These are out of scope of the C/C++ language, and must be solved by directly interacting with the CPU core!

 

 

Getting Hands Dirty via Low-Level CPU Cache Control

Software driven hardware cache management is possible. There are special ‘privileged’ Assembler instructions to clean, invalidate, flush cache(s), and synchronize the write buffer. They can be directly executed from privileged modes. (User mode applications can control the cache through system calls only.) Most compilers support this through built-in/intrinsic functions or inline Assembler.

The Intel 64 and IA-32 architectures provide a variety of mechanisms for controlling the caching of data and instructions, and for controlling the ordering of reads/writes between the processor, the caches, and memory.

These mechanisms can be divided into two groups:

  Cache control registers and bits: The Intel 64 and IA-32 architectures define several dedicated registers and various bits within control registers and page/directory-table entries that control the caching system memory locations in the L1, L2, and L3 caches. These mechanisms control the caching of virtual memory pages and of regions of physical memory.

  Cache control and memory ordering instructions: The Intel 64 and IA-32 architectures provide several instructions that control the caching of data, the ordering of memory reads and writes, and the prefetching of data. These instructions allow software to control the caching of specific data structures, to control memory coherency for specific locations in memory, and to force strong memory ordering at specific locations in a program.

How does it work?

The Cache Control flags and Memory Type Range Registers (MTRRs) operate hierarchically for restricting caching. That is, if the CD flag of control register 0 (CR0) is set, caching is prevented globally. If the CD flag is clear, the page-level cache control flags and/or the MTRRs can be used to restrict caching.

Tip: The memory type range registers (MTRRs) provide a mechanism for associating the memory types with physical-address ranges in system memory. They allow the processor to optimize operations for different types of memory such as RAM, ROM, frame-buffer memory, and memory-mapped I/O devices. They also simplify system hardware design by eliminating the memory control pins used for this function on earlier IA-32 processors and the external logic needed to drive them.

If there is an overlap of page-level and MTRR caching controls, the mechanism that prevents caching has precedence. For example, if an MTRR makes a region of system memory uncacheable, a page-level caching control cannot be used to enable caching for a page in that region. The converse is also true; that is, if a page-level caching control designates a page as uncacheable, an MTRR cannot be used to make the page cacheable.

In cases where there is a overlap in the assignment of the write-back and write-through caching policies to a page and a region of memory, the write-through policy takes precedence. The write-combining policy -which can only be assigned through an MTRR or Page Attribute Table (PAT)– takes precedence over either write-through or write-back. The selection of memory types at the page level varies depending on whether PAT is being used to select memory types for pages.

Tip: The Page Attribute Table (PAT) extends the IA-32 architecture’s page-table format to allow memory types to be assigned to regions of physical memory based on linear address mappings. The PAT is a companion feature to the MTRRs; that is, the MTRRs allow mapping of memory types to regions of the physical address space, where the PAT allows mapping of memory types to pages within the linear address space. The MTRRs are useful for statically describing memory types for physical ranges, and are typically set up by the system BIOS. The PAT extends the functions of the PCD and PWT bits in page tables to allow all five of the memory types that can be assigned with the MTRRs (plus one additional memory type) to also be assigned dynamically to pages of the linear address space.

 

 

CPU Control Registers

Generally speaking, control registers (CR0, CR1, CR2, CR3, and CR4) determine operating mode of the processor and the characteristics of the currently executing task. These registers are 32 bits in all 32-bit modes and compatibility mode. In 64-bit mode, control registers are expanded to 64 bits.

The MOV CRn instructions are used to manipulate the register bits. These instructions can be executed only when the current privilege level is 0.

Instruction 64-bit Mode Legacy Mode Description
MOV r32, CR0–CR7 Valid Move control register to r32.
MOV r64, CR0–CR7 Valid Move extended control register to r64.
MOV r64, CR8 Valid Move extended CR8 to r64.
MOV CR0–CR7, r32 Valid Move r32 to control register.
MOV CR0–CR7, r64 Valid Move r64 to extended control register.
MOV CR8, r64 Valid Move r64 to extended CR8.
Tip: When loading control registers, programs should not attempt to change the reserved bits; that is, always set reserved bits to the value previously read. An attempt to change CR4’s reserved bits will cause a general protection fault. Reserved bits in CR0 and CR3 remain clear after any load of those registers; attempts to set them have no impact.

The Intel 64 and IA-32 architectures provide the following cache-control registers and bits for use in enabling or restricting caching to various pages or regions in memory:

  CD flag (bit 30 of control register CR0): Controls caching of system memory locations. If the CD flag is clear, caching is enabled for the whole of system memory, but may be restricted for individual pages or regions of memory by other cache-control mechanisms. When the CD flag is set, caching is restricted in the processor’s caches (cache hierarchy) for the P6 and more recent processor families. With the CD flag set, however, the caches will still respond to snoop traffic. Caches should be explicitly flushed to insure memory coherency. For highest processor performance, both the CD and the NW flags in control register CR0 should be cleared. To insure memory coherency after the CD flag is set, the caches should be explicitly flushed. (Setting the CD flag for the P6 and more recent processor families modify cache line fill and update behaviour. Also, setting the CD flag on these processors do not force strict ordering of memory accesses unless the MTRRs are disabled and/or all memory is referenced as uncached.)

  NW flag (bit 29 of control register CR0): Controls the write policy for system memory locations. If the NW and CD flags are clear, write-back is enabled for the whole of system memory, but may be restricted for individual pages or regions of memory by other cache-control mechanisms.

  PCD and PWT flags (in paging-structure entries): Control the memory type used to access paging structures and pages.

  PCD and PWT flags (in control register CR3): Control the memory type used to access the first paging structure of the current paging-structure hierarchy.

  G (global) flag in the page-directory and page-table entries: Controls the flushing of TLB entries for individual pages.

  PGE (page global enable) flag in control register CR4: Enables the establishment of global pages with the G flag.

  Memory type range registers (MTRRs): Control the type of caching used in specific regions of physical memory.

  Page Attribute Table (PAT) MSR: Extends the memory typing capabilities of the processor to permit memory types to be assigned on a page-by-page basis.

  3rd Level Cache Disable flag (bit 6 of IA32_MISC_ENABLE MSR): Allows the L3 cache to be disabled and enabled, independently of the L1 and L2 caches. (Available only in processors based on Intel NetBurst microarchitecture)

  KEN# and WB/WT# pins (Pentium processor): Allow external hardware to control the caching method used for specific areas of memory. They perform similar (but not identical) functions to the MTRRs in the P6 family processors.

  PCD and PWT pins (Pentium processor): These pins (which are associated with the PCD and PWT flags in control register CR3 and in the page-directory and page-table entries) permit caching in an external L2 cache to be controlled on a page-by-page basis, consistent with the control exercised on the L1 cache of these processors. (The P6 and more recent processor families do not provide these pins because the L2 cache is embedded in the chip package.)

 

 

How to Manage CPU Cache using Assembly Language

The Intel 64 and IA-32 architectures provide several instructions for managing the L1, L2, and L3 caches. The INVD and WBINVD instructions are privileged instructions and operate on the L1, L2 and L3 caches as a whole. The PREFETCHh, CLFLUSH and CLFLUSHOPT instructions and the non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) offer more granular control over caching, and are available to all privileged levels.

The INVD and WBINVD instructions are used to invalidate the contents of the L1, L2, and L3 caches. The INVD instruction invalidates all internal cache entries, then generates a special-function bus cycle that indicates that external caches also should be invalidated. The INVD instruction should be used with care. It does not force a write-back of modified cache lines; therefore, data stored in the caches and not written back to system memory will be lost. Unless there is a specific requirement or benefit to invalidating the caches without writing back the modified lines (such as, during testing or fault recovery where cache coherency with main memory is not a concern), software should use the WBINVD instruction.

In theory, WBINVD instruction performs the following steps:

WriteBack(InternalCaches);
Flush(InternalCaches);
SignalWriteBack(ExternalCaches);
SignalFlush(ExternalCaches);
Continue;

The WBINVD instruction first writes back any modified lines in all the internal caches, then invalidates the contents of both the L1, L2, and L3 caches. It ensures that cache coherency with main memory is maintained regardless of the write policy in effect (that is, write-through or write-back). Following this operation, the WBINVD instruction generates one (P6 family processors) or two (Pentium and Intel486 processors) special-function bus cycles to indicate to external cache controllers that write-back of modified data followed by invalidation of external caches should occur. The amount of time or cycles for WBINVD to complete will vary due to the size of different cache hierarchies and other factors. As a consequence, the use of the WBINVD instruction can have an impact on interrupt/event response time.

The PREFETCHh instructions allow a program to suggest to the processor that a cache line from a specified location in system memory be prefetched into the cache hierarchy.

The CLFLUSH and CLFLUSHOPT instructions allow selected cache lines to be flushed from memory. These instructions give a program the ability to explicitly free up cache space, when it is known that cached section of system memory will not be accessed in the near future.

The non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) allow data to be moved from the processor’s registers directly into system memory without being also written into the L1, L2, and/or L3 caches. These instructions can be used to prevent cache pollution when operating on data that is going to be modified only once before being stored back into system memory. These instructions operate on data in the general-purpose, MMX, and XMM registers.

 

 

How to Disable Hardware Caching

To disable the L1, L2, and L3 caches after they have been enabled and have received cache fills, perform the following steps:

1.) Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0.

2.) Flush all caches using the WBINVD instruction.

3.) Disable the MTRRs and set the default memory type to uncached or set all MTRRs for the uncached memory type.

The caches must be flushed (step 2) after the CD flag is set to insure system memory coherency. If the caches are not flushed, cache hits on reads will still occur and data will be read from valid cache lines.
The intent of the three separate steps listed above address three distinct requirements:

a.) Discontinue new data replacing existing data in the cache,

b.) Ensure data already in the cache are evicted to memory,

c.) Ensure subsequent memory references observe UC memory type semantics. Different processor implementation of caching control hardware may allow some variation of software implementation of these three requirements.

Setting the CD flag in control register CR0 modifies the processor’s caching behaviour as indicated, but setting the CD flag alone may not be sufficient across all processor families to force the effective memory type for all physical memory to be UC nor does it force strict memory ordering, due to hardware implementation variations across different processor families. To force the UC memory type and strict memory ordering on all of physical memory, it is sufficient to either program the MTRRs for all physical memory to be UC memory type or disable all MTRRs.

Tip: For the Pentium 4 and Intel Xeon processors, after the sequence of steps given above has been executed, the cache lines containing the code between the end of the WBINVD instruction and before the MTRRS have actually been disabled may be retained in the cache hierarchy. Here, to remove code from the cache completely, a second WBINVD instruction must be executed after the MTRRs have been disabled.

 

 

References:

  Richard Blum, “Professional Assembly Language”, Wrox Publishing – (2005)

  Keith Cooper & Linda Torczon, “Engineering A Compiler”, Morgan Kaufmann, 2nd Edition – (2011)

  Alexey Lyashko, “Mastering Assembly Programming”, Packt Publishing Limited – (2017)

  “Intel® 64 and IA-32 Architectures Optimization Reference Manual” – (April 2018)

  “Intel® 64 and IA-32 Architectures Software Developer’s Manual: Basic Architecture” – (November 2018)

  “Intel® 64 and IA-32 Architectures Software Developer’s Manual: Instruction Set Reference A-Z” – (November 2018)

  “Intel® 64 and IA-32 Architectures Software Developer’s Manual: System Programming Guide” – (November 2018)

  “Intel® 64 and IA-32 Architectures Software Developer’s Manual: Model-Specific Registers” – (November 2018)

 

Blood, Sweat, and Pixels

Nowadays, I’m reading a tiny HarperCollins book called “Blood, Sweat, and Pixels”, written by Jason Schreier.

It is a journey through ‘development hell’ – a media industry jargon for a project that remains in development (often moving between different crews, scripts, or studios) without progressing to completion. In other words, ‘a never-ending project’.

So, if you have ever wondered what it takes to be a video game developer, don’t read this book! It must be the very last introductory document you should be referring to. – Just kidding! 😉

“If I ascend up into heaven, you are there: if I make my bed in hell, behold, you are there.” – (Psalm 139:8)

Jason Schreier takes readers on a fascinating odyssey behind the scenes of video game development. Ultimately, a tribute to the dedicated diehards and unsung heroes who scale mountains of obstacles in their quests to create the best games imaginable.

Life is hard for video game developers. Very hard, indeed… Thanks to nice small touches and heavenly surprises, life is more bearable. This book is certainly one of them. Thank you Jason!

Back to coding… 😉

(L)egocentric day in Paris

During our recent summer holiday in Paris, my beloved wife and daughter decided to take a day off and go out for shopping without me. – What a gift! I felt very privileged to have been given back the opportunity of being a ‘freeman’, despite the fact that it was only for a few hours 😉

Against ticking clock, I decided to feed the never ever growing up child within me, and dedicate the whole day to visiting all official LEGO shops in Paris. – Sounds crazy? Well, if you are a LEGO addict like me, then you know what I mean…

When I googled for LEGO shops, I’ve realized that most of the information available online is either misleading or outdated. After a couple of trial and errors, plus many hours wasted on road, I have managed to visit all 3 official LEGO stores in Paris.

 The LEGO Store – Les Halles

 The LEGO Store – So Ouest

 The LEGO Store – Disneyland

During the metro trip back to hotel, I promised myself to write a clear blog post about all the information that I had gathered, so that it could be useful to other LEGO fans visiting Paris.

So, here we go!

The LEGO Store – Les Halles

This is a brand new 400 m² LEGO store established in April 2016. It is located at the center of Forum des Halles shopping mall.

The main entrance of official LEGO store 'Les Halles' in Paris
The main entrance of official LEGO store ‘Les Halles’ in Paris
Directions: Take Metro Line 4 (light purple), and stop at ‘Les Halles’ station. There are more than one exits at this station. No worries! Use whichever you like. Using the stairs and escalators, you will either find yourself in a huge underground shopping mall, or in the middle of a crowded street. In both cases, you are at the heart of the Forum des Halles shopping mall. The LEGO store is at street level (Level 0), on the left-hand side of the main entrance. It is the largest shop on this level. – (Link: Google Maps)
My daughter, Dila, is amazed by the beauty of mega Notre-Dame Cathedral construction built in LEGO bricks!
My daughter, Dila, is amazed by the beauty of mega Notre-Dame Cathedral construction built in LEGO bricks!

Les Halles LEGO store has a breathtaking showcase. On the left, the store welcomes you with a huge French kitchen set built in LEGO bricks. While looking at the cook, oven, pots, colourful cupcakes, and many other well-thought-out details, it is quite easy to be bewildered while dreaming in front of the showcase. When you walk to the right hand side, you’ll notice two more mega LEGO constructions; The Notre-Dame Cathedral and The Arc de Triomphe. Though both sets demonstrate top-notch brick architecture wizardry, the cathedral construction is a truly remarkable piece of art. The amount of detail –and even humour– that goes into making this set is unreal; tiny goblins and knights walking at the roof speak for themselves 😉

When you go into the store, the first thing you’ll notice is the wall-to-wall layout of shelves. They are clearly categorized with hundreds of boxed LEGO products on them. When you are at the entrance (facing the point of sales), the Duplo products (for babies) are on the left, and the Technic series (for teenagers and adults) are on the right, which is a panoramic categorization from left to right based on age. Simple and effective.

One thing that I really loved is the location of the point of sale. An ellipse shaped desk (with many cash registers on it) is right in the middle of the store! No matter how crowded the shop is, you can always find a shortcut to reach the cashiers.

* This was a real lifesaver during my second visit to this store. I brought my wife and daughter with me on a Saturday afternoon, and the store was so crowded that we couldn’t walk without bumping each other. That day, I really appreciated the wise decision of locating the point of sales in the hotspot of the store.

Last but not least, here comes the jewel in the crown: The staff members. They are simply amazing! Unlike typical salespeople, they are 100% enthusiastic about what they are selling, and specialized in various product categories. These young ladies/gentlemen are always smiling, willing to assist, and very polite.

* And, did I mention that all the French staff members are fluent in English? – Oh, yes!

I have to mention one staff member in particular; Mademoiselle Samantha. For almost half an hour, she patiently answered all my technical questions, visited the storage room (behind the store) a few times, checked the availability of hard-to-find items on my shopping list, made a phone call to one of the other official LEGO stores (So Ouest), reserved the missing items for me, and finally wrote down the directions to make sure that I’ll find my way to that shop safe and secure… Thank you very much, indeed!

The LEGO Store – So Ouest

This is a 300 m² LEGO store established in October 2012. It is located at So Ouest shopping mall in Levallois-Perret,  a commune in the northwestern suburbs of Paris. Unlike the  previous LEGO store, this one is not at the center of Paris. However, if you follow my directions below, it will take approximately half an hour to get there. It’s not really far away…

So, is this store really worth visiting? Absolutely! This is a fantastic LEGO store in every way. Make sure that it is on your list.

The showcase of official LEGO store 'So Ouest' in Levallois-Perret, Paris
The showcase of official LEGO store ‘So Ouest’ in Levallois-Perret, Paris
Directions: Take Metro Line 14 (dark purple), and stop at ‘Saint Lazare’ station. Following ‘Île-de-France’ (Parisian region) directions and ‘SNCF Transilien’ (suburban train) icons on the signs, walk to the ‘Gare Saint Lazare’ railway station. Don’t worry, it will take 3-4 minutes to get there. Once you are at the main railway station, go up to the 2nd floor and find the ‘Île-de-France’ ticket office. Buy a ticket for line L. (Since this is a suburban line, there will be no seat numbers on your ticket). Go to the main hall, and check for the next train from the split-flap departure display. Your destination is ‘Clichy-Levallois’ – (line L, remember?). After leaving ‘Gare Saint Lazare’, it is the 2nd station on this line. It will take approximately 10 minutes to get there. When you stop at the ‘Gare de Clichy-Levallois’ station, follow the ‘Centrum’ signs. You will find yourself at the entrance of the train station. Now, your destination is So Ouest shopping mall! In order to get there, follow the ‘Rue Jean-Jaurès’ way for a minute, turn left to ‘Rue Victor Hugo’, walk for 3 minutes, and finally turn right to ‘Rue d’Alsace’. You’ll notice a huge shopping mall at the right-hand side of the street. That is So Ouest. Go in there, take the escalator down to B1, and Voilà! – (Link: Google Maps)

Compared to previous one, So Ouest LEGO store has a less-than-moderate showcase. No mega constructions to speak of, actually. However, the warm demonstration of recently introduced LEGO sets at the showcase instantly grabs your attention, and humbly welcomes you inside… A classy way of making you feel “Let’s see what they have here!” 😉

The "Pick-a-Brick Wall" at LEGO store 'So Ouest' in Levallois-Perret, Paris
The “Pick-a-Brick Wall” at LEGO store ‘So Ouest’ in Levallois-Perret, Paris

Contrary to the humble first impression of the store, the product range is simply premium. Don’t let the size and modest atmosphere of the shop fool you; they have everything here for you. All products are sorted by themes. Even on your first visit to this store, it is very easy to find what you are looking for. Everything is self-explanatory.

The staff members are superb! They are very polite, always ready to assist you, and willing to speak about the products that you are interested in. Somehow, you feel that you are being taken care of, and it makes you feel comfortable. From a customer point of view, this is something truly beyond the dated customer relationship lessons taught in business schools. It’s really nice to know that someone is keeping an eye on you.

Speaking of the staff members, please allow me to share my amazing experience with you… As I was gazing at the recently released Porsche 911 GT3 RS Technic set, I humbly came closer to one of the staff members, picked up a list from my pocket, and asked him if any of the hard-to-find items on my list was available, by any chance. The gentleman cheerfully looked at me, and said: “Oh, you must be the guy from Turkey! We were expecting you… Mademoiselle Samantha (from Les Halles store) phoned an hour ago, and told me about the items you are looking for. Your orders are ready, Sir!”

After the initial shock, I stuttered: “Well… Thank you!”

Thanks to Monsieur Damien, every item on my list was already collected from the inventory room, and packed. Besides being a very professional staff member, he was also a nice gentleman to talk with. His English was better than mine. For almost half an hour, we geeked out over the discontinued products, second hand LEGO market in France, and latest additions to my daughter’s LEGO train set collection. – A truly exceptional experience. Merci!

The LEGO Store – Disneyland

This is a huge LEGO store established in 2014. The name speaks for itself, the store is in the heart of Disneyland, Paris. Believe it or not, this is the most crowded LEGO shop I’ve ever visited in my life. Thanks to Disneyland’s reputation, this must be one of the most popular LEGO shops in Europe.

The main entrance of official LEGO store ‘Disneyland’, Paris
The main entrance of official LEGO store ‘Disneyland’, Paris
Directions: Take RER Line A (red), and stop at the last station, ‘Marne-la-Vallée’. This station is also known as ‘Parcs Disneyland’. (Both names are used on signs, in addition to a cute Mickey Mouse symbol.) When you leave the train, use the escalators, and go upstairs. If you have your train ticket with you, pass through the turnstiles. (If you don’t have any tickets, you are stuck! No ticket offices available around. You must find the ticket collector, and ask for help.) Leave the station, go out, and make a U-turn to left. Your destination is ‘The Village’ -aka ‘Disney Village’- a small virtual town where you can shop & dine. You don’t need a Disneyland ticket to get there. It’s free, and the LEGO Store is ahead of you. – (Link: Google Maps)
My daughter, Dila, so cheerful in front of the LEGO store ‘Disneyland’, Paris
My daughter, Dila, so cheerful in front of the LEGO store ‘Disneyland’, Paris

When you look from the outside, this store looks like an ordinary LEGO shop. The showcase is quite good, with a huge LEGO logo and a few 2.5D canvas paintings built in bricks. At first sight, it looks like there is nothing special in here…

However, when go in there, you realize how big the store is and immediately forget about the lacking showcase. The mega LEGO structures simply knock your socks off. They are everywhere! Pete’s Dragon hanging from the ceiling, an authentic life-size reproduction of R2-D2, a magnificent The Sorcerer’s Apprentice visual composition from “Fantasia” with Mickey wearing the blue wizard hat… These are spectacular items. Frankly, even better than the ones at ‘Les Halles’ store!

The product range is superb, just like the other stores I have mentioned. However, stock availability is a serious problem here. I was unable to find quite a number of products which were available in the other LEGO stores, such as pencil box, eraser, pen set, a bunch of recently released Technic sets, and almost all Power Functions products! When I asked the reason for missing items, staff members complained about ‘customer circulation vs lack of space’. I am not quite sure if this is an acceptable excuse.

Speaking of the staff members at LEGO Disneyland store, I have to say that they are simply the weakest link here. They are not smiling, not enjoying what they do, and keep themselves away from the customers. Somehow, they chat with each other by the exit. Nobody cares about you. Yep, I know that it is very difficult to manage such a huge store with such a large number of customers in it, but what I’m complaining about is more than that. When you ask a few questions, all you get is nothing more than “Yes”, “No”, or “I don’t know”. Being aware of the fact that Disneyland is a place most people visit once (and never come back again at least for a few years), I don’t think that you are welcomed as ‘loyal customers’ here. If these staff members think that people come and go, and more will come tomorrow no matter how they treat customers, I’m afraid that is a serious threat to LEGO’s reputation. As a lifetime loyal LEGO fan, I’m truly disappointed.

Conclusion

I love Paris! This was my second visit to the romantic city, and I’m planning to do it again and again, more frequently. For my next visit, I have 2 official LEGO stores on my list that I would love to revisit; ‘Les Halles’ and ‘So Ouest’. Great shopping experience in both cases. Strongly recommended.

May the force LEGO bricks be with you! 😉

New Video Game Project: Annual Information Update 2015

December 1, 2013 marks the beginning of my new video game project. The math is simple; I have been working on it for 2 years, precisely. Designing, developing and co-producing… A lot of work has been done, and many more still in progress. All tough tasks. Mostly game design related, such as 3-bit node graph architecture. Plus, a lot of coding…

It has been a busy year, indeed. – So, what’s new?

Workflow 3.0

The most distinguishing element of this project –optimized game development workflow– has been upgraded to version 3. This is something that I’m really proud of. Simply because, it is;

 more cost- and time-efficient,

more artwork/cinematography oriented,

 100% compatible with both old & next-gen workflows.

This year, I mostly concentrated on the last item. As we all know, global video game industry is having a hard time trying to make a quantum leap to next-gen video games, as well as keeping the cash flow pumping. Let’s face it, upgrading a business model while doing business is risky! You need to educate developers, reorganize teamwork and improve asset management, while keeping an eye on the ongoing projects and meeting the deadlines. A kind of “make something new, and keep the business running old-fashioned way” situation.

“…using both current and upcoming tools/assets.”

This is exactly where my upgraded workflow comes handy. In simple terms, it is a next-gen game development workflow offering an optimized way of making games for less money/time, using both current and upcoming tools/assets. Because it is backwards compatible, a veteran game development team/company can still use their old-fashioned workflow and make a smooth transition to next-gen video game development process using this workflow.

So far so good, but…

Why on earth is that backward compatibility thing so important? Simply because, when we say “workflow assets”, we are actually speaking about human beings! People with families, children, and responsibilities.

During the last 30 years, I have witnessed the highs and lows of the game development industry. It has always been very harsh on developers on critical occasions. When a “next-big thing” is in, managers start headhunting for next-gen guys. Current developers instantly turn into “old-fashioned guys”, and most of the time get fired. The turnover is so high that most experienced video game developers hate working inhouse for AAA companies. Instead, they prefer freelance business, just like me.

Frankly speaking, I upgraded my workflow to version 3 for a better human resource management. The first 2 versions favoured the management and income aspects of business. Now, the final version concentrates on developers. – Yep, something for my teammates!

We don’t work in a vacuum

Our environment feeds into the work we produce, particularly when that work is creative. Every piece of “thing” in our working environment affects us. What we see, listen, touch, and even smell, stimulates our creativity and in a way gets injected to our piece of work.

My humble home office

So, I made a radical decision. In order to increase my productivity, I decided to split my home office activities into two. Thanks to a painstaking and backaching performance, I moved all my coding/artwork related books, tools and computers from my mom’s house to home. Using some modular equipment from Ikea, I built a custom table wide enough for my desktop monitor and Wacom tablet, and spent a lot of time for cabling and ergonomics. Keeping things tidy, certainly served well. As I promised my beloved wife that I will use less than 2 m² of our living room, I have finally managed to create a wide open space using only 1.98 m². – Oh, that is optimization 😉

Within just a few days, I have realized a positive impact in my productivity. Now, my process is crystal clear. I do all my coding/artwork at home, and music related stuff in mom’s house. And the bonus is, I spend less time in traffic and more with my family.

“Creativity is a gift. It doesn’t come through if the air is cluttered.” – (John Lennon)

More details

Actually, I have so many things to tell you. I really would like to tell more and give you under the hood –technical- details of my upcoming project… I am afraid, I can’t. Until the official announcement, there are things not meant to be known or seen by public. Well, you know, this is how video game business works!

So, I’ll keep you posted whenever I can…

Tonight

Regarding the latest annual update and current status of my new video game project, I’m planning to open a bottle of wine and enjoy rest of the evening with my family. I think I deserved it.

See you next year!

3-bit Node Graph Architecture for Next-Gen Game Development

Speaking of my latest video game development project, yet an another milestone achieved. – Quite a tough one, indeed!

But first, please allow me to focus on some of the very basic mathematical logic definitions heavily used in software engineering, so that we can clearly understand what’s going on under the hood of a decent game development process.

Don’t worry, it’s not rocket science 😉

Some theory

All video games have gameplay mechanics based on logic. A game is “a set of story driven goals to achieve” from a programmer’s perspective.

When you open a chest, solve a puzzle or kill an enemy, you are actually triggering a logic unit that is predefined within the game code. Depending on game’s technical requirements and gameplay complexity, there can be thousands of these units forming a web of logic units.

Game programmers tend to use graph theory for defining and coding logic units. Each unit is symbolized with a simple geometric shape. A box, a circle, anything… And these units are connected to each other with links.

  “Logic units” (nodes) represent tasks that the player will perform.

  “Links” (lines) represent the relationship between the logic units.

Behaviour Analysis

A node graph architecture is almost identical to an electronic circuit. When you start executing a node graph code, you are actually branching from one component (node, in our case) to an another by the rules you’ve set for the logic units, just like electric current flowing from a resistor to a capacitor. And, as you can guess, this type of signal flow is 100% linear.

When the player accomplishes a task, the node related to that event will be “expired”. In other words, it will be dead. Expired nodes cannot be resurrected. Once they’re done, they will be ignored (skipped) during code execution, forever. – Which is unlikely in electronics! An electronic component, such as a resistor, a diode, etc. cannot be conditionally turned on/off.

Back to 2002 for a “classic” implementation: Flagger

During the “Culpa Innata” development sessions, we precisely knew that we needed a node graph architecture for handling game’s complex execution flow. Many discussions were held on the method of implementation. All members of the core management & development team were expert electric/electronics engineers with no experience in video game production [Reference], but me! As a video game programmer, my perspective towards node graph theory was naturally very different, contrary to their classical approaches. I wasn’t thinking in terms of voltage, current, etc., but focused on just one thing: optimized code execution.

Thanks to my Zilog Z80 and Motorola 68000 assembly language programming background, I offered the term “Flag” for the base logic unit (node), and teamed up with Mr. Mete Balcı for 3 weeks. In December 2002, we developed a tool called “Flagger”.

Pros and Cons

Flagger was a C++ code generator with a very handy visual interface similar to UE4’s current Blueprint approach. Using Flagger, we were able to add nodes, connect them to each other, program the logic behind the nodes/links, and even take printout of the whole node graph scenario. When the visual logic design process was over, it was just a matter of selecting “Generate C++ code” from the menu, and source code was generated within minutes.

Over the following years, Flagger evolved into a more sophisticated development tool capable of handling various scenarios. Although it was a very handy tool and saved many hours during “Culpa Innata” sessions, there were a few problems with the classical node graph theory that the implementation was based on;

  Flags were single threaded. Only one node was allowed to execute at a time. No multi-threading.

  Flags were expirable. When a task was done, related flag (node) was marked as “expired”, not deleted for the sake of logic integrity.

  Flags were not reusable. Once they were expired, there was no way of resurrecting them. – Inefficient memory usage, thanks to hundreds of expired nodes.

  Flags were heavily loaded with variables. Too many dialogue related “customized” variables were defined for special cases (exceptions). – Inefficient memory usage, once again.

  Flag execution flow wasn’t well optimized because of node-tree search algorithm. The more nodes we had, the longer it took to accomplish the search.

  Flag execution was linear. When a node was expired, the graph code was first searching for related nodes and then retriggering the whole diagram from the beginning, like an electronic circuit simulator. – Well, that was ideal for modeling a circuit, not for developing a video game!

A Modern Approach: 3-bit Worker!

13 years later, I have once again found an opportunity to dive into node graph theory, and just completed implementing a new architecture for my latest video game development project. Unlike Flagger, it is something extraordinary! It is very… atypical, unconventional, unorthodox… Well, whatever… You got it 😉

First of all, it has nothing to do with classical electric/electronic circuit theory. This time, I’m on my own, and approaching the problem as a software engineer. Everything I designed/coded is based on game requirement specifications. In other words, it is implemented with “practical usage” in mind.

  I have defined the basic logic unit (node), as a “worker”.(Due to functional similarities, I simply borrowed this term from Web Workers.)

  A worker is a background task with adjustable priority settings. It performs/responds like a hardware interrupt.

  Each worker is multi-threaded.

  Depending on conditional requirements, a worker can expire and/or live forever. If expired, it can be resurrected and/or reinitialized, while preserving its previous state. So, a worker is a 100% reusable node.

  Each worker uses only 3-bits! No additional variables, no references, nothing else. – (If necessary, a worker offers flexible architecture for additional variables. However, I find it totally unnecessary. 3-bits are more than enough!)

  Workers are object oriented. They can easily be inherited.

  Inherited workers don’t need additional logic variables. All child workers share the same 3-bit information that they inherited from their parents!

  Each worker has a time dependent linear workflow. Just like a reel-to-reel tape recorder, it can be played, paused, slowed down, accelerated, fast forwarded, rewinded, and stopped.

  Workers can be non-linearly linked to other Workers! Which means, node-tree search algorithms are no more necessary. There is no “main loop” for executing nodes! Code execution is pre-cached for optimum performance.

  Workers are optimized for event driven methodology. No matter how many concurrent active workers (threads) you have in the scene, there is practically no CPU overhead. Ideal for mobile scenarios.

  Workers are managed by “Managers”. A Manager is inherited from base Worker node. So, any worker can be assigned as a Manager.

  Workers can communicate with each other and access shared variables via Managers.

  Whole architecture is 100% platform independent. For a showcase, I’ve implemented it for Unreal Engine 4 using C++ and Blueprints. It can easily be ported to other game engines; such as Unity, CryEngine, etc.

  And, most important of all, everything is meticulously tested. – It’s working as of today 🙂

Any drawbacks?

Sure… Due to complexity of comprehending “a set of non-linearly linked time dependent linear nodes”, debugging can be a nightmare. As always, designing simplified and organized logic sets reduces potential problems. – I keep my logic sets neat and tidy 😉

So, what’s next?

Well, to be honest, since all theoretical stuff is done, I’ll switch to game content development. I am quite sure that I’ll keep on adding/removing things to my 3-bit node graph architecture. I will keep on improving it while preserving its simplicity, for sure.

“It is vain to do with more what can be done with less.” – (William of Ockham)

New Video Game Project: Annual Information Update 2014

The new video game project that I started working on a year ago, precisely, is going great! With respect to maintaining confidentiality, I still can’t share specific details with you, but I am more than happy to say that everything is going on “as planned”. – Something quite contrary to the nature of game development in general 😉

One for all, all for one

As the co-producer of the project, I have many responsibilities in addition to the usual things that I have to do. Game design, story development, programming, conceptual artwork design, 3d modeling, texturing, music production, etc. Although sounds like a one-man-army project, actually it is not.

“Only one artist takes all the responsibility…”

In order to preserve game’s artistic style, it is quite normal that only one artist takes all the responsibility of designing & planning everything, and making sure that things will be kept/done in that way. And, this is exactly what I am doing nowadays. – (At one point, we will have developers and artists contributing to the project, naturally. Until that moment, everything must be “well-defined”.)

Coding

Instead of creating detailed game design documents, some game development projects begin with “conceptual coding”. Same goes for this project. Contrary to traditional game development workflow that begins with documenting the game design, I decided to start with implementing a proof of concept.

Similar to LEGO building bricks, I have been coding fundamental elements of “gameplay”. As a result of these coding sessions, I have clearly envisioned a number of next-gen features that can possibly enrich our game.

We are currently evaluating the options. When the gameplay implementation is over, I’ll go back to game design document for sure. – (Yes, I know that it sounds a bit unorthodox, but I have my reasons. Sometimes it’s good to break old habits for the sake of creativity. In this game, I will let “gameplay” define and drive the game design!)

Spinners and Probability

Coding is all about making decisions. Getting your hands dirty in Mathematics has always been rewarding. Going back and forth between Calculus and Geometry is more than a stellar experience.  Not because it makes you a better programmer, but simply because it turns you into a “wise decision maker”.

In terms of design and implementation, this game development project is full of complex decisions. Thankfully, “coding” is the glue between questions and answers. When used wisely, coding offers new ways of dealing with decisions that you derive from Mathematics, and this is exactly what I’m trying to achieve throughout this project.

Content is King!

I spent a lot of time creating a narrative hook, which I believe is the most underestimated element in today’s game design trends! With references from 16-bit retro gaming era, I am quite sure that a well-defined hook creates a huge impact on gameplay.

“Admittedly, I had to make 7 revisions for a ‘great’ hook…”

It was a tough job. In order to fine-tune the hook, I had to rewrite it again and again for many times. After each rewrite, I left it on my bookshelf at least for a few weeks, so that I can completely concentrate on other things as well.  When I picked it up weeks later, I was objective enough to assess the tension and come up with fresh ideas. Each iteration added more flavour to the previous version. Admittedly, I had to make 7 revisions for a great hook, which later turned out to be “Level One”. – Worth every minute spent!

Hidden Treasure: “Workflow 2.0”

The most distinguishing element of this project is the optimized workflow that I have been working on as a side project for many years. Thanks to this workflow, our project will have the luxury of really dramatic cost savings, a more “talent oriented” development process, and the competence of keeping game design/style integrity throughout the development process.

So far, so good…

Still thousands of things to do, so I’m going back to work now.

I’ll keep you posted.

An unexpected surprise made my day!

Since the day I noticed his Star Wars, Alien and Predator sketches, I have always admired Tuncay Talayman’s artwork.

It has been a privilege –and a lot of fun– working with him during Culpa Innata development sessions (2001-2003). Even after all those years, his continuous passion for improving his techniques and seeking new ways of artistic expressions, still surprises me. The portrait below is one of them 😉

What a lovely surprise… Thank you very much Tuncay!

Tuncay Talayman's portrait of Mert Börü

Kitaro’s “Symphony Live in Istanbul” CD announced

Recorded Live at the Halic Congress Center in Istanbul, Turkey over two evenings in March of 2014, Grammy and Golden Globe winning artist Kitaro‘s “Symphony Live in Istanbul” CD is announced!

Commenting on the groundbreaking event, Kitaro noted “I am extremely grateful that my dream of performing in Istanbul finally came true. It was a once in a lifetime experience and in addition to my many experiences; I met a host of great people from Istanbul and the neighboring countries.  As a remembrance of my amazing music caravan and as a tribute to those I encountered along the way, I recorded this musical experience and performance as a CD.  It is my gift to everyone, in Istanbul and around the world, to experience and enjoy.”

Thanks to once-in-a-lifetime concert experience we had in Istanbul, Börü family is more than happy to pre-order the album 🙂

[ Börü family at Kitaro’s “Live in Istanbul” concert ]

This album includes Kitaro’s Golden Globe award-winning theme from the Oliver Stone film “Heaven & Earth”, music from his critically acclaimed Kojiki album and “Silk Road” soundtrack as well as two compositions from his Grammy award-winning album “Thinking Of You”.  New material includes a previously unreleased composition; “Kokoro – (Part II)”.

Looking forward to seeing Kitaro in Istanbul, again…

The Blog of Mert Börü: Selected Works, Ongoing Projects, and Memories