Game Engine Architecture 4

2023年5月31日上午5:26 • 技术杂谈 • 阅读 88

【 Game Engine Architecture 4】

1、a model of multiple semi-independent flows of control simply matches the problem better than a single flow-of-control design.

2、There are two basic ways in which concurrent threads can communicate:

• Message passing. The messages might be sent across a network, passed between processes using a pipe, or transmitted via a message queue in memory that is accessible to both sender and receiver. This approach

• Shared memory. two or more threads are granted access to the same block of physical memory, and can therefore operate directly on any data objects residing in that memory area. Threads within different processes can also share memory by mapping certain physical memory pages into all of the processes’ virtual address spaces.

3、Race Conditions

A race condition is defined as any situation in which the behavior of a program is dependent on timing.

竞态条件（race condition）是指设备或系统出现不恰当的执行时序，而得到不正确的结果。竞态条件（race condition），从多进程间通信的角度来讲，是指两个或多个进程对共享的数据进行读或写的操作时，最终的结果取决于这些进程的执行顺序。

4、Critical Races

critical race is a race condition that has the potential to cause incorrect program behavior.

• intermittent or seemingly random bugs or crashes,
• incorrect results,
• data structures that get into corrupted states,
• bugs that magically disappear when you switch to a debug build,

• bugs that are around for a while, and then go away for a few days, onlyto return again (usually the night before E3!),

• bugs that go away when logging (a.k.a., “printf() debugging”) is added to the program in an attempt to discover the source of the problem.

5、Data Races

A data race is a critical race condition in which two or more flows of control intefere with one another while reading and/or writing a block of shared data, resulting in data corruption.

Upper is an example of a read-modify-write (RMW) operation.

Data race bugs only occur when an operation on a shared object is interrupted by another operation on that same object.

Let’s use the term critical operation to refer to any operation that can possibly read or mutate one particular shared object._To guarantee that the shared object is free from data race bugs, we must ensure that none of its critical operations _ can interrupt one another.

When a critical operation is made uninterruptable in this manner, it is called an atomic operation. Alternatively, we can say that such an operation has the property of atomicity.

6、Invocation and Response

7、Atomicity Defined

a data race bug can occur when a critical operation is interrupted by another critical operation on the same shared object. This can happen:

• when one thread preempts another on a single core, or

• when two or more critical operations overlap across multiple cores.

8、define the atomicity of a critical operation as follows:

9、Makeing an Operation Atomic

how can we transform a critical operation into an atomic operation? The easiest and most reliable way to accomplish this is to use a special object called a mutex.

the OS guarantees that a mutex can only be acquired by one thread at a time.

10、Thread Synchronization Primitives

while these thread synchronization primitives are _robust and relatively easy to use, they are generally quite expensive._This is because these tools are provided by the kernel. Interacting with any of them therefore requires a kernel call, which involves a context switch into protected mode. Such context switches can cost upwards of 1000 clock cycles.

11、Mutex

mutex can be in one of two states: unlocked or locked. (sometimes called released and acquired, or signaled and nonsignaled, respectively.)

“mutex” comes from ” mutual exclusion.”

If one or more other threads is asleep (blocked) waiting on the mutex, the act of signaling it causes the kernel to select one of these waiting threads and wake it up.

12、Starting with C++11, the C++ standard library exposes kernel mutexes via the

class std::mutex.

13、Some operating systems provide less-expensive alternatives to a mutex. For example, Microsoft Windows provides a locking mechanism known as a critical section.

When a thread first attempts to enter (lock) a critical section that is already locked by another thread, an inexpensive spin lock is used to wait until the other thread has left (unlocked) that critical section. A spin lock does not require a context switch into the kernel, making it a few thousand clock cycles cheaper than a mutex.

Linux supports a thing called a ” futex” that acts somewhat like a critical section under Windows.

14、condition variable (CV).

we’d like a way to block the consumer thread (put it to sleep) while the producer does its work, and then wake it up when the data is ready to be consumed. This can be accomplished by making use of a new kind of kernel object called a condition variable (CV).

In concurrent programming, we often need to send signals between threads in order to synchronize their activities.

3） wait(). A blocking function that puts the calling thread to sleep.

4） notify(). A non-blocking function that wakes up any threads that are currently asleep waiting on the condition variable.

The sleep and wake operations are performed in an atomic way with the help of a mutex provided by the program, plus a little help from the kernel.

15、Semaphores

a semaphore acts like an atomic counterwhose value is never allowed to drop below zero.

3） take() or wait(). If the counter value encapsulated by a given semaphore is greater than zero, this function decrements the counter and returns immediately. If its counter value is currently zero, this function blocks (puts the thread to sleep) until the semaphore’s counter rises above zero again.

4） give(), post() or signal(). Increments the encapsulated counter value by one, thereby opening up a “slot” for another thread to take() the semaphore. If a thread is currently asleep waiting on the semaphore when give() is called, that thread will wake up from its call to take() or wait().7

We say that a semaphore is signaledwhenever its count is greater than zero, and it is nonsignaledwhen its counter is equal to zero.

_producer-consumer example,_This notification mechanism can be implemented using two binary semaphores. One indicate how many item in buffer, one indicate how many room left.

Queue g_queue;
sem_t g_semUsed; // initialized to 0
sem_t g_semFree; // initialized to 1

void* ProducerThreadSem(void*)
{
    // keep on producing forever...

    while (true)
    {
        // produce an item (can be done non-
        // atomically because it's local data)
        Item item = ProduceItem();

        // decrement the free count
        // (wait until there's room)
        sem_wait(&g_semFree);

        AddItemToQueue(&g_queue, item);

        // increment the used count
        // (notify consumer that there's data)
        sem_post(&g_semUsed);
    }
    return nullptr;
}

void* ConsumerThreadSem(void*)
{
    // keep on consuming forever...

    while (true)
    {
        // decrement the used count
        // (wait for the data to be ready)
        sem_wait(&g_semUsed);
        Item item = RemoveItemFromQueue(&g_queue);

        // increment the free count
        // (notify producer that there's room)
        sem_post(&g_semFree);

        // consume the item (can be done non-
        // atomically because it's local data)
        ConsumeItem(item);
    }
    return nullptr;
}

View Code

16、Implementing a Semaphore

class Semaphore
{
private:
    int m_count;
    pthread_mutex_t m_mutex;
    pthread_cond_t m_cv;

public:
    explicit Semaphore(int initialCount)
    {
        m_count = initialCount;
        pthread_mutex_init(&m_mutex, nullptr);
        pthread_cond_init(&m_cv, nullptr);
    }
    void Take()
    {
        pthread_mutex_lock(&m_mutex);

        // put the thread to sleep as long as
        // the count is zero
        while (m_count == 0)
            pthread_cond_wait(&m_cv, &m_mutex);

        --m_count;

        pthread_mutex_unlock(&m_mutex);
    }

    void Give()
    {
        pthread_mutex_lock(&m_mutex);
        ++m_count;

        // if the count was zero before the
        // increment, wake up a waiting thread
        if (m_count == 1)
            pthread_cond_signal(&m_cv);

        pthread_mutex_unlock(&m_mutex);
    }

    // aliases for other commonly-used function names
    void Wait() { Take(); }
    void Post() { Give(); }
    void Signal() { Give(); }
    void Down() { Take(); }
    void Up() { Give(); }
    void P() { Take(); } // Dutch "proberen" = "test"
    void V() { Give(); } // Dutch "verhogen" =
    // "increment"
};

View Code

17、DeadLock

如果依赖图中有环，则必定有 DeadLock.

there are fournecessary and sufficient conditions for deadlock, known as the Coffman conditions:

1）Mutual exclusive

2）Hold and wait

3）No lock preemption: No one (not even the kernel) is allowed to forcibly break a lock held by a sleeping thread.

4）Circular wait.

The circular wait condition can be avoided if Resource A and Resource B were both protected by a single lock L, then deadlock could not occur.

The circular wait condition can be avoided by imposing a global order to all lock-taking in the system.

18、Live Lock

two threads 1 and 2 contending over two resources A and B. Whenever a thread is unable to obtain a lock, it releases any locks it already holds and waits for a fixed timeout before trying again. If both threads use the same timeout, we can get into a situation in which the same degenerate situation simply repeats over and over. Our threads become “stuck” forever trying to resolve the conflict, and neither one ever gets a chance to do its real job. Livelock is akin to a stalemate in chess.

19、Starvation

Starvation is defined as any situation in which one or more threads fail to receive any execution time on the CPU.

20、Priority Inversion

Consider two threads, L and H, with a low and high priority, respectively. Thread L takes a mutex lock and then is preempted by H. If H attempts to take this same lock, then H will be put to sleep because L already holds the lock. This permits L to run even though it is lower priority than H—in violation of the principle that lower-priority threads should not run while a higher-priority thread is runnable.

21、Transaction-Based Algorithms

A transaction can be more precisely defined as an i ndivisible bundle of resources and/or operations. Threads in a concurrent system submit transaction requests to a central arbiter of some kind. A transaction either succeeds in its entirety, or it fails in its entirety (because some other thread’s transaction is actively being processed when the request arrives). If the transaction fails, its thread keeps resubmitting the transaction request until it succeeds (possibly waiting for a short time between retries).

22、Minimizing Contention

consider a group of threads that are producing data and storing it into a central repository. Every time one of these threads attempts to store its data in the repository, it contends with all of the other threads for this shared resource.

A simple solution that can sometimes work is to give _ each thread its own private repository._The threads can now produce data independently of one another, with no contention.

23、Lock-Free Concurrency

“lock-free” refers to the practice of preventing threads from _ going to sleep while waiting on a resource to become available._In other words, in lock-free programming we never allow a thread to block. So perhaps the term ” blocking-free” would have been more descriptive.

Lock-free programming is actually just one of a collection of non-blocking concurrent programmingtechniques.

1）Blocking

2）Obstruction freedom：guarantee that a single thread will always complete its work in a bounded number of steps, when all of the other threads in the system are suddenly suspended. No algorithm that uses a mutex lock or spin lock can be obstruction-free.

不论其他线程什么时候停住，都不会影响本线程执行完毕。

使用了 mutex lock、spin lock 的线程都无法满足 Obstruction freedom。

3）Lock freedom： if one thread is arbitrarily suspended, all others can still make progress. it can allow some threads to starve. In other words, certain threads might get stuck in a loop of failing and retrying their transactions indefinitely, while other

threads’ transactions always succeed.

不论本线程什么时候停住，都不会影响其他线程。

使用了 mutex lock、spin lock 的线程都无法满足 Lock freedom。

_4）Wait freedom：_A wait-free algorithm provides all the guarantees of lock freedom, but also guarantees starvation freedom.

The term “lock-free programming” is sometimes used loosely to refer to any algorithm that avoids blocking, but technically speaking the correct term for obstruction-free, lock-free and wait-free algorithms as a whole is “non-blocking algorithm.”

23.1、Causes of Data Race Bugs

• via the interruption of one critical operation by another,

• by the instruction reordering optimizations performed by the compiler and CPU, and

• as a result of hardware-specific memory ordering semantics.

24、Atomicity by Disabling Interrupts

To prevent other threads from interrupting our operation, we could try disabling interruptsjust prior to performing the operation, making sure to reenable them after the operation has been completed.

Interrupts are disabled by executing a machine language instruction(such as cli, “clear interrupt enable bit,” on an Intel x86 architecture). But this kind of instruction only affects the core that executed it.

25、Atomic Instructions

There are most certainly some machine language instructions that can never be assumed to execute atomically. Other instructions are atomic, but only when operating on certain kinds of data.

Some CPUs permit virtually any instruction to be forced to execute atomically by specifying a prefix on the instruction in assembly language. (The Intel x86 ISA’s lock prefix is one example.)

In fact, it is the existence of these atomic instructions that permits us to implement atomicity tools such as mutexes and spin locks

26、Atomic Reads and Writes

Misaligned reads and writes usually don’t have this atomicity property. This is because in order to read or write a misaligned object, the CPU usually _ composes two aligned memory accesses._

27、Atomic Read-Modify-Write

All modern CPUs support concurrency by providing at least one atomic _ read-modify-write (RMW) instruction._

The simplest RMW instruction is known as test-and-set (TAS). Rather, it atomically sets a Boolean variable to 1 (true) and returns its previous value.

设为 true，再返回值。下面是 TAS的应用，多个线程竞争， 只有一个线程能够拿到 pLock 为 false的状态，其余线程拿到的均为 true，从而保证每个时刻只有一个线程运行。

28、Exchange

Some ISAs like Intel x86 offer an atomic exchange instruction. This instruction swaps the contents of two registers, or a register and a location in memory.

上例 Exchange 用意和 TAS 一样。pLock 每次均会被置为 true，且只有一个线程能获得值为 false 的 pLock。

29、Compare and Swap

The behavior of the CAS instruction is illustrated by the following pseudocode:

CAS 的语义，把后续的 write，与前置的read 绑定在一起。

CAS实现 Spin Lock

CAS实现原子增。

On the Intel x86 ISA, the CAS instruction is called cmpxchg, and it can be emitted with Visual Studio’s _InterlockedCompareExchange () compiler intrinsic.

30、ABA Problem

31、Load Linked/Store Conditional

LL/SC 可以解决 ABA Problem。

The load linked instruction reads the valueof a memory location atomically, and also stores the addressin a special CPU register known as the link register.

The store conditional instruction writes a value into the given address, but only if the address matches the contents of the link register. It returns true if the write succeeded, or false if it failed. Any write operation on the bus (including a store conditional) clears the _ link register to zero._

This means that an LL/SC instruction pair is capable of detecting data races, because if any write occurs between the LL and SC instructions, the SC will fail.

Here’s how we’d implement an atomic increment using LL/SC：

CAS 需要与 oldValue 比较，确认后才写入 newValue。而 _sc 需要与 linked register 比较，确认后才写入 newValue。

32、Advantages of LL/SC over CAS

First, because the SC instruction fails whenever any write is performed on the bus, an LL/SC pair is not prone to the ABA problem.

33、Strong and Weak Compare-Exchange

Because of the possibility of spurious failures of the store-conditional instruction, C++11 provides two varieties of compare-exchange: strong and _ weak. Strong compare-exchange “hides” spurious SC failures from the pro grammer, while weak compare-exchange does not._

34、How Instruction Reordering Causes Concurrency Bugs

The thread synchronization primitives provided by the operating system (mutexes et al.) are carefully crafted to avoid the concurrency bugs that can be caused by instruction reordering optimizations. But now that we’re investigating how mutexes are implemented, let’s take a look at how to avoid these problems manually.

mutex 等 primitives 可以避免 instructino reordering optimizations 问题。下面的文章会说明如何不使用 mutex 等，手动地避免 reordering 问题。

int32_t g_data = 0;
int32_t g_ready = 0;

void ProducerThread()
{
    // produce some data
    g_data = 42;

    // inform the consumer
    g_ready = 1;
}

void ConsumerThread()
{
    // wait for the data to be ready
    while (!g_ready)
    PAUSE();

    // consume the data
    ASSERT(g_data == 42);
}

View Code

there’s nothing to prevent the compiler or the CPU’s out-of-order execution logic from reordering the producer’s write of 1 into g_ready so that it occurs before the write of 42 into g_data. Likewise, in theory the compiler could reorder the consumer’s check that g_data is equal to 42 so that it happens before the while loop. So even though all of our reads and writes are atomic, this code may not _ behave reliably.._

reordering 实际上发生在Machine Language级别。

If a second thread were waiting for B to become zero before reading the value of A, it would cease to function correctly if this compiler optimization were to be applied.

35、Volatile in C/C++ (and Why It Doesn’t Help Us)

In C and C++, the volatile type qualifier guarantees that consecutive reads or writes of a variable cannot be “optimized away” by the compiler.

The only guarantee it provides is that the contents of a variable marked volatile won’t be cached in a register—the variable’s value will be re-read directly from memory every timeit’s accessed.

Moreover, the volatile keyword in C/C++ does nothing to prevent the CPU’s out-of-order execution logic from reordering the instructions at runtime.

36、Compiler Barriers

With GCC, a compiler barrier can be inserted via some inline assembly syntax ; under Microsoft Visual C++, the compiler intrinsic _ReadWrite Barrier() has the same effect.

most function calls serve as an implicit compiler barrier. This makes sense, because the compiler doesn’t know anything about the side effects of a function call.

However compiler barriers don’t prevent the CPU’s out-of-order execution logic from reordering instructions at runtime.

we’ll learn about a collection of machine language instructions known as memory fences which serve as instruction reordering barriers for both the compiler and the CPU,

37、

38、

Original: https://www.cnblogs.com/tekkaman/p/10519474.html
Author: Tekkaman
Title: Game Engine Architecture 4

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/549006/

转载文章受原作者版权保护。转载请注明原作者出处！

技术杂谈

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

C++虚函数表

C++类在内存中的存储方式 C++ 内存分为 5 个区域：堆 heap ：由 new 分配的内存块，其释放编译器不去管，由程序员自己控制。如果程序员没有释放掉，在程序结束时系统会…

技术杂谈 2023年7月24日
0079
日常踩坑_jpa存储时失败，报错SQL Error: 1064, SQLState: 42000错误

背景交代使用jpa的save保存对象时，发现死活存不进去，然后报错了SQL Error: 1064, SQLState: 42000错误然后，机智的我当然是把原生sql粘贴出来…

技术杂谈 2023年7月25日
0066
子串次数——类似KMP

老规矩先来看问题字符串a 在A中出现了多少次求次数。当然有很多种算法最简单的一个一个找呗不过这种太麻烦我们不讲这种我在写这个算法的时候用到了KMP算法的部分内…

技术杂谈 2023年7月23日
0080
ELK 架构之 Logstash 和 Filebeat 安装配置

上一篇：ELK 架构之 Elasticsearch 和 Kibana 安装配置阅读目录： 1. 环境准备 2. 安装 Logstash 3. 配置 Logstash 4. Log…

技术杂谈 2023年6月1日
00106
jquery常用获取属性的方法有哪些(attr prop区别)

jquery常用获取属性的方法有：1、attr()方法，可获取并返回指定属性的值，语法”$(selector).attr(“属性名”)&#822…

技术杂谈 2023年5月31日
0083
java IO流、集合类部分小知识点总结

在Java中，以下三个类经常用于处理数据流，下面介绍一下三个类的不同之处以及各自的用法。 InputStream ：是所有字节输入流的超类，一般使用它的子类：FileInputS…

技术杂谈 2023年7月24日
0062
实践torch.fx第二篇-fx量化实操

好久不见各位，哈哈，又鸽了好久。本文紧接上一篇《实践torch.fx第一篇——基于Pytorch的模型优化量化神器》继续说，主要讲如何利用 FX进行模型量化。为什么这篇文章拖…

技术杂谈 2023年7月11日
0076
python的基本运用

python基础 Python语言是一种解释型、面向对象、动态数据类型的高级程序设计语言开发者：Guido van Rossum（人称龟叔）基本概念 1.变量变量名必须是大小…

技术杂谈 2023年7月11日
0084
Java——关于HashMap的面试问题

1、HashMap的底层实现答：JDK1.7及其之前的版本是数组+链表，JDK1.8是数组+链表/红黑树 2、HashMap的数组的元素类型答：java.util.Map$En…

技术杂谈 2023年7月24日
0076
docker学习：docker三要素

镜像 docker 镜像（image）就是一个只读的模板。镜像可以用来创建docker容器，一个镜像可以创建很多容器容器 docker 利用容器（Container）独立运行的…

技术杂谈 2023年7月10日
0075
node 递归读取目录recursive-readdir

recursive-readdir https://www.npmjs.com/package/recursive-readdir 作者：孟繁贵 Email：meng010387@…

技术杂谈 2023年5月31日
0075
Spring事务（二）-@Transactional注解

上一节说了Spring的事务配置，其中，声明式事务配置里有5种配置方式， @Transactional注解应该是最为常用的一种方式了。这一节就说说 @Transactional注解…

技术杂谈 2023年7月11日
0087
背题！

博客园：当前访问的博文已被密码保护请输入阅读密码: Original: https://www.cnblogs.com/0x3e-time/p/16344648.htmlAut…

技术杂谈 2023年6月21日
0076
MybatisPlus核心功能——实现CRUD增删改查操作 (包含条件构造器)

CRUD 官方文档：https://baomidou.com/（建议多看看官方文档，每种功能里面都有讲解）【本文章使用的mybatisplus版本为3.5.2】条件构造器一般都…

技术杂谈 2023年7月11日
0067
一文带你快速拆解云智慧前端技术架构

主讲人：王海虎，云智慧/智能研究院/算法工程经理讲师简介：6年开发经验，主攻可视化方向、3d引擎、视野分析方向。18年开始做可视化方向，从前端工程师做到开发经理；开源项目FlyFi…

技术杂谈 2023年7月23日
0071
Postgres14.4（Docker安装）

Postgres14.4（Docker安装）一，Docker拉取镜像 docker pull postgres:14.4 #检查镜像是否拉取成功 docker images | …

技术杂谈 2023年7月11日
0064

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Game Engine Architecture 4

大家都在看