前言
在 C++ 17 之前的标准中并不存在读写锁的对应封装,因此对于不同平台下需要分别进行实现,当然也可以采取 N2406 - Mutex, Lock, Condition Variable Rationale 的提案中的 shared_mutex 进行实现。这里采用 N2406 提案的方式使用 CAS 进行实现,记录下原子锁与内核管理的性能对比
实现流程
标准库提案
N2406 中 shared_mutex 是通过成员变量 unsigned state_ 记录读者的数量与写者的状态:
unsigned state_; |
这里将 state_ 的最高位作为了写者的标志,剩余 31 位则用于记录读者的数量,也就是说最大的读者数量为 $2^{31}-1$ ,虽然从现在的架构来说基本不会达到这个限制,但不过谁又能想到 IPV4 会耗尽呢?
N2406 中采用的是一个互斥锁+两个条件变量进行实现,这是写优先的实现策略,POSIX 中的实现和提案中的实现基本一致,同样都是避免写者饥饿的情况。因此原子的实现方式也会和提案中一致
原子锁的本质其实一种自旋锁,避免线程陷入内核态从而进行快速响应,因此读写两者加锁的流程其实十分的简单:
---
title: 读者加锁
---
stateDiagram-v2
direction LR
ReaderIncreaed: 读者计数增加
state IsWriterOccupied <>
[*] --> IsWriterOccupied : 开始自旋
IsWriterOccupied --> ReaderIncreaed: 未有写者占用
IsWriterOccupied --> IsWriterOccupied : 存在写者占用
ReaderIncreaed --> [*] : 结束自旋 成功加锁
---
title: 写者加锁
---
stateDiagram-v2
direction LR
SetWriterOccupied: 设置写者占用
state IsOtherWriterOccupied <>
state IsReadersEmpty <>
[*] --> IsOtherWriterOccupied : 开始自旋
IsOtherWriterOccupied --> IsOtherWriterOccupied : 存在其他写者占用
IsOtherWriterOccupied --> SetWriterOccupied : 不存在其他写者占用
SetWriterOccupied --> IsReadersEmpty
IsReadersEmpty --> IsReadersEmpty : 读者数量不为0
IsReadersEmpty --> [*] : 读者数量为0 结束自旋
对自旋的优化
由于自旋带来的会是一直占用 CPU 进行消耗,导致利用率过高,因此需要在合适的时候让出线程的控制权,将控制权交予内核让其他等待的线程继续执行
那如何让出控制权呢?
一般来说调用与线程相关的函数就可以,比如 yield 与 sleep,但两者又有区别,这里大致总结下区别(这里不做过多的分析,毕竟内核中的调度不是短短的一句话能够概括的)
usleep
usleep 的流程如下(方框为内核函数)
---
title: usleep 流程
---
flowchart LR
usleep(usleep) --> __nanosleep
__nanosleep(__nanosleep) --> __clock_nanosleep
__clock_nanosleep(__clock_nanosleep) --> __clock_nanosleep_time64
__clock_nanosleep_time64(__clock_nanosleep_time64) --> do_nanosleep
内核中的 do_nanosleep 会创建一个计时器交于对应的时间片管理器进行管理,这样线程就会暂停执行,也就让出了线程的空位交给其他线程继续执行
pthread_yield
pthread_yield 流程如下(方框为内核函数)
---
title: pthread_yield 流程
---
flowchart LR
pthread_yield(pthread_yield) --> sched_yield(sched_yield)
sched_yield --> do_sched_yield
内核中的 do_sched_yield 会将当前的 CPU 让步给其他任务,倘若没有其他任务等待则会退出阻塞并继续进行当前任务,但同时 man 手册中也有对应的提示:
NOTES |
sched_yield 和线程优先级有关,倘若那个时间段内只有当前线程优先级处于最高位,那么调用后不会让步给其他线程进行运行,也就是说明 yield 会让步给同优先等级的线程,同时也提到需要合适的调用,如果频繁进行调用反而会带来性能的下降
增加让出线程控制权逻辑
从 usleep 和 pthread_yield 结论上可以看出,在通常情况下在自旋锁中让出线程控制权最好的选择是 sleep ,因此对两者的逻辑增加对应的机制:
---
title: 读者加锁
---
stateDiagram-v2
direction LR
ReaderIncreaed: 读者计数增加
state IsWriterOccupied <>
[*] --> IsWriterOccupied : 开始自旋
IsWriterOccupied --> SpinLoopSleep : 写者占用
SpinLoopSleep --> IsWriterOccupied
IsWriterOccupied --> ReaderIncreaed: 写者未占用
ReaderIncreaed --> [*] : 结束自旋 成功加锁
state SpinLoopSleep {
direction LR
ThreadSleep : Sleep
state IsMaxTryCount <>
[*] --> IsMaxTryCount
IsMaxTryCount --> ThreadSleep : 到达最大尝试次数
IsMaxTryCount --> [*] : 未到达尝试最大次数
ThreadSleep --> [*]
}
---
title: 写者加锁
---
stateDiagram-v2
direction LR
IsWriterOccupied : 写者占用判断
state IsWriterOccupied {
SetWriterOccupied: 设置写者占用
state IsOtherWriterOccupied <>
[*] --> IsOtherWriterOccupied
IsOtherWriterOccupied --> SetWriterOccupied : 不存在其他写者占用
IsOtherWriterOccupied --> SpinLoop : 存在其他写者占用
SpinLoop --> IsOtherWriterOccupied
SetWriterOccupied --> [*]
SpinLoop : SpinLoopSleep
state SpinLoop {
ThreadSleep : Sleep
state IsMaxTryCount <>
[*] --> IsMaxTryCount
IsMaxTryCount --> [*] : 未到达最大尝试次数
IsMaxTryCount --> ThreadSleep : 到达最大尝试次数
ThreadSleep --> [*]
}
}
IsReaderOccupied : 读者占用判断
state IsReaderOccupied {
state IsReadersEmpty <>
[*] --> IsReadersEmpty
IsReadersEmpty --> SpinLoop_1 : 读者数量不为0
SpinLoop_1 --> IsReadersEmpty
IsReadersEmpty --> [*] : 读者数量为0
SpinLoop_1 : SpinLoopSleep
state SpinLoop_1 {
ThreadSleep_1 : Sleep
state IsMaxTryCount_1 <>
[*] --> IsMaxTryCount_1
IsMaxTryCount_1 --> [*] : 未到达最大尝试次数
IsMaxTryCount_1 --> ThreadSleep_1 : 到达最大尝试次数
ThreadSleep_1 --> [*]
}
}
[*] --> IsWriterOccupied : 开始自旋
IsWriterOccupied --> IsReaderOccupied
IsReaderOccupied --> [*] : 结束自旋 成功加锁
更进一步的优化
Intel 中的 PAUSE
除了让线程进行休眠,还有什么其他的方法呢?后面阅读 Nginx 的读写锁和自旋锁发现了一个有趣的函数 ngx_cpu_pause(),这个函数在其他平台下为空实现,其中在 x86 以及 amd64 平台均 使用了一个指令 pause :
// src/os/unix/ngx_gcc_atomic_amd64.h |
检索了下 Intel 手册中的描述:
Improves the performance of spin-wait loops. When executing a“spin-wait loop,”processors will suffer a severe performance penalty when exiting the loop because it detects a possible memory order violation. The
PAUSEinstruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that aPAUSEinstruction be placed in all spin-wait loops.
An additional function of thePAUSEinstruction is to reduce the power consumed by a processor while executing a spin loop. A processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor’s power consumption.
这是英特尔专门为自旋提供的优化指令,手册中解释了由于在退出循环时 CPU 会检测到一个可能的内存乱序从而导致严重的性能开销,因此 pause 指令会提示 CPU 目前处于自旋状态从而避免 CPU 进行指令重排而带来的内存乱序问题,在提升性能的同时也可以降低功耗消耗。同时也提到了在早期的处理器型号中并不支持对应的指令,转而其实是等价的 rep;nop 指令,现在的 pause 的指令其实是将两个指令合并成了一个指令
那么其他平台是否也具有类似的指令?
Arm 中的 PAUSE
指令 YIELD
查了下 Arm 的官方手册,在 The YIELD instruction 中有提到:
The
YIELDinstruction provides a hint that the task performed by a thread is of low importance so that it could yield, seeYIELD. This mechanism can be used to improve overall performance in a Symmetric Multithreading (SMT) or Symmetric Multiprocessing (SMP) system.
Examples of when theYIELDinstruction might be used include a thread that is sitting in a spin-lock, or where the arbitration priority of the snoop bit in an SMP system is modified. TheYIELDinstruction permits binary compatibility between SMT and SMP systems.
TheYIELDinstruction is aNOPhint instruction.
TheYIELDinstruction has no effect in a single-threaded system, but developers of such systems can use the instruction to flag its intended use for future migration to a multiprocessor or multithreading system. Operating systems can useYIELDin places where a yield hint is wanted, knowing that it will be treated as aNOPif there is no implementation benefit.
同样 Arm 也提供了对应的指令来优化自旋,yield 指令会提示当前任务线程可以降低优先级从而让出执行空间,同时也说明了该指令其实和 nop 指令相同,而且该指令只在多线程与多进程的系统中才能发挥作用
指令 ISB
看上去,似乎选用 yield 就可以了,但是从 ARM 的 progress64 中找到了不一样的实现:
// src/arch/armv7a.h |
发现官方的实现并没有使用 yield 指令,而是使用了 isb 指令,isb 的指令手册上的内容如下:
An
ISBinstruction ensures that all instructions that come after theISBinstruction in program order are fetched from the cache or memory after theISBinstruction has completed. Using anISBensures that the effects of context-changing operations executed before theISBare visible to the instructions fetched after theISBinstruction. Examples of context-changing operations that require the insertion of anISBinstruction to ensure the effects of the operation are visible to instructions fetched after theISBinstruction are:
• Completed cache and TLB maintenance instructions.
• Changes to System registers.
Any context-changing operations appearing in program order after theISBinstruction take effect only after theISBhas been executed.
isb 指令全名叫做 Instruction Synchronization Barrier(指令同步屏障),它的作用是确保在程序顺序中位于 isb 指令之后的所有指令在 isb 指令完成之后才从缓存或内存中获取,同时也保证指令前后的上下文的可见性,对于 CPU 来说相当于刷新了流水线。官方只是解释了指令的作用并没有说明这样使用的原因,后面在 stackflow 上找到了关于 isb 的提问 assembly - Why does hint::spin_loop use ISB on aarch64?,根据回答中的 Rust 提交 [Arm64] use isb instruction instead of yield in spin loops · rust-lang/rust@c064b65 找到了对应的解释与说明:
[Arm64] use isb instruction instead of yield in spin loops |
从提交来看确实使用 isb 起到了更好的效果,当然对于实际效果需要进一步验证
最终的逻辑
根据上面的几点,不难完善最终的逻辑:
---
title: 读者加锁
---
stateDiagram-v2
direction LR
ReaderIncreaed: 读者计数增加
state IsWriterOccupied <>
[*] --> IsWriterOccupied : 开始自旋
IsWriterOccupied --> SpinLoopSleep : 写者占用
SpinLoopSleep --> IsWriterOccupied
IsWriterOccupied --> ReaderIncreaed: 写者未占用
ReaderIncreaed --> [*] : 结束自旋 成功加锁
state SpinLoopSleep {
direction LR
ThreadSleep : Sleep
ThreadDoze : Doze
state IsMaxTryCount <>
[*] --> IsMaxTryCount
IsMaxTryCount --> ThreadSleep : 到达最大尝试次数
IsMaxTryCount --> ThreadDoze : 未到达尝试最大次数
ThreadDoze --> [*]
ThreadSleep --> [*]
}
---
title: 写者加锁
---
stateDiagram-v2
direction LR
IsWriterOccupied : 写者占用判断
state IsWriterOccupied {
SetWriterOccupied: 设置写者占用
state IsOtherWriterOccupied <>
[*] --> IsOtherWriterOccupied
IsOtherWriterOccupied --> SetWriterOccupied : 不存在其他写者占用
IsOtherWriterOccupied --> SpinLoop : 存在其他写者占用
SpinLoop --> IsOtherWriterOccupied
SetWriterOccupied --> [*]
SpinLoop : SpinLoopSleep
state SpinLoop {
ThreadSleep : Sleep
ThreadDoze : Doze
state IsMaxTryCount <>
[*] --> IsMaxTryCount
IsMaxTryCount --> ThreadDoze : 未到达最大尝试次数
IsMaxTryCount --> ThreadSleep : 到达最大尝试次数
ThreadDoze --> [*]
ThreadSleep --> [*]
}
}
IsReaderOccupied : 读者占用判断
state IsReaderOccupied {
state IsReadersEmpty <>
[*] --> IsReadersEmpty
IsReadersEmpty --> SpinLoop_1 : 读者数量不为0
SpinLoop_1 --> IsReadersEmpty
IsReadersEmpty --> [*] : 读者数量为0
SpinLoop_1 : SpinLoopSleep
state SpinLoop_1 {
ThreadSleep_1 : Sleep
ThreadDoze_1 : Doze
state IsMaxTryCount_1 <>
[*] --> IsMaxTryCount_1
IsMaxTryCount_1 --> ThreadDoze_1 : 未到达最大尝试次数
IsMaxTryCount_1 --> ThreadSleep_1 : 到达最大尝试次数
ThreadDoze_1 --> [*]
ThreadSleep_1 --> [*]
}
}
[*] --> IsWriterOccupied : 开始自旋
IsWriterOccupied --> IsReaderOccupied
IsReaderOccupied --> [*] : 结束自旋 成功加锁