cs431课程小结 | jsjtxietian's blog

cs431课程小结

2025-07-15 11:06:07 #Performance #GameDev

年初给自己2025年定的技术上的目标是继续深入研究Concurrent Programming，从整理自己已经学过的开始吧。
cs431 这门课是去年年初看的，那时候正好在学rust，加上本身我对这些也感兴趣就头铁学习了一波，正好借此机会温习下课程内容，和自己学的其他东西对照起来整理下。整体来说这课还是很适合自学的，除了github仓库以外，有课件，有youtube视频。我自己也做了大部分的作业，感觉也是颇有收获。

Lock-based concurrency

Lock-based shared-memory concurrency: at any moment, a location is accessed by a single agent. Why are locks “easy” ?

Lock constrains thread interleaving to acquire/release points => No interleaving between acquire and release
Lock removes instruction reordering => Threads are executed AS IF they are the same thread.
Lock reduces nondeterminism to the minimum.

We want an easy-to-use, always-safe high-level API

LockGuard : RAII
Lock< T > (= (Lock, T)): relating a lock and a resource with a new type

Rust API

然后话题就到了Rust上，解释了下为啥是Best fit for this course: Rust’s type system based on ownership and lifetime, the essence of concurrency (The principal challenge in the era of massive parallelism is managing shared mutable states). 之后正常介绍了下Rust：

Key ideas:
- Discipline: disallowing shared mutable accesses by default
- Interior mutability: allowing them in a controlled way
Benefits:
- Statically analyzing the safety of shared mutable accesses (both for sequential and concurrent programs)
- Explicitly marking those code that needs manual inspection

接着就是介绍了下Rust有哪些safe的api: Thread & ScopedThread, Arc, Send & Sync，Lock & LockGuard，Mutex & Condvar & RwLock, 以及crossbeam的一些api。也提供了之前的课件：rust-school-2023-snu - Google Slides

Lock

接下来是一些简单的实现，包括Spinlock, ticket lock, CLH lock, MCS lock, MCS parking lock等等。

TODO

Fine-grained locking

TODO

看得出来老师很喜欢这篇paper：When Concurrency Matters: Behaviour-Oriented Concurrency | Proceedings of the ACM on Programming Languages，我准备在cs431的作业实现的部分整理。

Managing multiple locks with BoC

论文在这里：When Concurrency Matters: Behaviour-Oriented Concurrency | Proceedings of the ACM on Programming Languages

Optimistic locking

Observation: frequently, multiple operations are not conflicting with each other

Idea: optimistically assumes the success of operation, posthumously recovers

Sequence lock ( Optimistic reader-writer lock): cs431/src/lock/seqlock.rs at main · kaist-cp/cs431

Writer: almost the same with spinlock

Managing sequence number (usize) instead of lock flag (boolean)
Even: consistent state between c.s., odd: inconsistent state inside c.s.
E.g. W(0): acquires 0->1, releases 2,
W(2): acquires 2->3, releases 4, …

Reader: trying to read a consistent state (e.g. 2)

Reading sequence number @ beginning & end (should be even & same)
Req 1: W(0)’s end happens before R(2)’s beginning
Req 2: R(2) doesn’t see W(2)’s writes as far as R(2) is validated

Conclusion

Performance drawbacks of fine-grained locking:

Write operations have heavy synchronization costs due to lock ops
Read operations also writes to lock leading to cache invalidation

Alternative 1: advanced locks for read-mostly workloads: reader-writer lock (“Rust concurrency libraries”), optimistic locking, … (still unscalable for writers)

Alternative 2: “lock-free” concurrent data structures:

Pros: even better performance: Write operations have lightweight synchronization costs, Read operations perform just reads w/o cache invalidation
Cons: much (×100) more difficult than lock-based concurrency…

Lock-free concurrency

如果还要更好的性能，就要进入lock free的领域了。首先就是给定义 ( Optimistic Locking Overview )：

Lock’s limitations: no guarantee of progress
Lock freedom: one of the ongoing operation is eventually completed, the whole system’s progress is guaranteed
Wait freedom: every ongoing operation is eventually completed, example: chaoran.me/assets/pdf/wfq-ppopp16.pdf

Key idea for lock freedom是single-instruction commit. Designating an RMW instruction as an operation’s commit point: atomically reading from & writing to a location (e.g., CAS), single-instruction commit is RMW’s motivation!

Data structures

Treiber’s stack

Michael-Scott’s queue

Push:

Push 1: find the actual tail: Update the “tail” pointer if necessary
Push 2: try to append a new node: If it fails, retry from the beginning
Push 3: update the “tail” pointer:Perform a CAS from the old to the new tail. It’s okay to fail the CAS (we just want tail pointer is sufficiently recent)

Pop:

Pop 1: update the “tail” pointer if it points to the dummy node.Ensure the head does not catch up to the tail. It’s okay to fail the CAS (we just want tail pointer is sufficiently recent)
Pop 2: update the “head” pointer. Perform CAS from the old to the new head; If it fails, retry from the beginning

Sorted singly linked lists

Linked list whose nodes (consisting of key, value, next) are sorted by keys: https://github.com/kaist-cp/cs431/blob/main/src/lockfree/list.rs

Finding a node: iterating list and finding the matching node. Used in lookup, insert, delete
Deleting a node:
- Step 1: tagging its next pointer by 0x1 (logical deletion)
- Step 2: detaching it in another iteration (physical deletion)
Iteration strategies: how to deal with logically deleted nodes?
- Harris’s: detaching consecutive logically deleted nodes at once
- Harris-Michael’s: detaching logically deleted nodes individually
- Harris-Herlihy-Shavit’s (wait-free): ignore logically deleted nodes. Can be used only for lookup (not insert/delete)

Circular buffer: https://people.mpi-sws.org/~dreyer/papers/gps/paper.pdf
Hash table, binary trees (AVL/red-black), radix trees, …
Chase-Lev work-stealing deque

Garbage collection

[KAIST CS431, 2021 Fall] Garbage Collection.pptx - Google Slides

linearizability

对这个理论并没有看懂，

Relaxed-memory behaviors

Homework

cs431/homework at main · kaist-cp/cs431