This technical blog post discusses concurrent programming performance optimization in Rust, with minimal direct engagement with human rights frameworks. The content demonstrates a mild positive disposition toward Article 19 (free expression of technical knowledge) and Article 27 (participation in scientific advancement) through open publication and empirical sharing, but exhibits negative signals regarding Article 26 (education access) due to specialized jargon and accessibility barriers. Overall, the content is substantively neutral regarding human rights, addressing technical rather than rights-centered concerns.
I'd be super interested in how this compares between cpu architectures, is there an optimization in Apple silicon that makes this bad while it'd fly on Intel/AMD cpus?
If implementation is task based and task always runs on same virtual CPU (slots equaling CPUs or parallelism), wonder if something like below might help.
RW lock could be implemented using an array of length equal to slots and proper padding to ensure each slot is in its own face line (avoid invalidating CPU cache when different slot is read/written).
For read lock: Each task acquires the lock for their slot.
For write lock: Acquire lock from left most slot to right. Writes can starve readers when they block on in-flight reader at a different slot when moving from left to right.
The code examples are confusing. The show the code that takes the locks, but they don’t show any of the data structures involved. The rwlock variant clones the Arc (makes sense), but the mutex variant does not (is it hidden inside inner.get)?
In any case, optimizing this well would require a lot more knowledge of what’s going on under the hood. What are the keys? Can the entire map be split into several maps? Can a reader hold the rwlock across multiple lookups? Is a data structure using something like RCU an option?
This is drawing broad conclusions from a specific RW mutex implementation. Other implementations adopt techniques to make the readers scale linearly in the read-mostly case by using per-core state (the drawback is that write locks need to scan it).
There are more sophisticated techniques such as RCU or hazard pointers that make synchronization overhead almost negligible for readers, but they generally require to design the algorithms around them and are not drop-in replacements for a simple mutex, so a good RW mutex implementation is a reasonable default.
Take a look at crates like arc_swap if you have a read often write rarely lock case. You can easily implement the RCU pattern. Just be sure to read about how to use RCU properly.
Well done this pattern gives you nearly free reads and cheap writes, sometimes cheaper than a lock.
For frequent writes a good RWLock is often better since RCU can degrade rapidly and badly under write contention.
Does this apply also to std::shared_mutex in C++? This is a timely article if so; I’m in the middle of doing some C++ multithreading that relies on a shared_mutex. I have some measuring to do.
Lock contention is a real issue for any multi-threaded system, and while a RW mutex is useful when you have a longer executing critical section, for something very short lived there is still a cache coordination cost. In many of the HashiCorp applications, we work around this by using an immutable radix tree design instead [1].
Instead of a RW mutex, you have a single writer lock. Any writer acquires the lock, makes changes, and generates a new root pointer to the tree (any update operation generates a new root, because the tree is immutable). Then we do an atomic swap from the old root to the new root. Any readers do an atomic read of the current point in time root, and perform their read operations lock free. This is safe because the tree is immutable, so readers don't need to be concerned with another thread modifying the tree concurrently, any modifications will create a new tree. This is a pattern we've standardized with a library we call MemDB [2].
This has the advantage of making reads multi-core scalable with much lower lock contention. Given we typically use Raft for distributed consensus, you only have a single writer anyways (e.g. the FSM commit thread is the only writer).
We apply this pattern to Vault, Consul, and Nomad all of which are able to scale to many dozens of cores, with largely a linear speedup in read performance.
I've observed the same behavior on AMD and Intel at $WORK. Our solution (ideal for us, reads happening roughly 1B times more often than writes) was to pessimize writes in favour of reads and add some per-thread state to prevent cache line sharing.
We also tossed in an A/B system, so reads aren't delayed even while writes are happening; they just get stale data (also fine for our purposes).
the behaviour is quite typical for any MESI style cache coherence system (i.e. most if not all of them).
A specific microarchitecture might alleviate this a bit with lower latency cross-core communication, but the solution (using a single naive RW lock to protect the cache) is inherently non-scalable.
I think it’s not unusual that reader-writer locks, even if well implemented, get in places where there are so many readers stacked up that writers never get to get a turn or 1 writer winds up holding up N readers which is not so scalable as you increase N.
Wow, folly::SharedMutex is quite an example of design tradeoffs. I wonder what application the authors wanted it for where using a global array was better than a per-mutex array.
Content celebrates technical achievement and optimization in the context of building 'high-performance' software (Redstone tensor cache). The narrative frames engineering excellence and efficient systems as positive contributions to human capability, consistent with Article 27's protection of participation in cultural and scientific advancement.
FW Ratio: 60%
Observable Facts
Post details optimized systems-level performance improvements with explicit technical methodology and reproducible results.
Content is freely accessible to any reader without authentication or subscription.
Author explicitly encourages replication and verification via public tools: 'Use tools like perf or cargo-flamegraph.'
Inferences
Public sharing of systems optimization techniques enables broader participation in scientific and technical advancement.
The focus on empirical validation and peer verification aligns with Article 27's commitment to protection of scientific progress.
Content advocates for open sharing of technical knowledge and empirical findings without restriction. The author publicly shares performance benchmarks, challenges conventional wisdom, and encourages replication and verification—all core expressions of free thought and expression in technical discourse.
FW Ratio: 60%
Observable Facts
The post title challenges conventional wisdom: 'Read Locks Are Not Your Friends' directly contradicts widely-held technical assumptions.
Detailed benchmark code and results are published in full without redaction or restricted access.
The author explicitly encourages skepticism and verification: 'Profile the Hardware' and 'Use tools like perf or cargo-flamegraph.'
Inferences
The challenge to 'obvious optimizations' signals willingness to express unpopular technical truths, consistent with free expression.
Public benchmarking and replicable evidence support open scientific discourse characteristic of Article 19 principles.
The post implicitly advocates for responsible use of technical knowledge. The author emphasizes that 'obvious optimizations can backfire' and stresses the need for empirical verification through profiling, discouraging blind application of conventional wisdom and promoting thoughtful, evidence-based decision-making. This aligns with Article 29's call for duties toward the community and responsible exercise of rights.
FW Ratio: 67%
Observable Facts
Post opens: 'This is a story about how obvious optimizations can backfire'—emphasizing responsible optimization practices.
Author repeatedly advises: 'Profile the Hardware' and 'always profile your code'—promoting due diligence over assumptions.
Inferences
The cautionary framing of the technical lesson reflects a responsibility to prevent harm through premature optimization, consistent with Article 29 duties.
Content is presented in highly technical language (Rust syntax, atomic operations, cache-line mechanics) that presupposes specialized knowledge in systems programming. This limits accessibility for persons without software engineering background, creating a structural barrier to education and participation in technical discourse.
FW Ratio: 60%
Observable Facts
Content uses Rust code syntax and systems programming concepts without introductory explanation or definitions.
Visual styling uses #ebdbb2 text on #1d2021 background (WCAG contrast ratio ~4.5:1, below AAA standard).
Monospace font (JetBrains Mono) is fixed and non-resizable in the provided CSS.
Inferences
Dense technical jargon and unattributed assumptions about reader knowledge exclude readers without specialized training, limiting right to education.
Low-contrast color scheme and non-adaptive typography may exclude users with low vision, affecting equal access to information.
No privacy policy or data collection disclosures observable on-domain.
Terms of Service
—
No terms of service or usage agreement visible.
Identity & Mission
Mission
—
No explicit mission statement or values disclosure on-domain.
Editorial Code
—
No editorial standards or corrections policy observable.
Ownership
—
Author credited as individual contributor; no corporate ownership or conflict-of-interest disclosures visible.
Access & Distribution
Access Model
+0.20
Article 27
Content is freely accessible without authentication or paywalls, supporting unrestricted access to technical knowledge.
Ad/Tracking
—
No ads or tracking scripts observable on-domain; Vercel hosting may include minimal telemetry.
Accessibility
-0.15
Article 26
Fixed-width monospace font (JetBrains Mono) and low-contrast dark theme (gruvbox) may impede readability for users with visual impairments; no alt text observed for code blocks or technical diagrams.
The blog infrastructure is freely accessible without paywalls, registration barriers, or licensing fees. Code examples and benchmarks are published openly, enabling others to benefit from and build upon the technical knowledge. No access restrictions limit participation in this scientific discourse.
Content is freely accessible without authentication, registration, or paywalls. The blog structure permits unrestricted reading and sharing of technical insights, supporting unfettered dissemination of ideas.
The site's visual design employs low-contrast dark theme (gruvbox colors) with fixed-width monospace font, potentially hindering readability for users with visual impairments. No accessible alternatives (alt text, high-contrast mode, dyslexia-friendly fonts) are observable.