Going through the comments provides another datapoint for what I have observed as an ironclad law: the first version of any code involving tricky atomics and synchronization is always wrong. Reasoning about these things is hard for human brains.
I strongly recommend using tools when viable. In Rust a particularly easy-to-use and effective one is loom. Alloy is another good tool that scales well but requires investment. In some cases like the Vulkan memory model, Alloy files are available.
There are so many more types of semaphores than most people realize, and they're way more versatile than their humble api might make you think. I had a go at implementing Win32-style semaphores in rust and wrote up about it at length; most people coming from the posix world had never experienced the almost "thread pool management" levels of awesomeness you can get from such a tiny api.
I overall like this article and share your admiration for semaphores; however this part ain't so:
> there’s not much theoretical benefit to such a binary semaphore over a mutex – in fact, they’re interchangeable
A binary semaphore has a superpower over a mutex: the lock-taker ("decrementer") can be disconnected from the lock-releaser ("incrementer"). You may wait for a thing and later be notified of the thing, but the notification comes on another thread, or in a signal handler, etc. But mutexes cannot handle locking in one thread and being unlocked somewhere else. Binary semaphores handle this fine; in this way they share something in common with condvars and also with mutexes.
The article calls this the "ugly part": how do you handle release()? But that's the source of a semaphore's power!
You can literally use it as a binary semaphore, but I can't think of a good reason to do so since pthread semaphores are signal safe while pthread mutexes aren't (which is why many people do the opposite).
The only way it can not work that way is to keep track of the owning thread id, which it elides as an optimization and trusts the caller to "use it wisely" instead.
Using PTHREAD_MUTEX_ERRORCHECK instead of PTHREAD_MUTEX_NORMAL (which pretty much every platform I've seen defaults to as the value of PTHREAD_MUTEX_DEFAULT) causes that to fail, but when's the last time you ran into a project using PTHREAD_MUTEX_ERRORCHECK? (PTHREAD_MUTEX_RECURSIVE technically has the information required to also validate the unlocking thread, but I haven't tried to see if does in practice.)
> There are so many more types of semaphores than most people realize
For me, a semaphore is a general way handle states and dependancies with synchronization.
Whether it's a file named ".lock" (like a simple traffic signal) or some fancy IPC (like a 4 way intersection between a segment with heavy traffic that gets the green light 99% of time, and one with low traffic, with cameras to detect cars on the small road for the 1% it will be useful, and pedestrian crossings with buttons to request a red ligh), it boils down to the same thing: enforcing states, with synchronization.
I'll make you laugh, but recently I used environment variables (!!) to do some basic semaphore without even a .lock file!
It may be ugly, but I needed something simple that would work on Windows and Linux. It did the job!
There are plenty of types of locking mechanisms. A semaphore is a specific type of such. Sure, they might look different on different systems, or have different surfaces, but all should (or at least the ones I've seen) have an integer. Unless the term has been overloaded (sigh) and it's come to mean different things.
For the same reason they’re absent from Boost: a preference for mutexes and condition variables. From the library maintainers’ point of view, conventional semaphore techniques are just too error prone.
The link to the Boost FAQ page from the C++ document site is 404.
Semaphores are particularly treacherous on processors with weakly-ordered memory models (ARM and PowerPC notably).
Condition variables are much easier to reason about, because there are clear and unambiguous memory barriers around data being shared by sender and receiver threads; and semaphores are trivially implementable using condition variables if that's truly all you need.
On processors with weakly-ordered memory models, consecutive writes to memory by one thread may be seen as occurring in a different order on a receiver thread. On processor with a strongly-ordered memory model (Intel), you can sometimes update the state of data being controlled by the semaphore before set, and after wait by relying on strong ordering of reads and writes (although it is insanely easy to make a make mistakes that causes terrible bugs, and infamously difficult to determine whether you haven't made a mistake).
On weakly-ordered platforms this doesn't work so you almost always have to update the state of the controlled object before set, and after wait while holding a mutex. But if you're going to use both a mutex AND a semaphore, then you may as well use a condition variable instead.
Bottom line: if you use a semaphore on a processor with a weakly-ordered memory model, you are almost definitely doing it wrong; and if you use one on a processor with a strongly-ordered memory model, you are still probably doing it wrong.
As to efficiency -- I can't imagine there is a meaningful difference. And even if there is, having a much higher chance of doing it right surely makes up for it.
Indeed, if your data need to be mutex protected might as well use a condition variable: it will be as efficient and certainly more understandable.
Semaphore make sense for one shot publishing: write the data -> signal -> wakeup then read the data, where some other invariant prevents reading the data before the wait. They also work somewhat well to add waiting on otherwise lock free queues.
In POSIX, condition variables are not async signal safe, so sem_t is a good alternative if needed.
I don't know what the Boost FAQ said, but IME, Semaphores are often used to incorrectly implement condition variables. Given traditional semaphores it's not possible to do in a reasonably performant manner, and even with the correct extension to semaphores, the implementation is quite non-obvious.
Mutexes can't be properly used for signaling, so they are less likely to be abused in this manner.
From the linked open-std doc:
> This paper acknowledges that semaphores should be only used in situations where mutex and condition variable combination can't offer enough performance or the function notifying an event can't be blocked. Because of this, this paper proposes standardizing a semaphore interface that can be optionally implemented by vendors (this is likely to happen in embedded C++ implementations). All embedded and realtime operating systems offer semaphore operations, so this shows the importance of semaphore classes for programming non-blocking critical tasks. The proposed interface is:
In terms of "mutex and condition variable combination can't offer enough performance" unless you really know what you're doing, using semaphores instead will give you higher performance at the cost of subtle bugs.
1: Though POSIX mutexes on Linux can be used for signalling (i.e. the thread that acquires and the thread that releases a mutex need not be the same), causing portability issues with some code developed on Linux...
I don't really know how it relates to semaphores but it was exactly what I needed last time when I wanted to parallelize some computation and I needed to pause starved threads and unpause them all of them only after the last one got starved and triggered generation of next data block and moved focus of computation to that new block.