Why is CAS(Compare and Swap) atomic? - operating-system

I know CAS is a well-known atomic operation. But I struggle to see why it must be atomic. Take the sample code below as an example. After if (*accum == *dest), if another thread jumps in and succeed to modify the *dest first, then switch back to the previous thread and it proceeds to *dest = newval;. Wouldn't that lead to a failure?
Is there something I am missing? Is there some mechanism that would prevent the above scenario from happening?
Any discussions would be greatly appreciated!
bool compare_and_swap(int *accum, int *dest, int newval)
if (*accum == *dest) {
*dest = newval;
return true;
} else {
*accum = *dest;
return false;

Often people use example code that is not atomic to describe what a CPU does atomically with a single instruction; because it's easier to see how it would work (and because a single cmpxchg instruction doesn't tell you much about how it works).
The code you've shown is like that (not atomic, to help understand how it works).

I had this question,too.This kind of things couldn't happen. The function that you wrote is an abstract operation of CPU, and the impletement is atomatic in real. U can google the key words of "cmpxchg" and will get the answer you find.

Yes, this code can lead to pitfalls that you mentioned as it looks from the outside. However, if we look at how it is compiled, it will lead to a cmpxchg command, which will be executed atomically by the compiler.

As a computer science concept compare and swap HAS to be implemented atomically because of what it is designed to do as a consensus object https://stackoverflow.com/a/56383038/526864
if another thread jumps in and succeed to modify the *dest first
I think that this premise is flawed because dest can not be allowed to change. The pseudocode should look more like
bool compare_and_swap(int *p, int oldval, int newval)
if (*p == oldval) {
*p = newval;
return true;
} else {
return false;
The example that you provided was for a specific implementation that returns the winning processes pid to the losers and only allows the single modification to *dest
an election protocol can be implemented such that every process checks the result of compare_and_swap against its own PID (= newval)
So compare-and-swap is either implemented with an atomic function/library or uses cmpxchg as you surmised
Do you think that these methods are special methods that directly utilize the hardware to perform atomic operations


How to implement non chronological backtracking

I'm working on a CDCL SAT-Solver. I don't know how to implement non-chronological backtracking. Is this even possible with recursion or is it only possible in a iterative approach.
Actually what i have done jet is implemented a DPLL Solver which works with recursion. The great differnece from DPLL and CDCL ist that the backracking in the tree is not chronological. Is it even possible to implement something like this with recursion. In my opionion i have two choices in the node of the binary-decision-tree if one of to path leads i a conlict:
I try the other path -> but then it would be the same like the DPLL, means a chronological backtracking
I return: But then i will never come back to this node.
So am i missing here something. Could it be that the only option is to implement it iterativly?
Non-chronological backtracking (or backjumping as it is usually called) can be implemented in solvers that use recursion for the variable assignments. In languages that support non-local gotos, you would typically use that method. For example in the C language you would use setjmp() to record a point in the stack and longjmp() to backjump to that point. C# has try-catch blocks, Lispy languages might have catch-throw, and so on.
If the language doesn't support non-local goto, then you can implement a substitute in your code. Instead of dpll() returning FALSE, have it return a tuple containing FALSE and the number of levels that need to be backtracked. Upstream callers decrement the counter in the tuple and return it until zero is returned.
You can modify this to get backjumping.
private Assignment recursiveBackJumpingSearch(CSP csp, Assignment assignment) {
Assignment result = null;
if (assignment.isComplete(csp.getVariables())) {
result = assignment;
else {
Variable var= selectUnassignedVariable(assignment, csp);
for (Object value : orderDomainValues(var, assignment, csp)) {
assignment.setAssignment(var, value);
fireStateChanged(assignment, csp);
if (assignment.isConsistent(csp.getConstraints(var))) {
result=recursiveBackJumpingSearch(csp, assignment);
if (result != null) {
if (result == null)
return result;

Proper usage of cache.putIfAbsent() in Cache2k

I am wondering how to work with the putIfAbsent() method when using the Cache2k cache. In the ConcurrentHashMap for example, one works with the method like this:
Set<X> set = map.get(name);
if (set == null) {
final Set<X> value = new HashSet<X>();
set = map.putIfAbsent(name, value);
if (set == null) {
set = value;
(Copied from Should you check if the map containsKey before using ConcurrentMap's putIfAbsent)
The Cache2K version returns a boolean. What does that mean and what does this tell me when more than 1 thread inserts a value using the same key.
Any help would be greatly appreciated as I am a bit unsure on how to deal with this. I am using the latest version 0.26-BETA.
Thanks and best regards,
putIfAbsent() is equivalent to:
if (!cache.containsKey(key)) {
cache.put(key, value);
return true;
} else {
return false;
except that it is executed atomically.
The method returns true, if the value was inserted (that implies it was not existing before). Since it is executed atomically, only the thread, which value was successfully inserted, gets the true value.
Mind that put may invoke a writer, if registered. The writer is not invoked, if the value is already present.
The semantic of this method is identical to JCache/JSR107. This behavior might not make sense for all situations or is intuitive. See a discussion in https://github.com/jsr107/jsr107spec/issues/303.
If you like, please try to explain in another question about your use case or you desired cache semantics, so we can discuss what the best approach might be.

.NET Rx - ReplaySubject buffer size not working

I've been using .NET Reactive Extensions to observe log events as they come in. I'm currently using a class that derives from IObservable and uses a ReplaySubject to store the logs, that way I can filter and replay the logs (for example: Show me all the Error logs, or show me all the Verbose logs) without losing the logs I've buffered.
The problem is, even though I've set a buffer size on the subject:
this.subject = new ReplaySubject<LogEvent>(10);
The memory usage of my program goes through the roof when I use OnNext to add to the observable collection on an infinite loop:
internal void WatchForNewEvents()
Task.Factory.StartNew(() =>
while (true)
dynamic parameters = new ExpandoObject();
// TODO: Add parameters for getting specific log events
if (this.logEventRepository.GetManyHasNewResults(parameters))
foreach (var recentEvent in this.logEventRepository.GetMany(parameters))
// Commented this out for now to really see the memory go up
// Thread.Sleep(1000);
Does the buffer size on ReplaySubject not work? It doesn't seem to be clearing the buffer when the buffer size is reached. Any help much appreciated!
I add subscribers like this (Is this wrong?):
public IDisposable Subscribe(IObserver<LogEvent> observer)
return this.subject.Subscribe(observer);
...which is called like:
// Inserts into UI ListView
this.logEventObservable.Subscribe(evt => this.InsertNewLogEvent(evt));
I'm not sure if this is the definitive answer, but I suspect that you're hitting an issue because of concurrency around the scheduler you're using. The constructor you're calling on ReplaySubject looks like this:
public ReplaySubject(int bufferSize)
: this(bufferSize, TimeSpan.MaxValue, Scheduler.CurrentThread)
{ }
The Scheduler.CurrentThread worries me. Try changing it to Scheduler.ThreadPool and see if that helps.
Also, as a side note, you seem to be mixing Rx with TPL and old fashioned thread sleeping. It's usually best to avoid doing that. You could change your WatchForNewEvents code to look like this:
dynamic parameters = new ExpandoObject();
var newEvents =
from n in Observable.Interval(TimeSpan.FromSeconds(1.0))
where this.logEventRepository.GetManyHasNewResults(parameters)
from recentEvent in
select recentEvent;
That's a nice compact Rx-y way of doing things.

Zookeeper barrier implementation

I am trying to implement a barrier in Zookeeper. My implementation works all of the time when there are a small number of nodes that need to join to pass the barrier. However, when I test my implementation with 100+ nodes needing to joining the barrier, around 1% of the time it seems like that one of the nodes is missing the last watcher event, and not checking to see if the number of children of the barrier node has changed.
I even synchronized the process method on the watcher, but that did not change anything. Below is the code for my process method, and the logic that checks to see if needs to move forward.
Watcher process :
public BarrierWatcher(FastBarrier FastBarrier) {
this.ofb = FastBarrier;
public synchronized void process(WatchedEvent event) {
synchronized (ofb) {
Logic to control barrier mechanism:
BarrierWatcher bw = new BarrierWatcher(this);
List<String> memberList = zk.getChildren(barrierPath, bw);
synchronized(this) {
while (memberList.size() < numOfMembers) {
memberList = zk.getChildren(barrierPath, bw);
Instead of just calling this.wait(), I had add this.wait(1000) for the rare failure occurrence. With 1000 in place it always passes the barrier once all nodes have joined. I was sure that synchronizing the process method would fix this, but it hasn't. Anyone have any experience with this, or an ideas what i might be doing wrong?
You can compare your implementation with netflix-curator where distributed barrier is already implemented.

How big is the memory cost when passing an type as parameter to an method?

For example, I try to do something like this:
- (BOOL)compare:(NSDecimal)leftOperand greaterThan:(NSDecimal)rightOperand {
BOOL returnValue = NO;
NSComparisonResult result = NSDecimalCompare(&leftOperand, &rightOperand);
if (result == NSOrderedDescending) { // if the left operand is greater than the right operand
returnValue = YES;
return returnValue;
But I wonder how big is the cost for memory when using this wrapper. The NSDecimalCompare function takes parameters by reference (is that the word?). But my method does not. I find that by-reference stuff hard to use. Does my method create copies of these values? Is it a waste of memory?
You'll be making copies of your NSDecimals, but they're only 36-byte (if my math is correct) structs, so it might not be a significant overhead.
But is this really an issue? For example, are you calling this method many times per second? Sample your code first to see where the bottlenecks are before trying to optimize things like this. As Knuth says, "premature optimization is the root of all evil".