spinlocks

Advanced Spinlocks in Java: TAS, TTAS, Ticket, and MCS

In any multicore system, threads will compete for shared data. If two threads modify the same memory simultaneously, it leads to data corruption. To resolve these race conditions, we use mutual exclusion (mutex), in which only one thread is allowed to enter the critical section at a time.

Java provides synchronized and ReentrantLock locks for mutual exclusion. These are blocking locks; if a thread fails to acquire the lock, it is parked. This causes a context switch in which the OS saves the thread’s execution state and loads another task. The cost of this swap can be far higher than the actual task itself, particularly if the critical section is extremely short.

In this case, we can use a spinlock, which keeps the thread active on the CPU, repeatedly checking the lock in a tight loop instead of sleeping.

In this blog post we explore some different spinning mechanisms (TAS, TTAS, Ticket, MCS, and CLH).

Test-And-Set (TAS)

The simplest spinlock is the Test-And-Set (TAS). It relies on a single atomic instruction to update the lock state. We simulate this using an AtomicBoolean which, when true, indicates the lock is held. In this approach, the thread continuously attempts to set the state to true. If the previous value was false, it means we acquired the lock.

public class TASLock {
    private final AtomicBoolean state = new AtomicBoolean(false);

    public void lock() {
        while (state.getAndSet(true)) {
            Thread.onSpinWait();
        }
    }

    public void unlock() {
        state.set(false);
    }
}

This implementation has severe performance implications and should be avoided. The getAndSet causes every waiting thread to perform an atomic write. Each time, this forces the CPU to invalidate the cache line on all other cores, generating excessive cache coherence traffic.

Test-Test-And-Set (TTAS)

TTAS is an extension of TAS designed to reduce cache coherence traffic by checking the lock state before attempting to acquire it. In this approach, we read the value first. If the lock is held, we spin on a read-only operation. Because we are only reading, the cache line remains in the Shared state, avoiding constant invalidations. We only attempt the expensive atomic write when our local read indicates the lock is free.

public class TTASLock {
    private final AtomicBoolean state = new AtomicBoolean(false);

    public void lock() {
        while (true) {
            while (state.get()) {
                Thread.onSpinWait();
            }
            if (!state.getAndSet(true)) {
                return;
            }
        }
    }

    public void unlock() {
        state.set(false);
    }
}

While TTAS performs better than TAS, it suffers from a problem called the Thundering Herd.

When the lock is released, the CPU updates the memory holding that lock. This invalidates the cache line on every waiting thread. All waiting threads reload the new cache line and try to perform the atomic write at once to grab the lock. This causes a sudden spike in cache-line contention which affects performance.

Ticket Lock

If you observed keenly, one thing common between TAS and TTAS is that a thread could wait forever (starvation). The Ticket Lock solves this by enforcing order or some fairness using FIFO (First-In, First-Out)

public class TicketLock {
    private final AtomicInteger ticketCounter = new AtomicInteger(0);
    private final AtomicInteger currentTurn = new AtomicInteger(0);

    public void lock() {
        int ticket = ticketCounter.getAndIncrement();

        while (currentTurn.get() != ticket) {
            Thread.onSpinWait();
        }
    }

    public void unlock() {
        currentTurn.incrementAndGet();
    }
}

This approach also introduces another issue: every thread spins on the same variable, currentTurn. Again here, the CPU invalidates the cache for all waiting threads whenever the lock owner increments it. Even threads who are not next in line are forced to reload data. If the thread number is high, this has poor scalability due to excessive cache coherence traffic.

MCS and CLH: The Queue-Based Solutions

As we saw, the Ticket Lock solves the problem of starvation but still has Global Spinning. The Queue-Based spinlocks solve this by eliminating the shared cache line contention.

MCS Lock (Mellor-Crummey and Scott)

The MCS Lock provides the first attempt to solve the global spinning problem. It was suggested by John M. Mellor-Crummey and Michael L. Scott in their 1991 paper, “Algorithms for scalable synchronization on shared-memory multiprocessors.

The MCS Lock fixes global spinning by using an explicit linked list of waiters. A thread joins the queue by inserting its own node. The thread then spins only on a flag inside its local node. Since this variable stays in the thread’s cache, it creates almost zero contention while waiting. The lock owner passes ownership by updating the locked flag directly to the next thread’s node, instead of a cache-invalidating broadcast.

CLH Lock (Craig, Landin, and Hagersten)

The CLH Lock is an alternative to the MCS, it was suggested by T. Craig, G. Landin, and S. Hagersten in the paper “Building FIFO and Priority-Queuing Spin Locks from Atomic Swap.” It also uses a queue-based approach but instead of spinning on its own node, the thread spins on the state of its predecessor’s node. This design significantly simplifies queue management by making it easier for threads to cancel their wait or time out, which is why it was chosen for core libraries like Java’s AQS. It still maintains the near zero contention of MCS by keeping the spinning localized.

Conclusion

In this post we explored different spinlock mechanism in Java: TAS, TTAS, Ticket, and the queue-based MCS/CLH designs. For real-world code, synchronized or ReentrantLock is still the right choice 99% of the time. Spinlocks only pay off when critical sections is extremely short. Use TAS/TTAS when contention is extremely rare, use Ticket when you need strict FIFO fairness, and use MCS or CLH when you face heavy contention on many cores and need maximum scalability or a smart mix of these approaches. In the upcoming blog post, we will directly provide implementations of these mechanisms and use JMH (Java Microbenchmark Harness) to benchmark them for a high-contention scenario.

Oval@3x 2 1024x570

Don’t miss a post!

Lobe Serge
Lobe Serge
Articles: 12

Leave a Reply

Your email address will not be published. Required fields are marked *