From synchronized to StampedLock: how Java optimized read-heavy concurrency

We have seen in our previous posts on TAS, TTAS, Ticket, and MCS spinlocks and TAS, TTAS, Ticket, and MCS spinlocks Part 2 various locking mechanisms in java. While those locks are excellent for managing mutual exclusion, they share a fundamental design philosophy: they are exclusive. They treat both read and write operations the same, meaning only a single thread can access the data at a time.

However, most Java applications are read-heavy. In systems like configuration caches, routing tables, or metadata stores, you might have thousands of reads for every single write. Using an exclusive lock in these scenarios is a massive waste of resources. Why should 50 threads wanting to simply look at a value have to wait in a queue behind each other?

In this post, we look at two specialized locks designed specifically for such workloads. We explore the internals of ReentrantReadWriteLock and StampedLock and finally perform a simple benchmark.

ReentrantReadWriteLock

ReentrantReadWriteLock is a specialized version of ReentrantLock that allows multiple threads to read shared state simultaneously, provided no thread is currently writing to it.

Internally, it uses a single 32-bit int state to track both types of locks. The first 16 bits hold the Read Lock count, while the lower 16 bits hold the Write Lock count.

Every reader must first check the shared state to see if any writer holds the lock. If not, it increments the higher 16 bits of the state by performing a CAS (Compare-And-Swap) operation. This is essential for the lock to know exactly when the resource is safe to write as the writer must wait until this read count drops to zero. The Writer thread on the otherhand

The Writer thread, on the other hand, must ensure that no readers and no other writers are present. It attempts to CAS the entire 32-bit state from 0 to 1. If any readers are active (high bits) or another writer is active (low bits), this fails, and the writer will be forced to wait until the state is entirely clear.

Here is an implementation of a counter using ReentrantReadWriteLock:

class RWCounter {
    private int value;
    private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();

    public int get() {
        lock.readLock().lock();
        try {
            return value;
        } finally {
            lock.readLock().unlock();
        }
    }

    public void increment() {
        lock.writeLock().lock();
        try {
            value++;
        } finally {
            lock.writeLock().unlock();
        }
    }
}

StampedLock

If you observe keenly how ReentrantReadWriteLock works, you will see there is still significant overhead which becomes costly with more threads.

With ReentrantReadWriteLock, all threads, including readers, participate in synchronizing the shared state. Every time a thread reads, it must increment and then decrement the shared counter. This is actually a write operation on the same cacheline, which results in cache invalidations. There is a real hardware cost for that which we discussed here. Also, there is the issue of CAS retries as multiple threads try to update the state simultaneously.

These are all performance bottlenecks that StampedLock fixes by introducing “optimistic reads.”

StampedLock optimistic reads simply rely on a versioning system. When we try to read, a reader grabs the current number (stamp). There is no write operation, so there is no CAS, and it also does not block writers. After reading the data, it then gets the new version and compares it to the original. If they are the same, the read was clean. If they are different, a modification happened and so the reader must retry appropriately. This is more performant because it avoids the synchronization tax entirely during the read.

Here is an implementation of the same counter using the optimistic path of StampedLock:

class StampedCounter {
    private int value;
    private final StampedLock lock = new StampedLock();

    public int get() {
        long stamp = lock.tryOptimisticRead();
        int v = value;

        if (!lock.validate(stamp)) {
            stamp = lock.readLock();
            try {
                v = value;
            } finally {
                lock.unlockRead(stamp);
            }
        }
        return v;
    }

    public void increment() {
        long stamp = lock.writeLock();
        try {
            value++;
        } finally {
            lock.unlockWrite(stamp);
        }
    }
}

Benchmark

In this section, we run a simple benchmark to compare ReentrantReadWriteLock and StampedLock under heavy contention, and then we analyze the results.

Here is the code for the benchmark:

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(2)
@Warmup(iterations = 3)
@Measurement(iterations = 5)
public class ReadHeavyLockBench {

    RWCounter rwCounter = new RWCounter();
    StampedCounter stampedCounter = new StampedCounter();

    @Benchmark
    @Group("rw")
    @GroupThreads(16)
    public int rw_read() {
        return rwCounter.get();
    }

    @Benchmark
    @Group("rw")
    @GroupThreads(1)
    public void rw_write() {
        rwCounter.increment();
    }

    @Benchmark
    @Group("stamped")
    @GroupThreads(16)
    public int stamped_read() {
        return stampedCounter.get();
    }

    @Benchmark
    @Group("stamped")
    @GroupThreads(1)
    public void stamped_write() {
        stampedCounter.increment();
    }
}

Run mvn clean install to build the project, then run the benchmark with:

java -jar target/benchmarks.jar ReadHeavyLockBench

Results

The benchmark numbers reveal a massive performance gap when 16 reader threads and 1 writer thread compete for the same state:

Benchmark	Score (ops/ms)	Improvement
ReadHeavyLockBench.rw	~3,251	Baseline
ReadHeavyLockBench.stamped	~1,299,780	~400x Faster

The results show that StampedLock is nearly 400 times faster than ReentrantReadWriteLock. The reason for this astronomical difference is the elimination of the Synchronization Tax.

Conclusion

As we have seen in this blog post and from our results, for read heavy workloads, StampedLock clearly outperforms ReentrantReadWriteLock. This makes it a strong choice for performance critical components where speed is the primary driver e.g. global caches, low latency routing tables, or high frequency metadata stores. However, it is important to note that ReentrantReadWriteLock still provides a more stable model for most standard read heavy scenarios: it is simpler to implement, supports reentrancy, and handles moderate write contention without the need for manual validation logic.