What Is Consistent Hashing?
When today's application architects plan a distributed system, they face a fundamental challenge: how do they know which server holds a specific piece of information? Managing where session states and data reside in a highly available, horizontally scaled architecture can quickly become chaotic without a uniform approach. Often, the answer to this problem is consistent hashing, an advanced data partitioning technique that maps both data keys and server nodes onto an abstract circle known as a hash ring.
Consistent hashing ensures that when a node is added or removed from the network, only a tiny fraction of the overall data keys are relocated, rather than forcing a system-wide reshuffle. This approach has proven highly effective for everything from distributed caches and decentralized databases to shared key-value stores.
The Challenges of Data Partitioning in Distributed Architectures
Whenever a system scales horizontally across multiple servers, it quickly becomes clear that you need some sort of algorithm to decide which node owns each specific key. The most straightforward approach is modulo hashing, which can be represented in a simple mathematical formula: node = hash(key) % N, where N represents the total number of active nodes in the cluster.
While modulo hashing evenly spreads keys across a static cluster, it's not well-suited for today's dynamic cloud environments. For example, if you change the value of N, the result changes for almost every single key in the database. So, for example, if you have deployed a 4-node caching cluster that comes under heavy load and you scale up to 5 nodes with modulo hashing, roughly 80% of your existing keys will suddenly map to an entirely different node.
When the app attempts to fetch that data, it won't be where it's expected, resulting in a cache miss. This, in turn, triggers the dreaded "thundering herd," where thousands of requests bypass the empty cache and hit the primary database all at once, potentially crashing the entire system.
Why Consistent Hashing Is an Ideal Solution
As the web and its services were becoming increasingly distributed in the mid-1990s, MIT computer science professor David Karger began looking for a better algorithm that could balance loads across servers while maintaining data consistency. Karger and his colleagues first introduced the concept of consistent hashing in a 1997 paper as a solution for the early web cache fleets in use at the time.
One of the biggest breakthroughs of consistent hashing, from a mathematical perspective, is that it removes the dependence on N. Instead of relying on the total node count, it places keys onto a fixed ring, effectively decoupling the data's location from the cluster's current size.
Visualizing a Consistent Hash
To visualize this mechanism, picture the output space of a standard hash function — for instance, the integers from 0 to 2³² − 1 — bent into a continuous circle, so that the largest possible value wraps seamlessly back to 0. The routing mechanism then relies on three distinct steps:
- Node placement: The system hashes a unique identifier for each server node (such as its IP address or hostname) to plot its position as a fixed point on the ring.
- Key placement: When data needs to be saved or retrieved, the system hashes the data key to place it at the same ring position.
- Routing: To determine ownership, the system looks at the key's position and walks clockwise around the ring. The data is assigned to the very first server node it encounters.
The Role of Vnodes
For as strong a solution as this model is, there is one flaw. If you only have a handful of physical servers, their randomly hashed positions might land unevenly, leaving large gaps. A node responsible for a large gap would inherit a disproportionately large arc of the ring and, therefore, a large share of the data. The standard fix for this problem is to implement virtual nodes, commonly known as vnodes.
Instead of placing a physical server on the ring just once, the algorithm hashes each physical node to dozens or even hundreds of different positions. This effectively slices a server's share of the keyspace into many small, non-contiguous arcs. Vnodes balance data effectively, and if a physical server suffers a hardware failure, its workload is evenly distributed across the remaining cluster rather than dumping all its traffic onto a single successor.
Cluster Scaling Without Large Data Migrations
Beyond mathematical calculations, the true value of consistent hashing becomes clear when you scale a cluster. Because the routing logic relies on the fixed ring rather than a node count, the blast radius of a change is strictly contained.
So, when you add a new node, it simply claims a spot on the ring and takes over the keys in the specific arc it now covers. Every other key in the entire system stays exactly where it was. Conversely, when you remove a node (either intentionally or due to failure), its designated arc folds into the territory of the next node clockwise. Only the departed node's keys are forced to move.
Consistent hashing guarantees that data movement remains proportional to the architectural change. On average, only 1/N of the total dataset will migrate when you add or remove a node. For a distributed cache, this means the vast majority of entries survive a scaling event untouched, which is essentially impossible with modulo hashing.
Managing Clustered Data Structures with Redisson PRO
Even though consistent hashing has proven effective in today's highly distributed cloud-based applications, that doesn't mean distributed routing is easy for architects and developers to manage. If anything, it has become more complicated as apps have become larger-scale and more distributed in the microservices architecture. But instead of dealing with the calculus of consistent hashing, Java developers can use Redisson, with either Valkey or Redis as their back-end data store.
Redisson, a Valkey/Redis client for Java developers, provides familiar objects that function perfectly in a clustered mode, while the entire state of a single object (like a Map or a List) resides entirely on whichever master node owns that object's calculated hash slot.
For enterprise applications managing exceptionally large datasets, where a single object might outgrow the memory capacity of one physical server, Redisson PRO allows you to spread a single data structure across the entire cluster with its data-partitioned RClustered collections. RClusteredMap, RClusteredSet, RClusteredBitSet, and RClusteredBloomFilter automatically split their internal contents into numerous smaller partitions. Integrating Redisson PRO's custom data types into your code is as simple as using any other object:
// Entries of a single map distributed across all master nodes
RClusteredMap orders = redisson.getClusteredMap("orders");
RClusteredSet tags = redisson.getClusteredSet("tags");
orders.put("order:9001", new Order(/* ... */));
You can also apply tags to each hash directly within Redisson object declarations to strictly co-locate related distributed objects on the same physical node:
// Both objects pinned to the same slot via the {user:1001} tag
RMap profile = redisson.getMap("{user:1001}:profile");
RList sessions = redisson.getList("{user:1001}:sessions");