Distributed Rate Limiting in Java With Valkey or Redis and Spring Boot

Published on

June 18, 2026

In this era of microservices-based applications, rate limiting is a necessary tool. Whereas monolithic apps centralize as much functionality as possible, the microservices architecture is all about connecting discrete components through APIs. Yet there must be limits on these services to prevent attacks or a single component from consuming too many resources. For Java developers, rate limiting in a single JVM is pretty easy — a basic counter in a ConcurrentHashMap class or a Guava RateLimiter will do the trick. For distributed applications, it's not so straightforward. Once you scale behind a load balancer, each node maintains its own isolated count. What you need is a shared state that every instance can read and update atomically.

Redis and its open source fork Valkey provide the atomicity and cluster management for real distributed rate limiting in Java. Developers just need Redisson, a Java client for both Valkey and Redis that wraps their infrastructure in clean APIs so that they don't have to code everything themselves. Here's a look at how Redisson makes distributed rate limiting in Java with Valkey or Redis and Spring Boot possible.

What a Distributed Rate Limiter Has to Guarantee

The requirement is easy to state and hard to satisfy: a limit of 100 requests per minute has to mean 100 across the entire fleet, not 100 per node, and it has to keep meaning that while instances start, stop and redeploy. Nearly everything that makes distributed rate limiting harder than the single-JVM case follows from that one sentence. For the concept itself — the algorithms, and how rate limiting differs from throttling — see what rate limiting is.

Two properties do the work. The decision has to be atomic, because a check followed by a separate increment is a race that two concurrent requests will win together. And the state has to be shared, because a counter owned by one instance is a counter the other instances cannot see. Miss either and the limit becomes advisory: mostly right, occasionally several times over, and impossible to reason about during the incident where it matters.

Why Distributed Systems Need Valkey or Redis for Rate Limiting

The main reason rate limiting is so challenging in a distributed environment is that all nodes must share a consensus. To enforce a single limit across multiple instances, they must either all consult the same authority before allowing a request or share state with one another.

An in-memory counter can't act as that authority because it isn't shared. A traditional relational database can share state, but executing an SQL query for every API call adds latency and creates lock contention (where multiple processes fight to lock a table or row before writing to the database).

Valkey and Redis are perfect for rate limiting because they are single-threaded and operate entirely in memory. This means commands run one at a time, making the INCR (increment) command inherently atomic, while network round-trip times are in sub-milliseconds. One Valkey/Redis key effectively becomes the single source of truth for the number of requests a client has made in the current time window. That's why every Valkey or Redis rate limiting setup in a Spring Boot service is production-grade, even for the most demanding distributed systems.

The Pitfalls of Coding Rate Limiting by Hand

Some Java developers try to code rate limiting directly into their apps. At first glance, it might seem easy to do. You just increment a per-client counter, set a 60-second TTL (time to live) on the first use, and reject requests once the count crosses your limit:

INCR rate:user:42 
EXPIRE rate:user:42 60

The problem lies in the gap between those two commands. If your process crashes between INCR and EXPIRE, you generate a key with no expiry that counts forever, locking the client out permanently. Concurrent requests can also race on the initial "first call?" check.

A better approach is to make the whole decision atomic in Valkey or Redis with a Lua script:

local current = redis.call("INCR", KEYS[1]) 
if current == 1 then 
  redis.call("EXPIRE", KEYS[1], ARGV[1]) 
end
return current

While this script closes the TTL race condition, it's only the start of a growing mountain of code for your dev team to maintain. Because fixed windows allow double bursts at boundaries, teams usually graduate to sliding windows or token buckets. This requires more Lua scripting, uncovers more edge cases, and demands more tests. Ultimately, you're left maintaining a custom rate limiting library instead of shipping features that drive business value.

The Better Way: Redisson's RRateLimiter

Redisson offers Java developers a better way: a distributed rate limiter implemented as an object. RRateLimiter keeps all its state in Redis and runs the atomic logic for you, ensuring that one limit holds across every JVM that shares it:

RedissonClient redisson = Redisson.create(config);

RRateLimiter limiter = redisson.getRateLimiter("api:global");
// 100 permits per 1 second, shared across every instance
limiter.trySetRate(RateType.OVERALL, 100, 1, RateIntervalUnit.SECONDS);

if (limiter.tryAcquire(1)) {
    // handle the request
} else {
    // over the limit — reject
}

Here, the tryAcquire() method returns immediately as true or false. Finally, tryAcquire(permits, timeout, unit) waits up to a specified bound. Keep in mind that RRateLimiter enforces an average rate, meaning after an idle stretch, it can briefly admit a burst above the nominal limit while it catches up. If you need strict limits, use GCRA, which is covered below.

Scope Limits: PER_CLIENT vs. OVERALL

RRateLimiter's RateType argument decides how the limit is scoped, with two options: PER_CLIENT and OVERALL.

When you use RateType.OVERALL, the limit is shared across every Redisson client connected to that key. If you set 100 permits/second, your whole fleet collectively gets 100/second. This is the right choice for protecting a shared downstream resource, like a database or a third-party API with its own quota.

Conversely, RateType.PER_CLIENT means each Redisson client instance gets its own quota. With 100 permits/second across three instances, aggregate throughput can reach 300 permits per second. This is right for per-node capacity limits rather than a global ceiling.

While you might assume PER_CLIENT means "per end user," that's not the case. To limit individual users or IP addresses, you must give each one its own named limiter:

RRateLimiter perUser = redisson.getRateLimiter("api:rate:" + clientId); 
perUser.trySetRate(RateType.OVERALL, 20, 1, RateIntervalUnit.MINUTES);

Here, OVERALL is correct: you want that user's 20 requests per minute enforced cluster-wide, and the key, not the RateType, that makes it "per user."

Using GCRA for Precision Burst Control

The Generic Call Rate Algorithm (GCRA) is considered the gold standard for controlling bursts. Instead of counting requests in a given time window, GCRA tracks the theoretical arrival time of the next allowed request. This gives a sustained rate plus a defined burst with no boundary spikes.

RGcra is Redisson's object for leveraging Redis's GCRA capabilities. The object stores state in a single Redis key shared across all threads and instances, is thread-safe, and runs on top of Redis's native GCRA command.

Note, however, that the GCRA command is available only in Redis 8.8 or later, and is unavailable on Valkey. If you use Valkey over Redis, stick with RRateLimiter.

To use RGcra, you pass the burst capacity, replenishment rate, and interval straight to tryAcquire(). There is no separate setup call, and it returns a GcraResult describing the decision:

RGcra gcra = redisson.getGcra("api:gcra:" + clientId); 

// tryAcquire(burst, rate, interval): 4 tokens/sec sustained, plus a burst of 2 
GcraResult result = gcra.tryAcquire(2, 4, Duration.ofSeconds(1)); 

if (result.isLimited()) { 
    long retryAfter = result.getRetryAfterSeconds();     // when to retry 
    long fullBurst  = result.getFullBurstAfterSeconds(); // when burst fully refills 
    // reject with 429 + Retry-After: retryAfter 
} else {
    // proceed 
} 

// weighted request — acquire 3 tokens at once:
GcraResult batch = gcra.tryAcquire(2, 4, Duration.ofSeconds(1), 3);

The isLimited() method tells you whether the request was rejected. getRetryAfterSeconds() gives the wait before it would succeed, returning -1 when not limited. Finally, getFullBurstAfterSeconds() reports when full burst capacity is restored. This is exactly the data you need for accurate response headers.

Building a Spring Boot @RateLimit Annotation

The cleanest way to apply Redisson's distributed rate limiting in a web app is to use a custom annotation in aspect-oriented programming (AOP) style. Here is a reusable pattern for a Spring Boot rate limit annotation API:

@Target(ElementType.METHOD) 
@Retention(RetentionPolicy.RUNTIME) 
public @interface RateLimit { 
    int permits() default 20; 
    int seconds() default 60; 
    RateType type() default RateType.OVERALL; // OVERALL = cluster-wide; PER_CLIENT = per instance 
} 


@Aspect 
@Component 
public class RateLimitAspect { 
    private final RedissonClient redisson; 


    public RateLimitAspect(RedissonClient redisson) { 
        this.redisson = redisson; 
    } 


    @Around("@annotation(rateLimit)") 
    public Object enforce(ProceedingJoinPoint pjp, RateLimit rateLimit) throws Throwable { 
        String clientId = currentClientId(); // e.g. API key or remote IP 
        String key = "rl:" + pjp.getSignature().toShortString() + ":" + clientId; 
        Duration window = Duration.ofSeconds(rateLimit.seconds()); 


        RRateLimiter limiter = redisson.getRateLimiter(key); 
        
        // The 4th arg is keepAlive: the limiter is deleted once it sits idle for a 
        // full window, so per-user-per-endpoint keys don't pile up in Redis forever. 
        limiter.trySetRate(rateLimit.type(), rateLimit.permits(), window, window); 


        if (!limiter.tryAcquire(1)) { 
            throw new TooManyRequestsException(rateLimit.seconds()); 
        } 
        return pjp.proceed(); 
    } 
}

Distributed Rate Limiting and More With Redisson PRO

While rate limiting in a single instance might be fairly straightforward, it gets much more complicated in a distributed system. RRateLimiter is the answer for distributed Java apps, including those you build in the Spring Boot framework. And RRateLimiter is just one of a multitude of features found in Redisson PRO that allow Java developers to leverage the full power of Valkey and Redis. To learn more, take a look at the feature comparison between Redisson and Redisson PRO. For the full API, see the Redisson rate limiter docs and getting-started guide.