Distributed Rate Limiting in Java With Valkey or Redis and Spring Boot
In this era of microservices-based applications, rate limiting is a necessary tool. Whereas monolithic apps centralize as much functionality as possible, the microservices architecture is all about connecting discrete components through APIs. Yet there must be limits on these services to prevent attacks or a single component from consuming too many resources. For Java developers, rate limiting in a single JVM is pretty easy — a basic counter in a ConcurrentHashMap class or a Guava RateLimiter will do the trick. For distributed applications, it's not so straightforward. Once you scale behind a load balancer, each node maintains its own isolated count. What you need is a shared state that every instance can read and update atomically.
Redis and its open source fork Valkey provide the atomicity and cluster management for real distributed rate limiting in Java. Developers just need Redisson PRO, a Java client for both Valkey and Redis that wraps their infrastructure in clean APIs so that they don't have to code everything themselves. Here's a look at how Redisson PRO makes distributed rate limiting in Java with Valkey or Redis and Spring Boot possible.
What is Rate Limiting and Why Does it Matter?
First, let's talk a bit more about what rate limiting is and why it's a mandatory layer in the modern enterprise architecture.
You can think of rate limiting as a traffic cop for your APIs. It restricts the number of requests a user, IP address, or application can make within a specific timeframe. For developers, it's the ultimate defense mechanism for enterprise apps that contain sensitive data and must maintain 24/7 global uptime.
Rate limiting protects downstream databases from buckling under sudden traffic spikes and stops malicious actors from brute-forcing login endpoints. It prevents "noisy neighbor" scenarios in which a single aggressive tenant consumes all available server resources, degrading performance for everyone else. This all matters because, in cloud environments where infrastructure auto-scales to meet demand, an unmitigated traffic spike — whether for good reasons like going viral or bad ones like a DDoS attack — can lead to poor performance, unexpected downtime, or cloud billing overages. A robust rate limiter is necessary not just for the stability it provides but also to protect company assets.
Why Distributed Systems Need Valkey or Redis for Rate Limiting
The main reason rate limiting is so challenging in a distributed environment is that all nodes must share a consensus. To enforce a single limit across multiple instances, they must either all consult the same authority before allowing a request or share state with one another.
An in-memory counter can't act as that authority because it isn't shared. A traditional relational database can share state, but executing an SQL query for every API call adds latency and creates lock contention (where multiple processes fight to lock a table or row before writing to the database).
Valkey and Redis are perfect for rate limiting because they are single-threaded and operate entirely in memory. This means commands run one at a time, making the INCR (increment) command inherently atomic, while network round-trip times are in sub-milliseconds. One Valkey/Redis key effectively becomes the single source of truth for the number of requests a client has made in the current time window. That's why every Valkey or Redis rate limiting setup in a Spring Boot service is production-grade, even for the most demanding distributed systems.
The Pitfalls of Coding Rate Limiting by Hand
Some Java developers try to code rate limiting directly into their apps. At first glance, it might seem easy to do. You just increment a per-client counter, set a 60-second TTL (time to live) on the first use, and reject requests once the count crosses your limit:
INCR rate:user:42
EXPIRE rate:user:42 60
The problem lies in the gap between those two commands. If your process crashes between INCR and EXPIRE, you generate a key with no expiry that counts forever, locking the client out permanently. Concurrent requests can also race on the initial "first call?" check.
A better approach is to make the whole decision atomic in Valkey or Redis with a Lua script:
local current = redis.call("INCR", KEYS[1])
if current == 1 then
redis.call("EXPIRE", KEYS[1], ARGV[1])
end
return current
While this script closes the TTL race condition, it's only the start of a growing mountain of code for your dev team to maintain. Because fixed windows allow double bursts at boundaries, teams usually graduate to sliding windows or token buckets. This requires more Lua scripting, uncovers more edge cases, and demands more tests. Ultimately, you're left maintaining a custom rate limiting library instead of shipping features that drive business value.
The Better Way: Redisson PRO's RRateLimiter
Redisson PRO offers Java developers a better way: a distributed rate limiter implemented as an object. RRateLimiter keeps all its state in Redis and runs the atomic logic for you, ensuring that one limit holds across every JVM that shares it:
RedissonClient redisson = Redisson.create(config);
RRateLimiter limiter = redisson.getRateLimiter("api:global");
// 100 permits per 1 second, shared across every instance
limiter.trySetRate(RateType.OVERALL, 100, 1, RateIntervalUnit.SECONDS);
if (limiter.tryAcquire(1)) {
// handle the request
} else {
// over the limit — reject
}
Here, the tryAcquire() method returns immediately as true or false. Finally, tryAcquire(permits, timeout, unit) waits up to a specified bound. Keep in mind that RRateLimiter enforces an average rate, meaning after an idle stretch, it can briefly admit a burst above the nominal limit while it catches up. If you need strict limits, use GCRA, which is covered below.
Scope Limits: PER_CLIENT vs. OVERALL
RRateLimiter's RateType argument decides how the limit is scoped, with two options: PER_CLIENT and OVERALL.
When you use RateType.OVERALL, the limit is shared across every Redisson client connected to that key. If you set 100 permits/second, your whole fleet collectively gets 100/second. This is the right choice for protecting a shared downstream resource, like a database or a third-party API with its own quota.
Conversely, RateType.PER_CLIENT means each Redisson client instance gets its own quota. With 100 permits/second across three instances, aggregate throughput can reach 300 permits per second. This is right for per-node capacity limits rather than a global ceiling.
While you might assume PER_CLIENT means "per end user," that's not the case. To limit individual users or IP addresses, you must give each one its own named limiter:
RRateLimiter perUser = redisson.getRateLimiter("api:rate:" + clientId);
perUser.trySetRate(RateType.OVERALL, 20, 1, RateIntervalUnit.MINUTES);
Here, OVERALL is correct: you want that user's 20 requests per minute enforced cluster-wide, and the key, not the RateType, that makes it "per user."
Using GCRA for Precision Burst Control
The Generic Call Rate Algorithm (GCRA) is considered the gold standard for controlling bursts. Instead of counting requests in a given time window, GCRA tracks the theoretical arrival time of the next allowed request. This gives a sustained rate plus a defined burst with no boundary spikes.
RGcra is Redisson PRO's object for leveraging Redis's GCRA capabilities. The object stores state in a single Redis key shared across all threads and instances, is thread-safe, and runs on top of Redis's native GCRA command.
Note, however, that the GCRA command is available only in Redis 8.8 or later, and is unavailable on Valkey. If you use Valkey over Redis, stick with RRateLimiter.
To use RGcra, you pass the burst capacity, replenishment rate, and interval straight to tryAcquire(). There is no separate setup call, and it returns a GcraResult describing the decision:
RGcra gcra = redisson.getGcra("api:gcra:" + clientId);
// tryAcquire(burst, rate, interval): 4 tokens/sec sustained, plus a burst of 2
GcraResult result = gcra.tryAcquire(2, 4, Duration.ofSeconds(1));
if (result.isLimited()) {
long retryAfter = result.getRetryAfterSeconds(); // when to retry
long fullBurst = result.getFullBurstAfterSeconds(); // when burst fully refills
// reject with 429 + Retry-After: retryAfter
} else {
// proceed
}
// weighted request — acquire 3 tokens at once:
GcraResult batch = gcra.tryAcquire(2, 4, Duration.ofSeconds(1), 3);
The isLimited() method tells you whether the request was rejected. getRetryAfterSeconds() gives the wait before it would succeed, returning -1 when not limited. Finally, getFullBurstAfterSeconds() reports when full burst capacity is restored. This is exactly the data you need for accurate response headers.
Building a Spring Boot @RateLimit Annotation
The cleanest way to apply Redisson PRO's distributed rate limiting in a web app is to use a custom annotation in aspect-oriented programming (AOP) style. Here is a reusable pattern for a Spring Boot rate limit annotation API:
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface RateLimit {
int permits() default 20;
int seconds() default 60;
RateType type() default RateType.OVERALL; // OVERALL = cluster-wide; PER_CLIENT = per instance
}
@Aspect
@Component
public class RateLimitAspect {
private final RedissonClient redisson;
public RateLimitAspect(RedissonClient redisson) {
this.redisson = redisson;
}
@Around("@annotation(rateLimit)")
public Object enforce(ProceedingJoinPoint pjp, RateLimit rateLimit) throws Throwable {
String clientId = currentClientId(); // e.g. API key or remote IP
String key = "rl:" + pjp.getSignature().toShortString() + ":" + clientId;
Duration window = Duration.ofSeconds(rateLimit.seconds());
RRateLimiter limiter = redisson.getRateLimiter(key);
// The 4th arg is keepAlive: the limiter is deleted once it sits idle for a
// full window, so per-user-per-endpoint keys don't pile up in Redis forever.
limiter.trySetRate(rateLimit.type(), rateLimit.permits(), window, window);
if (!limiter.tryAcquire(1)) {
throw new TooManyRequestsException(rateLimit.seconds());
}
return pjp.proceed();
}
}
Distributed Rate Limiting and More With Redisson PRO
While rate limiting in a single instance might be fairly straightforward, it gets much more complicated in a distributed system. RRateLimiter is the answer for distributed Java apps, including those you build in the Spring Boot framework. And RRateLimiter is just one of a multitude of features found in Redisson PRO that allow Java developers to leverage the full power of Valkey and Redis. To learn more, take a look at the feature comparison between Redisson and Redisson PRO. For the full API, see the Redisson rate limiter docs and getting-started guide.