Top-K for Redis on Java

Published on

July 1, 2026

Some questions in a data-intensive Java application are not "is this item present?" but "which items show up the most?" Think of the top trending search terms over the last hour, the hashtags driving a spike, the best-selling products on a busy storefront, or the handful of API clients responsible for most of your traffic. These are heavy-hitter questions, and they arrive as an endless, high-volume stream of events.

If you've solved the classic "top-k frequent elements" problem before, the shape here is familiar — but with a streaming twist: the data arrives as an unbounded stream you can't load into an array, which is exactly what breaks the textbook hash-map-and-heap solution. Keeping an exact counter per distinct item in a HashMap<String, Long> falls apart at scale. The map grows with the number of distinct keys, not with the number you actually care about, so a stream with millions of unique search terms means millions of counters in memory just to surface the top 20. Sorting that map to find the leaders adds more cost on every read.

A Top-K sketch solves this by trading exactness for a fixed, tiny memory footprint. Instead of storing every counter, it keeps a compact summary that tracks the k most frequent items using the HeavyKeeper algorithm, regardless of how many distinct items it has seen. For Java developers, working directly against the low-level Redis TOPK.* commands means manual serialization, command wrangling, and context switching. Redisson abstracts all of that into a clean, object-oriented RTopK interface. Here's how it works.

Tracking the Most Frequent Items With RTopK

Redisson's RTopK object is a probabilistic data structure that keeps track of the k most frequent items in a stream using the HeavyKeeper algorithm, with a fixed amount of memory no matter how many distinct items are seen. It is backed by the TOPK.* commands of the Redis Bloom module and is fully thread-safe, so it can be shared safely across application threads and nodes.

The workflow is simple. You initialize the structure once with the number of leaders you want to track, feed in occurrences as events arrive, and read back the current leaders — optionally with their approximate counts — at any time. Recording an item with add() returns the item (if any) that was pushed out of the top-K list as a result, or null when nothing was evicted, which makes it easy to react to changes in the leaderboard.

Take a look at this code sample to see how RTopK works for a trending-searches leaderboard:

import org.redisson.api.RTopK;
import org.redisson.api.RedissonClient;
import org.redisson.api.TopKInfo;
import java.util.List;
import java.util.Map;

public class TopKExample {

    public static void main(String[] args) {
        RedissonClient redisson = ... // Initialize your RedissonClient

        // Get access to the Top-K structure
        RTopK<String> trending = redisson.getTopK("trending:searches");

        // 1. Initialize once: track the 20 most frequent search terms
        trending.init(20);

        // 2. Record occurrences as events arrive
        // add() returns the item evicted from the leaderboard, or null
        String evicted = trending.add("postgres");
        System.out.println("Evicted by 'postgres': " + evicted);

        // Record several at once (results align positionally to the input)
        List<String> evictedItems =
                trending.add(List.of("redis", "kafka", "redis"));
        System.out.println("Evicted by the batch: " + evictedItems);

        // Record with an explicit weight, e.g. boost by request volume
        trending.incrementBy("postgres", 5);

        // 3. Query membership: is an item currently among the leaders?
        boolean isLeader = trending.contains("postgres");
        System.out.println("'postgres' is trending: " + isLeader);

        // 4. Read the current leaders, with their approximate counts
        List<String> leaders = trending.list();
        System.out.println("Current top terms: " + leaders);

        Map<String, Long> leadersWithCount = trending.listWithCount();
        System.out.println("Top terms with counts: " + leadersWithCount);

        // 5. Inspect the configured parameters of the structure
        TopKInfo info = trending.getInfo();
        System.out.println("Tracking top " + info.getTopK() + " items");
        System.out.println("Sketch width/depth/decay: "
                + info.getWidth() + "/" + info.getDepth() + "/" + info.getDecay());
    }
}

When you need the leaders together with their frequencies, reach for listWithCount() rather than the older per-item count() method, which is deprecated since Redis Bloom 2.4.0 because its estimate can be inaccurate. listWithCount() returns the leaderboard and the approximate counts in a single call.

What "Approximate" Means in Practice

If you haven't worked with a Top-K sketch before, it's worth understanding what it gives up in exchange for that fixed memory footprint. The structure doesn't keep a counter per item. Instead, it hashes each item into a small grid of counters and tracks frequency there; when different items collide on the same counter, the HeavyKeeper algorithm decays the existing count probabilistically. Because items with large counts are very unlikely to be decayed away, genuine heavy hitters are protected while rare items fade out — and that selective forgetting is exactly what lets the structure run in bounded memory.

Two consequences matter when you build on it. First, the counts from listWithCount() are estimates: they're ideal for ranking, leaderboards, and dashboards, but not for anything that must reconcile exactly, such as billing or quota enforcement. Second, items sitting right at the boundary of the top k may be reported in or out from one moment to the next — the clear leaders are reliable, the marginal ones less so.

Tuning the Underlying Sketch

For most workloads, init(int) with sensible defaults is all you need. When you want finer control over the accuracy-versus-memory trade-off, init(TopKInitArgs) exposes the parameters of the underlying sketch: width (counters per array, default 8), depth (the number of counter arrays, default 7), and decay (the probability of a counter being decreased on a collision, default 0.9). A wider, deeper sketch improves accuracy at the cost of more memory:

import org.redisson.api.TopKInitArgs;

trending.init(TopKInitArgs.topK(50)
        .width(2000)
        .depth(7)
        .decay(0.925));

Heavy-Hitter Detection in Bounded Memory

Beyond trending leaderboards, Top-K is a natural fit for spotting heavy hitters in high-volume traffic — API calls, log lines, or network flows where a small number of clients, IPs, or keys account for a disproportionate share of activity. Weighting each event by its cost with incrementBy() surfaces the worst offenders, and contains() gives you a cheap test for whether a specific client is currently among the top talkers, all without keeping a counter per distinct key.

RTopK<String> talkers = redisson.getTopK("api:top-callers");
talkers.init(10);

// Weight by request cost as traffic arrives
talkers.incrementBy(apiKey, requestCost);

// Is this caller currently a heavy hitter?
boolean isHeavyHitter = talkers.contains(apiKey);
List<String> worstOffenders = talkers.list();

Redisson also exposes RTopK through its asynchronous, reactive, and RxJava3 APIs (RTopKAsync, RTopKReactive, and RTopKRx), so the same trend-detection logic drops cleanly into non-blocking and reactive Java services.

When to Reach for Top-K

Top-K is the right tool when three things are true at once: the stream is high-cardinality, so that keeping a counter per distinct item would be expensive; you only need the leaders, not every item's exact tally; and an approximate ranking is good enough. Trending search terms, hashtags, popular products and pages, hot keys, and abusive clients all share that shape — the long tail is huge, but only the head matters.

Reach for something else when those conditions don't hold. If you need exact counts or a precise, stable ordering, a sorted set (RScoredSortedSet) is the better fit; and if the number of distinct items is small enough that an exact counter each is cheap, a plain counter or sorted set is simpler than a probabilistic sketch. For more detail on the structure and its tuning, see the Top-K reference in the Redisson documentation.

Optimize High-Performance Java Apps With Redisson PRO

Surfacing the most frequent items across an unbounded stream doesn't have to mean unbounded memory or hand-written command plumbing. With Redisson's RTopK interface, you get server-side, HeavyKeeper-backed Top-K tracking behind a clean Java API, with a fixed memory footprint whether your stream has thousands of distinct items or billions.

To bring enterprise-grade tooling, superior database performance, and highly optimized data structures to your Java applications, learn more about Redisson PRO today.