Skip to content

Cuckoo filter

Cuckoo filter

Java implementation of Valkey or Redis based RCuckooFilter object is a cuckoo filter. A cuckoo filter is a probabilistic data structure for fast set membership testing, similar to a Bloom filter but with support for element deletion and counting. Covers CF.* commands of the Redis Bloom module. This object is thread-safe.

Initialization

The filter must be initialized before use. Simple initialization requires only a capacity value. Advanced initialization allows tuning of bucket size, max iterations, and expansion rate through CuckooFilterInitArgs.

Code examples:

RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// simple initialization with capacity only
filter.init(100000);

// advanced initialization with detailed parameters
filter.init(CuckooFilterInitArgs.capacity(100000)
                .bucketSize(4)
                .maxIterations(500)
                .expansion(2));
RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// simple initialization with capacity only
RFuture<Void> future = filter.initAsync(100000);

// advanced initialization with detailed parameters
RFuture<Void> advFuture = filter.initAsync(CuckooFilterInitArgs.capacity(100000)
                                    .bucketSize(4)
                                    .maxIterations(500)
                                    .expansion(2));
RedissonReactiveClient redisson = redissonClient.reactive();
RCuckooFilterReactive<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// simple initialization with capacity only
Mono<Void> mono = filter.init(100000);

// advanced initialization with detailed parameters
Mono<Void> advMono = filter.init(CuckooFilterInitArgs.capacity(100000)
                                    .bucketSize(4)
                                    .maxIterations(500)
                                    .expansion(2));
RedissonRxClient redisson = redissonClient.rxJava();
RCuckooFilterRx<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// simple initialization with capacity only
Completable rx = filter.init(100000);

// advanced initialization with detailed parameters
Completable advRx = filter.init(CuckooFilterInitArgs.capacity(100000)
                                    .bucketSize(4)
                                    .maxIterations(500)
                                    .expansion(2));

Adding elements

Elements can be added individually or in bulk. The add() method allows adding the same element multiple times. The addIfAbsent() method adds an element only if it does not already exist in the filter. Bulk operations accept CuckooFilterAddArgs with optional capacity and noCreate parameters.

Code examples:

RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// add a single element (allows duplicates)
boolean added = filter.add("element1");

// add element only if it does not already exist
boolean addedNew = filter.addIfAbsent("element2");

// bulk add with optional parameters
Set<String> addedItems = filter.add(
    CuckooFilterAddArgs.<String>items(List.of("a", "b", "c"))
            .capacity(50000)
            .noCreate());

// bulk add only absent elements
Set<String> newItems = filter.addIfAbsent(
    CuckooFilterAddArgs.<String>items(List.of("d", "e", "f"))
            .capacity(50000));
RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// add a single element (allows duplicates)
RFuture<Boolean> addFuture = filter.addAsync("element1");

// add element only if it does not already exist
RFuture<Boolean> addNxFuture = filter.addIfAbsentAsync("element2");

// bulk add with optional parameters
RFuture<Set<String>> bulkFuture = filter.addAsync(
    CuckooFilterAddArgs.<String>items(List.of("a", "b", "c"))
            .capacity(50000)
            .noCreate());

// bulk add only absent elements
RFuture<Set<String>> bulkNxFuture = filter.addIfAbsentAsync(
    CuckooFilterAddArgs.<String>items(List.of("d", "e", "f"))
            .capacity(50000));
RedissonReactiveClient redisson = redissonClient.reactive();
RCuckooFilterReactive<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// add a single element (allows duplicates)
Mono<Boolean> addMono = filter.add("element1");

// add element only if it does not already exist
Mono<Boolean> addNxMono = filter.addIfAbsent("element2");

// bulk add with optional parameters
Mono<Set<String>> bulkMono = filter.add(
    CuckooFilterAddArgs.<String>items(List.of("a", "b", "c"))
            .capacity(50000)
            .noCreate());

// bulk add only absent elements
Mono<Set<String>> bulkNxMono = filter.addIfAbsent(
    CuckooFilterAddArgs.<String>items(List.of("d", "e", "f"))
            .capacity(50000));
RedissonRxClient redisson = redissonClient.rxJava();
RCuckooFilterRx<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// add a single element (allows duplicates)
Single<Boolean> addRx = filter.add("element1");

// add element only if it does not already exist
Single<Boolean> addNxRx = filter.addIfAbsent("element2");

// bulk add with optional parameters
Single<Set<String>> bulkRx = filter.add(
    CuckooFilterAddArgs.<String>items(List.of("a", "b", "c"))
            .capacity(50000)
            .noCreate());

// bulk add only absent elements
Single<Set<String>> bulkNxRx = filter.addIfAbsent(
    CuckooFilterAddArgs.<String>items(List.of("d", "e", "f"))
            .capacity(50000));

Checking element existence and counting occurrences

The exists() method checks if an element may exist in the filter. A return value of false guarantees the element is not present. A return value of true means the element may exist (false positives are possible). Multiple elements can be checked at once with exists(Collection). The count() method returns the approximate number of times an element has been added to the filter.

Code examples:

RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// check single element
boolean mayExist = filter.exists("element1");

// check multiple elements at once
Set<String> existing = filter.exists(List.of("a", "b", "c", "d"));

// get approximate count of times an element was added
long approxCount = filter.count("element1");
RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// check single element
RFuture<Boolean> existsFuture = filter.existsAsync("element1");

// check multiple elements at once
RFuture<Set<String>> mExistsFuture = filter.existsAsync(List.of("a", "b", "c", "d"));

// get approximate count of times an element was added
RFuture<Long> countFuture = filter.countAsync("element1");
RedissonReactiveClient redisson = redissonClient.reactive();
RCuckooFilterReactive<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// check single element
Mono<Boolean> existsMono = filter.exists("element1");

// check multiple elements at once
Mono<Set<String>> mExistsMono = filter.exists(List.of("a", "b", "c", "d"));

// get approximate count of times an element was added
Mono<Long> countMono = filter.count("element1");
RedissonRxClient redisson = redissonClient.rxJava();
RCuckooFilterRx<String> filter = redisson.getCuckooFilter("myCuckooFilter");

// check single element
Single<Boolean> existsRx = filter.exists("element1");

// check multiple elements at once
Single<Set<String>> mExistsRx = filter.exists(List.of("a", "b", "c", "d"));

// get approximate count of times an element was added
Single<Long> countRx = filter.count("element1");

Removing elements

Unlike Bloom filters, cuckoo filters support element deletion. The remove() method deletes an element from the filter and returns true if it was found and removed.

Note: Deleting an element that was never added to the filter may cause false negatives for other elements.

Code examples:

RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

boolean removed = filter.remove("element1");
RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

RFuture<Boolean> removeFuture = filter.removeAsync("element1");
RedissonReactiveClient redisson = redissonClient.reactive();
RCuckooFilterReactive<String> filter = redisson.getCuckooFilter("myCuckooFilter");

Mono<Boolean> removeMono = filter.remove("element1");
RedissonRxClient redisson = redissonClient.rxJava();
RCuckooFilterRx<String> filter = redisson.getCuckooFilter("myCuckooFilter");

Single<Boolean> removeRx = filter.remove("element1");

Filter information

The getInfo() method returns a CuckooFilterInfo object containing filter statistics: memory size in bytes, number of buckets, number of sub-filters, number of inserted items, number of deleted items, bucket size, expansion rate, and maximum iterations.

Code examples:

RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

CuckooFilterInfo info = filter.getInfo();
info.getSize();                 // memory size in bytes
info.getNumberOfBuckets();      // number of buckets
info.getNumberOfFilters();      // number of sub-filters
info.getNumberOfInsertedItems();// number of inserted items
info.getNumberOfDeletedItems(); // number of deleted items
info.getBucketSize();           // items per bucket
info.getExpansionRate();        // expansion rate
info.getMaxIterations();        // max swap attempts
RCuckooFilter<String> filter = redisson.getCuckooFilter("myCuckooFilter");

RFuture<CuckooFilterInfo> infoFuture = filter.getInfoAsync();
RedissonReactiveClient redisson = redissonClient.reactive();
RCuckooFilterReactive<String> filter = redisson.getCuckooFilter("myCuckooFilter");

Mono<CuckooFilterInfo> infoMono = filter.getInfo();
RedissonRxClient redisson = redissonClient.rxJava();
RCuckooFilterRx<String> filter = redisson.getCuckooFilter("myCuckooFilter");

Single<CuckooFilterInfo> infoRx = filter.getInfo();

Use Cases

Cuckoo filters provide fast, memory-efficient set membership testing where occasional false positives are acceptable but false negatives are not. Beyond what a Bloom filter offers, they support deletion and approximate counting, which makes them a fit for membership sets that change over time and for soft frequency limits.

Deduplication of Processed Items

Idempotent message handling, web-crawler URL frontiers, and "process each item once" pipelines need a compact record of what has already been seen. addIfAbsent both records and tests in a single call - it returns true only the first time an element is seen - so duplicates are skipped without a per-id lookup in a backing store. Because membership has no false negatives, a "seen" verdict is safe to act on, and unlike a Bloom filter an id can later be removed once it is safe to reprocess.

RCuckooFilter<String> seen = redisson.getCuckooFilter("processed-events");
seen.init(1_000_000);

// addIfAbsent returns true only the first time this id is seen
if (seen.addIfAbsent(eventId)) {
    process(event);          // first occurrence - handle it
}
// else: duplicate, already processed - skip

// allow the id to be processed again once the window has passed
seen.remove(eventId);
RCuckooFilter<String> seen = redisson.getCuckooFilter("processed-events");

// true only on first sight of the id
RFuture<Boolean> firstSeen = seen.addIfAbsentAsync(eventId);
firstSeen.whenComplete((isNew, exception) -> {
    if (isNew) {
        process(event);
    }
});

// allow reprocessing later
RFuture<Boolean> removed = seen.removeAsync(eventId);
RedissonReactiveClient redisson = redissonClient.reactive();
RCuckooFilterReactive<String> seen = redisson.getCuckooFilter("processed-events");

// true only on first sight of the id
Mono<Boolean> firstSeen = seen.addIfAbsent(eventId);

// allow reprocessing later
Mono<Boolean> removed = seen.remove(eventId);
RedissonRxClient redisson = redissonClient.rxJava();
RCuckooFilterRx<String> seen = redisson.getCuckooFilter("processed-events");

// true only on first sight of the id
Single<Boolean> firstSeen = seen.addIfAbsent(eventId);

// allow reprocessing later
Single<Boolean> removed = seen.remove(eventId);

Mutable Allow/Deny Lists

Revoked-token denylists, blocked IPs, and ban lists are membership sets that change over time. The cuckoo filter's defining advantage over a Bloom filter is deletion: when a token is reinstated or a ban is lifted, its entry is removed with remove instead of forcing a full rebuild. exists serves as a cheap gate - a false result is authoritative (the value was never added, so no backend call is needed), while a true result is a possible match that can be confirmed against the source of truth.

RCuckooFilter<String> revoked = redisson.getCuckooFilter("revoked-tokens");
revoked.init(500_000);

// on revocation
revoked.add(tokenId);

// on validation - false is authoritative: the token was never revoked
if (revoked.exists(tokenId)) {
    // possible match (or rare false positive) - confirm against the store
} else {
    // definitely not revoked - accept without a backend lookup
}

// reinstating a token removes it - not possible with a Bloom filter
revoked.remove(tokenId);
RCuckooFilter<String> revoked = redisson.getCuckooFilter("revoked-tokens");

RFuture<Boolean> addFuture = revoked.addAsync(tokenId);

// false is authoritative - the token was never revoked
RFuture<Boolean> mayBeRevoked = revoked.existsAsync(tokenId);

// reinstate
RFuture<Boolean> removed = revoked.removeAsync(tokenId);
RedissonReactiveClient redisson = redissonClient.reactive();
RCuckooFilterReactive<String> revoked = redisson.getCuckooFilter("revoked-tokens");

Mono<Boolean> added = revoked.add(tokenId);

// false is authoritative - the token was never revoked
Mono<Boolean> mayBeRevoked = revoked.exists(tokenId);

// reinstate
Mono<Boolean> removed = revoked.remove(tokenId);
RedissonRxClient redisson = redissonClient.rxJava();
RCuckooFilterRx<String> revoked = redisson.getCuckooFilter("revoked-tokens");

Single<Boolean> added = revoked.add(tokenId);

// false is authoritative - the token was never revoked
Single<Boolean> mayBeRevoked = revoked.exists(tokenId);

// reinstate
Single<Boolean> removed = revoked.remove(tokenId);

Frequency Capping with Approximate Counting

Soft limits - ad impressions per user, retries per key, showing a tip at most a few times - need an approximate occurrence count rather than exact accounting. Because add permits duplicates and count returns the approximate number of times an element was added, each occurrence is a single add and each cap check is a single count, with no separate per-key counter to maintain. Counts may slightly over-report, which suits soft caps well.

RCuckooFilter<String> impressions = redisson.getCuckooFilter("ad-impressions");
impressions.init(2_000_000);

String key = userId + ":" + campaignId;

// stop before exceeding the cap (count is approximate)
if (impressions.count(key) < 3) {
    showAd(campaignId);
    impressions.add(key);   // record this impression (duplicates allowed)
}
RCuckooFilter<String> impressions = redisson.getCuckooFilter("ad-impressions");
String key = userId + ":" + campaignId;

// approximate number of prior impressions
RFuture<Long> seen = impressions.countAsync(key);
seen.whenComplete((times, exception) -> {
    if (times < 3) {
        showAd(campaignId);
        impressions.addAsync(key);
    }
});
RedissonReactiveClient redisson = redissonClient.reactive();
RCuckooFilterReactive<String> impressions = redisson.getCuckooFilter("ad-impressions");
String key = userId + ":" + campaignId;

// approximate number of prior impressions
Mono<Long> seen = impressions.count(key);

// record an impression (duplicates allowed)
Mono<Boolean> recorded = impressions.add(key);
RedissonRxClient redisson = redissonClient.rxJava();
RCuckooFilterRx<String> impressions = redisson.getCuckooFilter("ad-impressions");
String key = userId + ":" + campaignId;

// approximate number of prior impressions
Single<Long> seen = impressions.count(key);

// record an impression (duplicates allowed)
Single<Boolean> recorded = impressions.add(key);