Concurrent series-1-high concurrent Redis cache penetration solution and principle analysis

Redis cache penetration solution and principle analysis under high parallel delivery

1. Why use redis as cache

Objective: to reduce the pressure of database

redis features:

Storage size

Data type k-v

Persistence, expiration

Sharing data between distributed services


2. Cache penetration

A cache exists in the database

Application -- > there is a direct query cache in the cache

B. there is no data inconsistency in the cache database (cache is not updated and key is expired)

Application - > cache does not exist, database exists - > read database (lock, only one thread is allowed to query in high concurrency scenario) - > write cache

The C cache does not exist, and it does not exist in the database

The application failed to check redis, failed to check the database, and returned null # it has been querying this condition, resulting in cache penetration
It is necessary to avoid continuous query of data that does not exist in the cache and database (update the empty data to the cache and add the key expiration time to avoid data synchronization)

Attack: how to solve the problem of constantly querying data that does not exist in different caches and databases?

Extended interview questions:

How to quickly judge the existence of an element in massive data (such as 1 billion disordered, indefinite length and non repetition)?

Directly put 1 billion data in memory # no matter what kind of storage type you put, you need several gigabytes of memory

The simplest data structure 0 1 (bit, graph) is actually ordered data

How to store data with 01

Mapping 1 Fixed length output 2 uniformity

Hash algorithm bit operation mapping


(the location of Mic and Tom data algorithms is the same, resulting in Hash collision:)

How to reduce the probability of Hash collision?

  1. Expand the capacity of bit group (disadvantage: memory consumption)

  2. Increase the number of hash functions, such as adding filter conditions (disadvantages: reduce the efficiency of accessing hash calculation)

How to neutralize the problem of 1.2 - >

1970 Blum published the Blum filter

Principle of Bloom filter

Essence (1. Digit group (binary vector) 2 A series of random mapping functions)

In the above figure, d is 1 in all three positions. Are you sure d exists? No, there will still be hash collision (wrong judgment of Bloom filter - false positive)

In the figure above, e has a 0 in three positions, so e must not exist


If the bloom filter determines that the element exists in the collection - > it does not necessarily exist

If the bloom filter determines that the element does not exist - > it must not exist


Therefore, if the element actually exists, the bloom filter must judge the existence

If the element does not actually exist, the bloom filter may judge that it exists


Based on the above principles, we can solve the problem of cache penetration (if the bloom filter determines that the element does not exist, it must not exist)

Guavo - Google encapsulates the toolkit of string, collection, cache, current limiting and QR code
It's called Hutool in China

There's a Blomn Filter in Guavo
Use: a. introduce guava dependency

		<!-- -->

b. create a filter and test the misjudgment rate of the filter

public class BloomFilterDemo {

    private static final int insertions = 1000000;

    public static void main(String[] args) {

        //Initialize a bloom filter that stores String data. The initialization size is 100w

        //The default misjudgment rate is 0.03

        BloomFilter<String> bf = BloomFilter.create(


        //It is used to store all the actual keys and judge whether the keys exist

        Set<String> sets = new HashSet<>(insertions);

        //It is used to store all actual key s and can be taken out for use

        List<String> lists = new ArrayList<>(insertions);

        //Initialize 100w random and unique strings to three containers

        for (int i = 0; i < insertions ; i++) {

            String uuid = UUID.randomUUID().toString();





        //Number of correct judgments

        int right = 0;

        //Number of misjudgments

        int wrong = 0;

        for (int i = 0; i < 10000; i++) {

            //When it can be divided by 100, take an existing number, otherwise randomly generate a uuid

            // Between 0-10000, there are 100 numbers that can be divided by 100
            String data = i%100 == 0 ? lists.get(i/100):UUID.randomUUID().toString();

            //Mightcontainer (possible)

            if (bf.mightContain(data)) {

                if (sets.contains(data)) {

                    //Hit when judging existence

                    right ++;



                //Error when judging that it does not exist

                wrong ++;



        //Calculate the correct hit rate of existing judgment

        NumberFormat percentFormat = NumberFormat.getPercentInstance();

        //Maximum number of decimal places


        float percent = (float)wrong/9900;

        float bingo = (float)(9900-wrong)/9900;

        System.out.println("At 100 w Among the elements, 100 actually existing elements are judged, and the bloom filter thinks they exist"+right);

        System.out.println("At 100 w Among the elements, 9900 elements that do not actually exist are judged, and the bloom filter mistakenly believes that they exist"+wrong+",Hit rate:"+percentFormat.format(bingo)+",Misjudgment rate:"+percentFormat.format(percent));



The status of Bloom filter. The data in the database is loaded into bloom filter before accessing redis
Load all the data in the database into the bloom filter, query the bloom filter before accessing the redis cache, and return it directly if it does not exist, so as to avoid cache penetration

Why is there no deletion method in the bloom filter because of hash collision





Posted by sepodati on Sat, 16 Apr 2022 08:26:42 +0930