Redis cache penetration solution and principle analysis under high parallel delivery
1. Why use redis as cache
Objective: to reduce the pressure of database
redis features:
Storage size
Data type k-v
Persistence, expiration
Sharing data between distributed services
2. Cache penetration
A cache exists in the database
Application -- > there is a direct query cache in the cache
B. there is no data inconsistency in the cache database (cache is not updated and key is expired)
Application - > cache does not exist, database exists - > read database (lock, only one thread is allowed to query in high concurrency scenario) - > write cache
The C cache does not exist, and it does not exist in the database
The application failed to check redis, failed to check the database, and returned null # it has been querying this condition, resulting in cache penetration
It is necessary to avoid continuous query of data that does not exist in the cache and database (update the empty data to the cache and add the key expiration time to avoid data synchronization)
Attack: how to solve the problem of constantly querying data that does not exist in different caches and databases?
Extended interview questions:
How to quickly judge the existence of an element in massive data (such as 1 billion disordered, indefinite length and non repetition)?
Directly put 1 billion data in memory # no matter what kind of storage type you put, you need several gigabytes of memory
The simplest data structure 0 1 (bit, graph) is actually ordered data
How to store data with 01
Mapping 1 Fixed length output 2 uniformity
Hash algorithm bit operation mapping
(the location of Mic and Tom data algorithms is the same, resulting in Hash collision:)
How to reduce the probability of Hash collision?
1. Expand the capacity of bit group (disadvantage: memory consumption)
2. Increase the number of hash functions, such as adding filter conditions (disadvantages: reduce the efficiency of accessing hash calculation)
How to neutralize the problem of 1.2 - >
1970 Blum published the Blum filter
Principle of Bloom filter
Essence (1. Digit group (binary vector) 2 A series of random mapping functions)
In the above figure, d is 1 in all three positions. Are you sure d exists? No, there will still be hash collision (wrong judgment of Bloom filter - false positive)
In the figure above, e has a 0 in three positions, so e must not exist
If the bloom filter determines that the element exists in the collection - > it does not necessarily exist
If the bloom filter determines that the element does not exist - > it must not exist
Therefore, if the element actually exists, the bloom filter must judge the existence
If the element does not actually exist, the bloom filter may judge that it exists
Based on the above principles, we can solve the problem of cache penetration (if the bloom filter determines that the element does not exist, it must not exist)
Guavo - Google encapsulates the toolkit of string, collection, cache, current limiting and QR code
It's called Hutool in China
There's a Blomn Filter in Guavo
Use: a. introduce guava dependency
<!-- https://mvnrepository.com/artifact/com.google.guava/guava --> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>30.1-jre</version> </dependency>
b. create a filter and test the misjudgment rate of the filter
public class BloomFilterDemo { private static final int insertions = 1000000; public static void main(String[] args) { //Initialize a bloom filter that stores String data. The initialization size is 100w //The default misjudgment rate is 0.03 BloomFilter<String> bf = BloomFilter.create( Funnels.stringFunnel(Charsets.UTF_8),insertions,0.0001); //It is used to store all the actual keys and judge whether the keys exist Set<String> sets = new HashSet<>(insertions); //It is used to store all actual key s and can be taken out for use List<String> lists = new ArrayList<>(insertions); //Initialize 100w random and unique strings to three containers for (int i = 0; i < insertions ; i++) { String uuid = UUID.randomUUID().toString(); bf.put(uuid); sets.add(uuid); lists.add(uuid); } //Number of correct judgments int right = 0; //Number of misjudgments int wrong = 0; for (int i = 0; i < 10000; i++) { //When it can be divided by 100, take an existing number, otherwise randomly generate a uuid // Between 0-10000, there are 100 numbers that can be divided by 100 String data = i%100 == 0 ? lists.get(i/100):UUID.randomUUID().toString(); //Mightcontainer (possible) if (bf.mightContain(data)) { if (sets.contains(data)) { //Hit when judging existence right ++; continue; } //Error when judging that it does not exist wrong ++; } } //Calculate the correct hit rate of existing judgment NumberFormat percentFormat = NumberFormat.getPercentInstance(); //Maximum number of decimal places percentFormat.setMaximumFractionDigits(2); float percent = (float)wrong/9900; float bingo = (float)(9900-wrong)/9900; System.out.println("At 100 w Among the elements, 100 actually existing elements are judged, and the bloom filter thinks they exist"+right); System.out.println("At 100 w Among the elements, 9900 elements that do not actually exist are judged, and the bloom filter mistakenly believes that they exist"+wrong+",Hit rate:"+percentFormat.format(bingo)+",Misjudgment rate:"+percentFormat.format(percent)); }
The status of Bloom filter. The data in the database is loaded into bloom filter before accessing redis
Load all the data in the database into the bloom filter, query the bloom filter before accessing the redis cache, and return it directly if it does not exist, so as to avoid cache penetration
Why is there no deletion method in the bloom filter because of hash collision