Design a caching strategy to dynamically cache hot data

๐Ÿ‘จ‍๐Ÿ’ปPersonal homepage: Muzi with little talent and little learning ๐Ÿ™‡‍โ™‚๏ธ I am also in the learning stage. If you find any problems, please let me know. Thank you very much ๐Ÿ™‡‍โ™‚๏ธ ๐Ÿ“’ This article is from the column: Solutions for Common Scenarios ๐ŸŒˆ Word of the Day: Efforts may not necessarily lead to success, but without hard work, you will definitely not succeed ๐ŸŒˆ โค๏ธ Support me: ๐Ÿ‘Like ๐ŸŒนFavorite ๐ŸคŸFollow

Written in the front, because our recent large-scale homework project needs to use the function of hotspot ranking, because we want to use Elasticsearch to store data, and then the initial idea is to implement this function of hotspot ranking in ES, but after careful thinking, in our It is a very stupid way to use ES for hotspot ranking in this project, because we are only a small ranking, so in the end we still use Redis to achieve hotspot ranking

Use LRU?

LRU is a common algorithm. If we set the hotspot data of TOP10, then we can specify the LRU capacity as 10. When the capacity is not full, we can directly put it in. When it is full, we will exclude the last one. Then introduce the latest one in the first

This seems to have achieved the ranking of hotspots but not. For example, data No. 2 is accessed 100 times and data No. 11 is only accessed once. Then use LRU to exclude 100 visits. This is unreasonable, so we should use each The access frequency of each data is used to select the ranking

How to rank the visit rate

Add all the data to the memory, and then record the frequency of each data access. This seems very simple, and it can be achieved by using zset, but what if your data has 100w? If you store all of them in Redis like this, it will lead to the emergence of large key s and reduce the efficiency of Redis. So can you start a separate server to save the data of the leaderboard? This is actually a waste, because generally our ranking list is TOP10~TOP100, which basically does not take up much memory. In our project, our data volume is relatively small, and there is an upload time. Generally, the closer the upload time is, the more It is easy to get into TOP10, and all we need is TOP10, so there are two options

The first method: select the last 10 pieces of data uploaded in the database, and if someone accesses these 10 pieces of data, then the access frequency of the corresponding data will be increased by one, and if it is not in the 10 pieces of data, don’t care about it, and then pass After a period of time, remove the last few pieces of data with low access frequency, and then randomly select a few if TOP10, and then loop The second type: the first type still has a little flaw, that is, it is possible that the TOP10 is the most accessed at the beginning, and then the real TOP10 may be squeezed out, so in the second type of solution, we cache 20 pieces of data, every other Time removes the 5-10 items with the lowest access frequency, and then randomly selects them to add to 20 items, but we only take the top 10 items, and the others are similar to the first option, except that more data is cached

code writing

After understanding the idea, code writing is the easiest step. How to introduce Redis into the project and operate the dependency configuration of Redis will not be repeated, because that has nothing to do with the code writing logic

Select the last 20 data

    public void getCur2MySQL(){
        Set<ZSetOperations.TypedTuple<String>> set = new HashSet<>();
        List<Integer> cur2Ids = baseMapper.getCur2Ids();
        cur2Ids.stream().forEach(e->{
            DefaultTypedTuple<String> tuple = new DefaultTypedTuple<String>(String.valueOf(e),0d);
            set.add(tuple);
        });
        redisTemplate.opsForZSet().removeRange(Constant.POLICY_TOP_10,0,-1);
        redisTemplate.opsForZSet().add(Constant.POLICY_TOP_10,set);
    }
copy

Add one if there is a visit

Add one method here, you can use Lua script, interested big guys can go to optimize

	//access
    public PolicyEntity getByPolicyById(Integer id) {
        addVisited(id);
        Object o = redisTemplate.opsForHash().get(Constant.POLICY_HASH_OBJECT, Integer.toString(id));
        PolicyEntity policy = JSON.parseObject((String) o, PolicyEntity.class);
        return policy;
    }

	//plus one
    public void addVisited(Integer id){
        if(redisTemplate.opsForZSet().score(Constant.POLICY_TOP_10, String.valueOf(id)) != null){
            redisTemplate.opsForZSet().incrementScore(Constant.POLICY_TOP_10,String.valueOf(id),1d);
        }
    }
copy

Get Top10

    public List<PolicyEntity> getTop10() {
        Set<String> ids = redisTemplate.opsForZSet().reverseRange(Constant.POLICY_TOP_10, 0, 9);
        List<Object> list = new ArrayList<>();
        ids.stream().forEach(list::add);
        List<Object> multiGet = redisTemplate.opsForHash().multiGet(Constant.POLICY_HASH_OBJECT, list);
        List<PolicyEntity> res = new ArrayList<>();
        multiGet.stream().forEach((d)->{
           PolicyEntity policyEntity = JSON.parseObject((String) d, PolicyEntity.class);
           res.add(policyEntity);
       });
        return res;
    }
copy

The next step is to write the code to realize the timing task. I use Quartz to write the timing task. There are still other ways to realize the timing task. If you are interested, you can try it.

write task

Delete the last five and randomly pick five from the database to add to it

@Component
public class TopTenQuartzJob extends QuartzJobBean {

    @Autowired
    StringRedisTemplate redisTemplate;
    @Autowired
    PolicyService policyService;
    @Override
    protected void executeInternal(JobExecutionContext jobExecutionContext) throws JobExecutionException {
        redisTemplate.opsForZSet().removeRange(Constant.POLICY_TOP_10,0,4);
        Set<String> ids = redisTemplate.opsForZSet().range(Constant.POLICY_TOP_10, 0, -1);
        List<Integer> list = ids.stream().map(e -> Integer.valueOf(e)).collect(Collectors.toList());
        List<Integer> id = policyService.listIdsAndNotIn(list);
        Set<ZSetOperations.TypedTuple<String>> set = new HashSet<>();
        id.stream().forEach(e->{
            DefaultTypedTuple<String> tuple = new DefaultTypedTuple<String>(String.valueOf(e),0d);
            set.add(tuple);
        });
        redisTemplate.opsForZSet().add(Constant.POLICY_TOP_10,set);
        System.out.println("deleted");
    }
}
copy

The xml file in it is

    <select id="listIdsAndNotIn" resultType="integer">
        select id from p_policy where id not in
        <foreach collection="list" open="(" close=")" separator="," item="id">
            #{id}
        </foreach>
        order by RAND() limit 5
    </select>
copy

Write Trigger and JobDetail

@Configuration
public class QuarztConfig {
    @Value("${quartz.policy.top10.cron}")
    private String cron;

    @Bean
    public JobDetail topTenQuartzJobDetail(){
        JobDetail jobDetail = JobBuilder.newJob(TopTenQuartzJob.class)
                .storeDurably()
                .build();
        return jobDetail;
    }
    @Bean
    public Trigger topTenQuartzTrigger(){
        CronScheduleBuilder schedule = CronScheduleBuilder.cronSchedule(cron);
        CronTrigger trigger = TriggerBuilder.newTrigger()
                .forJob(topTenQuartzJobDetail())
                .withSchedule(schedule)
                .build();
        return trigger;
    }
}
copy

There are still problems in implementing the leaderboard in this way. If there is a data that is frequently accessed but has not been randomly entered into Redis, then it will not be able to enter the leaderboard, but it is enough for the realization of my project, because in In my project, we save the latest files and there are relatively few files. Generally, the files that are on the hot list are usually newly released, so the impact on this project is relatively small.

If there are other bigwigs who have better designed leaderboards, I hope you can leave a message in the comment area or private message, thank you very much! ! !

Tags: Linux Database ElasticSearch Cache

Posted by L0j1k on Mon, 28 Nov 2022 22:37:08 +1030