preface
The text and pictures of this article come from the Internet, only for learning and communication, and do not have any commercial purpose. If you have any questions, please contact us in time for handling.
The following article comes from the Zen of python data analysis
Author: little dull bird
Python crawler, data analysis, website development and other case tutorial videos can be viewed online for free
https://space.bilibili.com/523606542

The film was not released in the cinema, or chose to premiere in Tencent video (it needs to be estimated). The current broadcast volume is more than 90 million, nearly 100 million. If it is broadcast in a single round, it will be the first place
Today, I will start with the movie bullet screen and analyze with you what is the highlight of this movie?
First of all, I used python to crawl all the bullet screens of the movie. This crawler is relatively simple, so I won't go into detail, but directly go to the code:
import requests import pandas as pd headers = { "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" } url = 'https://mfm.video.qq.com/danmu?otype=json&target_id=6480348612%26vid%3Dh0035b23dyt' # The final parameters that can control the barrage are target_id and timestamp,tiemstamp Request one packet every 30. comids=[] comments=[] opernames=[] upcounts=[] timepoints=[] times=[] n=15 while True: data = { "timestamp":n} response = requests.get(url,headers=headers,params=data,verify=False) res = eval(response.text) #Convert string to list format con = res["comments"] if res['count'] != 0: #Judge the number of barrages and confirm whether the crawling is over n+=30 for j in con: comids.append(j['commentid']) opernames.append(j["opername"]) comments.append(j["content"]) upcounts.append(j["upcount"]) timepoints.append(j["timepoint"]) else: break data=pd.DataFrame({'id':comids,'name':opernames,'comment':comments,'up':upcounts,'pon':timepoints}) data.to_excel('Fortune diary bullet screen.xlsx')
First, read the barrage data with padans
import pandas as pd data=pd.read_excel('Fortune diary bullet screen.xlsx') data

Nearly 40000 bullet screens, with 5 columns of data as "comment id", "nickname", "content", "number of likes" and "bullet screen location"
Segment the movie at 6-minute intervals to see the changes in the number of bullet screens in each time period:
time_list=['{}'.format(int(i/60))for i in list(range(0,8280,360))] pero_list=[] for i in range(len(time_list)-1): pero_list.append('{0}-{1}'.format(time_list[i],time_list[i+1])) counts=[] for i in pero_list: counts.append(int(data[(data.pon>=int(i.split('-')[0])*60)&(data.pon<int(i.split('-')[1])*60)]['pon'].count())) import pyecharts.options as opts from pyecharts.globals import ThemeType from pyecharts.charts import Line line=( Line({"theme": ThemeType.DARK}) .add_xaxis(xaxis_data=pero_list) .add_yaxis("",list(counts),is_smooth=True) .set_global_opts( xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15),name="Film duration"), title_opts=opts.TitleOpts(title="Changes in the number of barrages at different times"), yaxis_opts=opts.AxisOpts(name="Number of barrages"), ) ) line.render_notebook()

From the change of the number of bullet screens, there are two peaks in 60 minutes and 120 minutes respectively, indicating that the film has at least two climaxes
To satisfy our curiosity, let's analyze what we were talking about in the first 6 minutes (free of charge) and the first 2
1. Look at what people were saying in the first six minutes:
#Word cloud code import jieba #Word cutting import wordcloud #participle from wordcloud import WordCloud,ImageColorGenerator,STOPWORDS #Word cloud, color generator, stop from pyecharts.charts import WordCloud from pyecharts.globals import SymbolType from pyecharts import options as opts def ciyun(content): segment = [] segs = jieba.cut(content) # use jieba participle for seg in segs: if len(seg) > 1 and seg != '\r\n': segment.append(seg) # De stop word(Text denoising) words_df = pd.DataFrame({'segment': segment}) words_df.head() stopwords = pd.read_csv("stopword.txt", index_col=False, quoting=3, sep='\t', names=['stopword'], encoding="utf8") words_df = words_df[~words_df.segment.isin(stopwords.stopword)] words_stat = words_df.groupby('segment').agg(count=pd.NamedAgg(column='segment', aggfunc='size')) words_stat = words_stat.reset_index().sort_values(by="count", ascending=False) return words_stat data_6_text=''.join(data[(data.pon>=0)&(data.pon<360)]['comment'].values.tolist()) words_stat=ciyun(data_6_text) from pyecharts import options as opts from pyecharts.charts import WordCloud from pyecharts.globals import SymbolType words = [(i,j) for i,j in zip(words_stat['segment'].values.tolist(),words_stat['count'].values.tolist())] c = ( WordCloud() .add("", words, word_size_range=[20, 100], shape=SymbolType.DIAMOND) .set_global_opts(title_opts=opts.TitleOpts(title="{}".format('First 6 minutes'))) ) c.render_notebook()

"Xiaobao" ranks first, and words such as "good-looking" and "support" appear. It seems that Xiaobao is still very popular
2. The first climax:
data_60_text=''.join(data[(data.pon>=54*60)&(data.pon<3600)]['comment'].values.tolist()) words_stat=ciyun(data_60_text) from pyecharts import options as opts from pyecharts.charts import WordCloud from pyecharts.globals import SymbolType words = [(i,j) for i,j in zip(words_stat['segment'].values.tolist(),words_stat['count'].values.tolist())] c = ( WordCloud() .add("", words, word_size_range=[20, 100], shape=SymbolType.DIAMOND) .set_global_opts(title_opts=opts.TitleOpts(title="{}".format('First climax'))) ) c.render_notebook()

The top ones are "Xiaobao", "second brother", "hahaha", "good-looking" and so on, indicating that something funny must have happened between Xiaobao and second brother
3. The second climax:
data_60_text=''.join(data[(data.pon>=120*60)&(data.pon<128*60)]['comment'].values.tolist()) words_stat=ciyun(data_60_text) from pyecharts import options as opts from pyecharts.charts import WordCloud from pyecharts.globals import SymbolType words = [(i,j) for i,j in zip(words_stat['segment'].values.tolist(),words_stat['count'].values.tolist())] c = ( WordCloud() .add("", words, word_size_range=[20, 100], shape=SymbolType.DIAMOND) .set_global_opts(title_opts=opts.TitleOpts(title="{}".format('The second climax'))) ) c.render_notebook()

In the high-frequency words, the words "good-looking", "tears" and "crying" are found, indicating that the end of the film is very touching
Then we'll dig out the people with the most barrages and see what they're talking about. Because some barrages don't have nicknames, we need to kick them first:
data1=data[data['name'].notna()] data2=pd.DataFrame({'num':data1.value_counts(subset="name")}) #Count the number of occurrences data3=data2.reset_index() data3[data3.num>100] #Find people with more than 100 barrages data_text='' for i in data3['name'].values.tolist(): data_text+=''.join(data[data.name==i]['comment'].values.tolist()) words_stat=ciyun(data_text) from pyecharts import options as opts from pyecharts.charts import WordCloud from pyecharts.globals import SymbolType words = [(i,j) for i,j in zip(words_stat['segment'].values.tolist(),words_stat['count'].values.tolist())] c = ( WordCloud() .add("", words, word_size_range=[20, 100], shape=SymbolType.DIAMOND) .set_global_opts(title_opts=opts.TitleOpts(title="{}".format('Fan barrage'))) ) c.render_notebook()

This evaluation is really powerful!
It seems that the high box office is not in vain. There are both laughter and tears, which shows that Xiaobao's debut is very successful!