Python analyzes song Xiaobao's debut "fortune diary" to see this 100 million broadcast film

preface

The text and pictures of this article come from the Internet, only for learning and communication, and do not have any commercial purpose. If you have any questions, please contact us in time for handling.

The following article comes from the Zen of python data analysis

Author: little dull bird

Python crawler, data analysis, website development and other case tutorial videos can be viewed online for free

https://space.bilibili.com/523606542

 

 

The film was not released in the cinema, or chose to premiere in Tencent video (it needs to be estimated). The current broadcast volume is more than 90 million, nearly 100 million. If it is broadcast in a single round, it will be the first place

Today, I will start with the movie bullet screen and analyze with you what is the highlight of this movie?

First of all, I used python to crawl all the bullet screens of the movie. This crawler is relatively simple, so I won't go into detail, but directly go to the code:

import requests
import pandas as pd
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
}
url = 'https://mfm.video.qq.com/danmu?otype=json&target_id=6480348612%26vid%3Dh0035b23dyt' 
# The final parameters that can control the barrage are target_id and timestamp,tiemstamp Request one packet every 30.
comids=[]
comments=[]
opernames=[]
upcounts=[]
timepoints=[]
times=[]
n=15
while True:
    data = {
            "timestamp":n}
    response = requests.get(url,headers=headers,params=data,verify=False)
    res = eval(response.text) #Convert string to list format
    con = res["comments"]
    if res['count'] != 0: #Judge the number of barrages and confirm whether the crawling is over
        n+=30
        for j in con:
            comids.append(j['commentid'])
            opernames.append(j["opername"])
            comments.append(j["content"])
            upcounts.append(j["upcount"])
            timepoints.append(j["timepoint"])
    else:
        break
data=pd.DataFrame({'id':comids,'name':opernames,'comment':comments,'up':upcounts,'pon':timepoints})
data.to_excel('Fortune diary bullet screen.xlsx')

 

First, read the barrage data with padans

import pandas as pd
data=pd.read_excel('Fortune diary bullet screen.xlsx')
data

 

 

Nearly 40000 bullet screens, with 5 columns of data as "comment id", "nickname", "content", "number of likes" and "bullet screen location"

Segment the movie at 6-minute intervals to see the changes in the number of bullet screens in each time period:

time_list=['{}'.format(int(i/60))for i in list(range(0,8280,360))]
pero_list=[]
for i in range(len(time_list)-1):
    pero_list.append('{0}-{1}'.format(time_list[i],time_list[i+1]))
counts=[]
for i in pero_list:
    counts.append(int(data[(data.pon>=int(i.split('-')[0])*60)&(data.pon<int(i.split('-')[1])*60)]['pon'].count()))
import pyecharts.options as opts
from pyecharts.globals import ThemeType
from pyecharts.charts import Line
line=(
    Line({"theme": ThemeType.DARK})
    .add_xaxis(xaxis_data=pero_list)
    .add_yaxis("",list(counts),is_smooth=True)
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15),name="Film duration"),
        title_opts=opts.TitleOpts(title="Changes in the number of barrages at different times"),
        yaxis_opts=opts.AxisOpts(name="Number of barrages"),
    )
)
line.render_notebook()

 

 

From the change of the number of bullet screens, there are two peaks in 60 minutes and 120 minutes respectively, indicating that the film has at least two climaxes

To satisfy our curiosity, let's analyze what we were talking about in the first 6 minutes (free of charge) and the first 2

1. Look at what people were saying in the first six minutes:

#Word cloud code
import jieba #Word cutting
import wordcloud #participle
from wordcloud import WordCloud,ImageColorGenerator,STOPWORDS #Word cloud, color generator, stop
from pyecharts.charts import WordCloud
from pyecharts.globals import SymbolType
from pyecharts import options as opts
def ciyun(content):
    segment = []
    segs = jieba.cut(content)   # use jieba participle
    for seg in segs:
        if len(seg) > 1 and seg != '\r\n':
            segment.append(seg)
    # De stop word(Text denoising)
    words_df = pd.DataFrame({'segment': segment})
    words_df.head()
    stopwords = pd.read_csv("stopword.txt", index_col=False,
                                quoting=3, sep='\t', names=['stopword'], encoding="utf8")
    words_df = words_df[~words_df.segment.isin(stopwords.stopword)]
    words_stat = words_df.groupby('segment').agg(count=pd.NamedAgg(column='segment', aggfunc='size'))
    words_stat = words_stat.reset_index().sort_values(by="count", ascending=False)
    return words_stat
data_6_text=''.join(data[(data.pon>=0)&(data.pon<360)]['comment'].values.tolist())
words_stat=ciyun(data_6_text)
from pyecharts import options as opts
from pyecharts.charts import WordCloud
from pyecharts.globals import SymbolType
words = [(i,j) for i,j in zip(words_stat['segment'].values.tolist(),words_stat['count'].values.tolist())]
c = (
    WordCloud()
    .add("", words, word_size_range=[20, 100], shape=SymbolType.DIAMOND)
    .set_global_opts(title_opts=opts.TitleOpts(title="{}".format('First 6 minutes')))
)
c.render_notebook()

 

 

"Xiaobao" ranks first, and words such as "good-looking" and "support" appear. It seems that Xiaobao is still very popular

2. The first climax:

data_60_text=''.join(data[(data.pon>=54*60)&(data.pon<3600)]['comment'].values.tolist())
words_stat=ciyun(data_60_text)
from pyecharts import options as opts
from pyecharts.charts import WordCloud
from pyecharts.globals import SymbolType
words = [(i,j) for i,j in zip(words_stat['segment'].values.tolist(),words_stat['count'].values.tolist())]
c = (
    WordCloud()
    .add("", words, word_size_range=[20, 100], shape=SymbolType.DIAMOND)
    .set_global_opts(title_opts=opts.TitleOpts(title="{}".format('First climax')))
)
c.render_notebook()

 

 

The top ones are "Xiaobao", "second brother", "hahaha", "good-looking" and so on, indicating that something funny must have happened between Xiaobao and second brother

3. The second climax:

data_60_text=''.join(data[(data.pon>=120*60)&(data.pon<128*60)]['comment'].values.tolist())
words_stat=ciyun(data_60_text)
from pyecharts import options as opts
from pyecharts.charts import WordCloud
from pyecharts.globals import SymbolType
words = [(i,j) for i,j in zip(words_stat['segment'].values.tolist(),words_stat['count'].values.tolist())]
c = (
    WordCloud()
    .add("", words, word_size_range=[20, 100], shape=SymbolType.DIAMOND)
    .set_global_opts(title_opts=opts.TitleOpts(title="{}".format('The second climax')))
)
c.render_notebook()

 

 

In the high-frequency words, the words "good-looking", "tears" and "crying" are found, indicating that the end of the film is very touching

Then we'll dig out the people with the most barrages and see what they're talking about. Because some barrages don't have nicknames, we need to kick them first:

data1=data[data['name'].notna()]
data2=pd.DataFrame({'num':data1.value_counts(subset="name")}) #Count the number of occurrences
data3=data2.reset_index()
data3[data3.num>100]     #Find people with more than 100 barrages
data_text=''
for i in data3['name'].values.tolist():
    data_text+=''.join(data[data.name==i]['comment'].values.tolist())
words_stat=ciyun(data_text)
from pyecharts import options as opts
from pyecharts.charts import WordCloud
from pyecharts.globals import SymbolType
words = [(i,j) for i,j in zip(words_stat['segment'].values.tolist(),words_stat['count'].values.tolist())]
c = (
    WordCloud()
    .add("", words, word_size_range=[20, 100], shape=SymbolType.DIAMOND)
    .set_global_opts(title_opts=opts.TitleOpts(title="{}".format('Fan barrage')))
)
c.render_notebook()

 

 

This evaluation is really powerful!

It seems that the high box office is not in vain. There are both laughter and tears, which shows that Xiaobao's debut is very successful!

 

 

Posted by shyonne2004 on Sun, 17 Apr 2022 08:35:16 +0930