Word cloud is an important way of text visualization, which can highlight key sentences and words in large text,
This article first introduces several Python libraries for making word clouds, namely WordCloud, StyleCloud and pyechards; Add an online word cloud production website; Finally, we make a simple comparison between them through code operation and visualization effect
WordCloud, StyleCloud and Pyecharts all have one feature: only a few lines of code can draw a beautiful picture of word cloud, but the amount of parameters to be set is large;
WordCloud
WordCloud is the most frequently used library in Python for making word cloud images. It is simple to use and easy to operate; The shape of word cloud mask can be customized; The two libraries introduced later are based on it for secondary development
WordCloud encapsulates all methods into WordCloud classes. When using, you only need to change some parameters to adjust the style of word cloud graph
Take a simple round word cloud as an example,
First build a word frequency dictionary with collections, and then use generate in WordCloud()_ from_ Frequencies () method to fit the incoming text
As for the shape of word cloud, the following code generates a circular binary array through numpy as the mask parameter;
from wordcloud import WordCloud from collections import Counter word_list = [] with open("danmu.txt",encoding='utf-8') as f: words = f.read() for word in words.split('\n'): if re.findall('[\u4e00-\u9fa5]+', str(word), re.S): # Regular expressions match Chinese characters word_list.append(word) def SquareWord(word_list): counter = Counter(word_list) # Calculate word frequency; start = random.randint(0, 15) # Randomly take the middle number of 0-15; result_dict = dict(counter.most_common()[start:]) # Take the first start elements in counter; x,y = np.ogrid[:300,:300] # Create a 0-300 two-dimensional array; mask = (x-150)**2 + (y-150)**2>130**2 #Create a Mask with a radius of 130 and a center of 150; mask = 255*mask.astype(int) # Convert to int wc = WordCloud(background_color='black', mask = mask, mode = 'RGB', font_path="D:/Data/fonts/HGXK_CNKI.ttf", # Set font path, which is used to set Chinese, ).generate_from_frequencies(result_dict) plt.axis("off") plt.imshow(wc,interpolation="bilinear") plt.show() SquareWord(word_list)# Main function of drawing word cloud
The effect is as follows:
The most prominent point of WordCloud compared with the other two Python libraries: * * you can customize the mask * *. Pass in a numpy array through the mask parameter to set the shape of the word cloud
But it should be noted that the text is only filled with value= 255 area, ignore the area with Value ==255, so if the alternative image as mask does not meet this condition, the image needs to be preprocessed as follows, and the background is filled with pure white pixels
Custom mask word cloud drawing
def AliceWord(word_list): counter = Counter(word_list) # Calculate word frequency; start = random.randint(0, 15) # Randomly take the middle number of 0-15; result_dict = dict(counter.most_common()[start:]) # Take the first start elements in counter; # x, y = np.ogrid[:300, :300] # Create a 0-300 two-dimensional array; # mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2 # Create a Mask with a radius of 130 and a center of 150; # mask = 255 * mask.astype(int) # Convert to int # Read picture as Mask alic_coloring = np.array(Image.open("D:/Data/WordArt/Alice_mask.png")) wc = WordCloud(background_color = "white",# Set background color mode ="RGB", mask=alic_coloring,# When it is None, a binary image is automatically created, with a length of 400 and a width of 200; min_font_size=4,# The minimum frequency limit of words used; relative_scaling= 0.8,# Correlation between word frequency and size font_path="D:/Data/fonts/HGXK_CNKI.ttf", # Font path, used to set Chinese, ).generate_from_frequencies(result_dict) wc.to_file("D:/Data/WordArt/wordclound.jpg")# Save the generated word cloud plt.axis("off") plt.imshow(wc, interpolation="bilinear") plt.show()
Visualization effect
Finally, here are the main parameter settings in WordCloud:
- background_ Color (type - > STR), color name or color code, set the background color of the word cloud image
- font_ Path (type - > STR), a user-defined font path. If Chinese text is previewed, this parameter must be set, otherwise garbled code will occur;
- Mask (type - > ndarray), customize the shape of word cloud, and ignore the pure white area when drawing;
- Mode (type - > STR). When it is set to 'RGBA', the background is transparent, and the default is' RGB ';
- relative_ Scaling (type - > float), correlation between vocabulary frequency and final vocabulary display size, value 0 - 1; The larger the value, the stronger the correlation. The default value is 0.5;
- prefer_ Horizontal (type - > float), which controls the proportion of horizontal text relative to the disposal display text. The smaller the size, the more vertical text will be displayed in the word cloud;
In addition to the above parameters, you can also set the color, disable words, whether there are duplicate words and other information
For details, please refer to the official documents
https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html#wordcloud.WordCloud
StyleCloud
StyleCloud is developed based on WordCloud, and some new features are added to WordCloud
- 1. Support color gradient;
- 2. The color of word cloud can be set through the designed palette;
- 3. It supports icons as a mask. This new feature is the best, and can be directly connected to the Font Awesome website when setting. There are a variety of icons in it
- 4. In addition to text text as vocabulary input, it also supports the input of csv and txt file formats;
The main program only needs one line of code
def Style_WordArt(): # StyleClound draws word cloud stylecloud.gen_stylecloud( file_path = "danmu.txt",#Word cloud text background_color='white',#background color palette="colorbrewer.qualitative.Dark2_7",#Color palette to change the text color of word cloud icon_name='fas fa-cat',# Word cloud icon; font_path= "D:/Data/fonts/HGXK_CNKI.ttf",# Chinese font path random_state=40,#Control the random state of text color; invert_mask= False,# Whether the final Mask is reversed; output_name="D:/Data/WordArt/styleclound.jpg",# Picture saving path )
The effect is as follows:
To modify the mask, you only need to change the icon_ The name parameter is enough. You can refer to the Font Awesome website, https://fontawesome.com/icons?d=gallery&m=free , thousands of patterns can be used
icon_ The name can be set as the class label of the target icon, as shown below
When Icon_ When name = 'Fas FA dog'
When Icon_ When name = 'Fab FA Amazon':
For word cloud color palette settings, you can modify the palette parameter. For palette settings, please refer to the Palettable website: https://jiffyclub.github.io/palettable/ , there are a variety of palette style templates to choose from
Among them, there are many sub modules in each module above, which is the final palette to be set
When setting, you can select the sub template of any template instead of the previous palettable String; For example, I want to set palettale colorbrewser. qualitative. Dark2_ 3 as a color palette, you only need to put palettle = 'colorbrew qualitative. Dark2_ 3 'is enough
Setting different color palettes will eventually have different style effects!
paletabble ='colorbrewer.qualitative.Paired_10'
paletabble ='lightbartlein.diverging.BlueDarkOrange12_11'
For the usage of other parameters of Stylecloud, please refer to the official documents https://github.com/minimaxir/stylecloud
Pyecharts
Pyecharts is developed based on Apache Echarts and is mainly used for data visualization; The word cloud chart is only one of many chart types. Compared with the first two word cloud packages, the visualization effect of pyecharts is weaker
However, Pyecharts saves the word cloud image as a single html file, and finally presents it, which has a certain interactive effect
Code part
from pyecharts.charts import WordCloud import pyecharts.options as opts word_list = [] with open("danmu.txt",encoding='utf-8') as f: words = f.read() for word in words.split('\n'): if re.findall('[\u4e00-\u9fa5]+', str(word), re.S): # Regular expressions match Chinese characters word_list.append(word) def Pyecharts_wordArt(word_list): counter = Counter(word_list) # Calculate word frequency; start = random.randint(0, 15) # Randomly take the middle number of 0-15; result_dict = list(counter.most_common()[start:]) # Take the first start elements in counter; print(result_dict[5:]) Charts = WordCloud().add(series_name="Pyecharts", data_pair=result_dict, word_size_range=[6, 66]).set_global_opts( title_opts=opts.TitleOpts( title="Pyecharts", title_textstyle_opts=opts.TextStyleOpts(font_size=23)), tooltip_opts=opts.TooltipOpts(is_show=True), ) Charts.render("Pyecharts_Wordclound.html") Pyecharts_wordArt(word_list)
It should be noted that the text entered by pyechards needs to be of list type, and exists in the form of array every word and its occurrence frequency. The format is as follows:
summary
On the basis of these three kinds of word cloud images, here is another word cloud online production website, WordArt COM, the final visualization effect is better than the above three, and the style adjustment is also very convenient, simple and intuitive. If the number of word cloud pictures is not very large, it is suggested to draw with the help of this website
To compare these tools, I have sorted them from the following perspectives
Visualization effect
WordArt > Stylecloud > WordCloud > Pyecharts
Interaction effect
WordArt > Pyecharts > StyleCloud = WordCloud
Automation efficiency
Pyecharts = StyleCloud = WordCloud > WordArt
Ease of use
WrodArt > StyleCloud > Pyecharts > WordCloud
As for the final choice as the final word cloud rendering tool, you need to choose it in combination with your own situation and use scenarios, but you should have a brief understanding of either tool in advance
Well, that's all the content of this article. Finally, thank you for reading. See you next time!