Word cloud drawing, three Python packages plus an online website are recommended!

Word cloud is an important way of text visualization, which can highlight key sentences and words in large text,

This article first introduces several Python libraries for making word clouds, namely WordCloud, StyleCloud and pyechards; Add an online word cloud production website; Finally, we make a simple comparison between them through code operation and visualization effect

WordCloud, StyleCloud and Pyecharts all have one feature: only a few lines of code can draw a beautiful picture of word cloud, but the amount of parameters to be set is large;

WordCloud

WordCloud is the most frequently used library in Python for making word cloud images. It is simple to use and easy to operate; The shape of word cloud mask can be customized; The two libraries introduced later are based on it for secondary development

WordCloud encapsulates all methods into WordCloud classes. When using, you only need to change some parameters to adjust the style of word cloud graph

Take a simple round word cloud as an example,

First build a word frequency dictionary with collections, and then use generate in WordCloud()_ from_ Frequencies () method to fit the incoming text

As for the shape of word cloud, the following code generates a circular binary array through numpy as the mask parameter;

from wordcloud import WordCloud
from collections import Counter


word_list = []
with open("danmu.txt",encoding='utf-8') as f:
    words = f.read()
    for word in words.split('\n'):
        if re.findall('[\u4e00-\u9fa5]+', str(word), re.S):  # Regular expressions match Chinese characters
            word_list.append(word)


def SquareWord(word_list):
    counter = Counter(word_list) # Calculate word frequency;
    start = random.randint(0, 15) # Randomly take the middle number of 0-15;
    result_dict = dict(counter.most_common()[start:]) # Take the first start elements in counter;

    x,y = np.ogrid[:300,:300] # Create a 0-300 two-dimensional array;
    mask = (x-150)**2 + (y-150)**2>130**2 #Create a Mask with a radius of 130 and a center of 150;
    mask = 255*mask.astype(int) # Convert to int

    wc = WordCloud(background_color='black',
                   mask = mask,
                   mode = 'RGB',
                   font_path="D:/Data/fonts/HGXK_CNKI.ttf",  # Set font path, which is used to set Chinese,
                   ).generate_from_frequencies(result_dict)

    plt.axis("off")
    plt.imshow(wc,interpolation="bilinear")
    plt.show()
    
    
SquareWord(word_list)# Main function of drawing word cloud

The effect is as follows:

The most prominent point of WordCloud compared with the other two Python libraries: * * you can customize the mask * *. Pass in a numpy array through the mask parameter to set the shape of the word cloud

But it should be noted that the text is only filled with value= 255 area, ignore the area with Value ==255, so if the alternative image as mask does not meet this condition, the image needs to be preprocessed as follows, and the background is filled with pure white pixels

Custom mask word cloud drawing

def AliceWord(word_list):
    counter = Counter(word_list)  # Calculate word frequency;
    start = random.randint(0, 15)  # Randomly take the middle number of 0-15;
    result_dict = dict(counter.most_common()[start:])  # Take the first start elements in counter;

    # x, y = np.ogrid[:300, :300]  # Create a 0-300 two-dimensional array;
    # mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2  # Create a Mask with a radius of 130 and a center of 150;
    # mask = 255 * mask.astype(int)  # Convert to int

    # Read picture as Mask
    alic_coloring = np.array(Image.open("D:/Data/WordArt/Alice_mask.png"))
    wc = WordCloud(background_color = "white",# Set background color
                   mode ="RGB",
                   mask=alic_coloring,# When it is None, a binary image is automatically created, with a length of 400 and a width of 200;
                   min_font_size=4,#  The minimum frequency limit of words used;
                   relative_scaling= 0.8,# Correlation between word frequency and size
                   font_path="D:/Data/fonts/HGXK_CNKI.ttf",  # Font path, used to set Chinese,
                   ).generate_from_frequencies(result_dict)

    wc.to_file("D:/Data/WordArt/wordclound.jpg")# Save the generated word cloud
    plt.axis("off")
    plt.imshow(wc, interpolation="bilinear")
    plt.show()

Visualization effect

Finally, here are the main parameter settings in WordCloud:

  • background_ Color (type - > STR), color name or color code, set the background color of the word cloud image
  • font_ Path (type - > STR), a user-defined font path. If Chinese text is previewed, this parameter must be set, otherwise garbled code will occur;
  • Mask (type - > ndarray), customize the shape of word cloud, and ignore the pure white area when drawing;
  • Mode (type - > STR). When it is set to 'RGBA', the background is transparent, and the default is' RGB ';
  • relative_ Scaling (type - > float), correlation between vocabulary frequency and final vocabulary display size, value 0 - 1; The larger the value, the stronger the correlation. The default value is 0.5;
  • prefer_ Horizontal (type - > float), which controls the proportion of horizontal text relative to the disposal display text. The smaller the size, the more vertical text will be displayed in the word cloud;

In addition to the above parameters, you can also set the color, disable words, whether there are duplicate words and other information

For details, please refer to the official documents

https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html#wordcloud.WordCloud

StyleCloud

StyleCloud is developed based on WordCloud, and some new features are added to WordCloud

  • 1. Support color gradient;
  • 2. The color of word cloud can be set through the designed palette;
  • 3. It supports icons as a mask. This new feature is the best, and can be directly connected to the Font Awesome website when setting. There are a variety of icons in it
  • 4. In addition to text text as vocabulary input, it also supports the input of csv and txt file formats;

The main program only needs one line of code

def Style_WordArt():
    # StyleClound draws word cloud
    stylecloud.gen_stylecloud(
        file_path = "danmu.txt",#Word cloud text
        background_color='white',#background color 
        palette="colorbrewer.qualitative.Dark2_7",#Color palette to change the text color of word cloud
        icon_name='fas fa-cat',# Word cloud icon;
        font_path= "D:/Data/fonts/HGXK_CNKI.ttf",# Chinese font path
        random_state=40,#Control the random state of text color;
        invert_mask= False,# Whether the final Mask is reversed;
        output_name="D:/Data/WordArt/styleclound.jpg",# Picture saving path
    )

The effect is as follows:

To modify the mask, you only need to change the icon_ The name parameter is enough. You can refer to the Font Awesome website, https://fontawesome.com/icons?d=gallery&m=free , thousands of patterns can be used

icon_ The name can be set as the class label of the target icon, as shown below

When Icon_ When name = 'Fas FA dog'

When Icon_ When name = 'Fab FA Amazon':

For word cloud color palette settings, you can modify the palette parameter. For palette settings, please refer to the Palettable website: https://jiffyclub.github.io/palettable/ , there are a variety of palette style templates to choose from

Among them, there are many sub modules in each module above, which is the final palette to be set

When setting, you can select the sub template of any template instead of the previous palettable String; For example, I want to set palettale colorbrewser. qualitative. Dark2_ 3 as a color palette, you only need to put palettle = 'colorbrew qualitative. Dark2_ 3 'is enough

Setting different color palettes will eventually have different style effects!

paletabble ='colorbrewer.qualitative.Paired_10'

paletabble ='lightbartlein.diverging.BlueDarkOrange12_11'

For the usage of other parameters of Stylecloud, please refer to the official documents https://github.com/minimaxir/stylecloud

Pyecharts

Pyecharts is developed based on Apache Echarts and is mainly used for data visualization; The word cloud chart is only one of many chart types. Compared with the first two word cloud packages, the visualization effect of pyecharts is weaker

However, Pyecharts saves the word cloud image as a single html file, and finally presents it, which has a certain interactive effect

Code part

from pyecharts.charts import WordCloud
import pyecharts.options as opts


word_list = []
with open("danmu.txt",encoding='utf-8') as f:
    words = f.read()
    for word in words.split('\n'):
        if re.findall('[\u4e00-\u9fa5]+', str(word), re.S):  # Regular expressions match Chinese characters
            word_list.append(word)
            
def Pyecharts_wordArt(word_list):
    counter = Counter(word_list)  # Calculate word frequency;
    start = random.randint(0, 15)  # Randomly take the middle number of 0-15;
    result_dict = list(counter.most_common()[start:])  # Take the first start elements in counter;
    print(result_dict[5:])

    Charts = WordCloud().add(series_name="Pyecharts", data_pair=result_dict, word_size_range=[6, 66]).set_global_opts(
        title_opts=opts.TitleOpts(
            title="Pyecharts", title_textstyle_opts=opts.TextStyleOpts(font_size=23)),
        tooltip_opts=opts.TooltipOpts(is_show=True),
    )
    Charts.render("Pyecharts_Wordclound.html")


Pyecharts_wordArt(word_list)

It should be noted that the text entered by pyechards needs to be of list type, and exists in the form of array every word and its occurrence frequency. The format is as follows:

summary

On the basis of these three kinds of word cloud images, here is another word cloud online production website, WordArt COM, the final visualization effect is better than the above three, and the style adjustment is also very convenient, simple and intuitive. If the number of word cloud pictures is not very large, it is suggested to draw with the help of this website

To compare these tools, I have sorted them from the following perspectives

Visualization effect

WordArt > Stylecloud > WordCloud > Pyecharts

Interaction effect

WordArt > Pyecharts > StyleCloud = WordCloud

Automation efficiency

Pyecharts = StyleCloud = WordCloud > WordArt

Ease of use

WrodArt > StyleCloud > Pyecharts > WordCloud

As for the final choice as the final word cloud rendering tool, you need to choose it in combination with your own situation and use scenarios, but you should have a brief understanding of either tool in advance

Well, that's all the content of this article. Finally, thank you for reading. See you next time!

Posted by stephanie on Tue, 19 Apr 2022 07:52:18 +0930