1, Overview of data analysis
1. What is data
Is sound data?
Audio devices can collect sound and convert it into audio digital signals, which can support many interesting applications, such as iphone of siri,Sogou's voice input and wechat's voice translation show that in this era where sound can be recorded, sound is not only a kind of data, but also a kind of data with strong characteristics of the times
Is the image data?
With the maturity of digital imaging technology, all images can be recorded with high resolution. On this basis, people can do further analysis and modeling to support many interesting applications, such as face recognition, fingerprint recognition and license plate number recognition. In this era when images can be recorded, images are a kind of data, And it is a kind of data with strong characteristics of the times First, all that can be recorded is data; Second, due to the progress of recording technology, the definition of data has strong characteristics of the times
data(data)It is the result of facts or observations, the logical induction of objective things, and the raw raw material used to represent objective things.
2. Understanding data
(1) . type of data
Data types appear to divide data into data with different memory sizes, In the process of data operation, we find that numerical value, currency, date and time can carry out arithmetic operations such as addition, subtraction, multiplication and division, so they are collectively referred to as numerical type; Text can only be simply "counted" and cannot be arithmetically operated. It is still called category type.
(2) Presentation form of data
A two-dimensional table composed of the values of different individuals on multiple flags
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-s16YCsTR-1629336461623)(img/data1/1.png)]
Understanding data 1,Field. A field is a characteristic of a thing or phenomenon, such as employee satisfaction, appraisal score, Department, etc. It's called feature 2,record. Records are the specific manifestations of certain characteristics of things or phenomena, such as human resources, sales, market, etc 3,data sheet. The relational structure formed by organizing fields and records is the data table. It can be seen from the concept that records, fields and data tables are an organic whole, and only when they are interdependent can they exist.
2. What is data analysis
Most professional data analysis: The science and art of collecting, processing and sorting data, and using statistics and mining technology to analyze and interpret data! The most concise data analysis: The simple thing is to analyze the data. Describe data characteristics, predict data trends, and display analysis results, Extracting the information you want from a lot of data is data analysis.
Definition of data analysis and data mining
1,Data analysis is a process of detailed research and summary of data in order to extract useful information and form conclusions. It is the product of the combination of mathematics and computer science 2,Data mining refers to the process of searching hidden information from a large amount of data through algorithms 3,The ultimate purpose of data analysis and mining is to analyze a large number of data, refine the information hidden behind the data, and make the data produce new value
3. Steps of data analysis
The process of data analysis mainly includes six relatively independent and interrelated stages, which are introduced below. 1,Propose analysis objectives 2,collecting data 3,data processing 4,Data analysis and modeling 5,Data visualization 6,Put forward suggestions and promote landing
(1) Propose analysis objectives
Only when the purpose is clear, the data analysis will not deviate from the direction, otherwise the results will not only have guiding significance. 1,Status statistics: use statistical knowledge to make data statistics, such as how many departments there are, turnover rate, etc 2,Problem orientation: for example, what are the factors affecting employee turnover, Fluctuation explanation type: when the sales volume suddenly drops one day and the retention of new users suddenly decreases one day, analysts are often required to explain the reasons for the fluctuation Data double disk type: similar to monthly and quarterly reports, it is common in the Internet field app After a certain function has been online for some time, data analysts often need to review the performance of this function to see if there are any problems 3,Predict the future: for example, whether employees will leave Forecast sales, house prices, whether customers will lose, whether customers will default, etc
(2) . data acquisition
(1) Internal data: internal business data, customer data, etc (2) Network data: statistical information of relevant national departments, various information query platforms and various websites (3) Market Research: use scientific methods to systematically collect, record and sort out marketing information and data, analyze the market situation, understand the current market situation and development trend, and provide objective and correct data for market prediction and marketing strategy (4) Internet worm/Third party statistical tools
(3) . data processing
Data consolidation, data cleaning, data conversion and data standardization,data reduction After data processing, the overall data becomes clean and tidy, which can be directly used for the next analysis
(4) , data analysis and modeling
1,Simple analysis method: data analysis thinking mode+Business indicator construction+Descriptive statistics +Exploratory data analysis 2,Deep business logic modeling and analysis: data mining, use algorithms to build models, analyze and complete complex data analysis, focus on mining data value and looking for patterns and laws.
(5) . data visualization
After drawing a conclusion through data analysis, it also needs to be displayed in charts. As the saying goes, "text is not as good as table, and table is not as good as chart",Use charts to show your conclusions more clearly. Available powerbi/tableau/python Data visualization
(6) Put forward suggestions to promote landing
After drawing conclusions based on the analysis objectives, corresponding improvement suggestions shall be put forward according to the conclusions, and the implementation of the suggestions shall be promoted, so as to complete a complete closed-loop data analysis.
2, Data analysis value
1. Definition and composition
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-wL5Ik3DA-1629336461626)(img/6.png)]
2. Business value case
There is a soybean milk shop downstairs. You patronize it every day. The owner of the soybean milk shop is a girl, beautiful and in good shape, so... You decide to chase her when you are single.
However, how can we get her favor? As a data analyst, you want to conduct data analysis through your professional skills, so as to increase her income.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-U55wJNgR-1629336461627)(img/1.png)]
So you worked hard to collect the sales data of soybean milk shop for nearly a month. The data contains several dimensions, such as category of soybean milk, sales quantity of single product, price of single product, sales date (day of week), weather at that time, etc.
The following is an excerpt of the sales data analysis chart you did for the girl:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-i1oslrc3-1629336461628)(img/2.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-7QE91w4I-1629336461628)(img/3.png)]
After research and comparison, you find several laws. What is the law?
- On Monday, the sales volume of soybean milk was the best, which was higher than other days. - Sunny days have the highest sales. - Original soybean milk sells the fastest and is in short supply. Black bean milk can't be sold out every time. Half of it needs to be left. Based on the above analysis results, you put forward several suggestions to the girl: First, at the weekend, focus on the weather on Monday. If it is sunny, the stock volume will be increased by 30%%. Second, reduce by 50%The purchase volume of black soybean milk was increased, and the retail price of original soybean milk was increased by 50 cents.
The girl tried to do it and found that she really increased her income and profit income by increasing her income under the condition of constant expenditure. She was very surprised and agreed to be your girlfriend.
Time flies. Two years later, you two are stable and ready to get married
During these two years, you will continue to collect the sales data of soybean milk shops, constantly adjust the sampling fields according to the needs, and analyze the monthly sales report at the end of each month.
Recently, however, you have found that sales have gradually become a bottleneck.
The reason is that there are breakfast shops in the surrounding streets selling homogeneous products. The price of soybean milk is transparent and the price increase space is very low. Therefore, we can only increase profits by reducing costs.
So you began to sort out and think about the data in the past two years.
Through the research and analysis of industry indicators, you find that: First, my sister buys goods from retailers every week. Her purchase quantity is too small and the purchase frequency is too high. At the same time, due to the unstable market price of raw materials, her raw material cost is very high. Second, if you purchase from raw material manufacturers, as long as you increase the purchase volume, you can greatly reduce the unit price of raw materials. At the same time, because the purchase frequency is reduced, you can hedge the risk caused by price fluctuations.
The sales of soybean milk shops are becoming more and more stable; The demand for raw materials is also relatively fixed; Long storage period of raw materials. Therefore, you can reduce the cost by storing raw materials.
What we need to do now is to study the raw material market, analyze the year-on-year and month on month, and choose to hoard the corresponding food raw materials when the market is the lowest in the same period.
Year on year: compared with the previous year, August this year and August last year
Month on month comparison: comparison between August this year and July this year
After further study of the industry, it is found that the price of beans is the highest in March every year. The price of beans is the lowest in August and September.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Euji8MNx-1629336461629)(img/4.png)]
So you suggest that your fiancee: start hoarding beans in August for the whole year.
The fiancee listened to you and did it.
Sure enough, after you changed the purchase channels, you obtained food raw materials of the same quality at almost one-third of the original price, and the profit margin was easily increased by 150%.
After more than ten years, you have opened many branches by storing raw materials every year, reducing costs, improving profit margins. Lao Wang next door was very jealous and learned from you that you opened similar stores and did the same thing.
With the in-depth understanding of the raw material market and the increasingly clear relationship between supply and demand, your wife began to do futures trading of raw materials such as beans. And you, still unremitting data collection, data analysis.
But recently, you suddenly found a new risk point through the comparison and analysis of many years of data, modeling and detection.
What are the risk points?
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-51JqNumk-1629336461631)(img/5.png)]
You find that about every four years is a cycle. Due to climate and rain, the market of beans hangs upside down. Every four years, the originally expensive beans are cheaper in March, and the originally cheap beans become more expensive in August. This year is the cycle year of upside down risk.
You turn around and look at your wife. She is sitting on the sofa with her mobile phone and is preparing to hoard a lot of raw materials.
Before she placed the order, you urgently stopped all this.
Fortunately, everything is in time
Sure enough, the price of beans in August this year was surprisingly expensive. Lao Wang next door mispredicted the market trend and suffered serious losses because he was not aware of the cyclical risk. We had to maintain our business by raising the price of retail items, and the business was bleak.
By accurately predicting the data, you have avoided the expensive price fluctuation period and successfully avoided the risk.
Then, in December, when the price was relatively low, the raw materials were supplemented. Business has not been greatly affected.
The story is over. Through this small case, we can feel how data analysis affects business behavior. Stories don't matter. It's important to have the thinking of data analysis.
3, Data analysis thinking
1. Core thinking mode
Case analysis: when selling air conditioners, there are many offline sales stores across the country. We found that the sales volume in a certain month decreased by 20% compared with last year%. Now we need to analyze the cause
Time level: analyze whether it suddenly plummeted or gradually decreased.
Regional level: sales data of different regions: is it regional influence?
Market environment: do market research
Customer level: conduct customer interviews
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-o1xnDdto-1629336461632)(img/data1/7.png)]
(1) Structured analytical thinking
According to the goal, sort out and form logical and step-by-step analysis ideas
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-L0X0bTer-1629336461632)(img/data1/8.png)]
Structured thinking comes from McKinsey. Pyramid thinking. Each argument revolves around the goal of the previous problem, disassembles layers by layers, and is independent of each other, which will eventually form a pyramid structure. Think about the problem from a macro perspective, rather than looking for reasons only on a few possible arguments.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-mZlMj6iX-1629336461633)(img/data1/9.png)]
[external link picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-avw5vb0a-1629336461634)(img/data1/10.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-OivKNR0T-1629336461635)(img/data1/11.png)]
Structure is the thinking of analysis, but it is not enough data, and it inevitably has the disadvantage of divergence.
(2) , formulaic analytical thinking
Structuring can help us clarify various arguments of data analysis, but each argument is not necessarily reliable and effective. For example, we can't get these data about the sales volume of competitors for external reasons in the above case. Certain data quantification is required, and many calculation formulas will be used, such as profit=Sales revenue-Cost, profit can be quantified by sales revenue and cost. Structural method+Formula method can assist and quantify business
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-qVS8gDKG-1629336461636)(img/data1/12.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-A8cbv13V-1629336461636)(img/data1/13.png)]
Structured thinking+Formulaic thinking can solve most data analysis problems, but it can not guarantee that the final data analysis conclusion is 100%Correctly, this can only be said to be the objective result of macro analysis, and may not be applicable to specific business scenarios. If you look closely at the various arguments analyzed, sometimes some reasons will be missed due to lack of understanding of the business. At this time, you need a third way of thinking to supplement the missing points. Sometimes it is analyzed for analysis without in-depth understanding of the business, and sometimes the results are not ideal. Good data analysis thinking itself also has business thinking.
(3) Business oriented analytical thinking
Business thinking prevents us from analyzing for analysis and using structured thinking+Most of the time, the final analysis argument obtained through formula disassembly is a phenomenon, and the data is the embodiment of a certain result, but it does not represent the reason
Case 1: how to improve the income by selling corn? How much can we increase the price to get the maximum benefit? profit = Unit Price*Sales, then our strategy is to increase the unit premium or increase sales. 1,Ways to increase unit Premium: Brand building gains a long-term premium, but the defect is that it requires a lot of early marketing investment; Processed commodities occupy more links in the value chain, such as cooked corn, corn juice and corn gluten powder (); Repositioning commodities, such as gift, etc; Price discrimination: different pricing is adopted for different users according to price sensitivity. 2,sales volume=flow x Conversion rate, the above method of increasing unit premium may have an impact on the flow and may also have an impact on the conversion rate. profit = Unit Price x flow x For the conversion rate, what can be adopted on a large scale in the short term should be price discrimination, such as different corn prices at different times and in different business districts, adopting high pricing, and then providing coupons to price sensitive users.
Case 2:How to analyze the loss of users? What is the difference between the loss of new users and the loss of old users? 1)User churn analysis: The problem of user churn, so when you subdivide users here, you can subdivide which stage of the user's life cycle. (user life cycle: introduction, growth, maturity, dormancy, loss.) Indicator: number of lost users = Number of users in this group*Loss rate. Disassemble to see if it is because there are more users at this stage (for example, most users are in recession), or whether the turnover rate of this user group is relatively high Internal and external analysis: a. Internal: it is difficult for novices to get started, unreasonable charges, major problems in products and services, low activity quality, lack of retention means, low user participation b. External: market, competitors, social environment, holidays, etc 2)What is the difference between the loss of new users and the loss of old users: Loss of new users: the reasons may include non target users (rigid loss), products that do not meet the demand (natural loss), products that are difficult to get started (frustrated loss) and competitive products (market loss). New users should consider how to identify lost users with less data support, prevent user loss in advance, and how to recover effective new users. Loss of old users: the reasons may be the decline of user life cycle (natural loss), the difficulty of social evaporation to meet the needs of early users (frustrated loss) and the impact of competitive products (market loss). Old users have more data, which makes it easier to identify lost users. It is more important to prevent user loss. After the loss of users, it is necessary to consider the residual value of the user's life cycle and whether it needs to be recovered.
1, Introduction and installation of excel
1. Definition of Excel
Microsoft Excel is Microsoft For use Windows And [Apple Macintosh]( https://baike.baidu.com/item/Apple A program written by a computer with a Macintosh operating system Spreadsheet Software. Intuitive interface, excellent calculation function and chart tools, coupled with successful marketing, make Excel the most popular personal computer data processing software. After the release of version 5.0 as a component of Microsoft Office in 1993, Excel began to dominate the spreadsheet software on the applicable operating platform.
2. excel installation
link:https://pan. baidu. COM / S / 1awrbzetyikxegdruzozeq extraction code: after mymr copies this content, open Baidu online disk mobile App, which is more convenient to operate
2, Data acquisition
data classification
1,First hand data( Primary data): Also known as raw data. It refers to those directly obtained through personnel interview, inquiry, questionnaire, measurement and other methods, with better timeliness and relevance. 2,Second hand data( Secondary data): Using literature, annual report and database and other previous statistical data. The advantages are low acquisition cost and ready availability. Generally, it can be saved for a long time, which is convenient to generate data trend chart. 3,Second hand data, also known as secondary data, refers to the processed data obtained from peers or some media, such as various data regularly released by the National Bureau of statistics and various data obtained from newspapers and television.
3, Data cleaning
Data cleaning is to process and correct the data with wrong format, correct or delete the wrong data, supplement the missing data, and delete the duplicate and redundant data. Open student case-data processing.xlsx
1. Data consistency processing
For the data collected through statistical investigation, the data format of the same field is often inconsistent, as shown in the figure. This will directly affect the subsequent data analysis, so the data format must be consistent. The data unit of height column is inconsistent The age column is missing
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-8Ms9QYKO-1629336467314)(img/data/1.png)]
(1)Move the mouse pointer to the letter C When the pointer changes to a down arrow, click Select C column (2)Select find and select|"Replace command (3)In the find and Replace dialog box, in find what, enter“ cm",Set "replace with" to blank, and click "replace all" to complete the replacement
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-5FuNwCjm-1629336467315)(img/data/2.png)]
2. Processing of missing data
In the data list, if a cell has a null value, it is considered that the data is missing. There are three methods to deal with missing data: 1,Replace the missing value with the sample mean (or mode, median); 2,Delete records with missing values; 3,Keep the record and delete it temporarily when it is used for analysis (the most common method).
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-f1bo6kW6-1629336467316)(img/data/3.png)]
First of all, to solve how to find missing data, it is obviously unrealistic to search for missing data only by eyes. Generally, we use "positioning conditions" to find cells with missing data. The following demonstration replaces the null values in the age field with "19". (1)Select the age E Columns. (2)Select find and select|"Positioning condition command (3)In the positioning condition dialog box, select the null radio option (4)After clicking OK, E All blank cells in the column are selected (5)Enter the override value "19" and press Ctrl+Enter Key combination confirmation
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-FuM2zfPb-1629336467317)(img/data/4.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-YryYGn7e-1629336467318)(img/data/5.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-NIl40tSN-1629336467319)(img/data/6.png)]
3. Delete duplicate records
The operation of deleting duplicate records is extremely simple. Just click anywhere in the data table and then click data|"Delete duplicates button
3, Data processing
The means of data processing mainly include data transposition, field sorting, field matching, data extraction and data calculation
1. Data transpose
The operation method is to copy the horizontal data first, and then click start when pasting|"Click the triangle arrow under the paste button in the clipboard group, and then click the transpose button
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-HLBHaq9a-1629336467320)(img/data/7.png)]
2. Field breakdown
(1)Select the of the field breakdown worksheet A Column data (2)Click data|"Column button (3)To separate the first word in the field name, select the fixed column width radio option (4)Click the next button and click the mouse on the scale to determine the position of the column (5)Click next to determine the starting cell of the target range F1 (6)Click the finish button
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-d8mv8uGU-1629336467320)(img/data/8.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG kpzcgo7j-1629336467321) (IMG / data / 9. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-v92oPXGe-1629336467322)(img/data/10.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-l1sCigil-1629336467323)(img/data/11.png)]
3. Field matching
Field matching is to match the fields not in the original data list but in other data lists. Example 1, file "data processing".xlsx"The "whole school list" worksheet in is the basic information of all students of grade 2015 in a school, and the "CET-4 list" worksheet is the list of students of grade 2015 who have applied for CET-4 (1)Match the "Bai Sheng Cheng" ID number to the cell. D2. (2)double-click D2 The fill handle is filled to match all the ID number. The result is shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-LWsgwWib-1629336467324)(img/data/12.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG pwtlvqmo-1629336467324) (IMG / data / 13. PNG)]
1.If exact lookup is used, the fourth item must be written with 0, and the default value cannot be used 2.If fuzzy search is used, the recommended data must be sorted from small to large according to the first column to be searched. At this time, all items that exceed a certain item but fail to reach the next item will be matched to this item
The data on the left is the matching criteria, and the student scores on the right are used vlookup What grade are students in function fuzzy matching
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-0k7QVadV-1629336467325)(img/data/37.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-FWMRRAcN-1629336467326)(img/data/38.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-aYdHl6ys-1629336467327)(img/data/39.png)]
4. Data extraction
Data extraction refers to using some information of some fields in the original data list to get a new field. Turn on function extraction.xlsx Common data extraction functions are left(),right(),mid(),year(),month(),day(),weekday().
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-GrriOAPu-1629336467327)(img/data/14.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-2SJiK5iy-1629336467328)(img/data/15.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-MGTmjtJH-1629336467328)(img/data/16.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-F5H60e3v-1629336467329)(img/data/17.png)]
5. Data calculation
According to the data requirements, write functions to complete the functions
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-W0BHMGKv-1629336467330)(img/data/18.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-iesinsnk4-1629336467331) (IMG / data / 19. PNG)]
The number of years calculated in this way is usually a decimal. If you want to get an integer, you can use int Function rounding, i.e. formula“=int(D2/360)",The result is 3, as shown in the figure. because int The function returns an integer no larger than the parameter in parentheses.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Pw6ZJAGm-1629336467332)(img/data/34.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-RTdn4Bs8-1629336467332)(img/data/35.png)]
If you want to round, use the function round. When round(number,digits)When the second parameter of the function is 0, the first parameter can be rounded, so modify the formula to“=round(D2/360,0)"As shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-3ItOr3Sp-1629336467333)(img/data/36.png)]
4, Classroom practice
Complete the following exercises: 1.Open the movie release table in data processing to find the release month 2.Open the data sorting table in the data processing to sort the information such as home address 3.Open the matching province and province code table in the data processing, calculate and fill in the code through the first two digits of ID card information in the matching province table, and match the province according to the given province code 4.Open data processing to complete data calculation 1
1. Exercise 1
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Kdt7EZZp-1629336467334)(img/data/23.png)]
2. Exercise 2
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-2YG1hVA0-1629336467334)(img/data/25.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-CJywrznf-1629336467335)(img/data/26.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-6uRHTyK2-1629336467336)(img/data/27.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-5GPCkYfA-1629336467337)(img/data/28.png)]
3. Exercise 3
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-C8QkOGId-1629336467338)(img/data/29.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-YTdKihzA-1629336467339)(img/data/30.png)]
4. Exercise 4
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-asC3QBcx-1629336467339)(img/data/31.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-eHBjEu0w-1629336467340)(img/data/32.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-anUvnU6Q-1629336467341)(img/data/33.png)]
1, excel visualization
Statistical chart
Statistical chart is a form of expressing statistical data by using geometric figures or specific images. Its characteristics are visual, expressive and easy to understand. Therefore, drawing statistical map is also one of the important contents of statistical data sorting. The statistical chart can show the overall scale, level, structure, comparative relationship, dependence, development trend and distribution, which is more conducive to statistical analysis and research. The following mainly introduces how to use Excel Software to draw statistical charts. Excel Commonly used statistical charts include column chart, bar chart, pie chart, line chart, scatter chart, etc.
1. Column diagram
Column chart is the most commonly used graph to show data relationships. It is used to display the comparison between various data or the data changes over a period of time. In a column chart, categories are usually organized along the horizontal axis and values are organized along the vertical axis. Column charts are generally used to display data in the following cases.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-hGqMsEUd-1629336475896)(img/data3/1.png)]
1,Select data and select column chart in insert chart 2,Double click the graph to set the color fill on the right 3,Add a data tag plus sign, where you can set and adjust other formats 4,Select a data series to color by data points
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-ny9yFgRY-1629336475899)(img/data3/22.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-8gfbbEXP-1629336475900)(img/data3/23.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-RsnnB1wY-1629336475901)(img/data3/24.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-CNCJ8yOB-1629336475901)(img/data3/25.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Uivc9KB7-1629336475902)(img/data3/26.png)]
2. Multi series column chart
Comparison of the same index value of different departments, regions and products Example 2: according to the "statistical chart" in the workbook.xlsx"A multi series column chart as shown in the data drawing diagram of the "column chart 4" worksheet in.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-K1ObFcLy-1629336475903)(img/data3/2.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-xRS7FYTm-1629336475903)(img/data3/3.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG sapwflqm-1629336475904) (IMG / data3 / 4. PNG)]
3. Bar chart
A bar chart is a bar chart that rotates 90 degrees clockwise°The effect of the obtained effect drawing is the same as that of the column drawing. Generally speaking, if the horizontal axis label or data label of column chart is too long, it will affect the readability of data. At this time, it is recommended to use bar chart instead
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-K2FdJXbu-1629336475904)(img/data3/5.png)]
4. Gantt chart
Gantt chart shows the progress of the project through lines or rectangular bars. In the Gantt chart, the horizontal axis represents the time, the vertical axis represents the project, and the start point, end point and length of the line or rectangular bar represent the start time, end time and duration of the project respectively, as shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-zY3gcSWJ-1629336475905)(img/data3/14.png)]
Drawing steps 1,Select three columns: project name, planned start date and completion days to generate a stacked bar chart 2,Set the fill color and border color of the series "start date" to no fill and no wireless bar 3,Sets the fill color and border color for the series days to complete 4,Set the minimum scale of abscissa to 42500 (May 10) and the adjustment type to 3/14 format 5,Set ordinate labels to display in reverse order 6,Set the spacing between bars to no spacing
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-AP4gadNJ-1629336475906)(img/data3/15.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-BvjaKicc-1629336475906)(img/data3/16.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-xg8KHMK6-1629336475907)(img/data3/17.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-yPrAJeel-1629336475908)(img/data3/18.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-QDTs4Afv-1629336475908)(img/data3/19.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-Hv6oKYWM-1629336475909)(img/data3/20.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-m1Y7BfdF-1629336475910)(img/data3/21.png)]
5. Line chart
If you want the chart to reflect the change trend of data, you usually use a broken line chart. The speed index of dynamic sequence is generally displayed in broken line graph. Dynamic series refers to the sequence of index values of the whole at different times, which is also called time series. Example 1: the data shown in the figure is the speed index of a dynamic series. Please draw a broken line chart of month on month growth rate from January to December 2015. The data file is "statistical chart".xlsx"Line chart worksheet in.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-M0S4281p-1629336475910)(img/data3/6.png)]
1,Select January 2015-2015 year-12 Chain growth rate, insert line chart with data mark 2,Set chart area color, insert data label and set font 3,Set the dimension point size and color in the data series 4,Set the axis format label position to low and the line color to red 5,Add gridlines
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-IjyxDFKx-1629336475911)(img/data3/27.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-jwhYIYNc-1629336475911)(img/data3/28.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-nyYvHKAL-1629336475913)(img/data3/30.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-xMnEJ43D-1629336475913)(img/data3/31.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-nO28DJ16-1629336475914)(img/data3/32.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-WMpdR8bv-1629336475915)(img/data3/34.png)]
6. Double coordinate diagram
When the two series of data are very different, the original appearance of the data cannot be well displayed under the same coordinate axis. At this time, the double coordinate diagram should be used. At the same time, a broken line chart of the month on month growth rate and year-on-year growth rate in 2015 is drawn.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-jor8kxUz-1629336475915)(img/data3/7.png)]
1,Select the chain growth rate and year-on-year growth rate in 2015 and insert the line chart 2,Right click to select data and set it to the correct series name 3,Add data labels, gridlines 4,Set the size of marker points in the data series 5,Set the axis format label position to low and the line color to red
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Adj9WsyQ-1629336475916)(img/data3/35.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-GxGokSa2-1629336475917)(img/data3/36.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-GYajYV4g-1629336475917)(img/data3/37.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-D1S8hjbF-1629336475918)(img/data3/38.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-zGiyiH6o-1629336475919)(img/data3/39.png)]
7. Pie chart
A whole is divided into several parts, indicating the proportion of each part, which is generally represented by pie chart. That is, the relative number of structures is generally displayed in a pie chart. 1,Simple pie chart 2,When making a composite pie chart, this situation sometimes occurs: the proportion of some values in the pie chart is relatively small, so it is difficult to see these data in the same pie chart. At this time, the readability of small percentage data can be improved by using the composite bar pie chart
[the transfer of external chain pictures fails. The source station may have an anti-theft chain mechanism. It is recommended to save the pictures and upload them directly (img-cxI2vhfD-1629336475919)(img/data3/9.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-KqykVI3v-1629336475920)(img/data3/10.png)]
8. Stock price chart
Stock price chart, as the name suggests, is used to display the fluctuation of stock price, as shown in the figure. However, such charts can also be used to show other volatility data, such as using stock price charts to show daily or annual temperature fluctuations.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-bm5Ex2h4-1629336475921)(img/data3/11.png)]
9. Radar chart
The radar chart is named because its shape is similar to that of radar, as shown in the figure.
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (IMG kksxagkd-1629336475921) (IMG / data3 / 13. PNG)]
[the external link picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-7utHLSzH-1629336475922)(img/data3/12.png)]
1, Common data analysis thinking skills
1,Quadrant method 2,Hypothetical method 3,28 method 4,Funnel method 5,Contrast method
(1) Quadrant method
The analyzed samples are divided by coordinate system, similar to grouping and classification.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-qW20nnzD-1629336482929)(img/data1/20.png)]
1,Select the importance and urgency, insert the scatter chart, select the chart, right-click to select the data, and set the title clearly 2,Remove gridlines 3,set up x,y Coordinate axis interval, add ordinate and abscissa values, and place the data on the left and below 4,Set the drawing area border and coordinate axis line 5,Add data label and set name
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ezWDG1QD-1629336482931)(img/data1/22.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-hR1jDUdQ-1629336482932)(img/data1/39.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-yDQ4kk13-1629336482933)(img/data1/40.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG bmspfbow-1629336482934) (IMG / data1 / 41. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-jk0ViI9z-1629336482934)(img/data1/42.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-wyj8QU5N-1629336482935)(img/data1/43.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-7n6mRMsw-1629336482936)(img/data1/44.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-TUQr8tSE-1629336482936)(img/data1/45.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ERObL3wQ-1629336482937)(img/data1/46.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-EnYyWYPM-1629336482938)(img/data1/17.png)]
According to the click through rate and conversion rate of each marketing activity, find the corresponding data annotation points, and then classify the effect of this marketing activity into each quadrant. The four quadrants represent different effect evaluation. Quadrant 1: high click and high transformation, high click represents that the marketing creativity has moved the audience, and high transformation represents that the moved audience is the target user of the product; Quadrant 2: high conversion and low click. Similarly, high click represents that the audience moved is the target user of the product, but low click represents that the marketing creativity does not move the user; Quadrant 3: low click and low conversion. This quadrant is the worst marketing activity. There are few clicks on advertising, low conversion of click users, invalid creativity and inaccurate users; Quadrant 4: high click and low conversion. The marketing activities in this quadrant need to add chicken legs to planning and copywriting, but they need to deduct performance from the channel. This quadrant of marketing activities is suspected of the title party to a certain extent.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-VCtJ0fgx-1629336482938)(img/data1/18.png)]
What's the use of quadrant analysis? 1.Find common causes of problems Through the quadrant analysis method, the attribution analysis of events with the same characteristics is carried out, and the common causes are summarized. For example, the events in the first quadrant of the above case can extract effective promotion channels and promotion strategies, and the third and fourth quadrants can exclude some invalid promotion channels; 2.Establish grouping optimization strategy The quadrant analysis method for delivery can establish optimization strategies for different quadrants, such as improving the delivery creativity of quadrant 2 and the delivery channels of quadrant 4. Corresponding optimization strategies can be formulated in different application scenarios
(2) . hypothetical method
Before explaining the hypothesis analysis method, let's do a math problem in Grade 6 of primary school: "Xiao Ming and his mother bought 10 books, which cost exactly 100 yuan. The unit price of books is 8 yuan and 13 yuan. How many books did they buy for 8 yuan and 13 yuan respectively? Problem solving ideas: First of all, assuming that the 10 books were bought for 8 yuan, the total of the 10 books is 80 yuan, and there is an extra 20 yuan. Is it a miscalculation? No, obviously the extra 20 is 13 yuan for a book. 13 yuan books are 5 yuan more than 8 yuan books. You can buy 4 books for 20 yuan. Then you can come to the conclusion that there are 4 books for 13 yuan. How many books for 8 yuan? By the way, six books are great. I reward you with a little red flower.
This sixth grade math problem uses the hypothesis method. Assuming that all books are 8 yuan, what is the hypothesis method in data analysis? Simply understand, the hypothesis method is a data analysis method that assumes a quantitative among multiple variables affecting the results and reversely deduces the process.
How to use hypothesis method in operation analysis?
There are two most common scenarios of hypothetical method in operation analysis:
1. Find the cause of known results and make process variable assumptions;
2. Result oriented planning and result data assumptions.
The real purpose of hypothesis method is to put forward hypotheses for unknown factors and verify the authenticity of hypotheses in data derivation
Scenario 1: find the reason for the known results and make process variable assumptions For example, the number of Posts posted by a content community in November decreased by 20 compared with October%,How to analyze the reason for this result? So what are the factors that affect the number of Posts? We can split the number of posts by user hierarchy, 1,For example, the number of Posts posted by old users and the number of Posts posted by new users can also be split according to the specific number of posts. For example, users who post more than 5 posts can post 3-5 Users of post 1-3 Users of this article, after splitting, compare the data of the same dimension in November and October to find out the variables. 2,For example, it is found after disassembly that post 1-3 The number of users decreased by 40 compared with October%,The number of users of other articles is still higher than that in October, so the problem lies in posting 1-3 Users of this article. 3,So post 1-3 Why has the number of users decreased? We can make two assumptions: 4,Suppose you post 1 in October-3 The users of this article have grown into more active users, resulting in posting 3-5 The number of users increased, 1-3 The number of users of articles decreased; 5,Suppose you post 1 in October-3 The churn rate of users in this article is relatively high, and there are few new users in November, resulting in a small number of users in this group. 6,For these two assumptions, you need to post 1 in October-3 Users and posts in November 3-5 Track and analyze the users with articles and more than 5 articles, and analyze the comparison of retention and activity between new users in November and new users in October.
Scenario 2: the process of finding a known target and making a result hypothesis For example, the sales in November was 8 million, and the sales in December KPI How to make a sales plan for 10 million? This is the most common demand when making work plans. It needs to reach 10 million sales in December KPI For example, split sales KPI There are also two dimensions of related influencing factors: 1.Split from the perspective of commodities To achieve 10 million sales, in order to realize this result assumption, make process plans that can support 2 million sales, such as planning in promotion channel budget, warehousing and logistics, manpower allocation, etc; You can also put forward sales growth assumptions for several products; 2.Split from the perspective of the crowd To achieve 10 million sales, on the one hand, tap the purchasing power of old customers, on the other hand, increase the source channels of new customers. Assuming that the re purchase sales of old users are 2 million, design marketing activities for old users. Summary: hypothesis analysis is one of the commonly used data analysis ideas in practical application. The process of data analysis is the process of constantly putting forward and verifying hypotheses. Usually, we encounter data analysis that we don't know how to start, which can be broken through hypothesis analysis.
(3) . 28 method
The 28 law can also be called Pareto law, which originates from the classic 28 law. For example, in terms of personal wealth, it can be said that there are 20 in the world%People in control of 80%Wealth. In data analysis, it can be understood as 20%The data produced 80%The effect needs to revolve around these 20%Data mining. It is often related to the ranking when using the 28 rule, ranking in the top 20%Is valid data. The 28 method is to focus on key analysis and is applicable to any industry. Find the key points, find their characteristics, and then think about how to make the rest 80%To this 20%Transform and improve the effect.
Pareto Diagram is represented by two coordinate axes, and the common abscissa is the factors affecting quality, which are arranged in order from large to small according to the degree of influence; The left ordinate shows the frequency of each factor with a histogram, and the right ordinate shows the cumulative percentage of each factor with a broken line, as shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Wqvnl6qS-1629336482939)(img/data3/8.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-hjznCDnK-1629336482940)(img/data1/23.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-sBFDYKr5-1629336482941)(img/data1/24.png)]
(4) Funnel method
Funnel method is a funnel chart, a bit like an inverted pyramid. It is a process oriented way of thinking. It is often used in the analysis of changes and certain processes such as new user development and shopping conversion rate.
1,Calculate the placeholder data, which is added by the placeholder data and the actual data, B2 Cell formula==($C$2/2)-(C2/2) 2,Select A2:C5 In the data area, on the [insert] tab, select [stacked bar chart]; 3,Double click the ordinate axis and check the reverse order category in the coordinate axis options 4,Select the legend, network line and coordinate axis respectively, and press DELETE Key delete. Select "no fill" for the bar color fill of "sequence 1" 5,Beautify the chart and set the classification spacing of data series format to "0"%" 6,Insert "flowchart" in [shape]-Manual operation ", find a blank space and draw the shape 7,Copy the shape, select the funnel diagram above, and "paste" the shape, as shown in the figure. Similarly, replace the color in turn and paste it 8,Add "data label" and "set data label", and select "category name, value, display guide line"
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-9aksy0xW-1629336482941)(img/data1/25.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG kclnftrz-1629336482942) (IMG / data1 / 26. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-k84XkRpg-1629336482942)(img/data1/27.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG wmdyogcm-1629336482943) (IMG / data1 / 28. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-oFcQC84S-1629336482944)(img/data1/29.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-WCReOak8-1629336482944)(img/data1/30.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-lej50Nuf-1629336482945)(img/data1/31.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-tWytHdAg-1629336482946)(img/data1/32.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG nbfw9311-1629336482946) (IMG / data1 / 34. PNG)]
This is a classic marketing funnel, which vividly shows a sub link in the whole process from obtaining users to finally transforming into purchase. The conversion rate of adjacent links refers to quantifying the performance of each step with data indicators. Therefore, the whole funnel model first divides the whole purchase process into steps, then measures the performance of each step with the conversion rate, and finally finds out the problematic link through the abnormal data indicators, so as to solve the problem and optimize the step, so as to finally achieve the purpose of improving the overall purchase conversion rate.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-XbguRo1O-1629336482947)(img/data1/38.png)]
(5) . comparison method
Comparative analysis is widely used in various scenarios of life and work, such as whether personal development is progressing or not, whether business ability is improved, whether the sales target of the enterprise can be completed, the gap between the enterprise and its main competitors, etc. the specific analysis criteria include the following three dimensions
1,Comparison of different time periods That is, the indicators at different times are selected as the comparison standard. Compared with the same period of the previous year, it is called year-on-year, mainly considering the impact of seasonal cycle and light and peak seasons. The comparison with the previous period is called month on month comparison. Understand whether the adjacent time period has progressed or regressed, so as to analyze the cause in time.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-nbdyS2IX-1629336482948)(img/data1/35.png)]
2,Compared with different spatial indicators. That is, different spatial index data are selected for comparison in the same time cycle. a: Compare with similar spaces, such as units, departments and regions at the same level, find out the gap or advantages between themselves and departments at the same level, and analyze their own development direction. b: Compare with the advantage space, such as excellent enterprises, benchmarking departments and industry leaders, understand the position of their own development in the industry, what are the shortcomings, and establish development goals
3.Comparison with different planning standards Compare with the planned standard, that is, compare with the planned quantity, quota quantity and target quantity. For example, compare with the annual plan objectives and activity plan objectives,Understand their own development progress and completion rate through comparison, and analyze whether the expectation and strategy of goal completion need to be adjusted.
1, excel data grouping
1. Concept of statistical grouping
Statistical grouping is a statistical method that divides the whole into several different and related components according to the needs of statistical research and certain signs, and calculates the frequency or proportion of each group.
These components are called the "group" of the whole. According to the number of signs in each group, statistical grouping can be divided into monomial grouping and group distance grouping.
1.Monomial grouping A group of variable values is called monomial grouping, which is generally suitable for discrete variables with little change. For example, if the test scores are calculated on a five point system, the scores of all students can be divided into six groups, namely 5, 4, 3, 2, 1 and 0, as shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-2Frj7HhS-1629336492117)(img/34.png)]
2.Group spacing grouping Taking an interval as a group is called group distance grouping, which is generally suitable for occasions with more continuous variables or discrete data. Group distance grouping can be divided into equal distance grouping and unequal distance grouping. For example, if the students' scores are calculated on the percentage system, the scores of all students can be divided into 10 groups by equidistant grouping, as shown in the table; Unequal distance grouping can also be used to divide into 5 groups, as shown in.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-WjN0yh3H-1629336492118)(img/35.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-NEnnQhfJ-1629336492119)(img/36.png)]
2. Use PivotTable to group
PivotTable reports can Excel The data in the database are grouped to establish various forms of cross data list. PivotTable report combines functions such as filtering and subtotal. You can view data in different ways according to different needs.
3. Count the total sales of each commodity
In the analysis of monthly sales record, the total sales of each commodity are counted
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-KTT9QHNE-1629336492120)(img/37.png)]
The main steps for inserting a pivot table are as follows. (1)Click any cell in the data range, and then select insert|"PivotTable command. (2)All data ranges will be automatically selected in the create PivotTable dialog box. The location of the PivotTable is "new worksheet" by default, as shown in the figure. If you don't want to change the position of the PivotTable, just click OK. (3)Drag the grouping flag to row label, column label or report filter (row label is preferred, followed by column label, try not to drag it to report filter), and drag all the flags (fields) to be counted to value, as shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-6uMI52Ts-1629336492121)(img/38.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-t0jJlSfD-1629336492122)(img/39.png)]
If the statistics is a quality flag, the statistics method defaults to count; If the quantity flag is used for statistics, the statistics method is sum by default. The quality mark indicates the characteristics of the overall unit attribute,Its logo is expressed in words;The quantity flag indicates the characteristics of the overall unit quantity,The sign performance is expressed in numerical value,Flag value If you want to modify the statistics method, you can click the lower triangle on the right, select the value field setting command in the pop-up list box, and then modify the statistics method in the value field setting dialog box
4. Count the total number of "traded goods" in each quarter
2015 The annual sales record counts the total "number of traded goods" in each quarter
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-rIt52cor-1629336492122)(img/40.png)]
(1)Click any cell in the data area of the 2015 sales record worksheet, and then select insert|"The PivotTable command opens the create PivotTable dialog box. (2)The data area to analyze is“'2015 Annual sales record'!$A$1:$F$363",The location of the PivotTable is "existing worksheet" (3)Drag "date" to "line label" and "number of goods sold" to "value" (4)Right click any cell under row label in the PivotTable report, and select Create Group from the pop-up shortcut menu (5)In the group dialog box that opens later, select step size as quarter (6)Click OK, and the statistical results are as shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-vH4hsbh3-1629336492123)(img/41.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-svi9ei9l-1629336492124)(img/42.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-6T24DR98-1629336492124)(img/43.png)]
5. Count the number of clicks in different price ranges
Group workbook data.xlsx"The data in the "commodity details" worksheet in are grouped equidistantly by "unit price" (the group distance is 50), and the "hits" of each group are counted.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-y8Ej2fvy-1629336492125)(img/44.png)]
(1)Click any cell in the data range of the item details worksheet, and then select insert|"With the PivotTable command, drag the unit price to the row label and the number of hits to the value (2)Right click any cell under row label in the PivotTable report, and select Create Group from the pop-up shortcut menu (3)Change the step size to 50
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG trpopygf-1629336492126) (IMG / 45. PNG)]
[external link picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-UyUCjRuJ-1629336492126)(img/46.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-UjxwuElG-1629336492128)(img/47.png)]
4. PivotTable class exercises
Open the backpack sales record file in the data group. It is required to use the PivotTable to group. The following contents are required to be completed (1) Group according to the store type, and calculate the total number of stores (non duplicate count) and 30 day sales for the grouped data (2) Display the number of store names corresponding to the year (non duplicate) (3) Calculate the sales volume of each price range in steps of 100
1. Exercise (1)
(1) Group according to the store type, and calculate the total number of stores (non duplicate count) and 30 day sales for the grouped data
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-HNt3ZRR7-1629336492128)(img/48.png)]
There is no de duplication of the store name above. Next, we need to calculate the non duplicate quantity
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-W6XOlduX-1629336492129)(img/49.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-35GBSRKk-1629336492130)(img/50.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Z2WB7WA6-1629336492131)(img/51.png)]
2. Exercise (2)
(2) Display the number of store names corresponding to the year (non duplicate)
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-bGSZ4Drl-1629336492132)(img/52.png)]
3. Exercise (3)
(3) Calculate the sales volume of each price range in steps of 100
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-MM99q8Rh-1629336492133)(img/53.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-6pwuSyyy-1629336492133)(img/54.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-V2OqOh5C-1629336492134)(img/55.png)]
1, excel descriptive statistical analysis
1. Concept of descriptive statistical analysis
Descriptive statistics is mainly used to calculate the total index, average index, median, mode, range, variance, standard deviation, etc. Analyze common indicators with donation data
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ljTNWwYy-1629336492135)(img/57.png)]
2. Total indicators and average indicators
According to the specific content of the overall response, the total index is divided into indicator total and unit total.
(1) Total number of signs: the total number of signs in the whole ∑.
(2) Unit total: the number of individuals contained in the total.
If the collected data are original records one by one and are accurate data, the sum function is used to calculate the total amount of flags, the count function is used to calculate the total amount of units, and the average function is used to calculate the average index.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-7YVlUTUO-1629336492135)(img/56.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Jfczvu3p-1629336492136)(img/58.png)]
[external link picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-4rpDrBp7-1629336492137)(img/59.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG isdnamxv-1629336492138) (IMG / 60. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-pzFs3mrZ-1629336492138)(img/62.png)]
3. Median and mode
Median refers to the data in the middle of the sequence when the flag values of each unit of the population are arranged in size order.
If there are even numbers of data, take the average of the middle two numbers. The median is represented by the letter Me (median)
Mode refers to the data that appears most frequently in the population, represented by the letter Mo (mode).
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-NgFQ8r9C-1629336492139)(img/63.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-HkTmKRHs-1629336492140)(img/64.png)]
4. Range, variance and standard deviation
① range = maximum - Minimum
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-3qNpuOar-1629336492141)(img/analysis.png)]
Variance is a statistical value used to measure the degree of data dispersion. All values of the data set are used for calculation, not just individual extreme values (such as maximum and minimum). Therefore, variance can well reflect the overall degree of data dispersion.
What is the square of variance for?
Question 1: if you want to select one of the two players from a and B to participate in the shooting competition? What plan will you design? S: With a high total score. T: If one player shoots five times, the total score is 30; And the other player shot 10 times with a total score of 50 points. Who would you choose? S: It seems appropriate to average. Question 2: who do you choose? A: 3, 5, 6, 7, 9 B: 4, 5, 6, 7, 8 From the data, it can be seen that it is not advisable to use the average to select. Although the average number of rings is the same, there is still a gap between them. A has a maximum of 9 and a minimum of 3 rings, with a large fluctuation range, while B has a maximum of 8 and a minimum of 4, with a small fluctuation range. Therefore, B is more stable and should be selected. Question 3: can the difference between maximum and minimum be analyzed accurately? A: 3, 5, 6, 7, 9 C: 3, 6, 6, 6, 9 It is not difficult to find that although the gap between the most and the least is the same, C is more stable than a. It is also found that when the average is the same, simply comparing the maximum and minimum data can not explain the overall fluctuation of a group of data, and each data has the right to decide. So how to reflect the fluctuation of a group of data? S: Subtract their average from each data to obtain the deviation of each data; Then add the deviations. After a calculation, it is not difficult to find that the deviation sum of a and C is 0. Two methods are summarized. (1)First find the absolute value of each deviation, and then add it; (2)First find the square of each deviation, and then add it. Question 4: calculate the deviation and sum of the following data. A: 3, 5, 6, 7, 9 B: 4, 5, 6, 7, 8 C: 3, 6, 6, 6, 9 Calculate with method 1: A: 8; B: 6; C: 6 (how to compare the stability of B and C?) Calculate with method 2: A: 20; B: 10; C: 18 (according to this algorithm, the stability of the three is different? The purpose of using square instead of absolute value is to widen the gap between data) Question 5: B works alone (sum of squares of deviation): Phase I: 4, 5, 6, 7, 8 Stage 2: 4, 5, 6, 7, 8, 4, 5, 6, 7, 8 Stage 3: 4, 5, 6, 7, 8, 4, 5, 6, 7, 8, 4, 5, 6, 7, 8 …… From the above calculation, it can be seen that the sum of squares of deviations is easy to reach the conclusion that "the more data, the greater the sum of squares of deviations". In fact, their stability should be the same. It is concluded that only by obtaining the average of "square of deviation" can the fluctuation characteristics be truly described. It is only a rule to use square instead of absolute value to calculate variance, and there should be a reason for the rule
Variance is the mean of the square of the distance the data deviates from the mean. But why is it the average of the square of the distance from the average, not the average of the distance from the average? Intuitively, the latter is easier to understand. Completely correct, so there is the standard deviation. Taking the square root of the other party's difference will get the standard deviation. It also reflects the dispersion of the data, but because it is the same dimension as the original data, it is more in line with our intuition and easier to explain
Why do we need standard deviation with variance?
Variance is the mean of the square of the distance the data deviates from the mean. But why is it the average of the square of the distance from the average, not the average of the distance from the average? Intuitively, the latter is easier to understand. Completely correct, so there is the standard deviation. Taking the square root of the other party's difference will get the standard deviation. It also reflects the dispersion of the data, but because it is the same dimension as the original data, it is more in line with our intuition and easier to explain
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-wNDBZe5S-1629336492142)(img/case1.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-JzPGcH56-1629336492142)(img/68.png)]
[external link image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-vhwavAS7-1629336492143)(img/69.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-MWOeID5x-1629336492144)(img/70.png)]
The dimensions (units) of standard deviation and mean are consistent. When describing a fluctuation range, standard deviation is more convenient than variance.
The variance is standardized. Open root sign, unified unit, name standardization
In short, the standard deviation is a measure of the degree to which a set of values are dispersed from the average. A large standard deviation represents a large difference between most values and their average values; A smaller standard deviation means that these values are closer to the average.
The standard deviation is usually determined relative to the average of the sample data and represents the average distance from the average
These indicators are used to describe the degree of difference and dispersion of data. The larger the index value, the greater the dispersion of the data, that is, the greater the fluctuation range of the data, and the worse the representativeness of the average value; The smaller the index value is, the more stable the data is, the smaller the fluctuation range is, and the better the representativeness of the average value is. It is cumbersome to calculate variance and standard deviation directly from a mathematical point of view. stay Excel In, it is much more convenient to calculate with function, as shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-b1dhGF5Z-1629336492145)(img/65.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Nq8BCC74-1629336492146)(img/66.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ix2Q0R6O-1629336492147)(img/67.png)]
5. Sample variance and population variance
The population variance has finite population and infinite population, and has its own real parameters. This mean is the real true value. When calculating the population variance, it is divided by N. Sample variance is a random part of the population, which is used to estimate the population (the population is generally difficult to know). Many kinds of statistics can be obtained from the sample. The denominator of the population variance is n. The denominator of the sample variance is n-1.
The denominator of sample variance is n-1, It is called "Bessel correction" because we select samples, so the variance calculated from these samples will be less than or equal to the unbiased estimation of the variance of the overall data set. In order to make up for this defect, we put the formula n Change to n-1,So as to improve the value of variance. It is called Bessel correction coefficient.
https://www.zhihu.com/question/20099757
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-b9v57N9Z-1629336492147)(img/76.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-dH4Mm3Cw-1629336492148)(img/77.png)]
6. Descriptive statistical case - airline ticket selling speed
Some customers reported that the ticket office of an airline was too slow. Therefore, the airline collected sample data of the time spent by 100 customers buying tickets (unit: minutes). The results are shown in the table below
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-9hkHMxSP-1629336492149)(img/71.png)]
The airline believes that it is reasonable to handle a ticket business for a customer within five minutes. Does the above data support the airline's statement and is the customer's opinion reasonable? Please properly analyze the above data and answer the following questions. 1.The data are grouped equidistantly and sorted into a frequency distribution table 2.Calculate the median and mode according to the grouped data 3.Analyze whether the opinions put forward by customers are reasonable? Why? 4.Which average indicator is more reasonable to use to analyze the above problems?
(1) Question 1
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-X4Xc2CNh-1629336492150)(img/72.png)]
(2) Question 2
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Erv9KUv0-1629336492150)(img/73.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-VqShPp4W-1629336492151)(img/74.png)]
(3) Question 3
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-CkkHWM4n-1629336492152)(img/75.png)]
Reasonable. Although his average is 3.17 < 5, which is within the normal range, nearly 20% of the ticket purchase time > 5 minutes is beyond the normal range, that is, the speed is too slow. The average is not everything. Therefore, the reason put forward by customers is correct. The phenomenon of buying tickets too slowly does exist.
(4) Question 4
The average is reasonable. It can better reflect the approximate time of ticket purchase. More representative.
7. Descriptive statistical case - calculate average salary
Open the "monthly salary survey" worksheet in the file "descriptive statistics. xlsx", as shown in the figure, please calculate the average monthly salary.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-4ezaa7j-1629336492152) (IMG / 78. PNG)]
For a group( a,b),We call a Is the lower limit of the group, b Is the upper limit of the group; Difference between upper and lower limits( b-a)It's called group distance,(a+b)/2 It's called group median. The group median may not be the average value of the group data, but it is often used as the representative value of the group because of its simple calculation.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-6brQV1sE-1629336492153)(img/79.png)]
Right click to move the row label to adjust the up and down position of the data
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-WY2a7FBy-1629336492154)(img/80.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-z282N3Er-1629336492155)(img/81.png)]
Calculation steps: 1,Get a data table first 2,Get the group median for each salary range 3,Add a column to calculate the value of the group*frequency 4,Calculate average salary
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-QuY4oP6X-1629336492156)(img/82.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Z5bJDpkv-1629336492157)(img/83.png)]
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-kbXYfowv-1629336492158)(img/84.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-O7RycuLT-1629336492159)(img/85.png)]
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (IMG eubuxsaq-1629336492159) (IMG / 86. PNG)]
1, Analysis and prediction of excel dynamic series
1. Concept of dynamic sequence
Dynamic series refers to the sequence of index values of the whole at different times, which is also called time series. For convenience, dynamic sequences are often presented in the form of tables, as shown in the table. Dynamic sequence has two basic elements: time*t*And horizontal values*a*.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG clxlfdmo-1629336497817) (IMG / data1 / 1. PNG)]
2. Speed index of dynamic sequence
The commonly used speed indicators of dynamic series include development speed, total development speed, growth speed, average development speed and average growth speed.
3. Development speed
When studying the dynamic series, if the level values of two different periods are to be compared, the level value of the analysis and research period is called the level of the reporting period; The level value of the comparative base period is called the base period level. 1,Fixed base development speed: the fixed base development speed is the ratio of the level of each reporting period in the dynamic series to the level of a fixed base period, reflecting the development and change degree of the phenomenon in a long period of time. Indicates how many times or percent the level of the reporting period is in the fixed base period. 2,Month on month development speed: the month on month development speed is the ratio of the level in the reporting period to the level in the previous period, indicating the development speed of the phenomenon phase by phase
Classroom case 1
Example 1: file "dynamic sequence analysis".xlsx"The fixed asset investment of an enterprise from 2010 to 2015 is listed in the "development speed 1" worksheet, as shown in the figure. Please calculate the fixed base development speed and month on month development speed over the years. (1)Calculate the development speed of fixed base (2)Calculate the growth rate month on month
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-s2prkc1h-1629336497820)(img/data1/2.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-J6fxuo5j-1629336497820)(img/data1/3.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-1arJTnKb-1629336497821)(img/data1/4.png)]
3. Overall development speed
Total development speed The total development speed is referred to as the total speed. As the name suggests, the total development speed is the total development speed over a period of time, in numerical terms It should be equal to the final level divided by the initial level
Therefore, in example 1, the fixed base development speed H4 in 2015 is the total development speed. Of course, the formula "= product(D5:H5)" can also be used to calculate the total development speed, as shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-zJco0Zzd-1629336497822)(img/data1/5.png)]
4. Average development speed
The comparison basis of each period is different, so the calculation method of general average cannot be used. At present, the geometric average method is usually used to calculate the average development speed. Mathematically, we put n Product of number n The root of the power is called this n Geometric mean of the number. It is often used to calculate the average development speed in statistical research. This method is also used in calculating the annual average price rise in different periods.
Average development speed = geometric average of month on month development speed.
In Excel, there is a function that can calculate the geometric average of n numbers, that is, the geometric function
That is, the average development rate = geometric (month on month development rate).
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-YLd4W2tT-1629336497823)(img/data1/6.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-L060gCoj-1629336497824)(img/data1/7.png)]
5. Development speed and growth rate
Development speed and growth speed are dynamic relative numbers often used to express the development and change of a dynamic index in a certain period in people's daily social and economic work. Since both are "speed", it shows that the two are inextricably linked. They all abstract the development level of the two periods of comparison into a proportional number to represent the direction and degree of development and change of something during this period of comparison, and analyze and study the law of development and change of things. But there are obvious differences between the two 1,The development speed is generally expressed in percentage. When the proportion is large, it is more appropriate to use multiple. For example, the investment in fixed assets in a certain place was 36.6 billion yuan in 1994 and 32.8 billion yuan in 1993. Compared with 1993, it was 36.6 billion yuan in 1994÷328=1.12,This is the rate of development, expressed as a percentage of 112%,Expressed as a multiple, it is 1.12 Times. 2,The growth rate is a dynamic comparison index calculated by subtraction and division If the calculation result is positive, it is called growth rate or growth rate; If it is negative, it is called reduction speed or reduction rate. For example, the growth rate of fixed asset investment in a certain place in 1994 compared with that in 1993 was:(366-328)÷328=0.12,Expressed as a percentage, it is 12%
It can be seen from the above: growth rate = development rate - 1 (or 100%). Then: if the development rate is expressed as a percentage, the growth rate is the development rate minus 100%,
For example, the growth rate is 12% by subtracting 100% from 112% of the development rate in the above example;
If the development rate is expressed in multiples, the growth rate is the development rate minus 1.
Similarly, the growth rate of a certain period plus 1 (or 100%) is the development rate of this period.
Both year-on-year and month on month comparisons are based on the comparison between a statistical cycle and the previous statistical cycle. The difference is 1,Month on month comparison: it is called month on month comparison when the data of November this year is compared with that of last month (October). 2,Year on year: it is called year on year when the data in November of this year is compared with that in the same period of the previous year (November). "Year on year "is a comparison with the same period last year; "Month on month" is compared with the previous period. For example, the output of a coal mine was 2 million tons in June last year, 2.1 million tons in May this year, and 2.2 million tons in June this year. Then the "year-on-year" increase was 200000 tons, and the "month on month" increase was 100000 tons
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-QKgPD76B-1629336497824)(img/data1/8.png)]
Classroom case 2
Example 3: it is known that the growth rate of fixed asset investment of a company from 2011 to 2015 is shown in the figure. Please calculate the average growth rate in 5 years. The data file is "dynamic sequence analysis".xlsx"Average growth rate 2 worksheet.
The correct solution is: first use the chain growth rate to calculate the chain development rate, then use the chain development rate to calculate the average development rate, and then reduce the average development rate by 1, as shown in the figure. Growth rate month on month=Month on month growth rate+1 Average development speed=geomean(Growth rate month on month). Average growth rate=Average development speed-1
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-POob4C2k-1629336497825)(img/data1/9.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-4IlJ2rth-1629336497826)(img/data1/10.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-FiSizJex-1629336497826)(img/data1/11.png)]
6. Application case - Analysis and forecast of transaction volume over the years of "double 11"
Taking the trading volume over the years of "double 11" as an example, let's talk about the analysis indicators of dynamic series Data: according to official data, the turnover of tmall's "double 11" in 2019 is fixed at 2684.44 Billion yuan, and the transaction volume of the whole network platform reached 410.1 billion yuan. The transaction volume of the "double 11" network from 2013 to 2019 is shown in the figure below. It can be seen that the transaction volume shows an obvious trend of increasing year by year. We will introduce how to describe the dynamic series against the background of the change of transaction volume.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-fmbadxPX-1629336497827)(img/data1/12.png)]
Development speed and growth rate are relative indicators, reflecting the changes of things in a certain period of time. According to the selection of reference indicators, we can calculate three types of indicators. The first category of indicators is the fixed base ratio, which refers to the ratio between the current indicators and the baseline indicators, i.e an/a0,Indicates the general change of things in a certain period of time; The second category of indicators is month on month, which refers to the ratio between the current indicators and the previous indicators, i.e an/an-1,Indicates the change of things year by year; The third category of indicators is year-on-year, which refers to the ratio between the current indicators and the indicators in the same period last year, eliminating the impact of seasonal changes. For example, the ratio of network wide transaction volume in November 2019 and November 2018. Month on month and year-on-year indicators are our most common and important indicators. Based on the above three indicators, we can calculate the development speed and growth speed respectively.
Class exercise 1: 1,Select the transaction volume of the whole network in 2013 as the baseline period index to calculate the base ratio development speed and chain ratio development speed in 2019 2,Find the fixed base growth rate and month on month growth rate in 2019
1. Fixed base rate and chain rate of development in 2019
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-p0wMWnlo-1629336497828)(img/data1/13.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-J84TnlHP-1629336497828)(img/data1/14.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG tiypj5t2-1629336497829) (IMG / data1 / 15. PNG)]
2. Find the fixed base growth rate and month on month growth rate in 2019
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG jlirgmqf-1629336497830) (IMG / data1 / 16. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-gBOMENRv-1629336497831)(img/data1/17.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Svqszhmb-1629336497832)(img/data1/18.png)]
Class exercise 2: Average development speed and average growth speed are mainly used to describe the average change of something over a period of time. 1,Select the transaction volume of the whole network in 2013 as the baseline index, and the average development speed in 2019 2,2019 Average annual growth rate 3,The average development speed is also a very important indicator. We can use it to roughly estimate the future indicator level Based on the data of the whole network transaction volume from 2013 to 2019, it is assumed that the annual transaction volume will increase according to the average development rate in the past six years to budget the whole network transaction volume in 2020.
1. Average growth rate in 2019
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-6Mr0nUdV-1629336497833)(img/data1/19.png)]
2. Average growth rate in 2019
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-EKBjNjPK-1629336497834)(img/data1/20.png)]
3. The total transaction volume of the whole network in 2020 is budgeted.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-lUcoLpvm-1629336497835)(img/data1/21.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG evunxfc4-1629336497836) (IMG / data1 / 22. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-3fII2Hzb-1629336497836)(img/data1/23.png)]
7. Application case - simultaneous average detection
Generally, the regular fluctuations caused by seasonal changes are called seasonal changes. In addition to the data fluctuations caused by seasonal changes, there may also be data fluctuations caused by months. For the analysis of such data, we often use the same period average method. The same period average method is to first calculate the average of the same period (quarter or month) according to the dynamic data series, then calculate the seasonal index of each period, and finally predict the data of the next period according to the seasonal index. The specific calculation process is as follows: (1)First, calculate the average number of the same period (quarterly or monthly) according to the data over the years (more than 3 years); (2)Seasonal index= Average for the same period/Total average over the years(Average of average for the same period)×100%; (3)Calculate the forecast value of each period=Average level of last year×Seasonal index of each period. requirement:According to 2012-2015 The sales data of each month in 2016 is used to predict the sales of each month in 2016
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-B5507nSE-1629336497837)(img/data1/24.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-1I4pACIv-1629336497838)(img/data1/25.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-nBcbxnPQ-1629336497839)(img/data1/26.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-X1xUt6Bj-1629336497840)(img/data1/27.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-8DgsWMqP-1629336497840)(img/data1/28.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-awjQZhTw-1629336497841)(img/data1/29.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-M26ZcR63-1629336497842)(img/data1/30.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-DnYOceC9-1629336497843)(img/data1/31.png)]
2, Comprehensive evaluation analysis method
1. Concept of comprehensive evaluation and analysis
Comprehensive evaluation and analysis method refers to the method of comprehensive evaluation of multiple participants by using multiple indicators. The basic idea of comprehensive evaluation analysis method is to transform multiple indicators into an indicator that can reflect the comprehensive situation for analysis and evaluation.
If a student's usual score of a course is 90, the mid-term exam score is 70 and the final exam score is 80, the teacher will finally give a total score of 90 according to the consistent requirements of the school×20%+70×30%+80×50%=79(This is the specific application of comprehensive evaluation analysis method.
2. Comprehensive evaluation and case analysis
Example: a school recruits 3 math teachers. There are 7 candidates who have passed the written test, trial lecture and interview. The assessment results are shown in the figure. If the weights of written test scores, test lecture scores and interview scores are 40%, 30% and 30% respectively. Ask for the comprehensive evaluation score of each candidate and the comprehensive ranking of each candidate, and admit the top 3 according to the ranking. The data file is the "comprehensive evaluation 1" worksheet in the Workbook "comprehensive evaluation analysis. xlsx"
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-545RXxe5-1629336497843)(img/data1/32.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-WW84yK3O-1629336497844)(img/data1/33.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-hqULqEzN-1629336497845)(img/data1/34.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-uL1xo8Je-1629336497845)(img/data1/35.png)]
1, excel correlation analysis
1. Definition of correlation analysis
What is correlation analysis? When studying the relationship between two or more kinds of data, we usually use correlation analysis. What is the use of correlation analysis? Research A And B What does it matter Research A yes B What's the impact
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-0htjjtPE-1629336502307)(img/data2/1.png)]
People are always trying to find some connection between things For example:. Is there a relationship between smoking and life expectancy? The relationship between smoking and lung cancer, the relationship between height and the selection of basketball team members, the relationship between students' learning time and academic performance, the relationship between umbrella sales and season, and so on. The study of the relationship between things is everywhere in life. Here is a table listing students' study time and corresponding grades.
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-psZEh2Za-1629336502310)(img/data2/2.png)]
Correlation analysis is a statistical method to study the correlation degree and size between two or more variables. Its purpose is to reveal whether there is a correlation between phenomena and determine the nature, direction and closeness of the correlation.
2. Correlation diagram
How to analyze the relationship between the two? The most common method for correlation analysis of two variables is to take the values of these two variables as coordinates (x,y) and draw a scatter diagram in rectangular coordinate system. At this time, the scatter diagram is also called "correlation diagram", as shown in the figure.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-oKO8xg2Q-1629336502310)(img/data2/3.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-L9Sh4Nu8-1629336502311)(img/data2/4.png)]
The correlation diagram can intuitively and vividly show the relationship between variables. (1)The distribution of scattered points is roughly in a straight line, which is called linear correlation (2)The distribution of scatter points is roughly in a curve, which is called curve correlation (3)The distribution of scattered points is disordered, which is said to be irrelevant (4)When one variable increases, the other variable also increases, which is said to be positively correlated, as shown in the figure a As shown in. (5)When one variable increases and the other variable decreases, they are said to be negatively correlated, as shown in the figure b As shown in.
3. Correlation coefficient
https://www.zhihu.com/question/20852004
1,The correlation table and correlation graph can reflect the relationship between the two variables and their correlation direction, but they can not accurately indicate the degree of correlation between the two variables. 2,The correlation coefficient was first developed by statistician Carl·The statistical index designed by Pearson is the quantity to study the degree of linear correlation between variables, which is generally written in letters r express. Due to different research objects, there are many ways to define the correlation coefficient, and the Pearson correlation coefficient is more commonly used. 3,Correlation coefficient is a statistical index used to reflect the close degree of correlation between variables. The correlation coefficient is calculated according to the product difference method, which is also based on the deviation between the two variables and their average values, and the correlation degree between the two variables is reflected by multiplying the two deviations; This paper focuses on the linear single correlation coefficient
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG oxbtdbah-1629336502312) (IMG / data2 / 7. PNG)]
(1) Covariance
It can be popularly understood as: do the two variables change in the same direction in the change process? Or change in the opposite direction? You get bigger and I get bigger, which means that the two variables change in the same direction, and the covariance is positive. When you get bigger and I get smaller, it means that the two variables change in the opposite direction, and the covariance is negative.
positive correlation
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-qjAPakOm-1629336502313)(img/data2/8.png)]
negative correlation
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG iatuo1ll-1629336502313) (IMG / data2 / 9. PNG)]
Conclusion: covariance can only judge the correlation between the two groups of data, not the degree of correlation We want to judge the degree of correlation and divide the covariance by a value. This result can be compared to determine the degree of correlation
The correlation coefficient can also be regarded as covariance: a special covariance after excluding the dimensional influence of two variables. Since it is a special covariance, it: 1,It can also reflect whether the two variables change in the same direction or in the opposite direction. If the change in the same direction is positive, the change in the opposite direction is negative 2,Because it is the normalized covariance, a more important characteristic comes. It eliminates the influence of the change range of the two variables, but only reflects the similarity of the two variables per unit change.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-SSWMMZUg-1629336502314)(img/data2/6.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-YT7dsGz5-1629336502315)(img/data2/5.png)]
summary
In practical application, the general criteria for judging the degree of linear correlation by using correlation coefficient are as follows: (1),When|r|=0 When, it indicates that there is no linear correlation between the two variables; (2),When 0<|r|≤0.3 It is considered that there is a weak linear correlation between the two variables; (3),When 0.3<|r|≤0.5 It is considered that there is a low degree linear correlation between the two variables; (4),When 0.5<|r|≤0.8 It is considered that there is a significant linear correlation between the two variables; (5),When 0.8<|r|<1 It is considered that there is a high linear correlation between the two variables; (6),When|r|=1 It shows that there is a complete linear correlation between the two variables, that is, a linear function relationship; (7),When correlation coefficient r When it is very small or even zero, it can only indicate that there is no linear correlation between variables, but not that there is no correlation between them.
(2) Classroom case 1
investigation××The data of the annual sales of the community supermarket (million yuan) and the resident population of the community (10000 people) are shown in the figure. Please analyze the correlation between the annual sales of the supermarket and the resident population of the community. See the Workbook "correlation and regression analysis" for the data file.xlsx"Correlation factor 1 worksheet for
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-yMeqCAFz-1629336502316)(img/data2/10.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ilt4s0O6-1629336502317)(img/data2/11.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-dHPCfg0o-1629336502318)(img/data2/12.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-jryiOYUh-1629336502319)(img/data2/13.png)]
2, Regression analysis
1. Concept of regression analysis
- Regression analysis is a statistical analysis method to determine the quantitative relationship between two or more variables.
- Regression is used to estimate the numerical relationship between data elements and deal with regression problems. It is mainly used to predict numerical data
- Regression analysis can be divided into univariate regression analysis and multivariate regression analysis according to the number of variables involved; According to the relationship between independent variables and dependent variables, it can be divided into linear regression analysis and nonlinear regression analysis.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-LI1Mvp8s-1629336502319)(img/data2/14.png)]
For example, in the current case, when there are 600000 people in the community, what is the predicted annual sales of the supermarket?
2. Principle of regression analysis
The basic idea of regression analysis is: when the data is distributed near a straight line (or curve), find out an optimal straight line (or curve) to simulate it. When the sum of the squares of the vertical distances from all points to the line∑(y-y′)2 At the minimum, the best straight line (or curve) is obtained, as shown in the figure. This is the principle of least square method (double multiplication is square).
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-afb4eq0Y-1629336502320)(img/data2/15.png)]
In the final analysis, regression analysis is a statistical method that simulates the relationship between variables into a mathematical equation (also known as regression equation, or trend line equation) according to the principle of least square method, so as to infer the relationship between variables. Therefore, regression analysis is also called mathematical model method.
3. Coefficient of determination
When the relationship between variables can be simulated by a mathematical model, we use the determination coefficient (R2) to determine the fitting effect of the mathematical model.
Mathematically, the closer the determination coefficient R2 is to 1, the better the simulation effect of the mathematical model.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-WDnEvams-1629336502321)(img/data2/16.png)]
4. Use Excel scatter chart and trend line for regression analysis
For beginners, we still suggest that you use the method of "insert scatter diagram first, and then add trend line" to calculate the trend line equation, correlation coefficient and determination coefficient. Finally, determine the simulation effect according to the determination coefficient, and make data prediction according to the trend line equation.
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-yCvVEH7M-1629336502322)(img/data2/17.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-BhMv6UC1-1629336502322)(img/data2/18.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-1L3X0soR-1629336502323)(img/data2/19.png)]
1, Catering data analysis
1. Data set introduction
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-KEJwiHdD-1629336506272)(img/data4/1.png)]
2. Which hotel has the highest number of reviews in China?
Analysis: arrange the comment columns in descending order to get the maximum number of comments. (Sanbao porridge shop)(Nanjing East Road store)(Branch)
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Kuo6DItW-1629336506274)(img/data4/2.png)]
3. Which city's restaurants have the best taste per capita?
Resolution: 1,Directly use the PivotTable to list the average taste values and arrange them in descending order. 2,Do not directly operate on the pivot table, ctrl+a Select all tables and copy a pivot table. In descending order of taste
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-D6nJhxtP-1629336506275)(img/data4/3.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Py4xEeUt-1629336506276)(img/data4/4.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-z9ITpZh7-1629336506276)(img/data4/5.png)]
4. Which type of catering environment gives the best score?
Resolution: It can be done with PivotTable
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-cH4VGpCj-1629336506277)(img/data4/6.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-jXSBGdjL-1629336506278)(img/data4/7.png)]
5. How many restaurants rated above 8.0 in terms of taste, environment and service? In which city do they account for the most?
Resolution: 1,The screening function can be used to complete the screening of scores; 2,Then copy and paste the city, select all the data, and select the visible cells in the positioning conditions in the search and selection,Copy and paste into the new table 3,Insert a PivotTable, group by city, count the city fields, copy them to a new report, and use the PivotTable to complete the proportion calculation. Add a column of proportion, delete the part of 1, and sort in descending order according to the column of proportion
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-NZIPXw0A-1629336506279)(img/data4/9.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-o4n24uiu-1629336506280)(img/data4/10.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-qWqum1Dm-1629336506280)(img/data4/11.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-VZaDFs7e-1629336506281)(img/data4/12.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-qQKm1JBz-1629336506282)(img/data4/13.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-WqoeCGPp-1629336506283)(img/data4/14.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-8XnhzrXU-1629336506283)(img/data4/15.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-4Kyem3yQ-1629336506284)(img/data4/16.png)]
6. How many shops with Sichuan cuisine have the word "spicy" and how many have the word "Ma"?
utilize countifs Filter conditions and count
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-67cmfph-1629336506285) (IMG / Data4 / 20. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-5jHPJgVP-1629336506286)(img/data4/21.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-OXBjkeZD-1629336506287)(img/data4/22.png)]
7. Divide the per capita price into the grades of 05050100150150200200 + and how many are there in each city?
analysis: 1,Will everyone/50 The resulting values are listed separately, with column names Level 2,take Level Column use INT()Function [this function means rounding down, for example: 1.9 The function result is 1] rounded down 3,utilize if The function replaces all values greater than 4 with 4 4,Use PivotTable to convert cities Level Perform perspective and calculate the quantity
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Jc01Uxd1-1629336506288)(img/data4/23.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-JxBBKsH3-1629336506289)(img/data4/24.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (IMG gaposoeh-1629336506290) (IMG / Data4 / 25. PNG)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-ahn11WTT-1629336506291)(img/data4/26.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Fp6apbp2-1629336506292)(img/data4/27.png)]
1, Personnel statistics department data
1. Data set introduction
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-SM782ji7-1629336506292)(img/data4/30.png)]
2. Number of people in each department
Xiao Li is a personnel of a listed company. She will report on her work at the end of 2020. The leader asked her to provide some data, including: 1,Number of people in each department 2,Number and proportion of people of all ages 3,Count the number of people of all ages in all departments 4,Education background of each department 5,Number of employees in each department each year
1,Create a PivotTable report, group by department and count by department 2,Modify PivotTable name
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-9oY1VsO6-1629336506293)(img/data4/31.png)]
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-VzqJ5IyD-1629336506294)(img/data4/32.png)]
3. Number and proportion of people of all ages
1,Group by age and count by age 2,The data in the column of age are grouped in steps of 10 3,Add a count by age field in the PivotTable report. Right click-Value display mode-Percentage of total
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-tGLogHwE-1629336506294)(img/data4/33.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-G4zr7WdZ-1629336506295)(img/data4/34.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-kcj eiai8-1629336506296) (IMG / Data4 / 35. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG hyotnfck-1629336506297) (IMG / Data4 / 36. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-2Scia7of-1629336506298)(img/data4/37.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-rp69gJJd-1629336506298)(img/data4/38.png)]
4. Count the number of people of all ages in all departments
1,Rows are grouped by department and columns are grouped by age to count departments 2,The ages in the column are grouped in steps of 10 3,Modify name
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-qwrkySHg-1629336506299)(img/data4/39.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-zyqje4Yb-1629336506300)(img/data4/40.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG ldwbtsou-1629336506301) (IMG / Data4 / 41. PNG)]
5. Count the number of people with academic qualifications in each department
1,Rows are grouped by department, columns are grouped by education, and departments are counted 2,Modify name
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Naqp1C6m-1629336506302)(img/data4/42.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-08ZMvAev-1629336506303)(img/data4/43.png)]
5. Number of employees in each department every year
1,Rows are grouped by department, columns are grouped by education, and departments are counted 2,Modify name
[external link picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG erunurfj-1629336506304) (IMG / Data4 / 44. PNG)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-JPNCD9mY-1629336506304)(img/data4/45.png)]