mysql windowing function
1. Definitions
- In order to display the data before and after the aggregation function.
- The windowing function operates on a group of values. It does not need the group by clause to group the data. It can return the columns of the basic row and aggregate columns in the same row at the same time
Note: the windowing function is in mysql8 Only after 0
2. Grammar
Function name (column) over (option [the option can be partition by column, order by column])
over(partition by xxx) according to xxx Group all rows over(partition by xxx order by aaa) according to xxx Group by aaa sort
3. Classification of windowing function
3.1 aggregate windowing function
- If the function name is an aggregate function, it becomes an aggregate windowing function
- Syntax: aggregate function (column) over(partition by column, order by row)
- Common aggregate functions: sum(),count(),max(),min()
Calculate the number of passes for each student -- Using aggregate functions select student_id,count(sid) from score where num>=60 group by student_id; -- Use windowing function select sid,student_id,count(sid) over(partition by student_id order by student_id) Number of passes from score where num>=60;
summary
The windowing function will not modify the structure of the source data table, but will add the desired results in the last column of the table. If there are multiple rows of data in the grouping, it will be displayed repeatedly
3.2 sorting windowing function
- row_ Number (line number)
- Generate consecutive sequence numbers, regardless of the same score
- Rank
- The ranking of the same score is the same. The rear ranking is the real serial number, and the ranking is jumping
- For example: 12225, there are 3 in the ranking of 2, and the next ranking is 5, because there are 4 scores in front
- dense_ Rank (dense sort)
- The ranking of the same score is the same, which is a continuous ranking
- For example: 12223, there are 3 ranking 2, and the next ranking is 3
- Ntile (group ranking)
- ntile has the concept of bucket
- ntile(6), which divides the total records into 6 barrels. If there are 12 records, there are 2 records in a barrel
- Ranking: 112233445566
- The number size of the sorting result can only be used between buckets. Although the sequence number inside the bucket is the same, the num is not necessarily the same
-- Look at the code differentiation relationship select s.sid,s1.sname,s1.gender,c.cname,s.num, row_number() over(partition by c.cname order by num desc) as row_number ranking, rank() over(partition by c.cname order by num desc) as rank ranking, dense_rank() over(partition by c.cname order by num desc) as dense_rank ranking, ntile(6) over(partition by c.cname order by num desc) as ntile ranking from score as s join student s1 on s.student_id=s1.sid left join course c on s.course_id=c.cid;
Case: -- Query the performance information of the top three students in each subject select * from( select s.sid,s1.sname,s1.gender,c.cname,s.num, dense_rank() over(partition by c.cname order by num desc) as dense_rank ranking from score as s join student s1 on s.student_id=s1.sid left join course c on s.course_id=c.cid) as e where dense_rank ranking<=3;
3.3 others
-
lag(col,n)
The value of the nth row in the statistics window
-
lead(col,n)
The value of the nth row down in the statistics window
These two functions can be used for data subtraction of adjacent rows in the same column
case -- For the following data, for the same user( uid)If you log in again within 2 minutes, it will be judged as cheating, count which users have cheated, and calculate the number of cheating -- Go time difference SELECT *, format(Seconds difference / 60, 3) Minutes difference FROM ( SELECT id, uid, login_time, -- Add a column of time lead (login_time, 1) over ( PARTITION BY uid ORDER BY login_time ) lead_time, -- Find the difference between the two columns of time timestampdiff( SECOND, login_time, ( lead (login_time, 1) over ( PARTITION BY uid ORDER BY login_time )) ) Seconds difference FROM lag_table ) as e; -- Find the number of cheating SELECT uid,count(1) Number of cheating FROM ( SELECT id, uid, login_time, -- Add a column of time lead (login_time, 1) over ( PARTITION BY uid ORDER BY login_time ) lead_time, -- Find the difference between the two columns of time timestampdiff( SECOND, login_time, ( lead (login_time, 1) over ( PARTITION BY uid ORDER BY login_time )) ) Seconds difference FROM lag_table ) as e where format(Seconds difference / 60, 3)<=2 group by uid;
-
first_value(column)
Get the first value of the current row after sorting in the group
-- Take the maximum score of the subject -- In descending order, the first value is the maximum value select s.sid,s1.sname,s1.gender,c.cname,s.num, first_value(num) over(partition by c.cname order by num desc) as firstvalue from score s join student s1 on s.student_id=s1.sid left join course c on s.course_id=c.cid;
-
last_value(column)
- Get the last value of the current row after sorting in the group
- last_ The default statistical range of value() is rows between unbounded preceding and current row, which is to compare the data of the current row with the data before the current row
Modification scope: stay order by Add after rows between unbounded preceding and unbounded following Namely: order by rows between unbounded preceding and unbounded following It changes to: compare the current grouped data with the data and take the last value unbounded Unlimited preceding The forward offset of the current record of the partition current current following Current record backward offset of the partition
-- Take the minimum score of the subject -- According to the descending order, the last value is the minimum value, and the range should be modified select s.sid,s1.sname,s1.gender,c.cname,s.num, last_value(num) over(partition by c.cname order by num desc rows between unbounded preceding and unbounded following) as lastvalue from score s join student s1 on s.student_id=s1.sid left join course c on s.course_id=c.cid;
https://www.luffycity.com/