mysql windowing function

mysql windowing function

1. Definitions

  • In order to display the data before and after the aggregation function.
  • The windowing function operates on a group of values. It does not need the group by clause to group the data. It can return the columns of the basic row and aggregate columns in the same row at the same time

Note: the windowing function is in mysql8 Only after 0

2. Grammar

Function name (column) over (option [the option can be partition by column, order by column])

over(partition by xxx) according to xxx Group all rows
over(partition by xxx order by aaa) according to xxx Group by aaa sort

3. Classification of windowing function

3.1 aggregate windowing function

  • If the function name is an aggregate function, it becomes an aggregate windowing function
    • Syntax: aggregate function (column) over(partition by column, order by row)
    • Common aggregate functions: sum(),count(),max(),min()
Calculate the number of passes for each student
-- Using aggregate functions
select student_id,count(sid) from score where num>=60 group by student_id;

-- Use windowing function
select sid,student_id,count(sid) over(partition by student_id order by student_id) Number of passes 
from score where num>=60;


summary

The windowing function will not modify the structure of the source data table, but will add the desired results in the last column of the table. If there are multiple rows of data in the grouping, it will be displayed repeatedly

3.2 sorting windowing function

  • row_ Number (line number)
    • Generate consecutive sequence numbers, regardless of the same score
  • Rank
    • The ranking of the same score is the same. The rear ranking is the real serial number, and the ranking is jumping
    • For example: 12225, there are 3 in the ranking of 2, and the next ranking is 5, because there are 4 scores in front
  • dense_ Rank (dense sort)
    • The ranking of the same score is the same, which is a continuous ranking
    • For example: 12223, there are 3 ranking 2, and the next ranking is 3
  • Ntile (group ranking)
    • ntile has the concept of bucket
    • ntile(6), which divides the total records into 6 barrels. If there are 12 records, there are 2 records in a barrel
    • Ranking: 112233445566
    • The number size of the sorting result can only be used between buckets. Although the sequence number inside the bucket is the same, the num is not necessarily the same
-- Look at the code differentiation relationship
select s.sid,s1.sname,s1.gender,c.cname,s.num,
row_number() over(partition by c.cname order by num desc) as row_number ranking,
rank() over(partition by c.cname order by num desc) as rank ranking,
dense_rank() over(partition by c.cname order by num desc) as dense_rank ranking,
ntile(6) over(partition by c.cname order by num desc) as ntile ranking
from score as s 
join student s1 on s.student_id=s1.sid
left join course c on s.course_id=c.cid;

Case:
-- Query the performance information of the top three students in each subject
select * from(
select s.sid,s1.sname,s1.gender,c.cname,s.num,
dense_rank() over(partition by c.cname order by num desc) as dense_rank ranking
from score as s 
join student s1 on s.student_id=s1.sid
left join course c on s.course_id=c.cid) as e
where dense_rank ranking<=3;

3.3 others

  • lag(col,n)

    The value of the nth row in the statistics window

  • lead(col,n)

    The value of the nth row down in the statistics window

    These two functions can be used for data subtraction of adjacent rows in the same column

case
-- For the following data, for the same user( uid)If you log in again within 2 minutes, it will be judged as cheating, count which users have cheated, and calculate the number of cheating
-- Go time difference
SELECT *, 
	format(Seconds difference / 60, 3) Minutes difference 
	FROM 
		( SELECT id, 
         		uid, 
         login_time, 
         -- Add a column of time
         lead (login_time, 1) over ( PARTITION BY uid ORDER BY login_time ) lead_time, 
         -- Find the difference between the two columns of time
		timestampdiff( 
            SECOND, 
            login_time, 
            ( lead (login_time, 1) over ( PARTITION BY uid ORDER BY login_time ))
                     ) Seconds difference 
		FROM lag_table ) as e;
-- Find the number of cheating
SELECT uid,count(1) Number of cheating
	FROM 
		( SELECT id, 
         		uid, 
         login_time, 
         -- Add a column of time
         lead (login_time, 1) over ( PARTITION BY uid ORDER BY login_time ) lead_time, 
         -- Find the difference between the two columns of time
		timestampdiff( 
            SECOND, 
            login_time, 
            ( lead (login_time, 1) over ( PARTITION BY uid ORDER BY login_time ))
                     ) Seconds difference 
		FROM lag_table ) as e
	where format(Seconds difference / 60, 3)<=2 group by uid;



  • first_value(column)

    Get the first value of the current row after sorting in the group

    -- Take the maximum score of the subject
    -- In descending order, the first value is the maximum value
    select s.sid,s1.sname,s1.gender,c.cname,s.num,
    first_value(num) over(partition by c.cname order by num desc) as firstvalue
    from score s join student s1 on s.student_id=s1.sid
    left join course c on s.course_id=c.cid;
    

  • last_value(column)

    • Get the last value of the current row after sorting in the group
    • last_ The default statistical range of value() is rows between unbounded preceding and current row, which is to compare the data of the current row with the data before the current row
    Modification scope:
    stay order by Add after rows between unbounded preceding and unbounded following
     Namely: order by rows between unbounded preceding and unbounded following
     It changes to: compare the current grouped data with the data and take the last value
    unbounded Unlimited
    preceding The forward offset of the current record of the partition
    current current
    following Current record backward offset of the partition
    
    -- Take the minimum score of the subject
    -- According to the descending order, the last value is the minimum value, and the range should be modified
    select s.sid,s1.sname,s1.gender,c.cname,s.num,
    last_value(num) over(partition by c.cname order by num desc rows between unbounded preceding and unbounded following) as lastvalue
    from score s join student s1 on s.student_id=s1.sid
    left join course c on s.course_id=c.cid;
    


https://www.luffycity.com/

Tags: Database SQL

Posted by wscreate on Mon, 18 Apr 2022 09:12:22 +0930