MySQL 8.0 窗口函数实战:12 个业务场景高效查询指南
窗口函数(Window Functions)是 MySQL 8.0 的核心增强功能,支持对分组数据执行计算而不聚合结果集。以下是 12 个典型业务场景的实战写法,附 SQL 示例和解析:
场景 1:销售排名
需求:为每个销售员的销售额生成部门内排名
SELECT
salesperson_id,
department,
sales_amount,
RANK() OVER (PARTITION BY department ORDER BY sales_amount DESC) AS dept_rank
FROM sales_records;
说明:
-
PARTITION BY department按部门分组 -
RANK()根据销售额降序排名,同金额并列名次(如 1,1,3)
场景 2:移动平均值
需求:计算股票 7 日移动平均价格
SELECT
trade_date,
stock_price,
AVG(stock_price) OVER (
ORDER BY trade_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM stock_daily;
关键点:
-
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW定义滑动窗口为最近 7 天 - 适用于时间序列平滑处理
场景 3:累计占比
需求:计算每个产品销售额占总销售额的累计百分比
SELECT
product_id,
sales_amount,
SUM(sales_amount) OVER (ORDER BY sales_amount DESC) /
SUM(sales_amount) OVER () * 100 AS cumulative_percent
FROM product_sales;
效果:输出按销售额降序排列的累计占比曲线
场景 4:同环比分析
需求:计算月度销售额环比增长率
SELECT
month,
sales_amount,
(sales_amount - LAG(sales_amount, 1) OVER (ORDER BY month)) /
LAG(sales_amount, 1) OVER (ORDER BY month) * 100 AS growth_rate
FROM monthly_sales;
函数解析:
-
LAG(column, N)获取前第 N 行的数据 - 分母用上月数据,分子为本月增量
场景 5:分组 Top-N
需求:筛选每个品类销量前 3 的产品
WITH ranked_products AS (
SELECT
product_id,
category,
sales_volume,
DENSE_RANK() OVER (PARTITION BY category ORDER BY sales_volume DESC) AS rank_in_cat
FROM products
)
SELECT * FROM ranked_products WHERE rank_in_cat <= 3;
技巧:
- 使用
DENSE_RANK()避免名次断层(如 1,2,2,3) - CTE(公用表表达式)简化嵌套查询
场景 6:首尾记录对比
需求:比较用户首次和最后一次登录地点是否相同
SELECT
user_id,
FIRST_VALUE(login_city) OVER (PARTITION BY user_id ORDER BY login_time) AS first_city,
LAST_VALUE(login_city) OVER (PARTITION BY user_id ORDER BY login_time
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_city
FROM user_logins;
注意:LAST_VALUE() 需指定完整窗口范围,否则默认到当前行
场景 7:连续活跃检测
需求:标记连续 5 天登录的用户
SELECT
user_id,
login_date,
login_date - INTERVAL (DENSE_RANK() OVER (PARTITION BY user_id ORDER BY login_date)) DAY AS grp,
COUNT(*) OVER (PARTITION BY user_id, login_date - INTERVAL (DENSE_RANK() OVER (PARTITION BY user_id ORDER BY login_date)) DAY) AS consecutive_days
FROM logins;
原理:
- 利用
DENSE_RANK()生成序列号 - 日期减序列号得到连续登录的组标识(grp)
- 组内计数 >5 即达标
场景 8:差值填充
需求:用前一个有效值填充缺失的库存数据
SELECT
date,
inventory,
COALESCE(inventory,
LAST_VALUE(inventory) IGNORE NULLS OVER (ORDER BY date)) AS filled_inventory
FROM daily_inventory;
函数:
-
LAST_VALUE(...) IGNORE NULLS跳过空值取最近有效值 -
COALESCE()处理首行为空的情况
场景 9:分位数分析
需求:计算员工薪资的部门四分位数
SELECT
employee_id,
department,
salary,
NTILE(4) OVER (PARTITION BY department ORDER BY salary) AS quartile
FROM employees;
输出:quartile=1 代表最低 25% 薪资,4 代表最高 25%
场景 10:会话分割
需求:将用户事件按 30 分钟超时分割会话
SELECT
user_id,
event_time,
SUM(is_new_session) OVER (ORDER BY user_id, event_time) AS session_id
FROM (
SELECT *,
CASE WHEN TIMESTAMPDIFF(MINUTE,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time),
event_time) > 30
THEN 1 ELSE 0 END AS is_new_session
FROM user_events
) t;
逻辑:
- 计算相邻事件时间差 >30 分钟则标记新会话(is_new_session=1)
- 累加标记值生成会话 ID
场景 11:层级累加
需求:计算部门树形结构的预算累计值(从根节点向下)
SELECT
dept_id,
parent_id,
budget,
SUM(budget) OVER (
PARTITION BY root_id
ORDER BY dept_level
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_budget
FROM (
SELECT *,
CONNECT_BY_ROOT dept_id AS root_id,
LEVEL AS dept_level
FROM departments
START WITH parent_id IS NULL
CONNECT BY parent_id = PRIOR dept_id
) t;
组合技:
- 递归 CTE(或
CONNECT BY)生成树形层级 - 窗口函数按层级顺序累加
场景 12:异常波动检测
需求:标记日销售额超过历史均值 ±2 标准差的日期
SELECT
sale_date,
daily_sales,
CASE WHEN ABS(daily_sales - avg_sales) > 2 * std_dev THEN 1 ELSE 0 END AS is_anomaly
FROM (
SELECT *,
AVG(daily_sales) OVER () AS avg_sales,
STDDEV(daily_sales) OVER () AS std_dev
FROM daily_sales
) t;
统计方法:
- 窗口函数计算全局均值(avg_sales)和标准差(std_dev)
- 用
ABS()判断偏离程度
窗口函数核心语法总结
| 组件 | 说明 |
|---|---|
PARTITION BY |
定义分组列(类似 GROUP BY,但不聚合) |
ORDER BY |
指定排序顺序(决定计算顺序) |
| 窗口帧 | |
ROWS N PRECEDING |
包含前 N 行 |
RANGE INTERVAL |
按值范围定义窗口(如时间区间) |
| 常用函数 | |
ROW_NUMBER() |
行号(无并列) |
LEAD()/LAG() |
访问偏移行数据 |
PERCENT_RANK() |
百分比排名 |
最佳实践:
- 在子查询中预先过滤数据,减少窗口计算量
- 对大型数据集,用
INDEX优化PARTITION BY和ORDER BY涉及的列- 避免在窗口帧中使用
UNBOUNDED FOLLOWING(可能导致全表扫描)