早上九点半左右 业务人员反映他们页面打开缓慢,后续页面出现502。
然后我这边收到报警 ,登录数据库服务器(4核cpu)查看 cpu 400% load 30左右
进入到数据库中查看发现好多慢查询
本以为这些慢查询是来自该系统每天的定时任务(该系统相当于一个olap系统,每天会进行批量的数据查询提取。)于是先crontab -e 把所有的定时任务都停掉。但是慢查询还是存在。
所有的慢查询都是同一个模板, 后来询问开发的同事昨天上线了新版本
SELECT (plan.due_amount + IF(plan.overdue_day > 0, (plan.due_amount * extend.supplement_penalty_rate /100), 0) - plan.reduce_amount) due_amount_total, plan.repay_status
FROM mostop_xiaodai_supplement_loan_repay_plan plan LEFT JOIN mostop_xiaodai_loan_info_extend extend ON extend.loan_id = plan.loan_id WHERE plan.base_plan_id = 11124546 AND plan.step_no = 2
查看执行表结构
mysql> show create table mostop_xiaodai_loan_info_extend\G
*************************** 1. row ***************************
Table: mostop_xiaodai_loan_info_extend
Create Table: CREATE TABLE `mostop_xiaodai_loan_info_extend` (
`id` bigint(20) unsigned NOT NULL COMMENT '编号',
`agentid` int(10) unsigned NOT NULL COMMENT '渠道 ID',
`loan_id` bigint(20) unsigned NOT NULL COMMENT '贷款编号',
`create_time` datetime NOT NULL COMMENT '创建时间',
`update_time` datetime NOT NULL COMMENT '更新时间',
`total_rate` decimal(10,6) unsigned DEFAULT NULL COMMENT '总利率',
`service_rate` decimal(10,6) unsigned DEFAULT NULL COMMENT '服务费率',
`intrest_rate` decimal(10,6) unsigned DEFAULT NULL COMMENT '利息费率',
`overdue_rate` decimal(10,6) unsigned DEFAULT NULL COMMENT '逾期总利率',
`overdue_service_rate` decimal(10,6) unsigned DEFAULT NULL COMMENT '逾期服务费率',
`penalty_rate` decimal(10,6) unsigned DEFAULT NULL COMMENT '违约金率',
`is_split` tinyint(4) DEFAULT '0' COMMENT '息费打平,是否需要拆单',
`desired_repay_type` varchar(9) DEFAULT NULL COMMENT '息费打平,理想还款方式',
`desired_total_rate` decimal(10,6) DEFAULT NULL COMMENT '息费打平,理想总利率',
`supplement_overdue_rate` decimal(10,6) DEFAULT NULL COMMENT '息费打平,白条订单逾期总利率',
`supplement_penalty_rate` decimal(10,6) DEFAULT NULL COMMENT '息费打平,白条订单违约金率',
`investor_rate` decimal(10,6) DEFAULT NULL COMMENT '投资人利率',
`investor_repay_type` varchar(9) DEFAULT NULL COMMENT '投资人利率',
PRIMARY KEY (`id`,`agentid`),
UNIQUE KEY `agentid` (`agentid`,`loan_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='贷款信息扩展表'
查看执行计划
mysql> explain SELECT (plan.due_amount + IF(plan.overdue_day > 0, (plan.due_amount * extend.supplement_penalty_rate /100), 0)
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+-------+---------+----------+-------------+
| 1 | SIMPLE | plan | NULL | const | idx_base_planid | idx_base_planid | 8 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | extend | NULL | ALL | NULL | NULL | NULL | NULL | 4690305 | 100.00 | Using where |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+-------+---------+----------+-------------+
虽然表中的联合索引上有loan_id这个列
UNIQUE KEY `agentid` (`agentid`,`loan_id`)
但是根据索引的最左前缀原则,where条件中直接出了loan_id,复合索引出现了断开,所以索引失效。研发同学以为是可以用到表中的索引,没有审核就上线了,所以导致了全表扫描导致服务器的负载超高。
解决办法
添加索引
alter table mostop_xiaodai_loan_info_extend add index IDX_loan_id (loan_id);
添加索引后执行计划
explain SELECT (plan.due_amount + IF(plan.overdue_day > 0, (plan.due_amount * extend.supplement_penalty_rate /100), 0) - plan.reduce_amount) due_amount_total, plan.repay_status
-> FROM mostop_xiaodai_supplement_loan_repay_plan plan LEFT JOIN mostop_xiaodai_loan_info_extend extend ON extend.loan_id = plan.loan_id WHERE plan.base_plan_id = 11124546 AND plan.step_no = 2 ;
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | plan | NULL | const | idx_base_planid | idx_base_planid | 8 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | extend | NULL | ref | IDX_loan_id | IDX_loan_id | 8 | const | 1 | 100.00 | Using where |
+----+-------------+--------+------------+-------+-----------------+-----------------+---------+-------+------+----------+-------------+
服务器负载立马回复正常
通过本次事故
上线前进行SQL审核
应用和数据库单独部署在不同服务器上