这篇文章主要讲解了“数据库中cluster factor对执行计划的影响是什么”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“数据库中cluster factor对执行计划的影响是什么”吧!
cluster factor对执行计划的影响
测试环境:Linux 7.6 + ORACLE 19.6.1
1.创建测试环境
1.1 创建测试表并插入数据
CZH@czhpdb > create table test_ffs as select * from hr.employees;
Table created.
CZH@czhpdb > insert into test_ffs select * from test_ffs;
Execution Plan
----------------------------------------------------------
Plan hash value: 296244252
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 107 | 7383 | 3 (0)| 00:00:01 |
| 1 | LOAD TABLE CONVENTIONAL | TEST_FFS | | | | |
| 2 | OPTIMIZER STATISTICS GATHERING | | 107 | 7383 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TEST_FFS | 107 | 7383 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: statistics for conventional DML
Statistics
----------------------------------------------------------
72 recursive calls
89 db block gets
81 consistent gets
12 physical reads
21576 redo size
195 bytes sent via SQL*Net to client
394 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
107 rows processed
上面autotrace执行计划可以看到两个新特性:
1.2 12c R1与19c两个新特性
1.2.1 12c R1新特性OPTIMIZER STATISTICS GATHERING:
# OPTIMIZER STATISTICS GATHERING:12cR1以后的新特性,direct path load时,空表第一次加载数据时会自动收集统计信息。
# Oracle Database 12c introduced online statistics gathering for CREATE TABLE AS SELECT statements and direct-path inserts.
1.2.2 19c新特性real-time statistics
Oracle Database 19c introduces real-time statistics
, which extend online support to conventional DML statements
. Because statistics can go stale between DBMS_STATS jobs, real-time statistics helps the optimizer generate more optimal plans.Whereas bulk load operations gather all necessary statistics, real-time statistics augment rather than replace traditional statistics.
· Oracle introduced new parameters
· "_optimizer_gather_stats_on_conventional_dml" and "_optimizer_use_stats_on_conventional_dml" which are true by default
· "_optimizer_stats_on_conventional_dml_sample_rate" at 100%
· How does real time statistics works?
· By default the "_optimizer_gather_stats_on_conventional_dml" is true so its automatically kicks off
· When a DML operation is currently modifying a table (conventional), Oracle Database dynamically computes values for the most essential statistics if the above parameter is on.
· Consider a example of table that is having lot of inserts and rows are increasing. Real-time statistics keep track of the increasing row count as rows are being inserted. If the optimizer performs a hard parse of a new query, then the optimizer can use the real-time statistics to obtain a more accurate cost estimate.
· DBA_TAB_COL_STATISTICS and DBA_TAB_STATISITICS has columns NOTES tell real time statistics have been used. STATS_ON_CONVENTIONAL_DML
SELECT NVL(PARTITION_NAME, 'GLOBAL') PARTITION_NAME, NUM_ROWS, BLOCKS, NOTES
FROM USER_TAB_STATISTICS
WHERE TABLE_NAME = 'SALES'
ORDER BY 1, 4;
PARTITION_NAM NUM_ROWS BLOCKS NOTES
------------- ---------- ---------- -------------------------
GLOBAL 1837686 3315 STATS_ON_CONVENTIONAL_DML
1.3 插入大量数据并收集统计信息
CZH@czhpdb > set autot off
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > commit;
CZH@czhpdb > CREATE INDEX IDX_TEST_FFS ON TEST_FFS(EMPLOYEE_ID);
CZH@czhpdb > EXEC DBMS_STATS.GATHER_TABLE_STATS(user,’TEST_FFS’,cascade=>true);
1.4 使用Hint 获取sql真实执行计划
# sqlplus中set autotrace与explain plan for都是CBO预估出来的执行计划,可能与真实执行的并不相同,我们使用下面hint获取真实执行计划。
CZH@czhpdb > SELECT salary from test_ffs where employee_id < 100;
no rows selected
真实执行计划:
SYS@orcl2 > select * from table(dbms_xplan.display_cursor('c9qg9su5khysd',null,'allstats last'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID c9qg9su5khysd, child number 0
-------------------------------------
SELECT salary from test_ffs where
employee_id < 100
Plan hash value: 296244252
----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 2288 |
|* 1 | TABLE ACCESS FULL| TEST_FFS | 1 | 1 | 0 |00:00:00.01 | 2288 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("EMPLOYEE_ID"<100)
# 可以看到由于表中数据没有employee_id < 100,我们认为明显走索引的sql并未选择索引,那肯定是由于某些原因,cbo认为走索引并不是最优执行路径,我们就利用10053获取为什么cbo认为全表扫描cost更低。
SYS@orcl2 > alter system flush shared_pool;
System altered.
#如果不清空shared_pool或者使游标失效,软解析开启10053事件,不会生成trace文件。
CZH@czhpdb > ALTER SESSION SET EVENTS '10053 trace name context forever,level 1';
Session altered.
CZH@czhpdb > SELECT salary from test_ffs where employee_id < 100;
no rows selected
CZH@czhpdb > ALTER SESSION SET EVENTS '10053 trace name context off';
Session altered.
19c 10053:
# 可以从下面10053看到DK(distinct key),CLUF(clustering factor),IX_SEL,下一步将根据几个参数计算为何CBO认为走索引cost会高于全表扫描。
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: TEST_FFS Alias: TEST_FFSonline table stats for conventional DML (block count: 2263 row count: 219029) used on (TEST_FFS) block count: 5 -> 2263, row count: 107 -> 219136
#Rows: 219136 SSZ: 0 LGR: 0 #Blks: 2263 AvgRowLen: 69.00 NEB: 0 ChainCnt: 0.00 ScanRate: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 193
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
Index Stats::
Index: IDX_TEST_FFS Col#: 1
LVLS: 1 #LB: 458 #DK: 107 LB/K: 4.00 DB/K: 1524.00 CLUF: 163174.00 NRW: 219136.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1
KKEISFLG: 1
try to generate single-table filter predicates from ORs for query block SEL$1 (#0)
finally: "TEST_FFS"."EMPLOYEE_ID"<100
=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$1 (#0)
Return code in qosdSetupDirCtx4QB: NOCTX
=====================================
SPD: END context at query block level
=====================================
Access path analysis for TEST_FFS
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for TEST_FFS[TEST_FFS]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
kkecdn: Single Table Predicate:"TEST_FFS"."EMPLOYEE_ID"<100
online column stats for conventional DML used on (TEST_FFS.EMPLOYEE_ID) min: 100.00 -> 100.00, max: 206.00 -> 206.00, nnl: 0 -> 0, acl: 4 -> 0
Column (#1): EMPLOYEE_ID(NUMBER)
AvgLen: 22 NDV: 107 Nulls: 0 Density: 0.009346 Min: 100.000000 Max: 206.000000
Using density: 0.009346 of col #1 as selectivity of unpopular value pred
Table: TEST_FFS Alias: TEST_FFS
Card: Original: 219136.000000 Rounded: 2048 Computed: 2048.000000 Non Adjusted: 2048.000000
Scan IO Cost (Disk) = 615.000000
Scan CPU Cost (Disk) = 49272938.720000
Cost of predicates:
io = NOCOST, cpu = 50.000000, sel = 0.009346 flag = 2048 ("TEST_FFS"."EMPLOYEE_ID"<100)
Total Scan IO Cost = 615.000000 (scan (Disk))
+ 0.000000 (io filter eval) (= 0.000000 (per row) * 219136.000000 (#rows))
= 615.000000
Total Scan CPU Cost = 49272938.720000 (scan (Disk))
+ 10956800.000000 (cpu filter eval) (= 50.000000 (per row) * 219136.000000 (#rows))
= 60229738.720000
Access Path: TableScan
Cost: 621.167026 Resp: 621.167026 Degree: 0
Cost_io: 615.000000 Cost_cpu: 60229739
Resp_io: 615.000000 Resp_cpu: 60229739
****** Costing Index IDX_TEST_FFS
SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
Using density: 0.009346 of col #1 as selectivity of unpopular value pred
Access Path: index (RangeScan)
Index: IDX_TEST_FFS
resc_io: 1531.000000 resc_cpu: 11906445
ix_sel: 0.009346 ix_sel_with_filters: 0.009346
Cost: 1532.219121 Resp: 1532.219121 Degree: 1
Best:: AccessPath: TableScan
Cost: 621.167026 Degree: 1 Resp: 621.167026 Card: 2048.000000 Bytes: 0.000000
online column stats for conventional DML used on (TEST_FFS.SALARY) min: 2100.00 -> 2100.00, max: 24000.00 -> 24000.00, nnl: 0 -> 0, acl: 4 -> 0
***************************************
2.调整cluster factor
2.1 cluster factor聚簇因子说明
cluster factor表示索引顺序与表存储数据一致性,顺序扫描索引时,如果索引键值扫描到键值对应的表数据行对应的数据块发生变化时,则cluster factor加1,所以cluster factor最低为表数据块,最大为表数据行,与表存储顺序高度相关,如果表是按照顺序插入,则cluster factor较低,如果表数据为无序插入,则cluster factor较高,这就是为什么同样表数据情况下,执行计划会有时候有差别的原因。
索引扫描成本公式:
INDEX ACCESS I/O COST=BLEVEL+CEIL(#LEAF_BLOCKS*IX_SEL)
TABLE_ACCESS I/O COST=CEIL(CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)
IX_SEL与IX_SEL_WITH_FILTERS为索引选择率与索引带谓词选择率,一般为1/(DISTINCT KEY)值,本例中走全表扫描时,IX_SEL=1/107=0.009345,则计算走索引成本为:
ACCESS INDEX COST=INDEX ACCESS I/O COST + TABLE ACCESS I/O COST=2+CEIL(458*0.009345)+CEIL(163174*0.009345)=1540
近似等于CBO预计出来的1532,是高于全表扫的COST 615的,所以选择走了全表扫描。
2.2 调整cluster factor
重建表,order by排序,降低cluster factor
CZH@czhpdb > create table test_ffs_03 as select * from test_ffs_02 order by employee_id;
Table created.
CZH@czhpdb > create index idx_test_ffs_03 on test_ffs_03(employee_id);
Index created.
CZH@czhpdb > select clustering_factor,index_name from user_indexes where index_name='IDX_TEST_FFS_03';
CLUSTERING_FACTOR INDEX_NAME
---------------------------------------- --------------------
1128 IDX_TEST_FFS_03
# 可以看到cluster factor明显降低。
CZH@czhpdb > select salary from test_ffs_03 where employee_id < 100;
no rows selected
SYS@orcl2 > select * from table(dbms_xplan.display_cursor('8fpk2b8vzn5y2',null,'allstats last'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 8fpk2b8vzn5y2, child number 0
-------------------------------------
select salary from test_ffs_03 where
employee_id < 100
Plan hash value: 704625359
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 2 | 1 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TEST_FFS_03 | 1 | 1024 | 0 |00:00:00.01 | 2 | 1 |
|* 2 | INDEX RANGE SCAN | IDX_TEST_FFS_03 | 1 | 1024 | 0 |00:00:00.01 | 2 | 1 |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("EMPLOYEE_ID"<100)
感谢各位的阅读,以上就是“数据库中cluster factor对执行计划的影响是什么”的内容了,经过本文的学习后,相信大家对数据库中cluster factor对执行计划的影响是什么这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是亿速云,小编将为大家推送更多相关知识点的文章,欢迎关注!