在数据仓库中,有事实表、维度表两个概念。
事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。
维度表是维度属性的集合。是分析问题的一个窗口。是人们观察数据的特定角度,是考虑问题时的一类属性,属性的集合构成一个维。
如图示
我们以sh用户下的sales表和times表来看,
SALES为事实表
SQL> desc sales
Name Null? Type
----------------------------------------- -------- ----------------------------
PROD_ID NOT NULL NUMBER
CUST_ID NOT NULL NUMBER
TIME_ID NOT NULL DATE
CHANNEL_ID NOT NULL NUMBER
PROMO_ID NOT NULL NUMBER
QUANTITY_SOLD NOT NULL NUMBER(10,2)
AMOUNT_SOLD NOT NULL NUMBER(10,2)
TIMES为维度表
SQL> desc times
Name Null? Type
----------------------------------------- -------- ----------------------------
TIME_ID NOT NULL DATE
DAY_NAME NOT NULL VARCHAR2(9)
DAY_NUMBER_IN_WEEK NOT NULL NUMBER(1)
DAY_NUMBER_IN_MONTH NOT NULL NUMBER(2)
CALENDAR_WEEK_NUMBER NOT NULL NUMBER(2)
FISCAL_WEEK_NUMBER NOT NULL NUMBER(2)
WEEK_ENDING_DAY NOT NULL DATE
WEEK_ENDING_DAY_ID NOT NULL NUMBER
CALENDAR_MONTH_NUMBER NOT NULL NUMBER(2)
FISCAL_MONTH_NUMBER NOT NULL NUMBER(2)
CALENDAR_MONTH_DESC NOT NULL VARCHAR2(8)
CALENDAR_MONTH_ID NOT NULL NUMBER
FISCAL_MONTH_DESC NOT NULL VARCHAR2(8)
FISCAL_MONTH_ID NOT NULL NUMBER
DAYS_IN_CAL_MONTH NOT NULL NUMBER
DAYS_IN_FIS_MONTH NOT NULL NUMBER
END_OF_CAL_MONTH NOT NULL DATE
END_OF_FIS_MONTH NOT NULL DATE
CALENDAR_MONTH_NAME NOT NULL VARCHAR2(9)
FISCAL_MONTH_NAME NOT NULL VARCHAR2(9)
CALENDAR_QUARTER_DESC NOT NULL CHAR(7)
CALENDAR_QUARTER_ID NOT NULL NUMBER
FISCAL_QUARTER_DESC NOT NULL CHAR(7)
FISCAL_QUARTER_ID NOT NULL NUMBER
DAYS_IN_CAL_QUARTER NOT NULL NUMBER
DAYS_IN_FIS_QUARTER NOT NULL NUMBER
END_OF_CAL_QUARTER NOT NULL DATE
END_OF_FIS_QUARTER NOT NULL DATE
CALENDAR_QUARTER_NUMBER NOT NULL NUMBER(1)
FISCAL_QUARTER_NUMBER NOT NULL NUMBER(1)
CALENDAR_YEAR NOT NULL NUMBER(4)
CALENDAR_YEAR_ID NOT NULL NUMBER
FISCAL_YEAR NOT NULL NUMBER(4)
FISCAL_YEAR_ID NOT NULL NUMBER
DAYS_IN_CAL_YEAR NOT NULL NUMBER
DAYS_IN_FIS_YEAR NOT NULL NUMBER
END_OF_CAL_YEAR NOT NULL DATE
END_OF_FIS_YEAR NOT NULL DATE
如果我们创建一个物化视图
create materialized view sales_month_sum
enable query rewrite as
SELECT t.calendar_month_id,
prod_id,
channel_id,
promo_id,
SUM (quantity_sold) quantity_sold,
SUM (amount_sold) amount_sold
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY prod_id,
channel_id,
promo_id,
t.calendar_month_id;
如果我们做如下按月的分组查询
SQL> alter session set query_rewrite_enabled=true;
SQL> alter session set query_rewrite_integrity=trusted;
SQL> set autotrace traceonly
SQL> set line 200
SQL> SELECT t.calendar_month_id,
prod_id,
channel_id,
promo_id,
SUM (quantity_sold) quantity_sold,
SUM (amount_sold) amount_sold
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY prod_id,
channel_id,
promo_id,
t.calendar_month_id;
9068 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3287305789
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9068 | 690K| 13 (0)| 00:00:01 |
| 1 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
可见查询使用的是物化视图,但是如果我需要按年、季度对数据做分组查询呢?
SELECT t.calendar_quarter_id,prod_id,
channel_id,
promo_id,
SUM (quantity_sold) quantity_sold,
SUM (amount_sold) amount_sold
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY prod_id,
channel_id,
promo_id,
t.calendar_quarter_id;
这个查看肯定是不能使用物化视图的,执行计划如下
Execution Plan
----------------------------------------------------------
Plan hash value: 3221963832
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2037 | 79443 | 569 (6)| 00:00:07 | | |
| 1 | HASH GROUP BY | | 2037 | 79443 | 569 (6)| 00:00:07 | | |
|* 2 | HASH JOIN | | 918K| 34M| 546 (2)| 00:00:07 | | |
| 3 | PART JOIN FILTER CREATE | :BF0000 | 1826 | 21912 | 18 (0)| 00:00:01 | | |
| 4 | TABLE ACCESS FULL | TIMES | 1826 | 21912 | 18 (0)| 00:00:01 | | |
| 5 | PARTITION RANGE JOIN-FILTER| | 918K| 23M| 525 (2)| 00:00:07 |:BF0000|:BF0000|
| 6 | TABLE ACCESS FULL | SALES | 918K| 23M| 525 (2)| 00:00:07 |:BF0000|:BF0000|
---------------------------------------------------------------------------------------------------------
Oracle为了是查询重写更加的智能,引入了Dimension的概念。Dimension我们称之为维,它是基于维度表的,用来描述维度表的维度之间的层级关系。
CREATE DIMENSION SH.TIMES_DIM
LEVEL DAY IS
(SH.TIMES.TIME_ID)
LEVEL MONTH IS
(SH.TIMES.CALENDAR_MONTH_ID)
LEVEL QUARTER IS
(SH.TIMES.CALENDAR_QUARTER_ID)
LEVEL YEAR IS
(SH.TIMES.CALENDAR_YEAR_ID)
HIERARCHY CAL_ROLLUP
(DAY CHILD OF
MONTH CHILD OF
QUARTER CHILD OF
YEAR);
LEVEL定义等级,基于维度表,HIERARCHY关键字定义层级关系。由层级关系,我们知道quarter是由month组成的。
我们再次查询
SQL> SELECT t.calendar_quarter_id,prod_id,
channel_id,
promo_id,
SUM (quantity_sold) quantity_sold,
SUM (amount_sold) amount_sold
FROM sales s, times t
WHERE s.time_id = t.time_id
GROUP BY prod_id,
channel_id,
promo_id,
t.calendar_quarter_id;
3375 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3397140165
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 1720 | 36 (14)| 00:00:01 |
| 1 | HASH GROUP BY | | 20 | 1720 | 36 (14)| 00:00:01 |
|* 2 | HASH JOIN | | 128K| 10M| 33 (7)| 00:00:01 |
| 3 | VIEW | | 849 | 6792 | 19 (6)| 00:00:01 |
| 4 | HASH UNIQUE | | 849 | 6792 | 19 (6)| 00:00:01 |
| 5 | TABLE ACCESS FULL | TIMES | 1826 | 14608 | 18 (0)| 00:00:01 |
| 6 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
这次是使用物化视图与times表做关联,性能更高了。
我们对比如下两个查询
SQL> SELECT t.calendar_quarter_id,
prod_id,
channel_id,
promo_id,
SUM (quantity_sold) quantity_sold,
SUM (amount_sold) amount_sold
FROM sales s, times t
WHERE s.time_id = t.time_id AND t.calendar_quarter_id = 1769
GROUP BY prod_id,
channel_id,
promo_id,
t.calendar_quarter_id;
168 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3397140165
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 86 | 33 (7)| 00:00:01 |
| 1 | HASH GROUP BY | | 1 | 86 | 33 (7)| 00:00:01 |
|* 2 | HASH JOIN | | 6423 | 539K| 32 (4)| 00:00:01 |
| 3 | VIEW | | 34 | 272 | 19 (6)| 00:00:01 |
| 4 | HASH UNIQUE | | 34 | 272 | 19 (6)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | TIMES | 90 | 720 | 18 (0)| 00:00:01 |
| 6 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
使用了物化视图
SQL>SELECT t.calendar_quarter_id,
prod_id,
channel_id,
promo_id,
SUM (quantity_sold) quantity_sold,
SUM (amount_sold) amount_sold
FROM sales s, times t
WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'
GROUP BY prod_id,
channel_id,
promo_id,
t.calendar_quarter_id;
168 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3221963832
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8146 | 373K| | 848 (2)| 00:00:11 | | |
| 1 | HASH GROUP BY | | 8146 | 373K| 3632K| 848 (2)| 00:00:11 | | |
|* 2 | HASH JOIN | | 57459 | 2637K| | 546 (2)| 00:00:07 | | |
| 3 | PART JOIN FILTER CREATE | :BF0000 | 91 | 1820 | | 18 (0)| 00:00:01 | | |
|* 4 | TABLE ACCESS FULL | TIMES | 91 | 1820 | | 18 (0)| 00:00:01 | | |
| 5 | PARTITION RANGE JOIN-FILTER| | 918K| 23M| | 525 (2)| 00:00:07 |:BF0000|:BF0000|
| 6 | TABLE ACCESS FULL | SALES | 918K| 23M| | 525 (2)| 00:00:07 |:BF0000|:BF0000|
-----------------------------------------------------------------------------------------------------------------
没有使用物化视图。
其实条件实质上是一样的,因为t.calendar_quarter_desc = '1998-01' 和t.calendar_quarter_id = 1769 在times表中表示相同的数据。
但是Oracle不知道CALENDAR_QUARTER_DESC与CALENDAR_QUARTER_ID的关系。
我们在创建Dimension时,可以为LEVEL指定属性值。
如下
CREATE DIMENSION SH.TIMES_DIM
LEVEL DAY IS
(SH.TIMES.TIME_ID)
LEVEL MONTH IS
(SH.TIMES.CALENDAR_MONTH_ID)
LEVEL QUARTER IS
(SH.TIMES.CALENDAR_QUARTER_ID)
LEVEL YEAR IS
(SH.TIMES.CALENDAR_YEAR_ID)
HIERARCHY CAL_ROLLUP
(DAY CHILD OF
MONTH CHILD OF
QUARTER CHILD OF
YEAR)
ATTRIBUTE QUARTER DETERMINES
(SH.TIMES.CALENDAR_QUARTER_DESC,
SH.TIMES.DAYS_IN_CAL_QUARTER,
SH.TIMES.END_OF_CAL_QUARTER,
SH.TIMES.CALENDAR_QUARTER_NUMBER)
ATTRIBUTE YEAR DETERMINES
(SH.TIMES.CALENDAR_YEAR,
SH.TIMES.DAYS_IN_CAL_YEAR,
SH.TIMES.END_OF_CAL_YEAR);
我们再次查询
SQL> SELECT t.calendar_quarter_id,
prod_id,
channel_id,
promo_id,
SUM (quantity_sold) quantity_sold,
SUM (amount_sold) amount_sold
FROM sales s, times t
WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = '1998-01'
GROUP BY prod_id,
channel_id,
promo_id,
t.calendar_quarter_id;
168 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3290467316
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 2240 | 33 (7)| 00:00:01 |
| 1 | HASH GROUP BY | | 20 | 2240 | 33 (7)| 00:00:01 |
|* 2 | HASH JOIN | | 17191 | 1880K| 32 (4)| 00:00:01 |
| 3 | VIEW | VW_GBF_5 | 91 | 3094 | 19 (6)| 00:00:01 |
| 4 | HASH GROUP BY | | 91 | 728 | 19 (6)| 00:00:01 |
| 5 | VIEW | | 91 | 728 | 19 (6)| 00:00:01 |
| 6 | HASH UNIQUE | | 91 | 1456 | 19 (6)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | TIMES | 91 | 1456 | 18 (0)| 00:00:01 |
| 8 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
这次就使用了物化视图。