小编给大家分享一下HIVE如何统计各个区域下最热门的TOP3的商品,希望大家阅读完这篇文章之后都有所收获,下面让我们一起去探讨吧!
环境:
hadoop-2.6.0-cdh6.7.0
hive-1.1.0-cdh6.7.0
sqoop-1.4.6-cdh6.7.0
MySQL5.6.39
需求:HIVE统计各个区域下最热门的TOP3的商品,将统计结果导出到MySQL中
MySQL中有:city_info 城市信息表和product_info 商品信息表
HIVE中有: user_click 用户行为日志,按date分区
一、MySQL数据库建库建表及初始化数据
1.1 MySQL创建 ruozedata数据库:
CREATE DATABASE ruozedata;
1.2 创建city_info表
DROP TABLE if exists city_info;
CREATE TABLE `city_info` (
`city_id` int(11) DEFAULT NULL,
`city_name` varchar(255) DEFAULT NULL,
`area` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
初始化city_info表数据
insert into `city_info`(`city_id`,`city_name`,`area`) values (1,'BEIJING','NC'),(2,'SHANGHAI','EC'),(3,'NANJING','EC'),(4,'GUANGZHOU','SC'),(5,'SANYA','SC'),(6,'WUHAN','CC'),(7,'CHANGSHA','CC'),(8,'XIAN','NW'),(9,'CHENGDU','SW'),(10,'HAERBIN','NE');
1.3 创建product_info表
DROP TABLE if exists product_info;
CREATE TABLE `product_info` (
`product_id` int(11) DEFAULT NULL,
`product_name` varchar(255) DEFAULT NULL,
`extend_info` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
初始化product_info表数据
insert into product_info(product_id,product_name,extend_info) values (1,'product1','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (2,'product2','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (3,'product3','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (4,'product4','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (5,'product5','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (6,'product6','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (7,'product7','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (8,'product8','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (9,'product9','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (10,'product10','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (11,'product11','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (12,'product12','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (13,'product13','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (14,'product14','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (15,'product15','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (16,'product16','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (17,'product17','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (18,'product18','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (19,'product19','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (20,'product20','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (21,'product21','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (22,'product22','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (23,'product23','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (24,'product24','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (25,'product25','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (26,'product26','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (27,'product27','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (28,'product28','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (29,'product29','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (30,'product30','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (31,'product31','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (32,'product32','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (33,'product33','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (34,'product34','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (35,'product35','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (36,'product36','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (37,'product37','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (38,'product38','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (39,'product39','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (40,'product40','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (41,'product41','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (42,'product42','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (43,'product43','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (44,'product44','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (45,'product45','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (46,'product46','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (47,'product47','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (48,'product48','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (49,'product49','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (50,'product50','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (51,'product51','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (52,'product52','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (53,'product53','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (54,'product54','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (55,'product55','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (56,'product56','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (57,'product57','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (58,'product58','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (59,'product59','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (60,'product60','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (61,'product61','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (62,'product62','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (63,'product63','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (64,'product64','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (65,'product65','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (66,'product66','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (67,'product67','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (68,'product68','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (69,'product69','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (70,'product70','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (71,'product71','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (72,'product72','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (73,'product73','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (74,'product74','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (75,'product75','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (76,'product76','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (77,'product77','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (78,'product78','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (79,'product79','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (80,'product80','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (81,'product81','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (82,'product82','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (83,'product83','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (84,'product84','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (85,'product85','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (86,'product86','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (87,'product87','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (88,'product88','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (89,'product89','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (90,'product90','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (91,'product91','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (92,'product92','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (93,'product93','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (94,'product94','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (95,'product95','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (96,'product96','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (97,'product97','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (98,'product98','{"product_status":1}');
insert into product_info(product_id,product_name,extend_info) values (99,'product99','{"product_status":0}');
insert into product_info(product_id,product_name,extend_info) values (100,'product100','{"product_status":1}');
1.4 创建MySQL数据库产品统计表product_stat
drop table if exists `product_stat`;
CREATE TABLE `product_stat`(
`product_id` int(11) DEFAULT NULL
,`product_name` varchar(255) DEFAULT NULL
,`area` varchar(255) DEFAULT NULL
,`click_count` int(11)
,`rank` int
,`days` varchar(20)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;
二、使用sqoop将MySQL的city_info表和product_info表的数据导入到hive(包括创建表)
2.1 操作之前先看看hive里面的表
hive> show databases;
OK
default
hive2
hive2_ruozedata
ruozedata
sqoophive
Time taken: 0.06 seconds, Fetched: 5 row(s)
hive> use ruozedata;
OK
Time taken: 0.069 seconds
hive> show tables;
OK
emp_hive
gw_test
hive_wc
rating_json
ruozedata_emp
ruozedata_emp_managed
ruozedata_emp_partition
ruozedata_person
Time taken: 0.061 seconds, Fetched: 8 row(s)
hive>
2.2 导入city_info表(从MySQL导入数据到hive的ruozedata.city_info表):
sqoop import \
--connect jdbc:mysql://localhost:33066/ruozedata \
--username root \
--password root \
--table city_info -m 1 \
--mapreduce-job-name FromMySQLToHive \
--delete-target-dir \
--create-hive-table \
--hive-table ruozedata.city_info \
--hive-import --hive-overwrite;
2.3 导入product_info表(从MySQL导入数据到hive的ruozedata.product_info表):
sqoop import \
--connect jdbc:mysql://localhost:33066/ruozedata \
--username root \
--password root \
--table product_info -m 1 \
--mapreduce-job-name FromMySQLToHive \
--delete-target-dir \
--create-hive-table \
--hive-table ruozedata.product_info \
--hive-import --hive-overwrite;
2.4 操作完之后再看看hive里面的表:
hive> show tables;
OK
city_info
emp_hive
gw_test
hive_wc
product_info
rating_json
ruozedata_emp
ruozedata_emp_managed
ruozedata_emp_partition
ruozedata_person
Time taken: 0.061 seconds, Fetched: 10 row(s)
hive> select * from city_info;
OK
1BEIJINGNC
2SHANGHAIEC
3NANJINGEC
4GUANGZHOUSC
5SANYASC
6WUHANCC
7CHANGSHACC
8XIANNW
9CHENGDUSW
10HAERBINNE
Time taken: 0.554 seconds, Fetched: 10 row(s)
hive> select * from product_info;
OK
1product1{"product_status":1}
2product2{"product_status":1}
3product3{"product_status":1}
4product4{"product_status":1}
5product5{"product_status":1}
6product6{"product_status":1}
7product7{"product_status":1}
8product8{"product_status":1}
9product9{"product_status":0}
10product10{"product_status":1}
11product11{"product_status":0}
12product12{"product_status":0}
13product13{"product_status":0}
14product14{"product_status":0}
15product15{"product_status":1}
16product16{"product_status":0}
17product17{"product_status":1}
18product18{"product_status":0}
19product19{"product_status":1}
20product20{"product_status":1}
21product21{"product_status":0}
22product22{"product_status":0}
23product23{"product_status":0}
24product24{"product_status":0}
25product25{"product_status":1}
26product26{"product_status":1}
27product27{"product_status":0}
28product28{"product_status":1}
29product29{"product_status":0}
30product30{"product_status":0}
31product31{"product_status":0}
32product32{"product_status":0}
33product33{"product_status":1}
34product34{"product_status":1}
35product35{"product_status":0}
36product36{"product_status":0}
37product37{"product_status":1}
38product38{"product_status":0}
39product39{"product_status":0}
40product40{"product_status":1}
41product41{"product_status":1}
42product42{"product_status":1}
43product43{"product_status":1}
44product44{"product_status":0}
45product45{"product_status":1}
46product46{"product_status":1}
47product47{"product_status":0}
48product48{"product_status":1}
49product49{"product_status":1}
50product50{"product_status":1}
51product51{"product_status":1}
52product52{"product_status":0}
53product53{"product_status":0}
54product54{"product_status":1}
55product55{"product_status":0}
56product56{"product_status":0}
57product57{"product_status":1}
58product58{"product_status":1}
59product59{"product_status":1}
60product60{"product_status":1}
61product61{"product_status":0}
62product62{"product_status":1}
63product63{"product_status":1}
64product64{"product_status":0}
65product65{"product_status":0}
66product66{"product_status":1}
67product67{"product_status":1}
68product68{"product_status":0}
69product69{"product_status":1}
70product70{"product_status":0}
71product71{"product_status":0}
72product72{"product_status":0}
73product73{"product_status":1}
74product74{"product_status":0}
75product75{"product_status":1}
76product76{"product_status":0}
77product77{"product_status":0}
78product78{"product_status":1}
79product79{"product_status":0}
80product80{"product_status":0}
81product81{"product_status":0}
82product82{"product_status":1}
83product83{"product_status":1}
84product84{"product_status":1}
85product85{"product_status":0}
86product86{"product_status":1}
87product87{"product_status":1}
88product88{"product_status":1}
89product89{"product_status":1}
90product90{"product_status":1}
91product91{"product_status":1}
92product92{"product_status":0}
93product93{"product_status":0}
94product94{"product_status":1}
95product95{"product_status":0}
96product96{"product_status":0}
97product97{"product_status":1}
98product98{"product_status":1}
99product99{"product_status":0}
100product100{"product_status":1}
Time taken: 0.141 seconds, Fetched: 100 row(s)
hive>
sqoop成功的将MySQL的city_info表和product_info表的数据导入到hive(包括创建表)
三、在hive里面对表和数据进行操作,最后在进行统计。
3.1 创建用户行为日志表user_click分区表
create table user_click
(
user_id int
,session_id string
,action_time string
,city_id int
,product_id int
)
partitioned by (date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
;
3.2 上传user_click.txt数据文件,并导入数据到表。
上传user_click.txt数据文件到/home/hadoop/data 目录。
[hadoop@hadoop002 data]$ ll
total 62912
-rw-rw-r--. 1 hadoop hadoop 652 Jun 4 16:34 emp.txt
-rw-r--r--. 1 hadoop hadoop 84 Jun 7 09:53 hive_row_number.txt
-rw-rw-r--. 1 hadoop hadoop 34 Jun 11 15:17 hive_wc.txt
-rw-r--r--. 1 hadoop hadoop 63602280 Jun 7 09:54 rating.json
-rw-rw-r--. 1 hadoop hadoop 67 Jun 6 18:30 student.txt
-rwxrwxrwx. 1 hadoop hadoop 725264 Jun 9 21:28 user_click.txt
[hadoop@hadoop002 data]$ pwd
/home/hadoop/data
导入数据到user_click表:
LOAD DATA LOCAL INPATH '/home/hadoop/data/user_click.txt' OVERWRITE INTO TABLE user_click PARTITION(date='2018-06-20');
3.3 在hive创建产品统计分区表product_stat,用于写入统计结果
create table product_stat
(
product_id int
,product_name string
,area string
,click_count int
,rank int
,days string
)
partitioned by (date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
;
3.4 统计各个区域下最热门的TOP3的商品,并写入到统计表product_stat
insert overwrite table product_stat partition(date='2018-06-20')
select
t.product_id
,t.product_name
,t.area
,t.click_count
,t.rank
,'2018-06-20' as days
from
(
select
ci.area as area
,uc.product_id as product_id
,pd.product_name as product_name
,count(uc.product_id) as click_count
,(row_number() over(partition by ci.area order by count(uc.product_id) desc)) as rank
from city_info ci
left join user_click uc on uc.city_id = ci.city_id and uc.date='2018-06-20'
left join product_info pd on pd.product_id = uc.product_id
group by ci.area,uc.product_id,pd.product_name
)t where t.rank<=3
;
查询统计结果
hive> select * from product_stat where date='2018-06-20';
OK
7product7CC3912018-06-202018-06-20
26product26CC3922018-06-202018-06-20
70product70CC3832018-06-202018-06-20
4product4EC4012018-06-202018-06-20
96product96EC3222018-06-202018-06-20
99product99EC3132018-06-202018-06-20
9product9NC1612018-06-202018-06-20
40product40NC1622018-06-202018-06-20
94product94NC1332018-06-202018-06-20
NULLNULLNE012018-06-202018-06-20
67product67NW2012018-06-202018-06-20
56product56NW2022018-06-202018-06-20
48product48NW1932018-06-202018-06-20
38product38SC3512018-06-202018-06-20
88product88SC3422018-06-202018-06-20
33product33SC3432018-06-202018-06-20
16product16SW2012018-06-202018-06-20
95product95SW1922018-06-202018-06-20
60product60SW1932018-06-202018-06-20
Time taken: 0.345 seconds, Fetched: 19 row(s)
hive>
四、使用sqoop将hive的统计表product_stat数据导出到MySQL的统计表product_stat中:
./sqoop export \
--connect jdbc:mysql://localhost:33066/ruozedata \
--username root \
--password root \
--table product_stat \
--export-dir /ruozedata_03/product_stat/date=2018-06-20 \
--input-fields-terminated-by '\t' \
--input-null-string '' --input-null-non-string 0 \
--columns "product_id,product_name,area,click_count,rank,days" \
--update-key product_id,area --update-mode allowinsert \
;
去MySQL里面查看统计数据是否有成功导出:
mysql> select * from product_stat where days='2018-06-20' order by area,rank;
+------------+--------------+------+-------------+------+------------+
| product_id | product_name | area | click_count | rank | days |
+------------+--------------+------+-------------+------+------------+
| 7 | product7 | CC | 39 | 1 | 2018-06-20 |
| 26 | product26 | CC | 39 | 2 | 2018-06-20 |
| 70 | product70 | CC | 38 | 3 | 2018-06-20 |
| 4 | product4 | EC | 40 | 1 | 2018-06-20 |
| 96 | product96 | EC | 32 | 2 | 2018-06-20 |
| 99 | product99 | EC | 31 | 3 | 2018-06-20 |
| 9 | product9 | NC | 16 | 1 | 2018-06-20 |
| 40 | product40 | NC | 16 | 2 | 2018-06-20 |
| 94 | product94 | NC | 13 | 3 | 2018-06-20 |
| 67 | product67 | NW | 20 | 1 | 2018-06-20 |
| 56 | product56 | NW | 20 | 2 | 2018-06-20 |
| 48 | product48 | NW | 19 | 3 | 2018-06-20 |
| 38 | product38 | SC | 35 | 1 | 2018-06-20 |
| 88 | product88 | SC | 34 | 2 | 2018-06-20 |
| 33 | product33 | SC | 34 | 3 | 2018-06-20 |
| 16 | product16 | SW | 20 | 1 | 2018-06-20 |
| 95 | product95 | SW | 19 | 2 | 2018-06-20 |
| 60 | product60 | SW | 19 | 3 | 2018-06-20 |
+------------+--------------+------+-------------+------+------------+
18 rows in set (0.00 sec)
看完了这篇文章,相信你对“HIVE如何统计各个区域下最热门的TOP3的商品”有了一定的了解,如果想了解更多相关知识,欢迎关注编程网行业资讯频道,感谢各位的阅读!