这篇文章将为大家详细讲解有关怎么进行SparkSQL部署与简单使用,文章内容质量较高,因此小编分享给大家做个参考,希望大家阅读完这篇文章后对相关知识有一定的了解。
一、运行环境
Ø JDK:1.8.0_45 64位
Ø hadoop-2.6.0-cdh6.7.0
Ø Scala:2.11.8
Ø spark-2.3.1-bin-2.6.0-cdh6.7.0(需要自己编译)
Ø hive-1.1.0-cdh6.7.0
Ø MySQL5.6
二、SparkSQL运行准备
#元数据存在MySQL,启动MySQL
[root@hadoop001 ~]# su mysqladmin[mysqladmin@hadoop001 root]$ cd ~[mysqladmin@hadoop001 ~]$ service mysql startStarting MySQL [ OK ]
#启动HDFS
[hadoop@hadoop001 sbin]$ ./start-dfs.sh
#配置SparkSQL 的hive-site.xml
[hadoop@hadoop001 ~]$ cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf/
三、SparkSQL启动
#spark-sehll方式启动:
[hadoop@hadoop001 bin]$ ./spark-shell --master local[2] \ --jars ~/software/mysql-connector-java-5.1.34-bin.jar scala> spark.sql("use hive_data2").show(false)scala> spark.sql("select * from emp").show(false)+-----+------+---------+----+----------+-------+------+------+ |empno|ename |job |mgr |hiredate |salary |comm |deptno|+-----+------+---------+----+----------+-------+------+------+|7369 |SMITH |CLERK |7902|1980-12-17|800.0 |null |20 ||7499 |ALLEN |SALESMAN |7698|1981-2-20 |1600.0 |300.0 |30 ||7521 |WARD |SALESMAN |7698|1981-2-22 |1250.0 |500.0 |30 ||7566 |JONES |MANAGER |7839|1981-4-2 |2975.0 |null |20 ||7654 |MARTIN|SALESMAN |7698|1981-9-28 |1250.0 |1400.0|30 ||7698 |BLAKE |MANAGER |7839|1981-5-1 |2850.0 |null |30 ||7782 |CLARK |MANAGER |7839|1981-6-9 |2450.0 |null |10 ||7788 |SCOTT |ANALYST |7566|1987-4-19 |3000.0 |null |20 ||7839 |KING |PRESIDENT|null|1981-11-17|5000.0 |null |10 ||7844 |TURNER|SALESMAN |7698|1981-9-8 |1500.0 |0.0 |30 ||7876 |ADAMS |CLERK |7788|1987-5-23 |1100.0 |null |20 ||7900 |JAMES |CLERK |7698|1981-12-3 |950.0 |null |30 ||7902 |FORD |ANALYST |7566|1981-12-3 |3000.0 |null |20 ||7934 |MILLER|CLERK |7782|1982-1-23 |1300.0 |null |10 ||8888 |HIVE |PROGRAM |7839|1988-1-23 |10300.0|null |null |+-----+------+---------+----+----------+-------+------+------+
#spark-sql方式启动:
[hadoop@hadoop001 bin]$ ./spark-sql --master local[2] \--driver-class-path ~/software/mysql-connector-java-5.1.34-bin.jar #进入数据库spark-sql> use hive_data2;18/08/30 20:36:52 INFO HiveMetaStore: 0: get_database: hive_data218/08/30 20:36:52 INFO audit: ugi=hadoop ip=unknown-ip-addr cmd=get_database: hive_data2Time taken: 0.114 seconds#查询数据spark-sql> select * from emp;18/08/30 20:37:05 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 1.292944 s7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 207499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 307521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 207839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 107844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 307876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 207900 JAMES CLERK 7698 1981-12-3 950.0 NULL 307902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 207934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 108888 HIVE PROGRAM 7839 1988-1-23 10300.0 NULL NULL
关于怎么进行SparkSQL部署与简单使用就分享到这里了,希望以上内容可以对大家有一定的帮助,可以学到更多知识。如果觉得文章不错,可以把它分享出去让更多的人看到。