0%

使用Thriftserver和Beeline

我们已经有了spark-shellspark-sql,为什么还要使用Thriftserver呢?

  1. 每一个spark-shell / spark-sql都是一个Spark Application。

  2. Thriftserver不管启动多少个客户端(beeline / code),都只会产生一个Spark Application

启动thriftserver

需要把MySQL JDBC驱动通过jars参数加入进来。

1
2
3
4
cd $SPARK_HOME/

sbin/start-thriftserver.sh --master local[2] \
--jars ~/.m2/repository/mysql/mysql-connector-java/5.1.46/mysql-connector-java-5.1.46.jar

查看进程

1
2
3
4
5
6
7
8
jps -m

21522 SparkSubmit --master local[2] --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
--name Thrift JDBC/ODBC Server
--jars /Users/simon/.m2/repository/mysql/mysql-connector-java/5.1.46/mysql-connector-java-5.1.46.jar spark-internal
3459 Worker --webui-port 8081 spark://localhost:7077
18420 SparkSubmit --master local[2] --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
--jars /Users/simon/.m2/repository/mysql/mysql-connector-java/5.1.46/mysql-connector-java-5.1.46.jar spark-internal

我们看到有两个SparkSubmit进程,21522那个进程就是刚才启动的那个,class是HiveThriftServer2

以spark-shell方式启动的class是SparkSQLCLIDriver

thriftserver端口

默认端口为10000,可以修改为其他端口。

在启动时增加hiveconf参数,比如把端口号设置为14000

1
--hiveconf hive.server2.thrift.port = 14000

beeline访问thriftserver

需要设置jdbc uri: jdbc:hive2://localhost:10000

1
2
3
4
5
6
7
8
9
10
bin/beeline -u jdbc:hive2://localhost:10000 -n simon
Connecting to jdbc:hive2://localhost:10000
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connected to: Spark SQL (version 2.2.3)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://localhost:10000>

查看emp表的内容。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0: jdbc:hive2://localhost:10000> SELECT * FROM emp;
+--------+---------+------------+-------+-----------+---------+---------+---------+--+
| empno | ename | job | mgr | hiredate | sal | comm | deptno |
+--------+---------+------------+-------+-----------+---------+---------+---------+--+
| 7369 | SMITH | CLERK | 7902 | NULL | 800.0 | NULL | 20 |
| 7499 | ALLEN | SALESMAN | 7698 | NULL | 1600.0 | 300.0 | 30 |
| 7521 | WARD | SALESMAN | 7698 | NULL | 1250.0 | 500.0 | 30 |
| 7566 | JONES | MANAGER | 7839 | NULL | 2975.0 | NULL | 20 |
| 7654 | MARTIN | SALESMAN | 7698 | NULL | 1250.0 | 1400.0 | 30 |
| 7698 | BLAKE | MANAGER | 7839 | NULL | 2850.0 | NULL | 30 |
| 7782 | CLARK | MANAGER | 7839 | NULL | 2450.0 | NULL | 10 |
| 7788 | SCOTT | ANALYST | 7566 | NULL | 3000.0 | NULL | 20 |
| 7839 | KING | PRESIDENT | NULL | NULL | 5000.0 | NULL | 10 |
| 7844 | TURNER | SALESMAN | 7698 | NULL | 1500.0 | 0.0 | 30 |
| 7876 | ADAMS | CLERK | 7788 | NULL | 1100.0 | NULL | 20 |
| 7900 | JAMES | CLERK | 7698 | NULL | 950.0 | NULL | 30 |
| 7902 | FORD | ANALYST | 7566 | NULL | 3000.0 | NULL | 20 |
| 7934 | MILLER | CLERK | 7782 | NULL | 1300.0 | NULL | 10 |
+--------+---------+------------+-------+-----------+---------+---------+---------+--+
14 rows selected (0.411 seconds)

通过程序访问thriftserver

添加依赖

1
2
3
4
5
6
<!-- Thriftserver 支持 -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0</version>
</dependency>

SparkSQLThriftServerApp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package gy.finolo.spark

import java.sql.DriverManager


object SparkSQLThriftServerApp {

def main(args: Array[String]): Unit = {

// 1. 加载驱动
Class.forName("org.apache.hive.jdbc.HiveDriver")

// 2. 获取连接
val conn = DriverManager.getConnection("jdbc:hive2://localhost:10000", "root", "")
val ps = conn.prepareStatement("SELECT empno, ename, sal FROM emp")
val rs = ps.executeQuery()

while (rs.next()) {
println("empno: " + rs.getInt("empno") +
", ename: " + rs.getString("ename") +
", salary: " + rs.getDouble("sal"))
}

rs.close()
ps.close()
conn.close()
}

}

运行结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
empno: 7369, ename: SMITH, salary: 800.0
empno: 7499, ename: ALLEN, salary: 1600.0
empno: 7521, ename: WARD, salary: 1250.0
empno: 7566, ename: JONES, salary: 2975.0
empno: 7654, ename: MARTIN, salary: 1250.0
empno: 7698, ename: BLAKE, salary: 2850.0
empno: 7782, ename: CLARK, salary: 2450.0
empno: 7788, ename: SCOTT, salary: 3000.0
empno: 7839, ename: KING, salary: 5000.0
empno: 7844, ename: TURNER, salary: 1500.0
empno: 7876, ename: ADAMS, salary: 1100.0
empno: 7900, ename: JAMES, salary: 950.0
empno: 7902, ename: FORD, salary: 3000.0
empno: 7934, ename: MILLER, salary: 1300.0

注意

  1. 通过程序访问thriftserver前,得保证thriftserver已经启动,不然运行程序时,会遇到如下错误:
1
2
Exception in thread "main" java.sql.SQLException: Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:10000: java.net.ConnectException: Connection refused (Connection refused)
  1. 我安装的Hive版本是hive-1.1.0-cdh5.7.0,但我在pom.xml里依赖的jar版本不对时,可能遇到如下异常。
1
2
3
4
Exception in thread "main" java.sql.SQLException: Could not open client transport with JDBC Uri: 
jdbc:hive2://localhost:10000: Could not establish connection to jdbc:hive2://localhost:10000:
Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null,
configuration:{use:database=default})