0%

Hive环境搭建

下载Hive

我们还是使用cloudera cdh5版本 hive-1.1.0-cdh5.7.0

1
2
3
4
cd /usr/local/hive
wget http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.tar.gz

tar -zxvf hive-1.1.0-cdh5.7.0.tar.gz

配置环境变量

1
vi ~/.bash_profile

在文件末尾加上如下内容

1
export HIVE_HOME=/usr/local/hive/hive-1.1.0-cdh5.7.0

执行source命令让配置生效

1
source ~/.bash_profile

配置Hive

配置hive-env.sh

1
2
cp conf/hive-env.sh.template conf/hive-env.sh
vi conf/hive-env.sh

配置HADOOP_HOME参数

1
HADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0-cdh5.7.0

配置hive-site.xml

可以参考hcatalog/etc/hcatalog/proto-hive-site.xml文件的内容

1
vi conf/hive-site.xml

添加如下内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

<configuration>


<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hivemetastoredb?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>PASSWORD</value>
<description>password to use against metastore database</description>
</property>

</configuration>

然后我们还需要把mysql驱动包从Maven仓库拷贝到$HIVE_HOME/lib里面。

1
cp /Users/simon/.m2/repository/mysql/mysql-connector-java/5.1.46/mysql-connector-java-5.1.46.jar lib/

安装Mysql

在这里我就先不讲如何安装Mysql了,可参考 MacOS X安装MySQL

启动Hive

启动hive之前必须得保证hadoop hdfs已经启动。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
bin/hive

ls: /usr/local/spark/spark-2.2.3-bin-hadoop2.6/lib/spark-assembly-*.jar: No such file or directory
2019-01-11 10:12:46,772 WARN [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present. Continuing without it.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hbase/hbase-1.2.0-cdh5.7.0/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-01-11 10:12:46,954 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Logging initialized using configuration in jar:file:/usr/local/hive/hive-1.1.0-cdh5.7.0/lib/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties
Thu Jan 11 10:12:49 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Jan 11 10:12:49 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Jan 11 10:12:49 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Jan 11 10:12:49 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Jan 11 10:12:51 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Jan 11 10:12:51 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Jan 11 10:12:51 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Thu Jan 11 10:12:51 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cd $MYSQL_HOME
bin/mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 533
Server version: 5.7.17 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

show databases;
+---------------------------+
| Database |
+---------------------------+
| information_schema |
| hivemetastoredb |
| mysql |
| performance_schema |
+---------------------------+
11 rows in set (0.01 sec)

mysql>
use hivemetastoredb;
Database changed

可以看到之前设置的数据库hivemetastoredb已经成功生成了。此数据库里面,还自动生成了很多表,如TBLS

创建表

回到hive shell

1
2
3
hive> create table hive_wordcount(context string);
OK
Time taken: 1.312 seconds

回到Mysql终端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mysql> select * FROM TBLS;
Empty set (0.00 sec)

mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------------+---------------+--------------------+--------------------+
| 1 | 1548903355 | 1 | 0 | simon | 0 | 1 | hive_wordcount | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+----------------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)


mysql> select * from COLUMNS_V2;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
| 1 | NULL | context | string | 0 |
+-------+---------+-------------+-----------+-------------+
1 row in set (0.00 sec)

可以看到已经有一张表hive_wordcount已经成功生成了,这张表里面有一个context字段。

加载数据到表中

1
2
3
4
5
cd $HADOOP_HOME
bin/hadoop fs -text /data/input
19/01/11 11:00:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hello world
hello hello world

我们把hdfs://data/input加载到表中

1
2
3
4
5
hive> LOAD DATA INPATH '/data/input' INTO TABLE hive_wordcount;
Loading data to table default.hive_wordcount
Table default.hive_wordcount stats: [numFiles=1, totalSize=30]
OK
Time taken: 1.078 seconds

如果文件不在hfds里面,则命令为LOAD DATA LOCAL

执行查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
hive> select word, count(1) from hive_wordcount lateral view explode(split(context, ' ')) wc as word group by word;
Query ID = simon_20190111105555_39805c49-890c-4241-8388-56542c90fa2e
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1548765492126_0003, Tracking URL = http://localhost:8088/proxy/application_1548765492126_0003/
Kill Command = /usr/local/hadoop/hadoop-2.6.0-cdh5.7.0/bin/hadoop job -kill job_1548765492126_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-01-11 11:11:01,153 Stage-1 map = 0%, reduce = 0%
2019-01-11 11:11:06,299 Stage-1 map = 100%, reduce = 0%
2019-01-11 11:11:12,475 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1548765492126_0003
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 8786 HDFS Write: 16 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
hello 3
world 2
Time taken: 18.997 seconds, Fetched: 2 row(s)

lateral view explode即把每行记录按照指定分隔符进行拆解。

最后我们成功的得到了如下结果

1
2
hello	3
world 2