View on GitHub

个人笔记

SongPinru 的小仓库

CDH6.3.2 集群规划

基础

硬件信息

  内存 硬盘 核数
bjuc49.optaim.com 128G 10T 32
bjuc50.optaim.com 128G 10T 32
bjuc51.optaim.com 128G 10T 32
bjuc52.optaim.com 128G 10T 32
bjuc53.optaim.com 128G 10T 32
bjuc54.optaim.com 128G 10T 32
bjuc55.optaim.com 128G 10T 32

环境

系统:

ubuntu18.04

jdk:

oracle-j2sdk1.8-1.8.0+update181-1.x86_64.rpm

mysql:

Mariadb-10.1.45

username=root

password=Qwaszx@zxc

49和50上的hive库互为主从

Cloudera Manager:

Cloudera Express 6.3.1

CDH:

CDH6.3.2

systemctl restart cloudera-scm-server

CDH包含框架

Component Version
Apache Avro 1.8.2
Apache Flume 1.9.0
Apache Hadoop 3.0.0
Apache HBase 2.1.4
HBase Indexer 1.5
Apache Hive 2.1.1
Hue 4.3.0
Apache Impala 3.2.0
Apache Kafka 2.2.1
Kite SDK 1.0.0
Apache Kudu 1.10.0
Apache Solr 7.4.0
Apache Oozie 5.1.0
Apache Parquet 1.9.0
Parquet-format 2.4.0
Apache Pig 0.17.0
Apache Sentry 2.1.0
Apache Spark 2.4.0
Apache Sqoop 1.4.7
Apache ZooKeeper 3.4.5

集群规划

49 50 51 52 53 54 55
mariadb mariadb          
cm_server            
cm_agent cm_agent cm_agent cm_agent cm_agent cm_agent cm_agent
namenode namenode          
datanode datanode datanode datanode datanode datanode datanode
    RM RM      
NM NM NM NM NM NM NM
        zookeeper zookeeper zookeeper
        kafka kafka kafka
hive hive          
spark spark spark_history        
flink flink          
sqoop sqoop          
kudu kudu          
  impala          
flume            
hue            
Oozie            
sentry            

数据迁移

HDFS数据迁移:

hadoop distcp -skipcrccheck -update hdfs://10.11.90.12:8020/zyz/ifansb/ hdfs://10.11.20.52:8020/zyz/ifansb/

hadoop distcp -skipcrccheck -update hdfs://10.11.90.12:8020/user/hive/warehouse/ifansb.db hdfs://10.11.20.52:8020/user/hive/warehouse/ifansb.db

hive 元数据迁移:

第一步先把旧集群得元数据导出来
然后导入新集群的mysql对应的库里
查询升级所需的脚本
./schematool -dbType mysql -upgradeSchemaFrom 1.10 -dryRun -passWord scm -userName scm
再mysql中依次执行所需的脚本,最后补全缺少的table
在CDH中更新hive metaStore namenode

hive分区补全

msck repair table table_name;

权限刷新

hdfs dfsadmin -refreshUserToGroupsMappings

Hive server2 HA

<property>
	<name>hive.server2.support.dynamic.service.discovery</name>
	<value>true</value>
</property>

JDBC

#hive session
hive.spark.client.future.timeout=60s
hive.spark.client.connect.timeout=600s
hive.spark.client.server.connect.timeout=600s

!connect jdbc:hive2://bjuc55.optaim.com:2181,bjuc54.optaim.com:2181,bjuc53.optaim.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2

beeline --hiveconf hive.server2.logging.operation.level=NONE  -u "jdbc:hive2://10.11.20.55:2181,10.11.20.54:2181,10.11.20.53:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"  -n hive