Hadoop 3.1.3 detailed installation tutorial
Construction of Hadoop running environment
Virtual machine environment preparation (all three virtual machines must be executed)
1. Prepare three virtual machines. The virtual machine configuration requirements are as follows:
(1) Single virtual machine: 4G memory, 50G hard disk, necessary installation environment
sudo yum install -y epel-release sudo yum install -y psmisc nc net-tools rsync vim lrzsz ntp libzstd openssl-static tree iotop git
(2) Modify the static IP of the cloned virtual machine;
sudo vim /etc/sysconfig/network-scripts/ifcfg-ens33
DEVICE=ens33
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static
NAME="ens33"
IPADDR=192.168.43.153
PREFIX=24
GATEWAY=192.168.43.2
DNS1=192.168.43.2
2 . Modify host name
(1) Modify host name
sudo hostnamectl --static set-hostname hadoop001
(2) Configure the host name mapping and open / etc/hosts
sudo vim /etc/hosts
192.168.43.153 hadoop001
192.168.43.154 hadoop002
192.168.43.155 hadoop003
(3) Modify the host mapping file of window (hosts file)
Enter C:WindowsSystem32driversetc path
192.168.43.153 hadoop001
192.168.43.154 hadoop002
192.168.43.155 hadoop003
3. Turn off the firewall
sudo systemctl stop firewalld sudo systemctl disable firewalld
4 . Create a folder in the / opt directory
cd /opt sudo mkdir software
5 . Install JDK
(1) Uninstall existing JDK
rpm -qa | grep -i java | xargs -n1 sudo rpm -e --nodeps
(2) Install JDK (upload all tar packages to / opt/software directory)
cd /opt/software sudo tar -zxvf jdk-8u201-linux-x64.tar.gz sudo vi /etc/profile
Add at the end of the file
export JAVA_HOME=/opt/software/jdk1.8.0_201
export PATH=
J
A
V
A
_
H
O
M
E
/
b
i
n
:
JAVA\_HOME/bin:
JAVA_HOME/bin:PATH
Make environment variables effective
sudo source /etc/profile java -version
6 . Install hadoop
(1) Download Hadoop installation package
(2) Unzip hadoop-3.1.3 tar. gz
tar -zxvf hadoop-3.1.3.tar.gz
(3) Configure environment variables
sudo vi /etc/profile
Add at the end of the file
export HADOOP_HOME=/opt/software/hadoop-3.1.3
export PATH=
P
A
T
H
:
PATH:
PATH:HADOOP_HOME/bin
export PATH=
P
A
T
H
:
PATH:
PATH:HADOOP_HOME/sbin
Make environment variables effective
sudo source /etc/profile hadoop version
Configure the cluster (all three virtual machines need to execute)
1 . Configure ssh password free login
(1) Generate public and private keys
Then click (three carriage returns) and two file IDS will be generated_ RSA (private key), id_rsa Pub (public key)
ssh-keygen -t rsa
(2) Copy the public key to the target machine for password free login
ssh-copy-id hadoop001 ssh-copy-id hadoop002 ssh-copy-id hadoop003
2 . Cluster configuration
(1) Cluster deployment planning
assembly
hadoop001
hadoop002
hadoop003
HDFS
NameNode,dataNode
dataNode
dataNode,SecondaryNameNode
YARN
NodeManager
ResourceManager,NodeManager
NodeManager
(2) Configure cluster
Configure core site xml
cd $HADOOP_HOME/etc/hadoop vim core-site.xml
The contents of the document are as follows:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop001:9870</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/software/hadoop-3.1.3/data</value> </property> <property> <name>hadoop.proxyuser.atguigu.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> <property> <name>hadoop.http.staticuser.user</name> <value>root</value> </property> </configuration>
Configure HDFS site xml
vim hdfs-site.xml
The contents of the document are as follows:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop003:9868</value> </property> </configuration>
Configure yarn site xml
vim yarn-site.xml
The contents of the document are as follows:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop002</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>4096</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>
Configure mapred site xml
vim mapred-site.xml
The contents of the document are as follows:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
Configure workers
vim /opt/software/hadoop-3.1.3/etc/hadoop/workers
The contents of the document are as follows:
hadoop001
hadoop002
hadoop003
3 . Start cluster
(1) If the cluster is started for the first time, you need to format the namenode on Hadoop 001 node (note that before formatting, you must first stop all namenode and datanode processes started last time, and then delete data and log data)
hdfs namenode -format
(2) Start HDFS
If there are error messages of these parameters during startup, add these parameters to these files under sbin / directory
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
start-dfs.sh
(3) Start YARN on the node (Hadoop 002) where the resource manager is configured
start-yarn.sh
(4) View relevant pages on the Web
namenode
http://hadoop001:9870
yarn
http://hadoop002:8088