Hadoop 3.1.3 detailed installation tutorial

Hadoop 3.1.3 detailed installation tutorial

Construction of Hadoop running environment

Virtual machine environment preparation (all three virtual machines must be executed)

1. Prepare three virtual machines. The virtual machine configuration requirements are as follows:
(1) Single virtual machine: 4G memory, 50G hard disk, necessary installation environment

sudo yum install -y epel-release
sudo yum install -y psmisc nc net-tools rsync vim lrzsz ntp libzstd openssl-static tree iotop git

(2) Modify the static IP of the cloned virtual machine;

sudo vim /etc/sysconfig/network-scripts/ifcfg-ens33

DEVICE=ens33
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static
NAME="ens33"
IPADDR=192.168.43.153
PREFIX=24
GATEWAY=192.168.43.2
DNS1=192.168.43.2

2 . Modify host name

(1) Modify host name

sudo hostnamectl --static set-hostname hadoop001

(2) Configure the host name mapping and open / etc/hosts

sudo vim /etc/hosts

192.168.43.153 hadoop001
192.168.43.154 hadoop002
192.168.43.155 hadoop003

(3) Modify the host mapping file of window (hosts file)

Enter C:WindowsSystem32driversetc path
192.168.43.153 hadoop001
192.168.43.154 hadoop002
192.168.43.155 hadoop003

3. Turn off the firewall

sudo systemctl stop firewalld
sudo systemctl disable firewalld

4 . Create a folder in the / opt directory

cd /opt
sudo mkdir software

5 . Install JDK
(1) Uninstall existing JDK

rpm -qa | grep -i java | xargs -n1 sudo rpm -e --nodeps

(2) Install JDK (upload all tar packages to / opt/software directory)

cd /opt/software
sudo tar -zxvf jdk-8u201-linux-x64.tar.gz
sudo vi /etc/profile

Add at the end of the file
export JAVA_HOME=/opt/software/jdk1.8.0_201
export PATH= J A V A _ H O M E / b i n : JAVA\_HOME/bin: JAVA_HOME/bin:PATH

Make environment variables effective

sudo source /etc/profile
java -version

6 . Install hadoop
(1) Download Hadoop installation package
(2) Unzip hadoop-3.1.3 tar. gz

tar -zxvf hadoop-3.1.3.tar.gz

(3) Configure environment variables

sudo vi /etc/profile

Add at the end of the file
export HADOOP_HOME=/opt/software/hadoop-3.1.3
export PATH= P A T H : PATH: PATH:HADOOP_HOME/bin
export PATH= P A T H : PATH: PATH:HADOOP_HOME/sbin

Make environment variables effective

sudo source /etc/profile
hadoop version

Configure the cluster (all three virtual machines need to execute)

1 . Configure ssh password free login
(1) Generate public and private keys
Then click (three carriage returns) and two file IDS will be generated_ RSA (private key), id_rsa Pub (public key)

ssh-keygen -t rsa

(2) Copy the public key to the target machine for password free login

ssh-copy-id hadoop001
ssh-copy-id hadoop002
ssh-copy-id hadoop003

2 . Cluster configuration
(1) Cluster deployment planning

assembly

hadoop001

hadoop002

hadoop003

HDFS

NameNode,dataNode

dataNode

dataNode,SecondaryNameNode

YARN

NodeManager

ResourceManager,NodeManager

NodeManager

(2) Configure cluster
Configure core site xml

cd $HADOOP_HOME/etc/hadoop
vim core-site.xml

The contents of the document are as follows:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:9870</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/software/hadoop-3.1.3/data</value>
    </property>
    <property>
        <name>hadoop.proxyuser.atguigu.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
    </property>
</configuration>

Configure HDFS site xml

vim hdfs-site.xml

The contents of the document are as follows:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop003:9868</value>
    </property>
</configuration>

Configure yarn site xml

vim yarn-site.xml

The contents of the document are as follows:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop002</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
</configuration>

Configure mapred site xml

vim mapred-site.xml

The contents of the document are as follows:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

Configure workers

vim /opt/software/hadoop-3.1.3/etc/hadoop/workers

The contents of the document are as follows:

hadoop001
hadoop002
hadoop003

3 . Start cluster
(1) If the cluster is started for the first time, you need to format the namenode on Hadoop 001 node (note that before formatting, you must first stop all namenode and datanode processes started last time, and then delete data and log data)

hdfs namenode -format

(2) Start HDFS
If there are error messages of these parameters during startup, add these parameters to these files under sbin / directory

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

start-dfs.sh

(3) Start YARN on the node (Hadoop 002) where the resource manager is configured

start-yarn.sh

(4) View relevant pages on the Web
namenode
http://hadoop001:9870

yarn
http://hadoop002:8088

Tags: Java Back-end

Posted by VirusDoctor on Mon, 18 Apr 2022 12:06:07 +0930