Level 1: Deployment, installation and management of Hadoop clusters

We have prepared three virtual servers, which are connected as follows:

serverSSHpasswordip
masterssh 172.18.0.2123123172.18.0.2
slave1ssh 172.18.0.3123123172.18.0.3
slave2ssh 172.18.0.4123123172.18.0.4

The first step we need to initialize the virtual server on the evassh server:

cd /opt
wrapdocker
ulimit -f unlimited
docker load -i ubuntu16-ssh.tar
docker-compose up -d

Note: Please do not ssh login between each virtual server, this operation will cause the configuration data not to be saved. The correct way is: after executing exit in the virtual server, return to the evassh server, and then log in to each virtual server according to the above method.

 

file transfer

Put the Java installation package and Hadoop installation package above evassh into the /opt directory on the master server through the scp command.

scp /opt/jdk-8u141-linux-x64.tar.gz root@172.18.0.2:/opt
scp /opt/hadoop-3.1.0.tar.gz root@172.18.0.2:/opt

The first time you connect, you will be asked whether to continue the connection. Enter yes on the keyboard and enter the password 123123 to transfer.

 

 

Configure password-free login

In the process of cluster building, we will frequently jump between servers. This process is connected through SSH. In order to avoid entering a password during the startup process, we can configure password-free login.

1. Generate keys on master, slave1, and slave2 respectively. The commands are as follows:

Generate a key on the master server:

# Enter the master server, enter yes and password 123123 on the keyboard
ssh 172.18.0.2
ssh-keygen -t rsa
# After executing the command, press the Enter key three times in a row to generate the secret key.

Here we can open multiple command line windows, which can greatly reduce the number of jumps between servers.

 

Click the + sign to open multiple windows, up to a total of 3 command line windows can be opened.

Generate the key on the salve1 server:

 
 Enter salve1 server,keyboard input yes with password 123123
ssh 172.18.0.3
ssh-keygen -t rsa

Generate the key on the salve2 server:

 
# Enter the salve2 server, enter yes and password 123123 on the keyboard
ssh 172.18.0.4
ssh-keygen -t rsa

The master, slave1 and slave2 have been mapped directly, so there is no need to do the mapping here.

2. Copy the public keys of master, slave1, and slave2 on the master.

cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys
ssh slave1 cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys
ssh slave2 cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys
 The password is: 123123
 

3. Copy the master's authorized_keys file on slave1.

ssh master cat ~/.ssh/authorized_keys>> ~/.ssh/authorized_keys

4. Copy the master's authorized_keys file on slave2.

ssh master cat ~/.ssh/authorized_keys>> ~/.ssh/authorized_keys

Password-free between clusters has been successfully set up

Cluster installation JavaJDK

In the /opt directory of the master server, there are the Java installation package and the Hadoop installation package sent by the evassh server. Unzip the Java installation package to the /usr/local directory.

tar -zxvf /opt/jdk-8u141-linux-x64.tar.gz -C /usr/local/

After decompressing the JDK, you need to configure the JDK in the environment variables before it can be used. Next, configure the JDK. Enter the command: vim /etc/profile to edit the configuration file; enter the following code at the end of the file (no spaces are allowed):

export JAVA_HOME=/usr/local/jdk1.8.0_141
export PATH=$PATH:$JAVA_HOME/bin

Then, save and exit.

at last:

source /etc/profile

Make the configuration just now take effect.

enter:

java -version

The following interface appears, indicating that the configuration is successful.

 

Send the decompressed JDK and configuration files to slave1 and slave2 through the scp command.

 
#send JDK
scp -r /usr/local/jdk1.8.0_141/ root@slave1:/usr/local/
scp -r /usr/local/jdk1.8.0_141/ root@slave2:/usr/local/
#send configuration file
scp /etc/profile root@slave1:/etc/
scp /etc/profile root@slave2:/etc/

Execute on slave1 and slave2 servers respectively

source /etc/profile

Make the sent configuration take effect.

Hadoop Distributed Cluster Construction

Unzip and rename

Unzip the Hadoop package to the /usr/local directory, and rename the unzipped folder to hadoop .

 
  1. tar -zxvf /opt/hadoop-3.1.0.tar.gz -C /usr/local/
    
    cd /usr/local
    
    mv hadoop-3.1.0/ hadoop

Add Hadoop to environment variables

vi /etc/profile
Insert the following code at the end of the file:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Finally make the changes take effect:
source /etc/profile

Create folder

mkdir /var/hadoop
mkdir /var/hadoop/disk
mkdir /var/hadoop/logs
mkdir /var/hadoop/tmp
mkdir /var/hadoop/namenode
Modify the configuration file

Go to the hadoop configuration folder:

cd /usr/local/hadoop/etc/hadoop/
Revise core-site.xml document core-site.xml is the core configuration file we need to add to this fileHDFSofURIandNameNodeThe location of the temporary folder, this temporary folder will be created below.  at the end of the fileconfigurationAdd the following code to the tag:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/tmp</value>
</property>
</configuration>

Modify the hdfs-site.xml file replication refers to the number of copies, we are now 3 nodes, so it is 3. The property name dfs.datanode.data.dir represents the physical storage location of the data block on the datanode, and the property name dfs.namenode.name.dir represents the hdfs namespace metadata stored on the namenode

 
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/var/hadoop/disk</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/var/hadoop/namenode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>

Modify the yarn-site.xml file

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Modify the mapred-site.xml file

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Modify the hadoop-env.sh file The file is mainly to configure the location of the JDK, and add the statement to the file:

export JAVA_HOME=/usr/local/jdk1.8.0_141

Modify the workers file hadoop-2.6 This file is called slaves , and it is now workers in version 3.0.

master
slave1
slave2

 

Because the root user can't start hadoop yet, let's set it up.

Under the /hadoop/sbin path:

cd /usr/local/hadoop/sbin

. Will

vi start-dfs.sh
vi stop-dfs.sh

Add the following parameters at the top of the two files

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

and also,

vi start-yarn.sh
vi stop-yarn.sh

Also add the following at the top:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Format

Before using Hadoop we need to format some basic information about hadoop. Use the following command:

hadoop namenode -format

The following interface appears to indicate success:

 

Send to two other servers

scp -r /usr/local/hadoop/ root@slave1:/usr/local/
scp -r /usr/local/hadoop/ root@slave2:/usr/local/

Start Hadoop

Next we start Hadoop:

start-all.sh

Use jps to check whether the hadoop process started successfully. On master:

Tags: Linux server Hadoop

Posted by xengvang on Tue, 08 Nov 2022 16:17:53 +1030