We have prepared three virtual servers, which are connected as follows:
server | SSH | password | ip |
---|---|---|---|
master | ssh 172.18.0.2 | 123123 | 172.18.0.2 |
slave1 | ssh 172.18.0.3 | 123123 | 172.18.0.3 |
slave2 | ssh 172.18.0.4 | 123123 | 172.18.0.4 |
The first step we need to initialize the virtual server on the evassh server:
cd /opt wrapdocker ulimit -f unlimited docker load -i ubuntu16-ssh.tar docker-compose up -d
Note: Please do not ssh login between each virtual server, this operation will cause the configuration data not to be saved. The correct way is: after executing exit in the virtual server, return to the evassh server, and then log in to each virtual server according to the above method.
file transfer
Put the Java installation package and Hadoop installation package above evassh into the /opt directory on the master server through the scp command.
scp /opt/jdk-8u141-linux-x64.tar.gz root@172.18.0.2:/opt scp /opt/hadoop-3.1.0.tar.gz root@172.18.0.2:/opt
The first time you connect, you will be asked whether to continue the connection. Enter yes on the keyboard and enter the password 123123 to transfer.
Configure password-free login
In the process of cluster building, we will frequently jump between servers. This process is connected through SSH. In order to avoid entering a password during the startup process, we can configure password-free login.
1. Generate keys on master, slave1, and slave2 respectively. The commands are as follows:
Generate a key on the master server:
# Enter the master server, enter yes and password 123123 on the keyboard ssh 172.18.0.2 ssh-keygen -t rsa # After executing the command, press the Enter key three times in a row to generate the secret key.
Here we can open multiple command line windows, which can greatly reduce the number of jumps between servers.
Click the + sign to open multiple windows, up to a total of 3 command line windows can be opened.
Generate the key on the salve1 server:
Enter salve1 server,keyboard input yes with password 123123 ssh 172.18.0.3 ssh-keygen -t rsa
Generate the key on the salve2 server:
# Enter the salve2 server, enter yes and password 123123 on the keyboard ssh 172.18.0.4 ssh-keygen -t rsa
The master, slave1 and slave2 have been mapped directly, so there is no need to do the mapping here.
2. Copy the public keys of master, slave1, and slave2 on the master.
cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys ssh slave1 cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys ssh slave2 cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys The password is: 123123
3. Copy the master's authorized_keys file on slave1.
ssh master cat ~/.ssh/authorized_keys>> ~/.ssh/authorized_keys
4. Copy the master's authorized_keys file on slave2.
ssh master cat ~/.ssh/authorized_keys>> ~/.ssh/authorized_keys
Password-free between clusters has been successfully set up
Cluster installation JavaJDK
In the /opt directory of the master server, there are the Java installation package and the Hadoop installation package sent by the evassh server. Unzip the Java installation package to the /usr/local directory.
tar -zxvf /opt/jdk-8u141-linux-x64.tar.gz -C /usr/local/
After decompressing the JDK, you need to configure the JDK in the environment variables before it can be used. Next, configure the JDK. Enter the command: vim /etc/profile to edit the configuration file; enter the following code at the end of the file (no spaces are allowed):
export JAVA_HOME=/usr/local/jdk1.8.0_141 export PATH=$PATH:$JAVA_HOME/bin
Then, save and exit.
at last:
source /etc/profile
Make the configuration just now take effect.
enter:
java -version
The following interface appears, indicating that the configuration is successful.
Send the decompressed JDK and configuration files to slave1 and slave2 through the scp command.
#send JDK scp -r /usr/local/jdk1.8.0_141/ root@slave1:/usr/local/ scp -r /usr/local/jdk1.8.0_141/ root@slave2:/usr/local/ #send configuration file scp /etc/profile root@slave1:/etc/ scp /etc/profile root@slave2:/etc/
Execute on slave1 and slave2 servers respectively
source /etc/profile
Make the sent configuration take effect.
Hadoop Distributed Cluster Construction
Unzip and rename
Unzip the Hadoop package to the /usr/local directory, and rename the unzipped folder to hadoop .
-
tar -zxvf /opt/hadoop-3.1.0.tar.gz -C /usr/local/ cd /usr/local mv hadoop-3.1.0/ hadoop
Add Hadoop to environment variables
vi /etc/profile
Insert the following code at the end of the file:
export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Finally make the changes take effect:
source /etc/profile
Create folder
mkdir /var/hadoop mkdir /var/hadoop/disk mkdir /var/hadoop/logs mkdir /var/hadoop/tmp mkdir /var/hadoop/namenode
Modify the configuration file
Go to the hadoop configuration folder:
cd /usr/local/hadoop/etc/hadoop/
Revise core-site.xml document core-site.xml is the core configuration file we need to add to this fileHDFSofURIandNameNodeThe location of the temporary folder, this temporary folder will be created below. at the end of the fileconfigurationAdd the following code to the tag:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/var/hadoop/tmp</value> </property> </configuration>
Modify the hdfs-site.xml file replication refers to the number of copies, we are now 3 nodes, so it is 3. The property name dfs.datanode.data.dir represents the physical storage location of the data block on the datanode, and the property name dfs.namenode.name.dir represents the hdfs namespace metadata stored on the namenode
<configuration> <property> <name>dfs.datanode.data.dir</name> <value>/var/hadoop/disk</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/var/hadoop/namenode</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
Modify the yarn-site.xml file
<configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Modify the mapred-site.xml file
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
Modify the hadoop-env.sh file The file is mainly to configure the location of the JDK, and add the statement to the file:
export JAVA_HOME=/usr/local/jdk1.8.0_141
Modify the workers file hadoop-2.6 This file is called slaves , and it is now workers in version 3.0.
master slave1 slave2
Because the root user can't start hadoop yet, let's set it up.
Under the /hadoop/sbin path:
cd /usr/local/hadoop/sbin
. Will
vi start-dfs.sh vi stop-dfs.sh
Add the following parameters at the top of the two files
#!/usr/bin/env bash HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
and also,
vi start-yarn.sh vi stop-yarn.sh
Also add the following at the top:
#!/usr/bin/env bash YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
Format
Before using Hadoop we need to format some basic information about hadoop. Use the following command:
hadoop namenode -format
The following interface appears to indicate success:
Send to two other servers
scp -r /usr/local/hadoop/ root@slave1:/usr/local/ scp -r /usr/local/hadoop/ root@slave2:/usr/local/
Start Hadoop
Next we start Hadoop:
start-all.sh
Use jps to check whether the hadoop process started successfully. On master: