Pseudo-distributed cluster installation
Configuration Environment
linux system: Centos7
Virtual Machine: VMware Workstation 16 Pro
A Linux machine, also known as a node, with a JDK environment installed on it
The top one is the process that the Hadoop cluster will start. NameNode, SecondaryNameNode, and DataNode are the processes of the HDFS service, and the ResourceManager and NodeManager are the processes of the YARN service. MapRecue has no process here because it is a computing framework. Wait for the Hadoop cluster to be installed. After that, the MapReduce program can be executed on it.
Before installing the cluster, you need to download the Hadoop installation package, here we use the hadoop3.2.0 version
Then let's take a look. There is a download button on the Hadoop official website. After entering it, you will find the Apache release archive link. Click on it to find the installation packages of various versions.
Note: If you find that the download from this foreign address is slow, you can use the domestic mirror address to download, but the version of the installation package provided in these domestic mirror addresses may not be complete. If you don't find the version we need, you must be honest Download from the official website.
These domestic mirror addresses not only contain Hadoop installation packages, but also software installation packages in most Apache organizations.
After the installation package is downloaded, we start to install the pseudo-distributed cluster.
Use the bigdata01 machine here
First configure the basic environment
ip,hostname,firewalld,ssh password-free login, JDK
ip : set static ip
[root@bigdata01 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33 TYPE="Ethernet" PROXY_METHOD="none" BROWSER_ONLY="no" BOOTPROTO="static" DEFROUTE="yes" IPV4_FAILURE_FATAL="no" IPV6INIT="yes" IPV6_AUTOCONF="yes" IPV6_DEFROUTE="yes" IPV6_FAILURE_FATAL="no" IPV6_ADDR_GEN_MODE="stable-privacy" NAME="ens33" UUID="de575261-534b-4049-bad0-6a6d55a5f4f0" DEVICE="ens33" ONBOOT="yes" IPADDR=192.168.10.130 GATEWAY=192.168.10.2 DNS1=8.8.8.8 [root@bigdata01 ~]# service network restart Restarting network (via systemct1): [OK] [root@bigdata01 ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:76:da:a0 brd ff:ff:ff:ff:ff:ff inet 192.168.10.130/24 brd 192.168.10.255 scope global ens33 valid_lft forever preferred_lft forever inet6 fe80::567e:a2a:64b8:ccab/64 scope link valid_lft forever preferred_lft forever
hostname: Set temporary hostname and permanent hostname
[root@bigdata01 ~]# hostname bigdata01 [root@bigdata01 ~]# vi /etc/hostname bigdata01 Notice:recommended in/etchosts configuration in file ip and hostname(hostname)the mapping relationship, append the following to Vetc/hosts in, cannot be deleted/etc/hosts Existing content in the file! [root@bigdata01 ~]# vi /etc/hosts 192.168.10.130 bigdata01
●firewalld: Temporarily turn off the firewall + permanently turn off the firewall
[root@bigdata01 ~]# systemctl stop firewalld [root@bigdata01 ~]# systemctl disable firewalld
ssh password-free login
Here we need to briefly explain the meaning of ssh. ssh is a secure shell, a secure shell, through which you can remotely log in to a remote linux machine.
The hadoop cluster will use ssh. When we start the cluster, we only need to start it on one machine, and then hadoop will connect to other machines through ssh, and start the corresponding programs on other machines.
But now there is a problem, that is, when we use ssh to connect to other machines, we will find that we need to enter a password, so now we need to implement ssh password-free login.
Then we may have doubts. The multiple machines mentioned here need to be configured with password-free login, but we are now a pseudo-distributed cluster, and there is only one machine.
Note that no matter it is a cluster of several machines, the steps to start the program in the cluster are the same. They are all operated through ssh remote connection. Even if it is a machine, it will use ssh to connect itself. We now use ssh You even need a password yourself.
ssh password-free login
ssh, a secure/encrypted shell, uses asymmetric encryption. There are two types of encryption, symmetric encryption and asymmetric encryption. The decryption process of asymmetric encryption is irreversible, so this encryption method is relatively secure.
Asymmetric encryption will generate a secret key. The secret key is divided into a public key and a private key. Here, the public key is disclosed to the outside world, and the private key is held by itself.
Then the process of ssh communication is that the first machine will give its public key to the second machine. When the first machine wants to communicate with the second machine, the first machine will give it to the second machine. Send a random string, the second machine will encrypt the string with the public key, and the first machine will also encrypt the string with its own private key, and then pass it to the second machine as well
At this time, the second machine has two encrypted contents, one is encrypted with its own public key, and the other is encrypted with the private key of the first machine. The public key and private key are encrypted through a certain Calculated by the algorithm, at this time, the second machine will compare whether the two encrypted contents match. If it matches, the second machine will consider the first machine to be trusted and allow the login. If they are not equal it is considered an illegal machine.
Let's start to formally configure ssh password-free login. Since we need to configure our own password-free login here, the first machine and the second machine are the same
First execute ssh-keygen -t rsa on bigdata01
rsa represents an encryption algorithm
Note: After executing this command, you need to press the Enter key 4 times to return to the linux command line to indicate the end of the operation, and you do not need to enter anything when pressing the Enter key.
[root@bigdata01 ~]# ssh-keygen -t rsa
After execution, the corresponding public and private key files will be produced in the -/.ssh directory
[root@bigdata01 ~]# ll ~/.ssh/ total 12 --------. 1 root root 1679 Apr 7 16:39 id_ _rsa -rw-r--r--.1 root root 396 Apr 7 16:39 id_ rsa. pub
The next step is to copy the public key to the machine that requires password-free login
[root@bigdata01 ~]# cat ~/.ssh/id_ rsa.pub >> ~/.ssh/authorized_ keys
Then you can log in to the bigdata01 machine without password through ssh
[root@bigdata01 ~]# ssh bigdata01 Last login: Tue Apr 7 15:05:55 2020 from 192.168.182.1 [root@bigdata01 ~]#
JDK
Let's start installing the JDK
According to the development process in normal work, it is recommended to put all the software installation packages in the /data/soft directory.
We don't have a new disk here, so manually create the /data/soft directory
[root@bigdata01 ~]# mkdir -p /data/soft
Upload the JDK installation package to the data/soft/ directory
[root@bigdata01 soft]# tar -zxvf jdk-8u202-1inux-x64.tar.gz
rename jdk
[root@bigdata01 soft]# mv jdk1.8.0_202 jdk1.8
Configure environment variable JAVA_HOME
[root@bigdata01 soft]# vi /etc/profile ..... export JAVA_HOME=/data/soft/jdk1.8 export PATH=.:$JAVA_HOME/bin:$PATH
verify
[root@bigdata01 soft]# source /etc/profile [root@bigdata1 soft]# java -version java version "1.8.0_202" Java(TM) SE Runtime Environment (build 1.8.0_202-b08) Java HotSpot(TM) 64-Bit Server VM (build 25 .202-b08,mixed mode)
The basic environment is done, let's start installing Hadoop
1: First upload the hadoop installation package to the /data/soft directory
[root@bigdata01 soft]# ll total 527024 -rw-r--r--. 1 root root 345625475 Jul 19 2019 hadoop-3.2.0. tar.gz drwxr-xr-x. 7 10 143 245 Dec 16 2018 jdk1.8 -rw-r--r--. 1 root root 194042837 Apr 6 23:14 jdk-8u202-1inux-x64. tar.gz
2. Unzip the hadoop installation package
[root@bigdata01 soft]# tar -zxvf hadoop-3.2.0.tar.gz
There are two important directories under the hadoop directory, one is the bin directory and the other is the sbin directory
[root@bigdata01 soft]# cd hadoop-3.2.0 [root@bigdata01 hadoop-3.2.0]# ll total 184 drwxr-xr-x.2 1001 1002 203 Jan 8 2019 bin drwxr-xr-x.3 1001 1002 20 Jan 8 2019 etc drwxr-xr-x.2 1001 1002 106 Jan 8 2019 include drwxr-xr-x.3 1001 1002 20 Jan 8 2019 1ib drwxr-xr-x.4 1001 1002 4096 Jan 8 2019 libexec -rW-rW-r--.1 1001 1002 150569 Oct 19 2018 LICENSE. txt -rw-rw-r--.1 1001 1002 22125 Oct 19 2018 NOTICE . txt -rw-rw-r--.1 1001 1002 1361 Oct 19 2018 README.txt drwxr-xr-x.3 1001 1002 4096 Jan 8 2019 sbin drwxr-xr-x.4 1001 1002 31 Jan 8 2019 share
Let's take a look at the bin directory. There are scripts such as hdfs and yarn. These scripts are mainly used to operate the hdfs and yarn components in the hadoop cluster.
Let's take a look at the sbin directory. There are many scripts starting with start and stop. These scripts are responsible for starting or stopping components in the cluster.
In fact, another important directory is the etc/hadoop directory. The files in this directory are mainly some configuration files of hadoop, which are more important. After a while we install hadoop, mainly to modify the files under this directory.
Because we will use some scripts under the bin directory and the sbin directory, for convenience, we need to configure environment variables.
[root@bigdata01 hadoop-3.2.0]# vi /etc/profile ....... export JAVA_HOME=/data/soft/jdk1.8 export HADOOP_HOME=/data/soft/hadoop-3.2.0 export PATH=.:$JAVA_HOME/bin:$HADOOP_HOME/sbin: $HADOOP_HOME/bin: $PATH [root@bigdata01 hadoop-3.2.0]# source /etc/profile
3: Modify Hadoop related configuration files
Go to the directory where the configuration file is located
[root@bigdata1 hadoop-3.2.0]# cd etc/hadoop/ [root@bigdata01 hadoop]#
Mainly modify the following files:
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
workers
First, modify the hadoop-env.sh file, add the environment variable information, and add it to the end of the hadoop-env.sh file.
JAVA_HOME: Specify the installation location of java
HADOOP_LOG_DIR: Storage directory of hadoop logs
[root@bigdata01 hadoop]# vi hadoop-env.sh ....... export JAVA_HOME=/data/soft/jdk1.8 export HADOOP LOG_DIR=/data/hadoop_repo/1ogs/hadoop
Modify the core-site.xml file
Notice fs.defaultFS The hostname in the properties needs to be the same as the hostname you configured [root@bigdata01 hadoop]# vi core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata01:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop_repo</value> </property> </configuration>
Modify the hdfs-site.xml file and set the number of file copies in hdfs to 1, because now the pseudo-distributed cluster has only one node
[root@bigdata01 hadoop]# vi hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
Modify mapred-site.xml to set the resource scheduling framework used by mapreduce
[root@bigdata01 hadoop]# vi mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
Modify yarn-site.xml to set the whitelist of services and environment variables that support running on yarn
[root@bigdata01 hadoop]# vi yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>
Modify the workers and set the hostname information of the slave nodes in the cluster. There is only one cluster here, so just fill in bigdata01.
[root@bigdata01 hadoop]# vi workers bigdata01
The configuration file has been modified at this point, but it cannot be started directly, because HDFS in Hadoop is a distributed file system, and the file system needs to be formatted before using it, just like when we buy a new disk, in Before installing the system, it needs to be formatted before it can be used.
4: Format HDFS
[root@bigdata01 hadoop]# cd /data/soft/hadoop-3.2.0 [root@bigdata01 hadoop-3.2.0]# bin/hdfs namenode -format
If you can see the message successfully formatted, the formatting is successful.
If an error is prompted, it is generally due to a configuration file problem. Of course, it is necessary to analyze the problem according to the specific error message.
Note: The formatting operation can only be performed once. If the formatting fails, you can modify the configuration file and then execute the formatting. If the formatting is successful, you cannot repeat the execution, otherwise the cluster will have problems.
If you really need to repeat the execution, you need to delete all the content in the /data/hadoop_repo directory, and then execute the formatting
It can be understood in this way, we buy a new disk and come back to install the operating system. We will format it before using it for the first time. Will you format it after you have nothing to do? Certainly not, the operating system has to be reinstalled after formatting.
5: Start a pseudo-distributed cluster
Use the start-all.sh script in the sbin directory
[root@bigdata01 hadoop-3.2.0]# sbin/start-all.sh
When executing, I found a lot of ERROR information, indicating that some user information of HDFS and YARN is missing.
The solution is as follows:
Modify the two script files start-dfs.sh and stop-dfs.sh in the sbin directory, and add the following content in front of the file
[root@bigdata01 hadoop-3.2.0]# cd sbin/ [root@bigdata01 sbin]# vi start-dfs.sh HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root [root@bigdata01 sbin]# vi stop-dfs.sh HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
Modify the two script files start-yarn.sh and stop-yarn.sh in the sbin directory, and add the following content in front of the file
[root@bigdata01 sbin]# vi start-yarn.sh YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root [root@bigdata01 sbin]# vi stop-yarn.sh YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
restart the cluster
[root@bigdata01 sbin]# cd /data/soft/hadoop-3.2.0 [root@bigdata01 hadoop-3.2.0]# sbin/start-all.sh
6: Verify cluster process information
Execute the jps command to view the process information of the cluster. In addition to the Jps process, 5 processes are required to indicate that the cluster is started normally.
[root@bigdata01 hadoop-3.2.0]# jps
You can also verify whether the cluster service is normal through the webui interface
? HDFS webui interface: http://192.168.10.130:9870 ? YARN webui interface: http://192.168.10.130:8088
If you want to access by hostname, you need to modify the hosts file in the windows machine
The location of the file is: C:WindowsSystem32driversetcHOSTS
Add the following content to the file, which is actually the ip and hostname of the Linux virtual machine. After doing a mapping here, you can access the Linux virtual machine through the hostname on the Windows machine.
192.168.10.130 bigdata01
Note: If this file cannot be modified, it is usually due to permission problems. You can choose to open it in administrator mode when opening it.
7: Stop the cluster
If you modify the cluster configuration file or stop the cluster for other reasons, you can use the following command
[root@bigdata01 hadoop-3.2.0] # sbin/stop-all.sh
Distributed cluster installation
Environment Preparation: Three Nodes
bigdata01 192.168.10.130
bigdata02 192.168.10.131
bigdata03 192.168.10.132
Note: The basic environment of each node must be configured first. First, configure the basic environments such as ip, hostname, firewalld, ssh password-free login, and JDK. The current number of nodes is not enough. According to the content of the first week, To create multiple nodes by cloning, first delete the previously installed hadoop in bigdata01, delete the decompressed directory, and modify the environment variables.
Note: We need to delete the hadoop_repo directory under the /data directory and the hadoop-3.2.0 directory under /data/soft in the bigdata01 node to restore the environment of this node, which records some information about the previous pseudo-distributed cluster.
[root@bigdata01 ~]# rm -rf /data/soft/hadoop-3.2.0 [root@bigdata01 ~]# rm -rf /data/hadoop_repo
Suppose we now have three linux machines, all with brand new environments.
Let's get started.
Note: The configuration steps for basic environments such as ip, hostname, firewalld, and JDK for these three machines are no longer recorded here.
bigdata01
bigdata02
bigdata03
The basic environments of ip, hostname, firewalld, ssh password-free login, and JDK of these three machines have been configured ok.
After these basic environments are configured, it is not finished, and there are still some configurations that need to be improved.
configure /etc/hosts
Because two slave nodes need to be remotely connected to the master node, it is necessary to enable the master node to recognize the host name of the slave node and use the host name for remote access. By default, only ip remote access can be used. If you want to use the host name for remote access, you need to Configure the ip and hostname information of the corresponding machine in the /etc/hosts file of the node.
So here we need to configure the following information in the /etc/hosts file of bigdata01. It is best to configure the current node information into it, so that the content in this file is universal and can be directly copied to the other two slave nodes.
[root@bigdata01 ~]# vi /etc/hosts 192.168.10.130 bigdata01 192.168.10.131 bigdata02 192.168.10.132 bigdata03
Modify the /etc/hosts file of bigdata02
[root@bigdata02 ~]# vi /etc/hosts 192.168.10.130 bigdata01 192.168.10.131 bigdata02 192.168.10.132 bigdata03
Modify the /etc/hosts file of bigdata03
[root@bigdata03 ~] # vi /etc/hosts 192.168.10.130 bigdata01 192.168.10.131 bigdata02 192.168.10.132 bigdata03
Time synchronization between cluster nodes
As long as the cluster involves multiple nodes, it is necessary to synchronize the time of these nodes. If the time difference between the nodes is too different, the stability of the cluster will be affected, or even cause problems in the cluster.
First operate on the bigdata01 node
Use ntpdate -u ntp.sjtu.edu.cn to achieve time synchronization, but when executing it, it prompts that the ntpdata command cannot be found
[root@bigdata01 ~]# ntpdate -u ntp.sjtu.edu.cn -bash: ntpdate: command not found
There is no ntpdate command by default, you need to use yum to install online, execute the command yum install -y ntpdate
[root@bigdata01 ~]# yum install -y ntpdate
Then manually execute ntpdate -u ntp.sjtu.edu.cn to confirm whether it can be executed normally
[root@bigdata01 ~]# ntpdate -u ntp.sjtu.edu.cn
It is recommended to add this synchronization time operation to the linux crontab timer and execute it every minute
[root@bigdata01 ~]# vi /etc/crontab
Then configure time synchronization on bigdata02 and bigdata03 nodes
Operate on the bigdata02 node
[root@bigdata02 ~]# yum install -y ntpdate [root@bigdata02 ~]# vi /etc/crontab
Operate on the bigdata03 node
[root@bigdata03 ~]# yum install -y ntpdate [root@bigdata03 ~]# vi /etc/crontab
###SSH password-free login perfect
Note: For password-free login, only oneself password-free login is currently implemented. Finally, it is necessary to realize that the host point can log in to all nodes without password, so it is necessary to improve the password-free login operation.
First, execute the following command on the bigdata01 machine to copy the public key information to the two slave nodes
[root@bigdata01 ~]# scp ~/.ssh/authorized_keys bigdata02:~/ [root@bigdata01 ~]# scp ~/.ssh/authorized_keys bigdata03:~/
Then execute on bigdata02 and bigdata03
bigdata02:
[root@bigdata02 ~]# cat ~/authorized_keys >> ~/.ssh/authorized_keys
bigdata03:
[root@bigdata03 ~]# cat ~/authorized_keys >> ~/.ssh/authorized_keys
To verify the effect, use ssh to remotely connect two slave nodes on the bigdata01 node. If you do not need to enter a password, it means that it is successful. At this time, the host can log in to all nodes without a password.
[root@bigdata01 ~]# ssh bigdata02 [root@bigdata02 ~]# exit [root@bigdata01 ~]# ssh bigdata03 [root@bigdata03 ~]# exit
install hadoop
First install it on the bigdata01 node.
1: Upload the hadoop-3.2.0.tar.gz installation package to the /data/soft directory of the linux machine
[root@bigdata01 soft]# ll
2. Unzip the hadoop installation package
[root@bigdata01 soft]# tar -zxvf hadoop-3.2.0.tar.gz
3. Modify hadoop related configuration files
Go to the directory where the configuration file is located
[root@bigdata01 soft]# cd hadoop-3.2.0/etc/hadoop/ [root@bigdata01 hadoop]#
First modify the hadoop-env.sh file and add environment variable information at the end of the file
[root@bigdata01 hadoop]# vi hadoop-env.sh export JAVA_HOME=/data/soft/jdk1.8 export HADOOP_LOG_DIR=/data/hadoop_repo/logs/hadoop
Modify the core-site.xml file, note that the hostname in the fs.defaultFS property needs to be consistent with the hostname of the master node
[root@bigdata01 hadoop]# vi core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata01:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop_repo</value> </property> </configuration>
Modify the hdfs-site.xml file and set the number of file copies in hdfs to 2, up to 2, because now there are two slave nodes in the cluster, and the node information where the secondaryNamenode process is located
[root@bigdata01 hadoop]# vi hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>bigdata01:50090</value> </property> </configuration>
Modify mapred-site.xml to set the resource scheduling framework used by mapreduce
[root@bigdata01 hadoop]# vi mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
Modify yarn-site.xml to set the whitelist of services and environment variables that support running on yarn
Note that for distributed clusters, the hostname of the resourcemanager needs to be set in this configuration file, otherwise the nodemanager cannot find the resourcemanager node.
[root@bigdata01 hadoop]# vi yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>bigdata01</value> </property> </configuration>
Modify the workers file and add the hostnames of all slave nodes, one line at a time
[root@bigdata01 hadoop]# vi workers bigdata02 bigdata03
Modify the startup script
Modify the two script files start-dfs.sh and stop-dfs.sh, and add the following content in front of the file
[root@bigdata01 hadoop]# cd /data/soft/hadoop-3.2.0/sbin [root@bigdata01 sbin]# vi start-dfs.sh HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root [root@bigdata01 sbin]# vi stop-dfs.sh HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
Modify the two script files, start-yarn.sh and stop-yarn.sh, and add the following content in front of the file
[root@bigdata01 sbin]# vi start-yarn.sh YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root [root@bigdata01 sbin]# vi stop-yarn.sh YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
4: Copy the installation package with the modified configuration on the bigdata01 node to the other two slave nodes
[root@bigdata01 sbin]# cd /data/soft/ [root@bigdata01 soft]# scp -rq hadoop-3.2.0 bigdata02:/data/soft/ [root@bigdata01 soft]# scp -rq hadoop-3.2.0 bigdata03:/data/soft/
5. Format HDFS on the bigdata01 node
[root@bigdata01 soft]# cd /data/soft/hadoop-3.2.0 [root@bigdata01 hadoop-3.2.0]# bin/hdfs namenode -format
6. Start the cluster and execute the following command on the bigdata01 node
[root@bigdata01 hadoop-3.2.0]# sbin/start-all.sh
7. Verify the cluster
Execute the jps command on the three machines respectively, and the process information is as follows:
Execute on the bigdata01 node
[root@bigdata01 hadoop-3.2.0]# jps
Execute on the bigdata02 node
[root@bigdata02]# jps
Execute on the bigdata03 node
[root@bigdata03]# jps
8. Stop the cluster
Execute the stop command on the bigdata01 node
[root@bigdata01 hadoop-3.2.0]# sbin/stop-all.sh
So far, the hadoop distributed cluster has been successfully installed
When writing notes for the first time, if there are any mistakes in the article, you are welcome to point out in the comment area