This case package: link: https://pan.baidu.com/s/1zABhjj2umontXe2CYBW_DQ
Extraction code: 1123 (if the link fails, comment below and I will update it in time)
Table of contents
(1) Configure the master node of the Hadoop cluster
1. Enter the configuration file:
2. Modify the hadoop-env.sh file
3. Configure the core-site.xml file
4. Configure the hdfs-site.xml file
5. Configure the mapred-site.xml file
6. Configure the yarn-site.xml file
7. Set the slave node, that is, modify the workers file
(2) Distribute the configuration file of the cluster master node to other nodes
Edit 2. Add the IP mapping of the cluster service on the Windows host
(1) Configure the master node of the Hadoop cluster
1. Enter the configuration file:
cd /usr/local/hadoop/etc/hadoop
2. Modify the hadoop-env.sh file
Find the JAVA_HOME parameter location and add the JDK path.
sudo vi hadoop-env.sh
3. Configure the core-site.xml file
sudo vi core-site.sml
Edit core-site.xml file content
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> </configuration>
The main process NameNode running the host (the main node of the Hadoop cluster) configured with HDFS, and the temporary directory of the data generated when the Hadoop cluster is running.
4. Configure the hdfs-site.xml file
This file is used to set up the two processes of HDFS NameNode and DataNode
Open the hdfs-site.xml file
sudo vi hdfs-site.xml
Edit hdfs-site.xml file content
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>slave01:50090</value> </property> <property> <name>dfs.namenode.http-address</name> <value>master:50070</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name> dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property> </configuration>
In the configuration file above, the number of copies of HDFS data blocks (the default value is 3) is configured with the Secondarynamenode, the IP and port of the host where the namenode is located, the number of copies of HDFS blocks, and the directory where temporary files are stored.
5. Configure the mapred-site.xml file
This file is used to specify the MapReduce running framework and is the core configuration file of MapReduce.
Open the mapred-site.xml file. The command is as follows.
sudo vi mapred-site.xml
Edit mapred-site.xml file content
<configuration> <property> <name> mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> <property> <name> yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> </configuration>
6. Configure the yarn-site.xml file
This file is used to specify the manager of the YARN cluster
Open yarn-site.xml folder
sudo vi yarn-site.xml
Edit yarn-site.xml file content
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
The ResourceManager running host is specified as the master, and the MapReduce default program can run normally only when the auxiliary service of NodeManager is configured as mapreduce_shuffle.
7. Set the slave node, that is, modify the workers file
This file records all the slave nodes of the Hadoop cluster, and starts the slave nodes with one click with the script
Open the workers file
sudo vi workers
Delete the content in the file, add new content, each host name has a line
master slave01 slave02
(2) Distribute the configuration file of the cluster master node to other nodes
The master executes the following commands:
sudo scp /etc/profile slave01:/etc/profile sudo scp /etc/profile slave02:/etc/profile sudo scp -r /usr/local/hadoop slave01:/usr/local sudo scp -r /usr/local/hadoop slave02:/usr/local sudo scp -r /usr/local/jdk slave01:/usr/local sudo scp -r /usr/local/jdk slave02:/usr/local scp ~/.bashrc slave01:~/ scp ~/.bashrc slave02:~/
After the above command is executed, it needs to be executed on slave01 and slave02 respectively.
source /etc/profile source ~/.bashrc
Execute the following commands on slave01 and slave02 to modify the folder permissions
cd /usr/local sudo chown -R hadoop ./hadoop
(3) Format the file system
The master master node executes the following command:
hdfs namenode -format
The following figure appears to indicate that the formatting is successful:
After the formatting command is executed, the message has been successfully formatted appears, indicating that the HDFS file system is successfully formatted, and the cluster can be officially started; otherwise, you need to check whether the command is correct, or whether the previous Hadoop cluster installation and configuration are correct. In addition, it should be noted that the above format command only needs to be executed once before the initial startup of the Hadoop cluster, and does not need to be formatted during subsequent repeated startups.
(4) Start the Hadoop cluster
start-dfs.sh start-yarn.sh
As shown in the figure above, the Hadoop cluster has been fully opened
(5) View the Web interface
1. Execute the following commands on the three virtual machines to turn off the firewall and firewall self-starting
sudo service iptables stop sudo chkconfig iptables off
2. Add the IP mapping of the cluster service on the Windows host
The path for Windows10 and Windows7 operating systems is C:\Windows\System32\drivers\etc\hosts
Add the following content:
After performing the above operations, you can visit http://master:50070 and http://master:8088 to view the HDFS cluster and YARN cluster status through the browsers of the host or the three virtual machines, respectively. The effect is as follows: