Construction and configuration of Hadoop cluster

Construction and configuration of Hadoop cluster

Installing CentOS under VMware

1, Install a virtual machine first

After installation, right-click to find and open it in the way of administrator

You can also set it to run as an administrator every time you open it

2, Install a CentOS. Version 7 is used here

3, Open VMware and create a new virtual machine

When you come to this interface, you can wait 60 seconds or press the tab key to go to the next step.

I choose English here. You can choose Chinese.

Continue adding

Set password

It will be installed here

4, ping local and Baidu settings

1. Restart the service

service network restart

2. Modify the configuration file

vi /etc/sysconfig/network-scripts/ifcfg-eth0

If you enter an empty table, use cd to enter here, and then use "ll" to view the file with ifcfg and enter editing

This is ifcfg-ens33

cd /etc/sysconfig/network-scripts
ll

If you have the $symbol here, please use this command, use super user, and become#

DEVICE is the DEVICE name

HWADDR refers to the network card address

ONBOOT is used to set whether to activate the network card when the system is started

BOOTPROTO can be set to dhcp, none, bootp and static

When DHCP sets the binding of the network card, it obtains the address through the method of DHCP protocol

none does not use any protocol when setting network card binding

BOOTP uses BOOTP protocol when setting network card binding

Static use static protocol when setting network card binding

Change the file from ONBOOT=no to ONBOOT=yes, and from BOOTPROTO=dhcp to BOOTPROTO=static

Add information such as IP address IPADDR, subnet mask NETMASK, GATEWAY gateway and DNS1

This is my unmodified appearance (please Baidu if you don't know how to modify it and save it)

Click the virtual network editor in edit

You need to use the software as an administrator to change settings

When using the administrator identity, there will be three options here. Choose NAT mode

The last paragraph here should be different, preferably 3 digits

The two here should be the same

After modifying, save and exit, restart the service

service network restart

View IP

ifconfig

3.ping Baidu and local to see if the connection is smooth

ping www.baidu.com
ping 192.168.235.233

ctel + c exit ping

4. Install yum source

Execute CD / etc / yum.com repos. D enter etc / yum.com repos. D directory

cd /etc/yum.repos.d

Check out yum repos. Files in directory D

CentOS-Base.repo is the of network

CentOS-Media.repo is local

5. Execute the rename command

If you want to use the network to download, change the name of all (the name is changed to disable it. If you need to use it, change the name back)

CentOS-Base.repo CentOS-Debuginfo.repo CentOS-fasttrack.repo CentOS-Vault.repo CentOS-Media.repo

mv CentOS-Media.repo CentOS-Media.repo.bak
vi CentOS-Media.repo.bak

Change the value of baseurl to: file:///media/ , change the value of gpgcheck to 0 and the value of enabled to 1

Before modification:

After modification

6. Mount

Execute the following command to hang

mount /dev/dvd /media   #Version 6.8
mount /dev/cdrom /media #Version 7

If there is no article 1, amend Article 2

7. Update yum source.

yum clean all

8. Install software with alicloud

Open alicloud login

curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
#wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo

9. Install Java

Using xshell7

Use ctel+alt+f to open the shared file

Put it in the opt directory

Installing Java on the command line

Enter the opt directory to install

rpm -ivh jdk-7u80-linux-x64.rpm

V Build Hadoop fully distributed cluster

1. Upload the Hadoop installation package Hadoop xxxxx / to the / opt directory of the virtual machine master

Then enter the opt directory and enter the following statement to extract the hadoop installation package to the virtual machine

tar -zxf hadoop-xxxx -C /usr/local

All files involved in Hadoop configuration are in / usr/local/hadoop-2.6.4/etc/hadoop /

2. Modify core site XML file

vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
  <value>hdfs://master:8020</value>
  </property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/var/log/hadoop/tmp</value>
</property>
</configuration>

Then create a tmp folder in the hadoop directory

mkdir tmp

3. Modify Hadoop env sh

vi hadoop-env.sh

4. Modify Yum env.com SH file

5. Copy mapred site xml. Template and named mapred site XML and modify

cp mapred-site.xml.template mapred-site.xml

Modify mapred site XML file

<configuration>
<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>
<!-- jobhistory properties -->
<property>
   <name>mapreduce.jobhistory.address</name>
   <valve>master:10020</value>
</property>
<property>
   <name>mapreduce.jobhistory.webapp.address</name>
   <value>master:19888</value>
</property>
</configuration>

6. Modify Yard site XML file

vi yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
   <name>yarn.resourcemanager.hostname</name>
   <value>master</value>
 </property>
 <property>
   <name>yarn.resourcemanager.address</name>
   <value>${yarn.resourcemanager.hostname}:8032</value>
 </property>
 <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>${yarn.resourcemanager.hostname}:8030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>${yarn.resourcemanager.hostname}:8088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.https.address</name>
   <value>${yarn.resourcemanager.hostname}:8090</value
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>${yarn.resourcemanager.hostname}:8031</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.adress</name>
   <value>${yarn.resourcemanager.hostname}:8033</value>
 </property>
 <property>
   <name>yarn.nodemanager.local-dirs</name>
   <value>/data/hadoop/yarn/local</value>
 </property>
 <property>
   <name>yarn.log-aggregation-enable</name>
   <value>true</value>
 </property>
 <property>
   <name>yarn.nodemanager.remote-app-log-dir</name>
   <value>/data/tmp/logs</value>
 </property>
<property>
 <name>yarn.log.server.url</name>
 <value>http://master:19888/jobhistory/logs/</value>
 <description>URL for job history server</description>
</property>
<property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
   <value>false</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>2048</value>
</property>
<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>512</value>
</property>
<property>
  <name>yarn.scheduler.maximum-allocation-mb</name>
  <value>4096</value>
</property>
<property>
  <name>mapreduce.map.memory.mb</name>
  <value>2048</value>
</property>
<property>
  <name>mapreduce.reduce.memory.mb</name>
  <value>2048</value>
</property>
<property>
  <name>yarn.nodmanager.resource.cpu-vcores</name>
  <value>1</value>
</property>
</configuration>

7. Modify the slave file

vi slaves
slave1
slave2
slave3

8. Modify HDFS site XML file

<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>

9. Return to the original directory, enter / etc and modify the hosts file

Add the following code on the last side, which mainly conforms to your own ip and host name

192.168.235.233 master master.centos.com
192.168.235.234 slave1 slave1.centos.com
192.168.235.235 slave2 slave2.centos.com
192.168.235.236 slave3 slave3.centos.com

10. Shut down and clone after modification.

Right click master, manage, clone

11. Open virtual machine slave1

(1) Execute the following command to delete 70 persistent net rules

rm -rf /etc/udev/rules.d/70-persistent-net.rules

(2) Execute the command: ifconfig -a view HWADDR and record HWADDR (this value is different for each machine)

Cannot find another method (if visual)

(3) Modify the / etc / sysconfig / network scripts / ifcfg-eth0 file, modify HWADDR, IPADDR and comment out the line beginning with UUID, and modify the content of HWADDR to the actual address

vi /etc/sysconfig/network-scripts/ifcfg-eth0

If you enter an empty table, use cd to enter here, and then use "ll" to view the file with ifcfg and enter editing

This is ifcfg-ens33

cd /etc/sysconfig/network-scripts
ll

If you have the $symbol here, please use this command, use super user, and become#

(4) Modify the machine name and execute the command

vi /etc/sysconfig/network

Open the file and change the machine name to slave1 centos. com

(5) Restart the virtual machine using the reboot command.

(6) Verify whether slave1 is configured successfully. ping slave1 under the master. If the ping is successful, the configuration is successful.

Note that both the master and slave1 virtual machines must be turned on to ping each other

12. Repeat the relevant steps in (1) ~ (5), clone the master to slave2 and slave3, and modify the relevant configurations of slave2 and slave3.

Tags: Hadoop

Posted by steveh62 on Mon, 18 Apr 2022 19:17:26 +0930