Construction and configuration of Hadoop cluster
Installing CentOS under VMware
1, Install a virtual machine first
After installation, right-click to find and open it in the way of administrator
You can also set it to run as an administrator every time you open it
2, Install a CentOS. Version 7 is used here
3, Open VMware and create a new virtual machine
When you come to this interface, you can wait 60 seconds or press the tab key to go to the next step.
I choose English here. You can choose Chinese.
Continue adding
Set password
It will be installed here
4, ping local and Baidu settings
1. Restart the service
service network restart
2. Modify the configuration file
vi /etc/sysconfig/network-scripts/ifcfg-eth0
If you enter an empty table, use cd to enter here, and then use "ll" to view the file with ifcfg and enter editing
This is ifcfg-ens33
cd /etc/sysconfig/network-scripts
ll
If you have the $symbol here, please use this command, use super user, and become#
DEVICE is the DEVICE name
HWADDR refers to the network card address
ONBOOT is used to set whether to activate the network card when the system is started
BOOTPROTO can be set to dhcp, none, bootp and static
When DHCP sets the binding of the network card, it obtains the address through the method of DHCP protocol
none does not use any protocol when setting network card binding
BOOTP uses BOOTP protocol when setting network card binding
Static use static protocol when setting network card binding
Change the file from ONBOOT=no to ONBOOT=yes, and from BOOTPROTO=dhcp to BOOTPROTO=static
Add information such as IP address IPADDR, subnet mask NETMASK, GATEWAY gateway and DNS1
This is my unmodified appearance (please Baidu if you don't know how to modify it and save it)
Click the virtual network editor in edit
You need to use the software as an administrator to change settings
When using the administrator identity, there will be three options here. Choose NAT mode
The last paragraph here should be different, preferably 3 digits
The two here should be the same
After modifying, save and exit, restart the service
service network restart
View IP
ifconfig
3.ping Baidu and local to see if the connection is smooth
ping www.baidu.com ping 192.168.235.233
ctel + c exit ping
4. Install yum source
Execute CD / etc / yum.com repos. D enter etc / yum.com repos. D directory
cd /etc/yum.repos.d
Check out yum repos. Files in directory D
CentOS-Base.repo is the of network
CentOS-Media.repo is local
5. Execute the rename command
If you want to use the network to download, change the name of all (the name is changed to disable it. If you need to use it, change the name back)
CentOS-Base.repo CentOS-Debuginfo.repo CentOS-fasttrack.repo CentOS-Vault.repo CentOS-Media.repo
mv CentOS-Media.repo CentOS-Media.repo.bak
vi CentOS-Media.repo.bak
Change the value of baseurl to: file:///media/ , change the value of gpgcheck to 0 and the value of enabled to 1
Before modification:
After modification
6. Mount
Execute the following command to hang
mount /dev/dvd /media #Version 6.8
mount /dev/cdrom /media #Version 7
If there is no article 1, amend Article 2
7. Update yum source.
yum clean all
8. Install software with alicloud
Open alicloud login
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo #wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
9. Install Java
Using xshell7
Use ctel+alt+f to open the shared file
Put it in the opt directory
Installing Java on the command line
Enter the opt directory to install
rpm -ivh jdk-7u80-linux-x64.rpm
V Build Hadoop fully distributed cluster
1. Upload the Hadoop installation package Hadoop xxxxx / to the / opt directory of the virtual machine master
Then enter the opt directory and enter the following statement to extract the hadoop installation package to the virtual machine
tar -zxf hadoop-xxxx -C /usr/local
All files involved in Hadoop configuration are in / usr/local/hadoop-2.6.4/etc/hadoop /
2. Modify core site XML file
vi core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/var/log/hadoop/tmp</value> </property> </configuration>
Then create a tmp folder in the hadoop directory
mkdir tmp
3. Modify Hadoop env sh
vi hadoop-env.sh
4. Modify Yum env.com SH file
5. Copy mapred site xml. Template and named mapred site XML and modify
cp mapred-site.xml.template mapred-site.xml
Modify mapred site XML file
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- jobhistory properties --> <property> <name>mapreduce.jobhistory.address</name> <valve>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
6. Modify Yard site XML file
vi yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>${yarn.resourcemanager.hostname}:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>${yarn.resourcemanager.hostname}:8030</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address</name> <value>${yarn.resourcemanager.hostname}:8090</value </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>${yarn.resourcemanager.hostname}:8031</value> </property> <property> <name>yarn.resourcemanager.admin.adress</name> <value>${yarn.resourcemanager.hostname}:8033</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/data/hadoop/yarn/local</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/data/tmp/logs</value> </property> <property> <name>yarn.log.server.url</name> <value>http://master:19888/jobhistory/logs/</value> <description>URL for job history server</description> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>4096</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>2048</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>2048</value> </property> <property> <name>yarn.nodmanager.resource.cpu-vcores</name> <value>1</value> </property> </configuration>
7. Modify the slave file
vi slaves
slave1 slave2 slave3
8. Modify HDFS site XML file
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/hadoop/hdfs/data</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
9. Return to the original directory, enter / etc and modify the hosts file
Add the following code on the last side, which mainly conforms to your own ip and host name
192.168.235.233 master master.centos.com 192.168.235.234 slave1 slave1.centos.com 192.168.235.235 slave2 slave2.centos.com 192.168.235.236 slave3 slave3.centos.com
10. Shut down and clone after modification.
Right click master, manage, clone
11. Open virtual machine slave1
(1) Execute the following command to delete 70 persistent net rules
rm -rf /etc/udev/rules.d/70-persistent-net.rules
(2) Execute the command: ifconfig -a view HWADDR and record HWADDR (this value is different for each machine)
Cannot find another method (if visual)
(3) Modify the / etc / sysconfig / network scripts / ifcfg-eth0 file, modify HWADDR, IPADDR and comment out the line beginning with UUID, and modify the content of HWADDR to the actual address
vi /etc/sysconfig/network-scripts/ifcfg-eth0
If you enter an empty table, use cd to enter here, and then use "ll" to view the file with ifcfg and enter editing
This is ifcfg-ens33
cd /etc/sysconfig/network-scripts ll
If you have the $symbol here, please use this command, use super user, and become#
(4) Modify the machine name and execute the command
vi /etc/sysconfig/network
Open the file and change the machine name to slave1 centos. com
(5) Restart the virtual machine using the reboot command.
(6) Verify whether slave1 is configured successfully. ping slave1 under the master. If the ping is successful, the configuration is successful.
Note that both the master and slave1 virtual machines must be turned on to ping each other