[big data] HBase cluster deployment

HBase cluster deployment

Rapid deployment summary

Static IP

position

vi /etc/sysconfig/network-scripts/ifcfg-ens33

content

BOOTPROTO=static
ONBOOT=yes

DNS1=192.168.56.1
GATEWAY=192.168.56.1
IPADDR=192.168.56.101

BOOTPROTO: IP acquisition method, ONBOOT: enabled or not, DNS1: GATEWAY IP, GATEWAY: GATEWAY IP, IPADDR: native IP

Restart service

service network restart

View IP

ip addr

firewall

Temporarily Closed

systemctl stop firewalld

View status

systemctl status firewalld

Permanently closed

systemctl disable firewalld

View status

systemctl list-unit-files | grep firewalld

Generally, temporary shutdown + permanent shutdown are used

host name

see

hostname

Temporary modification

hostname slave0

Permanent modification

vi /etc/hostname

/A single host name is stored in etc/hostname, which is usually temporarily closed + permanently closed

Domain name resolution

file

vi /etc/hosts

Append

192.168.25.10 slave0
192.168.25.11 slave1
192.168.25.12 slave2

Format: IP (space) domain name

Password free login

Generate secret key

ssh-keygen -t rsa

Test (configure password free login yourself)

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 

Distribute to the virtual machine to be logged in without secret

scp ~/.ssh/id_rsa.pub slave1:~
scp ~/.ssh/id_rsa.pub slave2:~

The virtual machine to be logged in without secret will append the public key to authorized_keys file

mkdir .ssh
cd .ssh
touch authorized_keys
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys

The host that generates the secret key can log in to the host that receives the public key without secret

SSH DNS free authentication

configuration file

 vi /etc/ssh/sshd_config 

Set UseDNS to no and uncomment

Restart SSH service

service sshd restart

time synchronization

Install ntpdate

yum install -y ntpdate

Synchronization time

ntpdate -u ntp.aliyun.com
Set scheduled tasks

View path

which ntpdate

file

vi /etc/crontab 

Scheduled task configuration, synchronized once every 10 minutes (added)

*/10 * * * * root /usr/sbin/ntpdate -u ntp.aliyun.com

Disable email reminders

file

vi /etc/profile

Configuration (additional)

unset MAILCHECK

to update

source /etc/profile

Do not. / directly execute the program

file

vi /etc/profile

Configuration (additional)

## Files can be executed directly
export PATH=.:$PATH

to update

source /etc/profile

Java

Create directories /opt/software and /opt/module

mkdir -p /opt/software
mkdir -p /opt/module

Upload the installation package to /opt/software

decompression

tar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/module/

Environment variable /etc/profile (append)

## JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.7.0_79
export PATH=$PATH:$JAVA_HOME/bin

to update

source /etc/profile

verification

java -version

Hadoop

plan

Upload the installation package to /opt/software

decompression

tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/

Environment variable /etc/profile (append)

## HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

to update

source /etc/profile
configuration file

Configuration directory: installation directory /etc/hadoop/

hadoop-env.sh (insert at the end)

yarn-env.sh (inserted before)

Mapred env.sh (inserted before)

export JAVA_HOME=/opt/module/jdk1.7.0_79

core-site.xml

<configuration>
    <!-- appoint HDFS in NameNode Node information of the process -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://slave0:9000</value>
    </property>
    <!-- appoint Hadoop Storage directory of files generated at run time -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop-2.7.2/data/tmp</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <!-- appoint HDFS Number of file copies (there are 3 slave nodes in the cluster,Default is 3) -->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!-- appoint HDFS in SecondaryNameNode Node information of the process -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave2:50090</value>
    </property>
</configuration>

yarn-site.xml

<configuration>
    <!-- set up Reducer How to obtain data -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- appoint YARN of ResourceManager Node information of the process -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>slave1</value>
    </property>
</configuration>

mapred-site.xml

copy

cat mapred-site.xml.template >> mapred-site.xml

to configure

<configuration>
    <!-- appoint MapReduce Run on YARN upper -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

slaves file (add the domain name of all hosts of the cluster)

slave0
slave1
slave2
Distribution run

distribute

scp -rq /opt/module/hadoop-2.7.2/ slave1:/opt/module/
scp -rq /opt/module/hadoop-2.7.2/ slave2:/opt/module/

Format (first time)

bin/hdfs namenode -format

start-up

slave0 : HDFS

sbin/start-dfs.sh 

slave1 : YARN

sbin/start-yarn.sh 

stop it

slave1 : YARN

sbin/stop-yarn.sh 

slave0 : HDFS

sbin/stop-dfs.sh 

ZooKeeper

Upload the installation package to /opt/software

decompression

tar -zxvf zookeeper-3.4.10.tar.gz  -C /opt/module/

Environment variable /etc/profile (append)

## ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.10
export PATH=$PATH:$ZOOKEEPER_HOME/bin

to update

source /etc/profile
create folder

Create the data/zkData directory under /opt/module/zookeeper-3.4.10/

mkdir -p data/zkData
configuration file

Under the installation directory /conf folder

Copy template file

cat zoo_sample.cfg >> zoo.cfg

zoo.cfg

modify

dataDir=/opt/module/zookeeper-3.4.10/data/zkData

Append at the end

## colony
server.1=slave0:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

Create the myid file in the directory /opt/module/zookeeper-3.4.10/data/zkData/

cd /opt/module/zookeeper-3.4.10/data/zkData/
touch myid

Distribution run

During distribution, modify the myid file in /opt/module/zookeeper-3.4.10/data/zkData/ directory, with a number corresponding to the content added in zoo.cfg respectively. For example, the myid content of slave0 is 1, and the myid content of slave1 is 2

vi /opt/module/zookeeper-3.4.10/data/zkData/myid

distribute

scp -rq /opt/module/zookeeper-3.4.10/ slave1:/opt/module/
scp -rq /opt/module/zookeeper-3.4.10/ slave2:/opt/module/

function

Execute in the installation directory of ZooKeeper on the three virtual machines respectively

bin/zkServer.sh start

stop it

Execute in the installation directory of ZooKeeper on the three virtual machines respectively

bin/zkServer.sh stop

HBase

Upload the installation package to /opt/software

decompression

tar -zvxf hbase-1.3.3-bin.tar.gz -C /opt/module/

Environment variable /etc/profile (append)

## HBASE_HOME
export HBASE_HOME=/opt/module/hbase-1.3.3
export PATH=$PATH:$HBASE_HOME/bin

to update

source /etc/profile
to configure

Jump to /opt/module/hbase-1.3.3/conf/ directory

cd /opt/module/hbase-1.3.3/conf/

HBase env.sh (insert before)

## JDK path
export JAVA_HOME=/opt/module/jdk1.7.0_79
## Set the use of external ZooKeeper
export HBASE_MANAGES_ZK=false

hbase-site.xml

<configuration>
  <!-- Set the maximum clock offset to reduce the requirements for time synchronization -->
  <property>
    <name>hbase.master.maxclockskew</name>
    <value>180000</value>
  </property>
  <!-- appoint HDFS Instance address -->
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://slave0:9000/hbase</value>
  </property>
  <!-- Enable distributed clusters -->
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <!-- ZooKeeper Configuration: settings ZooKeeper Cluster node -->
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>slave0,slave1,slave2</value>
  </property>
  <!-- ZooKeeper Configuration: settings ZooKeeper Data directory -->
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/opt/module/zookeeper-3.4.10/data/zkData</value>
  </property>
</configuration>

regionservers file (add the domain names of all hosts of the cluster)

slave0
slave1
slave2

Copy the core-site.xml and hdfs-site.xml of Hadoop to the conf directory of HBase

cp /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml /opt/module/hbase-1.3.3/conf/
cp /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml /opt/module/hbase-1.3.3/conf/

Distribution and operation

distribute

scp -rq /opt/module/hbase-1.3.3/ slave1:/opt/module/
scp -rq /opt/module/hbase-1.3.3/ slave2:/opt/module/

Run (on the NameNode node host, slave0, HBase installation directory)

bin/start-hbase.sh 

stop it

bin/stop-hbase.sh 

Virtual machine software installation

Demonstrate the download and installation of VirtualBox 6.1.30 and VMware Workstation Pro 15.5.6 respectively.

VirtualBox virtual machine

download

  1. Open with browser virtualbox.org website

Click "Downloads"

  1. Scroll down and find "VirtualBox older builds"

Click "VirtualBox older builds"

  1. Find "VirtualBox 6.1" and click "VirtualBox 6.1"

  1. Find "Windows hosts" under "VirtualBox 6.1.30" and click "Windows hosts" to download the installation package of VirtualBox 6.1.30

install

  1. After downloading, right-click to open the installer

  1. next step

  1. You can click "Browse" to select an appropriate installation location, keep the default location here, and then click "next"

  1. Check the first three options as needed, and it is recommended to check "registration document Association". Next step

  1. The network will be temporarily disconnected during installation. Click "yes" when you are ready

  1. Click "Install" to start installing VirtualBox

  1. Three Windows security prompts will pop up in the middle, all of which click the "Install" button

  1. Click Finish to complete the installation of VirtualBox6.1.30

  1. Ignore the update prompt, and uncheck "check update" in "management" → "global setting" → "update"

VMware virtual machine

download

  1. Open with browser vmware.com website

Click "login" → "Customer Connect"

  1. Log in to your own VMware account. If you don't have one, you can register one, or try to enter the website directly:

  1. Go to the "CUSTOMER CONNECT home page" and click on all products

  1. Turn to the bottom, find "VMware Workstation Pro", and click "view downloaded components"

  1. Select the version, select the version of 15.0, find the VMware Workstation Pro for Windows in the product, and click "go to download" on the right

6. Select version 15.5.6 and click "download now" on the right of the file below

7. Check "I agree to the terms and conditions listed in the end user license agreement" and click "accept" to download the installation package of VMware Workstation Pro 15.5.6

install

  1. After downloading, get a file, right click → open

  1. Click "next"

  1. Check "I accept the terms in the license agreement (A)" and click "next"

  1. You can click "change..." above to select an appropriate installation location, keep the default location here, and then click "next"

  1. You can check the above two options as needed, cancel them all here, and click "next"

  1. You can check the above two options as needed, keep the default here, and click "next"

  1. Click Install to start installing VMware Workstation Pro 15.5.6

  1. After installation, click "finish" to exit the installation program

register

  1. Open VMware Workstation Pro, click "help" → "enter license key"

  1. Search "VMwareWorkstation 15 Pro key" to find some keys, or buy them yourself. Here are some keys found on the Internet:
VMware Workstation 15 Pro Key for:

UG5J2-0ME12-M89WY-NPWXX-WQH88
GA590-86Y05-4806Y-X4PEE-ZV8E0
YA18K-0WY8P-H85DY-L4NZG-X7RAD
UA5DR-2ZD4H-089FY-6YQ5T-YPRX6
B806Y-86Y05-GA590-X4PEE-ZV8E0
ZF582-0NW5N-H8D2P-0XZEE-Z22VA
YG5H2-ANZ0H-M8ERY-TXZZZ-YKRV8
UG5J2-0ME12-M89WY-NPWXX-WQH88
UA5DR-2ZD4H-089FY-6YQ5T-YPRX6
GA590-86Y05-4806Y-X4PEE-ZV8E0
ZF582-0NW5N-H8D2P-0XZEE-Z22VA
YA18K-0WY8P-H85DY-L4NZG-X7RAD

Each key is not guaranteed to be valid, and each key has time limit and number of times.

Enter the obtained key into the "license key" input box and click "OK"

  1. After activation, you can see the relevant license information in the window opened by "help" → "about WMware Workstation(A)"

Install Linux Cluster

Using CentOS7 - 1810 system, three virtual machines form a cluster.

CentOS7 - 1810 Download

  1. Open website: centos.org

Click the "Download" button to enter the Download page

  1. Scroll down, find and click the "then click here" tab under "Older Versions"

  1. Scroll down, find the "Base Distribution" panel under "Archived Versions", find "7 (1810)" and click the "Tree" tab on the right

  1. Scroll down, find the "isos" folder, and click in

  1. Click on the "x86_64" folder

  1. We choose to download the "CentOS-7-x86_64-Minimal-1810.iso" file

  1. After downloading, we get such a file

  1. (optional) in order to ensure that there is no error during downloading, you can use the SHA256 tool to calculate the SHA256 value of this file, and compare it with the 38d5d51d9d100fd73df031ffd6bd8b1297ce24660dc8c13a3b8b4534a4bd291c given by the official to see whether it is consistent. If it is inconsistent, there may be an error in the downloading process and you need to download it again

VirtualBox virtual machine installation Linux Cluster

  1. Open VirtualBox and click "new" on the home page

  1. Give the virtual machine a name to facilitate future operations. "slave0" (slave): the folder is the location where the virtual machine is stored. Try to choose a disk with a larger capacity, the type is "Linux", and the version is "other Linux (64 bit)" or "RedHat (64 bit)". Next step

  1. Set the running memory size of the virtual machine according to the memory size of your computer and the number of virtual machines that need to be started at the same time. You can use the recommended 512M, but you'd better use 2G or above. You can also adjust it later if it's not enough. Use 2G here. Next step

  1. Select "create virtual hard disk now" and click "create"

  1. Keep the default "VDI (VirtualBox disk image)", next

  1. It is recommended to use "dynamic allocation" for virtual hard disk allocation to save hard disk space. Next step

  1. The location remains the default, and the hard disk size is recommended to be 50G or above, which is the maximum space it takes. Click "create"

  1. After creation, before startup, you need to make some configuration. Click the virtual computer entry, click "Settings" on the home page, and click "storage" → "no disk" → assign the CD Icon on the right of the CD-ROM drive → select virtual disk "in the newly opened setting window

  1. Select the disk image we just downloaded and open

  1. The NAT network of VirtualBox cannot be actively connected to the virtual machine from the host computer, so the virtual machine also needs to add a network card to specifically interconnect with the host computer. Click "network" → "network card 2" → "enable network connection", and select "host only network" as the connection method. The interface name remains the default, OK

  1. Select the virtual machine and click the start button

  1. Click the virtual machine interface in the newly opened window, and a prompt will pop up. The keyboard and mouse will be captured and exclusive by the virtual machine. Exit and press the Ctrl key on the right. You can check don't prompt again, and click the capture button to start operating the virtual machine

  1. You can turn off the message prompt (prompting you to control the virtual machine button and mouse will be monopolized). After entering the control virtual machine, press the up arrow "↑" to adjust the option to the "Install CentOS 7" option, and enter

  1. You can turn off the message prompt (prompt that your virtual machine supports automatic mouse switching, that is, it will not be exclusive), select a language you like, and click "continue" ©”

  1. Confirm whether the "date and time" is set correctly. If not, it needs to be manually changed, and then click "installation location (D)" to select an installation location

  1. Choose the disk according to your own needs. Here I keep the default and directly click "finish (D)"

  1. After coming out, click "start installation" to start installing CentOS

  1. During installation, click "root password" to set the password of root

  1. Enter the password twice. If the password is too short, you need to press "finish (D)" twice to complete the setting

  1. Click "restart" after installation ®” To restart the virtual machine

  1. Seeing this login interface after restarting indicates that the system has been successfully installed

  1. Click the control in the menu bar above and select "normal shutdown" to shut down the virtual machine

  1. After shutdown, find "slave0" in the main panel, and right-click → "copy" to enter the copy wizard

  1. Change the name and adjust it to the appropriate path. Select "regenerate MAC address for all network cards" as the MAC address setting. Next step

25. Select "full copy" as the copy type, and click Copy to start copying

  1. After copying, we get "slave1"

  1. Copy "slave2" in the same way, and you can clone the number of units you need according to your own needs. Three units here are enough

VMware Workstation Pro installation Linux Cluster

  1. Open VMware Workstation Pro and click "create new virtual machine" on the home page

  1. Select customize (Advanced) ©” Options, next

  1. Virtual machine hardware compatibility remains the default, next step

  1. Select "install the operating system later", next

  1. Select "Linux" for the client operating system and "CentOS 764 bit" for the version. Next step

  1. You can give the virtual machine a name to facilitate future operations, "slave0" (slave), and then select the storage location of the virtual machine. Try to choose a disk with a larger capacity. Next step

  1. The processor configuration remains the default first, and can be changed at any time later. Next step

  1. Set the running memory size of the virtual machine according to the memory size of your computer and the number of virtual machines that need to be started at the same time. You can use the recommended 1G, but you'd better use 2G or more. You can also adjust it later if it's not enough. Use 2G here. Next step

  1. It is recommended to select NAT type for the network type, so that the host can connect to the virtual machine, and the virtual machine can also connect to the network, and the IP address is relatively stable. Next step

  1. IO controller remains default, next step

  1. The disk type remains the default. Next step

  1. You need to create a new disk. Select "create new virtual disk (V)", and then go to the next step

  1. Set the disk size of the virtual machine according to your own needs. It is recommended to be 50GB or above. Next step

  1. The disk file name remains the default. Next step

  1. After confirming that the configuration information is OK, click Finish to create the virtual machine

  1. After creation, you need to make some configuration to install the operating system. Click "Edit virtual machine settings" on the homepage of slave0 virtual machine

  1. Select "use ISO image file (M)" in virtual machine settings → hardware → CD/DVD (IDE), click "browse (B)...", select the newly downloaded "CentOS-7-x86_64-Minimal-1810.iso", open it and confirm

  1. Return to the "slave0 home page" and click "start this virtual machine" to start the virtual machine

  1. After the virtual machine is started, click the black interface of the virtual machine to enter the control virtual machine, press the up arrow "↑" to adjust the option to the "Install CentOS 7" option, and press enter

  1. Choose a language you like and click "continue" ©”

  1. Confirm whether the "date and time" is set correctly. If not, it needs to be manually changed, and then click "installation location (D)" to select an installation location

  1. Choose the disk according to your own needs. Here I keep the default and directly click "finish (D)"

  1. After coming out, click "start installation" to start installing CentOS

  1. During installation, click "root password" to set the password of root

  1. Enter the password twice. If the password is too short, you need to press "finish (D)" twice to complete the setting

  1. Click "restart" after installation ®” To restart the virtual machine

  1. Seeing this login interface after restarting indicates that the system has been successfully installed

  1. Click the triangle on the right of the pause button on VMware and select "shut down client (D)" to shut down the virtual machine

  1. Click "shutdown" to confirm the shutdown, and you can check "don't show this message again" according to your own needs

  1. After shutdown, find "slave0" in the "library" panel, right-click → "management" → "clone" to enter the cloning wizard

  1. next page

  1. Clone source remains default, next page

  1. It is recommended to select "create full clone" for clone type. Next

  1. After changing the virtual machine name and storage location, complete

  1. Click close when finished to close the clone virtual machine Wizard

  1. In this way, we get another virtual machine "slave1" with the system installed

  1. Clone another virtual machine in the same way. You can clone the number of virtual machines you need according to your needs. Three virtual machines here are enough

Linux cluster network configuration

VirtualBox virtual machine network configuration

View network information

  1. Click the "management" menu and select "host network manager"

  1. Click the only network card inside, click "properties", and select the "DHCP server" tab to see all network information

Here, my subnet IP is 192.168.56.0, the subnet mask is 24 bits, and the gateway is 192.168.56.100. Next, I will set the network of my virtual machine according to these information.

Enable network card

  1. Start the virtual machine and log in

  1. Use the ip addr command to query the network card name. The network card names I found here are "enp0s3" and "enp0s8"

  1. To enable the network card, enter the commands ifup enp0s8 and ifup enp0s3, that is, the name of the ifup network card, to enable the network card

  1. Again, use the ip addr command to view the IP address corresponding to the network card. You can see that the network segment of "enp0s8" is the same as that in the host network manager, so the virtual machine is connected to the host through "enp0s8", and "enp0s3" is connected to the external network through NAT. Therefore, for the convenience of later operations, we only need to configure "enp0s8" as a static IP

  1. Configure the host network card and use vi to edit the network card configuration file. The command is: vi /etc/sysconfig/network-scripts/ifcfg-enp0s8, that is, vi /etc/sysconfig/network-scripts/ifcfg- network card name. After entering the editing interface (press i to start editing), you need to modify the value of bootpro to static (static IP), and the value of ONBOOT to yes (enabled), and then add DNS1, GATEWAY and IPADDR. Among them, DNS1 and GATEWAY are the GATEWAY addresses queried in the virtual network editor, and IPADDR is set to the non conflicting IP address of the same network segment as the GATEWAY IP in the virtual network editor. My configuration is as follows, save and exit (Esc →: wq → enter)

  1. Edit the configuration of NAT network card by configuring the host network card. Here, just set the value of ONBOOT to yes, save and exit

  1. Restart the network service, enter the command service network restart to restart the network service

  1. Check whether the IP address is configured successfully. Use the ip addr command to check the current IP address. You can see that the configuration on my side is successful

  1. Configure the other three hosts in the same way, and set the enabled network card and static IP respectively. My IP plan is:
slave0: 192.168.56.10
slave1: 192.168.56.11
slave2: 192.168.56.12

VMware virtual machine network configuration

View network information

  1. Open VMware, select "Edit" → "virtual network editor" in the menu bar to open the virtual network editor

  1. Find and click the virtual network card that selects NAT mode in the virtual network editor, and click "NAT setting (S)..." to see the subnet IP, subnet mask and gateway IP in the newly popped NAT setting window

Here, my subnet IP is 192.168.25.0, the subnet mask is 24 bits, and the gateway is 192.168.25.2. Next, I will set the network of my virtual machine according to these information.

Enable network card

  1. Start the virtual machine and log in

  1. Use the ip addr command to query the network card name. The network card name I found here is "ens33"

  1. Configure the network card and edit the network card configuration file with vi. the command is vi /etc/sysconfig/network-scripts/ifcfg-ens33, that is, vi /etc/sysconfig/network-scripts/ifcfg- network card name. After entering the editing interface (press i to start editing), you need to modify the value of bootpro to static (static IP), and the value of ONBOOT to yes (enabled), and then add DNS1, GATEWAY and IPADDR, where, DNS1 and GATEWAY are the GATEWAY addresses queried in the virtual network editor, and IPADDR is set to the non conflicting IP address of the same network segment as the GATEWAY IP in the virtual network editor. My configuration is as follows, save and exit (Esc →: wq → enter)

  1. Restart the network service, enter the command service network restart to restart the network service

  1. Check whether the IP address is configured successfully. Use the ip addr command to check the current IP address. You can see that the configuration on my side is successful

  1. Configure the other three hosts in the same way, and set the enabled network card and static IP respectively. My IP plan is:
slave0: 192.168.25.10
slave1: 192.168.25.11
slave2: 192.168.25.12

Remote connection to Linux virtual machine

There are many software for remote connection to Linux virtual machine. MobaXterm is used here for demonstration

Software download

  1. open mobaxterm.mobatek.net , click the "Download" button

  1. Find and click the "Download now" button under "HomeEdition" to download the free personal edition

3. Click "MobaXterm Portable v21.5" to download the installation free version of MobaXterm with version 21.5, or you can also download the latest installation free version. The changes should not be too big

  1. After downloading, right-click and unzip to the current folder

  1. The two files obtained after decompression are MobaXterm programs. You can fix "MobaXterm_Personal_21.5.exe" to the start menu or send it to the desktop shortcut for easy use. The compressed package just downloaded can be deleted

Connecting Linux virtual machines

  1. Open "MobaXterm_Personal_21.5.exe" to enter the main interface of MobaXterm

  1. Click the "Session" icon on the main interface to create a new Session. The Session type is SSH. Enter the IP address of the Linux host in the "Remote host" input box below. For convenience, you can check "Specify username" and enter root, so you don't have to enter the user name every time you connect

Set font: click "Terminal setting" → "Terminal font setting" in the "Session setting" window, and you can specify the font Size in the Size input box in the pop-up "Terminal font selection" window. The default is 10. Based on personal habits, I set it to 18. After setting, click OK to close the "Terminal font selection" window

Set the session alias: click the "Bookmark setting" tab in the "Session setting" window to change the "Session name". For convenience, I change it to the same name as the host virtual machine (slave0), and the default is 192.168.25.10 (root), which is not easy to identify. Click OK to complete the creation of the session after the setting is completed

  1. MobaXterm will automatically open the newly added session (if it does not open automatically, you need to click the star on the left, and then double-click to open the session with the corresponding name). After opening, you will be asked to enter the password. If you did not check the specified user name in the previous step, you will also be asked to enter the user name. Enter the password here and press enter to confirm

  1. After connecting, you will be asked whether to save the password. You can choose to save or not. For convenience, I choose "Yes" to save the password

  1. If you save the password for the first time, you will also be asked to set a master password to encrypt and save your information. This master password should be set as complex as possible, but I set it simple for convenience. The following "Prompt me for my master password" allows you to choose the first option for convenience and the last one for security, but it's troublesome. I'll choose the first option here and click OK to set the master password

  1. After the connection is completed, you can interact with the Linux virtual machine through MobaXterm

  1. Right mouse button function setting: click the right mouse button in the window, and a box will pop up for the first time. The first option "Show context menu" means to display the right-click menu, and the second option "Paste" means to Paste. Depending on your personal habits, select the first one to Paste with the middle mouse button, and select the second one. If you want to open the menu, you have to press and hold the shift key and click the right mouse button. Here, according to my habits, I choose the first option, OK

  1. Add the other two virtual machines to the session list of MobaXterm according to the above steps. Finally, you can see three sessions in the star panel on the left

Other configurations of Linux Cluster

Turn off firewall

In order to facilitate the interconnection between Linux virtual machines, the simplest way is to turn off the firewall of the intranet

Temporarily close the firewall (effective immediately, but invalid after restart)

Use the command systemctl stop firewalld to close the firewall, and use the command systemctl status firewalld to view the firewall closing

[root@localhost ~]# systemctl stop firewalld
[root@localhost ~]# systemctl status firewalld
‚óŹ firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Day 2022-03-06 13:18:37 CST; 51s ago
     Docs: man:firewalld(1)
  Process: 6425 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 6425 (code=exited, status=0/SUCCESS)

3 June 13:18:00 localhost.localdomain systemd[1]: Starting firewalld - dyn...
3 June 13:18:02 localhost.localdomain systemd[1]: Started firewalld - dyna...
3 June 13:18:36 localhost.localdomain systemd[1]: Stopping firewalld - dyn...
3 June 13:18:37 localhost.localdomain systemd[1]: Stopped firewalld - dyna...
Hint: Some lines were ellipsized, use -l to show in full.
[root@localhost ~]#

You can see that the firewall above has been stopped. Success

Permanently close the firewall (effective after restart)

Permanently close the firewall (effective after restart)

Use the command systemctl disable firewalld to disable the firewall, and use the command systemctl list unit files | grep firewalld to check whether the firewall has been disabled successfully

[root@localhost ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@localhost ~]# systemctl list-unit-files | grep firewalld
firewalld.service                             disabled
[root@localhost ~]#

You can see that the firewall has been successfully disabled, success

The other two virtual machines use the same method to turn off the firewall

Modify host name

In order to distinguish virtual machines, its hostname needs to be modified

Use the command hostname to view the current hostname

Temporarily modify the hostname (effective immediately, but invalid after restart)

Temporarily modify the hostname using the command hostnam new hostname

[root@localhost ~]# hostname
localhost.localdomain
[root@localhost ~]# hostname slave0
[root@localhost ~]# hostname
slave0
[root@localhost ~]#

As you can see, querying the hostname after modification takes effect immediately

Permanently modify the hostname (effective after restart)

The hostname information is saved in /etc/hostname. Linux reads the hostname from this file every time it starts up, so we only need to modify the contents of this file to achieve the effect of permanent modification

[root@slave0 ~]# vi /etc/hostname 
slave0
~                                                                                                                      
~                                                                                                                      
~                                                                                                                      
"/etc/hostname" 1L, 7C written
[root@slave0 ~]# 

The other two virtual machines use the same method to configure the hostname. I configure the three here as slave0, slave1 and slave2 respectively

Configure domain name resolution file

When connecting from one Linux to another Linux, you need to enter the IP address. However, the IP address is not easy to remember and identify. We can use the domain name to replace the IP address (the same principle is that the browser accesses the website through the web address rather than IP). There is a host file on each Linux host as the local domain name resolution configuration. We can write the IP and the corresponding domain name into this file to associate, In the future, we can connect other Linux virtual machines through domain names instead of IP

host file path: /etc/hosts

IP bound domain name format: IP address (space) domain name

[root@slave0 ~]# vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.25.10 slave0
192.168.25.11 slave1
192.168.25.12 slave2
~                                                                                                                      
~                                                                                                                      
~                                                                                                               
"/etc/hosts" 6L, 222C written
[root@slave0 ~]# 

Through the above operations, we have successfully bound the IP with the domain name. Later, slave0, slave1 and slave2 represent 192.168.25.10, 192.168.25.11 and 192.168.25.10 respectively

We can verify whether it is available by ping

[root@slave0 ~]# ping slave0
PING slave0 (192.168.25.10) 56(84) bytes of data.
64 bytes from slave0 (192.168.25.10): icmp_seq=1 ttl=64 time=0.016 ms
64 bytes from slave0 (192.168.25.10): icmp_seq=2 ttl=64 time=0.026 ms
64 bytes from slave0 (192.168.25.10): icmp_seq=3 ttl=64 time=0.030 ms
^C
--- slave0 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2006ms
rtt min/avg/max/mdev = 0.016/0.024/0.030/0.005 ms
[root@slave0 ~]# ping slave1
PING slave1 (192.168.25.11) 56(84) bytes of data.
64 bytes from slave1 (192.168.25.11): icmp_seq=1 ttl=64 time=0.578 ms
64 bytes from slave1 (192.168.25.11): icmp_seq=2 ttl=64 time=0.317 ms
64 bytes from slave1 (192.168.25.11): icmp_seq=3 ttl=64 time=0.728 ms
^C
--- slave1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2013ms
rtt min/avg/max/mdev = 0.317/0.541/0.728/0.169 ms
[root@slave0 ~]# ping slave2
PING slave2 (192.168.25.12) 56(84) bytes of data.
64 bytes from slave2 (192.168.25.12): icmp_seq=1 ttl=64 time=0.458 ms
64 bytes from slave2 (192.168.25.12): icmp_seq=2 ttl=64 time=0.326 ms
64 bytes from slave2 (192.168.25.12): icmp_seq=3 ttl=64 time=1.02 ms
^C
--- slave2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2011ms
rtt min/avg/max/mdev = 0.326/0.603/1.027/0.305 ms
[root@slave0 ~]# 

success

Configure password free login

We can connect to another virtual machine through ssh domain name /IP address, for example, from slave0 to slave1

[root@slave0 ~]# ssh slave1
The authenticity of host 'slave1 (192.168.25.11)' can't be established.
ECDSA key fingerprint is SHA256:brSII1Ii+yIXjzvMWG1Rxn+3vOTolPZq/rJomBVxl00.
ECDSA key fingerprint is MD5:49:0b:71:88:b6:21:4a:b3:c7:ad:79:88:78:0a:1e:5a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave1,192.168.25.11' (ECDSA) to the list of known hosts.
root@slave1's password: 
Last login: Sun Mar  6 13:35:22 2022 from 192.168.25.1
[root@slave1 ~]# exit
 Logout
Connection to slave1 closed.
[root@slave0 ~]# ssh slave1
root@slave1's password: 
Last login: Sun Mar  6 13:50:28 2022 from 192.168.25.10
[root@slave1 ~]# exit
 Logout
Connection to slave1 closed.
[root@slave0 ~]#

It's troublesome. You need to enter yes for the first time, and then you need to enter a password every time

In order to facilitate the interconnection between virtual machines, we can configure password free login

Generate public and private keys

Use the command SSH keygen -t RSA on slave0 to generate a pair of secret keys (the configuration remains the default, that is, press enter three times)

[root@slave0 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:qckAgde752zoURXjtNkmCHTTZiLKrUf1tLQPTmAa0mY root@slave0
The key's randomart image is:
+---[RSA 2048]----+
| ..+o o.+        |
|. o.Eo==**       |
| o.* B.O*oo      |
|  o.=  .*+       |
|   o...oSo       |
|  . o+.o. .      |
|   ..=+          |
|    ..+          |
|   ...           |
+----[SHA256]-----+
[root@slave0 ~]#

View the secret key pair we generated

[root@slave0 ~]# cd .ssh/
[root@slave0 .ssh]# ll
 Total consumption 8
-rw-------. 1 root root 1679 3 June 14:22 id_rsa
-rw-r--r--. 1 root root  393 3 June 14:22 id_rsa.pub
[root@slave0 .ssh]#

As you can see, an ID_ Another name of RSA is id_rsa.pub, where the public key with pub is the id_rsa.pub appended to ~/.ssh/authorized_ In the keys file, configure password free login and verify whether it can be used

[root@slave0 .ssh]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
[root@slave0 .ssh]# ssh slave0
Last login: Sun Mar  6 14:20:19 2022 from 192.168.25.1
[root@slave0 ~]# exit
 Logout
Connection to slave0 closed.
[root@slave0 .ssh]# 

It can be used normally, and the public key is distributed to the machine that needs password free login

Use the command scp file name domain name /IP address: the file storage location of the target machine to transfer the public key to the root user directory on slave1 and slave2 respectively

[root@slave0 .ssh]# scp ~/.ssh/id_rsa.pub slave1:~
root@slave1's password: 
id_rsa.pub                                                                           100%  393   322.5KB/s   00:00    
[root@slave0 .ssh]# scp ~/.ssh/id_rsa.pub slave2:~
The authenticity of host 'slave2 (192.168.25.12)' can't be established.
ECDSA key fingerprint is SHA256:brSII1Ii+yIXjzvMWG1Rxn+3vOTolPZq/rJomBVxl00.
ECDSA key fingerprint is MD5:49:0b:71:88:b6:21:4a:b3:c7:ad:79:88:78:0a:1e:5a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave2,192.168.25.12' (ECDSA) to the list of known hosts.
root@slave2's password: 
id_rsa.pub                                                                           100%  393   385.9KB/s   00:00    
[root@slave0 .ssh]# 

Then enable the public key on slave1 and slave2 respectively (that is, append the contents of the public key to the file ~/.ssh/authorized_keys. By default, the file does not need to be created first)

slave1

[root@slave1 ~]# ls
anaconda-ks.cfg  id_rsa.pub
[root@slave1 ~]# mkdir .ssh
[root@slave1 ~]# cd .ssh/
[root@slave1 .ssh]# touch authorized_keys
[root@slave1 .ssh]# cat ~/id_rsa.pub >> ~/.ssh/authorized_keys 
[root@slave1 .ssh]# 

slave2

[root@slave2 ~]# ls
anaconda-ks.cfg  id_rsa.pub
[root@slave2 ~]# mkdir .ssh
[root@slave2 ~]# cd .ssh/
[root@slave2 .ssh]# touch authorized_keys
[root@slave2 .ssh]# cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
[root@slave2 .ssh]# 

After completing the above operations, slave0 can log in to itself, slave1 and slave2 without password

[root@slave0 .ssh]# ssh slave0
Last login: Sun Mar  6 14:21:19 2022 from 192.168.25.1
[root@slave0 ~]# exit
 Logout
Connection to slave0 closed.
[root@slave0 .ssh]# ssh slave1
Last login: Sun Mar  6 14:17:58 2022 from 192.168.25.1
[root@slave1 ~]# exit
 Logout
Connection to slave1 closed.
[root@slave0 .ssh]# ssh slave2
Last login: Sun Mar  6 14:18:04 2022 from 192.168.25.1
[root@slave2 ~]# exit
 Logout
Connection to slave2 closed.
[root@slave0 .ssh]# 

See whether you need to configure slave1 and slave2 password free login to the other two machines according to your own needs. For convenience, I configure slave1 password free login to slave0 and slave2, and slave2 password free login to slave0 and slave1

Configure SSH

If you find that it is slow to log in to another virtual opportunity with SSH during the password free configuration just now, you can configure SSH

The main reason for the slow login is that CentOS will request DNS to verify the login IP when it is logged in remotely to ensure security, but we don't have to worry about this problem when we use it on the intranet. On the contrary, slow login will affect efficiency. Here, turn off SSH DNS verification for it

Modify file

Modify the SSH configuration file, SSH configuration file path: /etc/ssh/sshd_config

[root@slave0 ~]# vi /etc/ssh/sshd_config 
.
.
.
##ShowPatchLevel no
UseDNS no
##PidFile /var/run/sshd.pid
##MaxStartups 10:30:100
.
.
.
"/etc/ssh/sshd_config" 139L, 3905C written
[root@slave0 ~]# 

Use the search function of vi to change \UseDNS yes to UseDNS no in the file

Restart SSH

Restart the SSH service with the command service sshd restart

[root@slave0 ~]# service sshd restart
Redirecting to /bin/systemctl restart sshd.service
[root@slave0 ~]# 

Use the same method to configure the other two virtual machines on demand

Configure time synchronization

Server programs running on Linux generally have strict time requirements, and the time in at least one cluster should be synchronized

Install ntpdate

Ntpdate is required to configure time synchronization. Use the command yum install -y ntpdate to install ntpdate (Internet connection required)

[root@slave0 ~]# yum install -y ntpdate
 Plug in loaded: fastestmirror
Determining fastest mirrors
 * base: mirrors.ustc.edu.cn
.
.
.
already installed:
  ntpdate.x86_64 0:4.2.6p5-29.el7.centos.2                                                                             

complete!
[root@slave0 ~]# 

installation is complete

Synchronization time

Use the command ntpdate -u ntp server address to synchronize the time. The ntp server can be queried on the search engine

Here are some ntp servers I found

https://dns.icoa.cn/ntp/

National Time Service Center NTP The server ntp.ntsc.ac.cn
 China NTP Fast time service cn.ntp.org.cn
 international NTP Fast time service cn.pool.ntp.org
 Alibaba cloud public NTP The server ntp.aliyun.com
 Tencent cloud public NTP The server time1.cloud.tencent.com
 Education network (self built by colleges and universities) ntp.sjtu.edu.cn
 Microsoft Windows NTP The server time.windows.com

Here I use Alibaba cloud's ntp.aliyun.com

[root@slave0 ~]# ntpdate -u ntp.aliyun.com
 6 Mar 15:02:57 ntpdate[8252]: adjust time server 203.107.6.88 offset 0.017084 sec
[root@slave0 ~]# 

Synchronization successful

Set scheduled tasks

It's too troublesome to synchronize manually every time. We can set a scheduled task to synchronize it every other period of time

Add scheduled tasks to crontab

Before that, we need to know the full path of ntpdate. Use which ntpdate to get the full path of ntpdate

[root@slave0 ~]# which ntpdate
/usr/sbin/ntpdate
[root@slave0 ~]# 

Write /etc/crontab and set it to execute every 10 minutes (time is self-determined)

[root@slave0 ~]# vi /etc/crontab 
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root

## For details see man 4 crontabs

## Example of job definition:
## .---------------- minute (0 - 59)
## |  .------------- hour (0 - 23)
## |  |  .---------- day of month (1 - 31)
## |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
## |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
## |  |  |  |  |
## *  *  *  *  * user-name  command to be executed

*/10 * * * * root /usr/sbin/ntpdate -u ntp.aliyun.com

~                                                                                                                      
~                                                                                                                      
~                                                                                                                   
"/etc/crontab" 17L, 503C written
[root@slave0 ~]# 

complete

Configure the automatic synchronization time of the other two virtual machines according to the above steps

Disable email reminders

After configuring time synchronization, you will find such a statement prompting you to have new mail (or English) in /var/spool/mail/root

[root@slave0 ~]# ls
anaconda-ks.cfg  id_rsa.pub
 You are /var/spool/mail/root There are new messages in
[root@slave0 ~]# 

To remove the reminder, add a line of unset MAILCHECK at the end of /etc/profile, and then update the environment variable

[root@slave0 ~]# vi /etc/profile
.
.
.
unset i
unset -f pathmunge

unset MAILCHECK
"/etc/profile" 78L, 1836C written
 You are /var/spool/mail/root There are new messages in
[root@slave0 ~]# source /etc/profile
[root@slave0 ~]# 

Finish, and then don't remind me of any email

[root@slave0 ~]# ls
anaconda-ks.cfg  id_rsa.pub
[root@slave0 ~]# 

The other two virtual machines are configured with email reminders on demand

Configure without. / directly execute the program

When we execute the program, we need to add. / by default to execute it, which cannot be executed directly

[root@slave0 ~]# vi hello.sh
name=luckydog
echo $name
~                                                                                                                      
~                                                                                                                      
~                                                                                                                
"hello.sh" [New] 2L, 25C written
[root@slave0 ~]# chmod +x hello.sh 
[root@slave0 ~]# hello.sh
-bash: hello.sh: Command not found
[root@slave0 ~]# ./hello.sh
luckydog
[root@slave0 ~]# 

For convenience, the configuration file can be executed directly

  1. Edit the /etc/profile file and add the environment variable export PATH=.:$PATH at the end
[root@slave0 ~]# vi /etc/profile
.
.
.

unset i
unset -f pathmunge

unset MAILCHECK

## Files can be executed directly
export PATH=.:$PATH
"/etc/profile" 81L, 1884C written
[root@slave0 ~]# 
  1. Update environment variable source /etc/profile
[root@slave0 ~]# source /etc/profile
[root@slave0 ~]# 
  1. View the effect
[root@slave0 ~]# hello.sh
luckydog
[root@slave0 ~]# 

success

Configure the other two virtual machines according to your own needs

Java environment configuration

Most service programs are written in Java, so a Java environment is required for runtime. Here I use JDK7u79

Software download

  1. Open the official Oracle website oracle COM, click "Products" → "Hardware and Software" → "Java"

  1. Click the "Download Java" button

  1. Switch to the Java archive panel

  1. Scroll down, find and click "Java SE 7"

  1. Find the "Java SE Development Kit 7u79" panel and click "jdk-7u79-linux-x64.tar.gz" to download the Linux 64 bit GZ package

  1. Check "l reviewed and accept the Oracle Binary Code License Agreement for Java SE" and click "Download jdk-7u79-linux-x64.tar.gz"

  1. You need to log in. If you don't have an account, you can register

  1. Then download and get such a file

Software installation

Copy the installation package to the Linux virtual machine

First, upload the software to the Linux virtual machine

  1. First, create the directory where the software is stored, and plan to place the software installation package in /opt/software and the software program in /opt/module
[root@slave0 ~]# mkdir -p /opt/software
[root@slave0 ~]# mkdir -p /opt/module
[root@slave0 ~]#
  1. Click the "yellow earth" on the right in MobaXterm, enter the software storage path in the input box above, click the green tick on the right of the input box to go to the software storage directory, and drag the installation package of the JDK just downloaded into it. When dragging, there will be an upload progress bar, waiting for its upload to be completed

  1. After uploading, go to the software installation package storage directory to check whether the installation package exists (if it does not exist, upload again)
[root@slave0 ~]# cd /opt/software/
[root@slave0 software]# ls
jdk-7u79-linux-x64.tar.gz
[root@slave0 software]#

  1. Unzip jdk-7u79-linux-x64.tar GZ to the program storage directory /opt/module (z: gzip attribute, X: decompression, v: display process, f: use file name, -C: decompression to the specified directory)
[root@slave0 software]# tar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/module/
.
.
.
jdk1.7.0_79/db/bin/stopNetworkServer
jdk1.7.0_79/db/README-JDK.html
jdk1.7.0_79/db/NOTICE
jdk1.7.0_79/README.html
jdk1.7.0_79/THIRDPARTYLICENSEREADME.txt
[root@slave0 software]# 

Environment configuration

  1. Edit /etc/profile to add export JAVA_HOME=/opt/module/jdk1.7.0_79 and export PATH=$PATH:$JAVA_HOME/bin
[root@slave0 software]# vi /etc/profile
.
.
.
## Files can be executed directly
export PATH=.:$PATH

## JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.7.0_79
export PATH=$PATH:$JAVA_HOME/bin
"/etc/profile" 85L, 1971C written
[root@slave0 software]# 
  1. Update environment variables
[root@slave0 software]# source /etc/profile
[root@slave0 software]# 

verification

Use the command java -version to view the Java version. If the version is displayed correctly, the configuration succeeds. If the Java version cannot be displayed, the configuration fails. You need to check it

[root@slave0 software]# java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
[root@slave0 software]# 

success

The other two virtual machines are also configured with the Java environment according to the above steps

Hadoop cluster deployment

The version of Hadoop used is hadoop-2.7.2

Three virtual machines are required, and the node function planning is as follows:

The NameNode process of HDFS runs on slave0, the SecondaryNameNode process runs on slave2, and all three nodes run DataNode processes

YARN's ResourceManager process runs on slave1, and all three nodes run NodeManager processes

download

  1. Open the website hadoop.apache.org and click "Download"

  1. Scroll down, find and click "Apache release archive"

  1. Find and click "hadoop-2.7.2/"

  1. Click "hadoop-2.7.2.tar.gz" to start downloading Hadoop 2.7.2

  1. Get such a file after downloading

  1. (optional) to ensure the correct download, you can use SHA256 to verify it

install

Plan to put the software installation package in /opt/software and the software program in /opt/module

  1. Click the "yellow earth" on the right in MobaXterm, enter the software storage path in the input box above, click the green tick on the right of the input box to go to the software storage directory, and drag the newly downloaded Hadoop installation package into it. When dragging, there will be an upload progress bar, waiting for its upload to be completed

  1. After uploading, go to the software installation package storage directory to check whether the installation package exists (if it does not exist, upload again)
[root@slave0 software]# ls
hadoop-2.7.2.tar.gz
jdk-7u79-linux-x64.tar.gz
[root@slave0 software]#

  1. Unzip hadoop-2.7.2.tar GZ to the program storage directory /opt/module (z: gzip attribute, x: decompression, v: display process, f: use file name, -C: decompression to the specified directory)
[root@slave0 software]# tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
.
.
.
hadoop-2.7.2/lib/native/libhadoop.a
hadoop-2.7.2/lib/native/libhdfs.a
hadoop-2.7.2/lib/native/libhadoop.so
hadoop-2.7.2/lib/native/libhadooppipes.a
hadoop-2.7.2/LICENSE.txt
[root@slave0 software]#

to configure

environment variable

  1. Edit /etc/profile to join export HADOOP_HOME=/opt/module/hadoop-2.7.2,export PATH=$PATH:$HADOOP_HOME/bin and export PATH=$PATH:$HADOOP_HOME/sbin
[root@slave0 software]# vi /etc/profile
.
.
.
export PATH=$PATH:$JAVA_HOME/bin

## HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
"/etc/profile" 90L, 2101C written
[root@slave0 software]# 
  1. Update environment variables
[root@slave0 software]# source /etc/profile
[root@slave0 software]# 

Hadoop configuration

  1. The configuration files of Hadoop are in the directory /opt/module/hadoop-2.7.2/etc/hadoop/. Jump to this directory first
[root@slave0 software]# cd /opt/module/hadoop-2.7.2/etc/hadoop/
[root@slave0 hadoop]#
  1. Edit Hadoop env SH, add Java environment variable configuration export Java at the end_ HOME=/opt/module/jdk1.7.0_ seventy-nine
[root@slave0 hadoop]# vi hadoop-env.sh 
.
.
.
## A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER

export JAVA_HOME=/opt/module/jdk1.7.0_79
"hadoop-env.sh" 100L, 4266C written
[root@slave0 hadoop]# 
  1. Edit yarn env SH, add Java environment variables before the first export statement to configure export Java_ HOME=/opt/module/jdk1.7.0_ seventy-nine
[root@slave0 hadoop]# vi yarn-env.sh 
.
.
.
## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
## See the License for the specific language governing permissions and
## limitations under the License.

export JAVA_HOME=/opt/module/jdk1.7.0_79

## User for YARN daemons
export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
.
.
.
"yarn-env.sh" 123L, 4609C written
[root@slave0 hadoop]# 
  1. Edit mapred env SH, add Java environment variables before the first export statement to configure export Java_ HOME=/opt/module/jdk1.7.0_ seventy-nine
[root@slave0 hadoop]# vi mapred-env.sh 
.
.
.
## See the License for the specific language governing permissions and
## limitations under the License.

export JAVA_HOME=/opt/module/jdk1.7.0_79

## export JAVA_HOME=/home/y/libexec/jdk1.6.0/

export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
.
.
.                                                                                                                  
"mapred-env.sh" 29L, 1425C written
[root@slave0 hadoop]# 
  1. Edit core-site.xml and add the following configuration in the configuration tag at the end:
    <!-- appoint HDFS in NameNode Node information of the process -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://slave0:9000</value>
    </property>
    <!-- appoint Hadoop Storage directory of files generated at run time -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop-2.7.2/data/tmp</value>
    </property>

Operation:

[root@slave0 hadoop]# vi core-site.xml 
.
.
.
<configuration>
    <!-- appoint HDFS in NameNode Node information of the process -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://slave0:9000</value>
    </property>
    <!-- appoint Hadoop Storage directory of files generated at run time -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop-2.7.2/data/tmp</value>
    </property>
</configuration>
"core-site.xml" 30L, 1129C written
[root@slave0 hadoop]# 
  1. Edit hdfs-site.xml and add the following configuration in the configuration tag at the end:
    <!-- appoint HDFS Number of file copies (there are 3 slave nodes in the cluster,Default is 3) -->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!-- appoint HDFS in SecondaryNameNode Node information of the process -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave2:50090</value>
    </property>

Operation:

[root@slave0 hadoop]# vi hdfs-site.xml 
.
.
.
<configuration>
    <!-- appoint HDFS Number of file copies (there are 3 slave nodes in the cluster,Default is 3) -->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!-- appoint HDFS in SecondaryNameNode Node information of the process -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave2:50090</value>
    </property>
</configuration>
"hdfs-site.xml" 30L, 1146C written
[root@slave0 hadoop]# 
  1. Edit yarn-site.xml and add the following configuration in the configuration tag at the end:
    <!-- set up Reducer How to obtain data -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- appoint YARN of ResourceManager Node information of the process -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>slave1</value>
    </property>

Operation:

[root@slave0 hadoop]# vi yarn-site.xml 
.
.
.
<configuration>

<!-- Site specific YARN configuration properties -->
    <!-- set up Reducer How to obtain data -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- appoint YARN of ResourceManager Node information of the process -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>slave1</value>
    </property>
</configuration>
~                                                                                                                      
~                                                                                                                      
"yarn-site.xml" 28L, 1041C written
[root@slave0 hadoop]# 
  1. Write the contents of mapred-site.xml.template file to mapred-site.xml
[root@slave0 hadoop]# cat mapred-site.xml.template >> mapred-site.xml
[root@slave0 hadoop]# 
  1. Edit mapred-site.xml and add the following configuration in the configuration tag at the end:
    <!-- appoint MapReduce Run on YARN upper -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

Operation:

[root@slave0 hadoop]# vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <!-- appoint MapReduce Run on YARN upper -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
~                                                                                                                      
~                                                                                                                      
~                                                                                                                       
"mapred-site.xml" 25L, 907C written
[root@slave0 hadoop]# 
  1. Edit the slaves file and add the DataNode node domain name
slave0
slave1
slave2

Operation:

[root@slave0 hadoop]# vi slaves 
slave0
slave1
slave2
~                                                                                                                      
~                                                                                                                      
~                                                                                                              
"slaves" 3L, 21C written
[root@slave0 hadoop]# 

complete

distribute

Use the command scp -rq folder domain name /IP address: save the location of the target host to copy the configured Hadoop to the other two virtual machines

[root@slave0 hadoop]# cd /opt/module/
[root@slave0 module]# scp -rq hadoop-2.7.2/ slave1:/opt/module/
[root@slave0 module]# scp -rq hadoop-2.7.2/ slave2:/opt/module/
[root@slave0 module]# 

Check replication on the other two virtual machines

slave1

[root@slave1 ~]# cd /opt/module/
[root@slave1 module]# ll
 Total consumption 0
drwxr-xr-x. 9 root root 149 3 June 20:37 hadoop-2.7.2
drwxr-xr-x. 8   10  143 233 4 November 2015 jdk1.7.0_79
[root@slave1 module]# 

slave2

[root@slave2 ~]# cd /opt/module/
[root@slave2 module]# ll
 Total consumption 0
drwxr-xr-x. 9 root root 149 3 June 20:37 hadoop-2.7.2
drwxr-xr-x. 8   10  143 233 4 November 2015 jdk1.7.0_79
[root@slave2 module]# 

complete

start-up

  1. For the first startup, you need to format the NameNode, and run the command bin/hdfs namenode -format under the hadoop installation directory on the planned NameNode host
[root@slave0 module]# cd /opt/module/hadoop-2.7.2/
[root@slave0 hadoop-2.7.2]# bin/hdfs namenode -format
22/03/06 20:43:16 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = slave0/192.168.25.10
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.2
.
.
.
22/03/06 20:43:19 INFO namenode.FSImage: Allocated new BlockPoolId: BP-453973894-192.168.25.10-1646570598969
22/03/06 20:43:19 INFO common.Storage: Storage directory /opt/module/hadoop-2.7.2/data/tmp/dfs/name has been successfully formatted.
22/03/06 20:43:19 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
22/03/06 20:43:19 INFO util.ExitUtil: Exiting with status 0
22/03/06 20:43:19 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at slave0/192.168.25.10
************************************************************/
[root@slave0 hadoop-2.7.2]# 

If you see the above information with "successfully formatted", it means that the formatting is successful

22/03/06 20:43:19 INFO common.Storage: Storage directory /opt/module/hadoop-2.7.2/data/tmp/dfs/name has been successfully formatted.
  1. Start HDFS of Hadoop on the virtual machine that needs to start the NameNode process. Here is slave0
[root@slave0 hadoop-2.7.2]# cd /opt/module/hadoop-2.7.2/
[root@slave0 hadoop-2.7.2]# sbin/start-dfs.sh 
Starting namenodes on [slave0]
slave0: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-namenode-slave0.out
slave2: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-slave1.out
slave0: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-slave0.out
Starting secondary namenodes [slave2]
slave2: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-slave2.out
[root@slave0 hadoop-2.7.2]# 
  1. Start the YARN of Hadoop on the virtual machine that needs to start the ResourceManager process. Here is slave1
[root@slave1 module]# cd /opt/module/hadoop-2.7.2/
[root@slave1 hadoop-2.7.2]# sbin/start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-resourcemanager-slave1.out
slave2: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-slave2.out
slave0: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-slave0.out
slave1: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-slave1.out
[root@slave1 hadoop-2.7.2]# 
  1. Use the command jps on the three virtual machines to check whether the process information conforms to the cluster deployment planning table

slave0

[root@slave0 hadoop-2.7.2]# jps
11911 NameNode
12283 NodeManager
12008 DataNode
12392 Jps
[root@slave0 hadoop-2.7.2]# 

slave1

[root@slave1 hadoop-2.7.2]# jps
11076 NodeManager
10974 ResourceManager
11367 Jps
10847 DataNode
[root@slave1 hadoop-2.7.2]# 

slave2

[root@slave2 module]# jps
10954 NodeManager
11074 Jps
10766 DataNode
10840 SecondaryNameNode
[root@slave2 module]# 

Comply with the cluster deployment planning table, and the cluster deployment is successful

ZooKeeper deployment

The version of ZooKeeper used is 3.4.10

download

  1. Open the website zookeeper.apache Org, click the "Download" tab under Getting Started

  1. Find "Older releases are available in the archive." and click the "in the archive" tab

  1. Scroll down, find and click "zookeeper-3.4.10/"

  1. Click "zookeeper-3.4.10.tar.gz" to start downloading ZooKeeper 3.4.10

  1. Get such a file after downloading

  1. (optional) to ensure the correct download, you can use SHA1 to verify it

install

Plan to put the software installation package in /opt/software and the software program in /opt/module

  1. Click the "yellow earth" on the right in MobaXterm, enter the software storage path in the input box above, click the green tick on the right of the input box to go to the software storage directory, and drag the newly downloaded ZooKeeper installation package into it. When dragging, there will be an upload progress bar, waiting for its upload to be completed

  1. After uploading, go to the software installation package storage directory to check whether the installation package exists (if it does not exist, upload again)
[root@slave0 hadoop-2.7.2]# cd /opt/software/
[root@slave0 software]# ll
 Total consumption 391220
-rw-r--r--. 1 root root 212046774 3 June 17:39 hadoop-2.7.2.tar.gz
-rw-r--r--. 1 root root 153512879 3 June 16:06 jdk-7u79-linux-x64.tar.gz
-rw-r--r--. 1 root root  35042811 3 June 21:36 zookeeper-3.4.10.tar.gz
[root@slave0 software]# 

  1. Unzip zookeeper-3.4.10.tar GZ to the program storage directory /opt/module (z: gzip attribute, x: decompression, v: display process, f: use file name, -C: decompression to the specified directory)
[root@slave0 software]# tar -zxvf zookeeper-3.4.10.tar.gz  -C /opt/module/
.
.
.
zookeeper-3.4.10/bin/zkServer.sh
zookeeper-3.4.10/bin/zkCli.cmd
zookeeper-3.4.10/bin/zkEnv.cmd
zookeeper-3.4.10/ivysettings.xml
[root@slave0 software]# 

to configure

environment variable

  1. Edit /etc/profile to join export zookeeper_ Home=/opt/module/zoomeeper-3.4.10 and export path=$path:$zoomeeper_ HOME/bin
[root@slave0 software]# vi /etc/profile
.
.
.
export PATH=$PATH:$HADOOP_HOME/sbin

## ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.10
export PATH=$PATH:$ZOOKEEPER_HOME/bin                                                                                                               
"/etc/profile" 94L, 2203C written
[root@slave0 software]# 
  1. Update environment variables
[root@slave0 software]# source /etc/profile
[root@slave0 software]# 

ZooKeeper configuration

  1. Create a data directory, and create a data/zkData directory under /opt/module/zookeeper-3.4.10/
[root@slave0 software]# cd /opt/module/zookeeper-3.4.10/
[root@slave0 zookeeper-3.4.10]# mkdir -p data/zkData
[root@slave0 zookeeper-3.4.10]# 
  1. Copy the configuration template file /opt/module/zookeeper-3.4.10/conf/zoo_ The contents of sample.cfg are stored in the zoo.cfg file in the same directory
[root@slave0 zookeeper-3.4.10]# cd conf/
[root@slave0 conf]# cat zoo_sample.cfg >> zoo.cfg
[root@slave0 conf]# 
  1. Modify zoo.cfg, change the value of dataDir to /opt/module/zookeeper-3.4.10/data/zkData, and then add the following content:
## colony
server.1=slave0:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

Operation:

[root@slave0 conf]# vi zoo.cfg 
.
.
.
## do not use /tmp for storage, /tmp here is just
## example sakes.
dataDir=/opt/module/zookeeper-3.4.10/data/zkData
## the port at which the clients will connect
clientPort=2181
.
.
.

## colony
server.1=slave0:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
"zoo.cfg" 33L, 1036C written
[root@slave0 conf]# 
  1. Create a file named myid in /opt/module/zookeeper-3.4.10/data/zkData/ directory, and assign corresponding numbers to different machines. Refer to the content about clusters just added in /opt/module/zookeeper-3.4.10/conf/zoo.cfg file
[root@slave0 conf]# cd /opt/module/zookeeper-3.4.10/data/zkData/
[root@slave0 zkData]# touch myid
[root@slave0 zkData]# vi myid
1
~                                                                                                              
~                                                                                                                      
~                                                                                                                      
"myid" 1L, 2C written
[root@slave0 zkData]# 

For example, my slave0 is numbered 1, slave1 is numbered 2, and slave2 is numbered 3

distribute

Use the command scp -rq folder domain name /IP address: save the location of the target host and copy the configured ZooKeeper to the other two virtual machines

[root@slave0 zkData]# cd /opt/module/
[root@slave0 module]# scp -rq zookeeper-3.4.10/ slave1:/opt/module/
[root@slave0 module]# scp -rq zookeeper-3.4.10/ slave2:/opt/module/
[root@slave0 module]# 

Check the replication on the other two virtual machines and modify the number in the myid file

slave1

[root@slave1 hadoop-2.7.2]# cd /opt/module/
[root@slave1 module]# ll
 Total consumption 4
drwxr-xr-x. 11 root root  173 3 June 20:48 hadoop-2.7.2
drwxr-xr-x.  8   10  143  233 4 November 2015 jdk1.7.0_79
drwxr-xr-x. 11 root root 4096 3 June 22:21 zookeeper-3.4.10
[root@slave1 module]# vi /opt/module/zookeeper-3.4.10/data/zkData/myid 
2
~                                                                                                                      
~                                                                                                                      
~                                                                                                                
"zookeeper-3.4.10/data/zkData/myid" 1L, 2C written
[root@slave1 module]# 

slave2

[root@slave2 module]# ll
 Total consumption 4
drwxr-xr-x. 11 root root  173 3 June 20:48 hadoop-2.7.2
drwxr-xr-x.  8   10  143  233 4 November 2015 jdk1.7.0_79
drwxr-xr-x. 11 root root 4096 3 June 22:21 zookeeper-3.4.10
[root@slave2 module]# vi /opt/module/zookeeper-3.4.10/data/zkData/myid 
3
~                                                                                                                  
~                                                                                                                      
~                                                                                                                      
"zookeeper-3.4.10/data/zkData/myid" 1L, 2C written
[root@slave2 module]# 

complete

start-up

  1. Start ZooKeeper on slave0, slave1 and slave2 respectively:

slave0

[root@slave0 module]# cd /opt/module/zookeeper-3.4.10/
[root@slave0 zookeeper-3.4.10]# bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave0 zookeeper-3.4.10]# 

slave1

[root@slave1 module]# cd /opt/module/zookeeper-3.4.10/
[root@slave1 zookeeper-3.4.10]# bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave1 zookeeper-3.4.10]# 

slave2

[root@slave2 module]# cd /opt/module/zookeeper-3.4.10/
[root@slave2 zookeeper-3.4.10]# bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave2 zookeeper-3.4.10]# 
  1. Use the command jps to check the process information on the three virtual machines respectively

slave0

[root@slave0 zookeeper-3.4.10]# jps
11911 NameNode
12283 NodeManager
12008 DataNode
13100 Jps
13046 QuorumPeerMain
[root@slave0 zookeeper-3.4.10]# 

slave1

[root@slave1 zookeeper-3.4.10]# jps
11076 NodeManager
10974 ResourceManager
11966 QuorumPeerMain
12010 Jps
10847 DataNode
[root@slave1 zookeeper-3.4.10]# 

slave2

[root@slave2 zookeeper-3.4.10]# jps
11836 QuorumPeerMain
10954 NodeManager
10766 DataNode
11873 Jps
10840 SecondaryNameNode
[root@slave2 zookeeper-3.4.10]# 

QuorumPeerMain process exists, success

HBase deployment

HBase version used is 1.3.3

download

  1. Open the website: hbase.apache.org, find and click the "here" tab under "Download"

  1. Turn to the end of the page, find and click the "Apache Archive" tab

  1. Scroll down to find and click "1.3.3/"

  1. Click "hbase-1.3.3-bin.tar.gz" to start downloading HBase version 1.3.3 installation package

  1. Get such a file after downloading

install

Plan to put the software installation package in /opt/software and the software program in /opt/module

  1. Click the "yellow earth" on the right in MobaXterm, enter the software storage path in the input box above, click the green tick on the right of the input box to go to the software storage directory, and drag the HBase installation package just downloaded into it. When dragging, there will be an upload progress bar, waiting for its upload to be completed

  1. After uploading, go to the software installation package storage directory to check whether the installation package exists (if it does not exist, upload again)
[root@slave0 zookeeper-3.4.10]# cd /opt/software/
[root@slave0 software]# ll
 Total consumption 496104
-rw-r--r--. 1 root root 212046774 3 June 17:39 hadoop-2.7.2.tar.gz
-rw-r--r--. 1 root root 107398278 3 June 22:58 hbase-1.3.3-bin.tar.gz
-rw-r--r--. 1 root root 153512879 3 June 16:06 jdk-7u79-linux-x64.tar.gz
-rw-r--r--. 1 root root  35042811 3 June 21:36 zookeeper-3.4.10.tar.gz
[root@slave0 software]# 

  1. Unzip hbase-1.3.3-bin.tar GZ to the program storage directory /opt/module (z: gzip attribute, x: decompression, v: display process, f: use file name, -C: decompression to the specified directory)
[root@slave0 software]# tar -zvxf hbase-1.3.3-bin.tar.gz -C /opt/module/
.
.
.
hbase-1.3.3/lib/hbase-server-1.3.3-tests.jar
hbase-1.3.3/lib/hbase-it-1.3.3-tests.jar
hbase-1.3.3/lib/hbase-annotations-1.3.3-tests.jar
[root@slave0 software]# 

to configure

environment variable

  1. Edit /etc/profile to join export HBASE_HOME=/opt/module/hbase-1.3.3 and export PATH=$PATH:$HBASE_HOME/bin
[root@slave0 software]# vi /etc/profile
.
.
.
export PATH=$PATH:$ZOOKEEPER_HOME/bin

## HBASE_HOME
export HBASE_HOME=/opt/module/hbase-1.3.3
export PATH=$PATH:$HBASE_HOME/bin
"/etc/profile" 98L, 2298C written
[root@slave0 software]# 
  1. Update environment variables
[root@slave0 software]# source /etc/profile
[root@slave0 software]# 

HBase configuration

All configuration files are in the /opt/module/hbase-1.3.3/conf/ directory. Jump to this directory first

[root@slave0 software]# cd /opt/module/hbase-1.3.3/conf/
[root@slave0 conf]# 
  1. Edit HBase env SH, add export Java at the beginning of the file_ HOME=/opt/module/jdk1.7.0_ 79 and export HBASE_MANAGES_ZK=false
[root@slave0 conf]# vi hbase-env.sh 
.
.
.
## * See the License for the specific language governing permissions and
## * limitations under the License.
## */
## JDK path
export JAVA_HOME=/opt/module/jdk1.7.0_79
## Set the use of external ZooKeeper
export HBASE_MANAGES_ZK=false

## Set environment variables here.
.
.
.
"hbase-env.sh" 140L, 7628C written
[root@slave0 conf]# 
  1. Edit hbase-site.xml and add the following configuration in the configuration tag at the end:
  <!-- Set the maximum clock offset to reduce the requirements for time synchronization -->
  <property>
    <name>hbase.master.maxclockskew</name>
    <value>180000</value>
  </property>
  <!-- appoint HDFS Instance address -->
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://slave0:9000/hbase</value>
  </property>
  <!-- Enable distributed clusters -->
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <!-- ZooKeeper Configuration: settings ZooKeeper Cluster node -->
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>slave0,slave1,slave2</value>
  </property>
  <!-- ZooKeeper Configuration: settings ZooKeeper Data directory -->
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/opt/module/zookeeper-3.4.10/data/zkData</value>
  </property>

Operation:

[root@slave0 conf]# vi hbase-site.xml 
.
.
.
-->
<configuration>
  <!-- Set the maximum clock offset to reduce the requirements for time synchronization -->
  <property>
    <name>hbase.master.maxclockskew</name>
    <value>180000</value>
  </property>
  <!-- appoint HDFS Instance address -->
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://slave0:9000/hbase</value>
  </property>
  <!-- Enable distributed clusters -->
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <!-- ZooKeeper Configuration: settings ZooKeeper Cluster node -->
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>slave0,slave1,slave2</value>
  </property>
  <!-- ZooKeeper Configuration: settings ZooKeeper Data directory -->
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/opt/module/zookeeper-3.4.10/data/zkData</value>
  </property>
</configuration>
"hbase-site.xml" 49L, 1723C written
[root@slave0 conf]# 
  1. Edit the regionservers file and add the domain name of the HRegionServer node
slave0
slave1
slave2

Operation:

[root@slave0 conf]# vi regionservers 
slave0
slave1
slave2
~                                                                                                                      
~                                                                                                                      
~                                                                                                              
"regionservers" 3L, 21C written
[root@slave0 conf]# 
  1. Copy the core-site.xml and hdfs-site.xml configuration files of Hadoop to the conf directory of HBase
[root@slave0 conf]# cd /opt/module/hadoop-2.7.2/etc/hadoop/
[root@slave0 hadoop]# cp core-site.xml /opt/module/hbase-1.3.3/conf/
[root@slave0 hadoop]# cp hdfs-site.xml /opt/module/hbase-1.3.3/conf/
[root@slave0 hadoop]# 

complete

distribute

Use the command scp -rq folder domain name /IP address: save the location of the target host and copy the configured HBase to the other two virtual machines

[root@slave0 hadoop]# cd /opt/module/
[root@slave0 module]# scp -rq hbase-1.3.3/ slave1:/opt/module/
[root@slave0 module]# scp -rq hbase-1.3.3/ slave2:/opt/module/
[root@slave0 module]# 

Check replication on the other two virtual machines

slave1

[root@slave1 zookeeper-3.4.10]# cd /opt/module/
[root@slave1 module]# ll
 Total consumption 4
drwxr-xr-x. 11 root root  173 3 June 20:48 hadoop-2.7.2
drwxr-xr-x.  7 root root  160 3 June 23:40 hbase-1.3.3
drwxr-xr-x.  8   10  143  233 4 November 2015 jdk1.7.0_79
drwxr-xr-x. 11 root root 4096 3 June 22:27 zookeeper-3.4.10
[root@slave1 module]# 

slave2

[root@slave2 zookeeper-3.4.10]# cd /opt/module/
[root@slave2 module]# ll
 Total consumption 4
drwxr-xr-x. 11 root root  173 3 June 20:48 hadoop-2.7.2
drwxr-xr-x.  7 root root  160 3 June 23:40 hbase-1.3.3
drwxr-xr-x.  8   10  143  233 4 November 2015 jdk1.7.0_79
drwxr-xr-x. 11 root root 4096 3 June 22:28 zookeeper-3.4.10
[root@slave2 module]# 

complete

start-up

  1. Start the HBase cluster with the command bin / start hbase.sh under the HBase installation directory
[root@slave0 hbase-1.3.3]# bin/start-hbase.sh 
starting master, logging to /opt/module/hbase-1.3.3/logs/hbase-root-master-slave0.out
slave1: starting regionserver, logging to /opt/module/hbase-1.3.3/bin/../logs/hbase-root-regionserver-slave1.out
slave2: starting regionserver, logging to /opt/module/hbase-1.3.3/bin/../logs/hbase-root-regionserver-slave2.out
slave0: starting regionserver, logging to /opt/module/hbase-1.3.3/bin/../logs/hbase-root-regionserver-slave0.out
[root@slave0 hbase-1.3.3]# 
  1. Use the command jps to check the process information on the three virtual machines respectively

slave0

[root@slave0 hbase-1.3.3]# jps
11911 NameNode
13896 HMaster
14685 Jps
12283 NodeManager
12008 DataNode
14456 HRegionServer
13046 QuorumPeerMain
[root@slave0 hbase-1.3.3]# 

slave1

[root@slave1 module]# jps
12663 HRegionServer
11076 NodeManager
10974 ResourceManager
12819 Jps
11966 QuorumPeerMain
10847 DataNode
[root@slave1 module]# 

slave2

[root@slave2 module]# jps
12852 Jps
11836 QuorumPeerMain
10954 NodeManager
10766 DataNode
10840 SecondaryNameNode
12676 HRegionServer
[root@slave2 module]# 

All three virtual machines have HRegionServer processes, and slave0 has HMaster processes. Success

source address

https://www.wolai.com/kq9qzZCG33HVkAdATecant

Tags: Big Data Linux HBase

Posted by umrguy on Sat, 30 Jul 2022 02:03:51 +0930