DMMPP two node environment construction

3.1 DMMPP two node environment construction

1. Preliminary planning

1.1 system planning

1.2 IP and port planning

1.3 storage planning

1.4 database initialization parameter planning

2. Install basic database software

(omitted)
PS:

The version of DM server used by each EP of DM MPP shall be consistent. Meanwhile, pay attention to the number of operating system bits, size end mode
The time zone and time setting should be consistent, otherwise unexpected errors may be caused.

3. configure environment variables


The specific configuration of environment variables is as follows:

su - dmdba
vi .bash_profile
source ~/.bash_profile
#################################################
### Dameng Database Environment Configuration ###
export TMP=/tmp
export TMP_DIR=$TMP
export DM_INSTALL_TMPDIR=$TMP
export DM_HOME=/DMDB/app/DM8
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$DM_HOME/bin"
export PATH=$DM_HOME/bin:$DM_HOME/tool:$PATH:$HOME/bin
umask 022
### end ###

4. Initialize database instance

There are two ways to build the Library: dbca graphical tool and dminit command line tool.

Use the dbca graphical tool to build the library here:
(create databases on MPPEP01/MPPEP02 hosts respectively)
Screenshot of MPPEP01:








Screenshot of MPPEP02:




Stop the database instance before configuring the mpp environment

service DmServiceEP01 stop
service DmServiceEP02 stop

5. Edit the configuration file required for the MPP environment

5.1 configuring DM ini

MPPEP01 host database EP01 instance DM INI file

## adjustment dm.ini Partial parameters ##
INSTANCE_NAME = EP01
PORT_NUM = 5236
MAL_INI = 1
MPP_INI = 1

MPPEP02 host database EP02 instance DM INI file

## adjustment dm.ini Partial parameters ##
INSTANCE_NAME = EP02
PORT_NUM = 5236
MAL_INI = 1
MPP_INI = 1
5.2 configuring dmmal ini

Dmmal between all EP nodes Ini communication system configuration file is the same.

[MAL_INST1]
MAL_INST_NAME = EP01
MAL_HOST = 10.10.10.11
MAL_PORT = 5336
MAL_INST_HOST = 192.168.0.11
MAL_INST_PORT = 5236

[MAL_INST2]
MAL_INST_NAME = EP02
MAL_HOST = 10.10.10.12
MAL_PORT = 5336
MAL_INST_HOST = 192.168.0.12
MAL_INST_PORT = 5236

Same dmmal Ini to MPPEP02 data directory

[dmdba@MPPEP01 DMDB]$ scp dmmal.ini 192.168.0.12:/DMDB/data/DMDB/
dmdba@192.168.0.12's password: 
dmmal.ini                                                                                           100%  245     0.2KB/s   00:00    
[dmdba@MPPEP01 DMDB]$ 
5.3 prepare dmmpp Ini configuration file (dmpp.ctl)

dmmpp.ctl is a binary file, which cannot be configured directly by the user. You need to configure dmmpp first ini.

# dmmpp.ini
[SERVICE_NAME1]
MPP_SEQ_NO = 0
MPP_INST_NAME = EP01

[SERVICE_NAME2] 
MPP_SEQ_NO = 1
MPP_INST_NAME = EP02

Use the dmctlcvt tool to convert dmmpp Ini to dmmpp CTL control file, and then transfer this file to each other at each node to ensure that all nodes apply the same dmmpp ctl.

dmctlcvt TYPE=2 SRC=/DMDB/data/DMDB/dmmpp.ini DEST=/DMDB/data/DMDB/dmmpp.ctl
scp /DMDB/data/DMDB/dmmpp.ctl 192.168.0.12:/DMDB/data/DMDB


[dmdba@MPPEP01 DMDB]$ ls -l dmmpp*
-rw-r--r-- 1 dmdba dinstall 41540 12 April 15:01 dmmpp.ctl
-rw-r--r-- 1 dmdba dinstall   106 12 April 15:00 dmmpp.ini
[dmdba@MPPEP01 DMDB]$ scp /DMDB/data/DMDB/dmmpp.ctl 192.168.0.12:/DMDB/data/DMDB
dmdba@192.168.0.12's password: 
dmmpp.ctl                                                                                           100%   41KB  40.6KB/s   00:00    

6. Run MPP

After the MPP front-end environment is configured, start the database instances EP01/EP02 respectively (regardless of the sequence of instance startup).

### PS: the user logs in to any EP node (also known as the master node) for normal database operation.
chkconfig --list |grep ^Dm
service DmServiceEP01 start
service DmServiceEP02 start

7. Create distribution table test

Global connection login mpp (default) disql SYSDBA/SYSDBAglobal@localhost:5236
Log in to MPP disql SYSDBA through local connection/ SYSDBAlocal@localhost :5236

dm_svc.conf configuration

TIME_ZONE=(+8:00)
LANGUAGE=(en)
mppsvc=(192.168.0.11:5236,192.168.0.12:5236)
LOGIN_MODE=(1)


7.1 creating tablespaces
## DDL must be executed in mpp global mode
create tablespace cqsoftbs 
   datafile '/DMDB/data/DMDB/cqsoftbs01.DBF' size 64;
7.2 creating users
create user cqsoft identified by "cqsoft_123"  
   diskspace unlimited 
   default tablespace cqsoftbs;
7.3 authorized users
grant resource to cqsoft;
7.4 create hash distribution table
conn cqsoft/"cqsoft_123" 
create table tab_hash(a1 int, a2 char(10)) distributed by hash(a1);
7.5 creating a random distribution table

create table tab_random(a1 int, a2 char(10)) distributed randomly;

7.6 create range distribution table
create table tab_range(a1 int, a2 char(10)) 
  distributed by range(a1) 
    (
    values equ or less than (100) on EP01,
    values equ or less than (maxvalue) on EP02 
    );    
7.7 create LIST distribution table
create table tab_list (a1 int, a2 char(10)) 
  distributed by list(a2)
   (
     values ('male') on EP01,
     values (default) on EP02
   );
   
-- The instance distribution in the distribution table cannot be specified repeatedly.
values ('female') on EP02,
 values (default) on EP02  
error[-2212]:The distribution table specifies the same instance name[EP02]. 
7.8 create replication distribution table

The data content of this table on each EP node is exactly the same. There is no need to get data from other nodes.

create table tab_replicate (a1 int, a2 char(10))  distributed fully;
7.9 range and horizontal partition table of hash distribution table
create table tab_hash_range_part (a1 int, a2 char(10), a3 char(10))
  partition by range (a1)
   (
     partition p1 values less than (100),
     partition p2 values less than (200),   
     partition p3 values less than (maxvalue)  
   )
 distributed by hash (a1);  

8. Fast data loading dmfldr

Prepare data text file tab_list_distr.txt

1,in
2,male
3,male
4,male
5,male
6,male
7,female
8,female
9,female
10,female
11,female
12,female
13,female
14,female
15,female
16,female
17,female
18,female
19,male
20,male

dmfldr control filetab_ list_ distr.ctl
Note: you'd better check the code of jukeng data file first.

OPTIONS (
 CHARACTER_CODE = 'UTF-8'       
)

LOAD DATA
INFILE '/DMDB/soft/tab_list_distr.txt'
BADFILE '/DMDB/soft/fldr.bad'
INTO TABLE tab_list
FIELDS ','
(A1,A2)

dmfldr load data

su - dmdba
dmfldr cqsoft/"cqsoft_123"@localhost:5236 control=\'/DMDB/soft/tab_list_distr.ctl\'

Global mode query table content:

Local mode query table content:

PS: from the above query results, we know that mpp uses the benefits of distributed storage and distributed computing to improve the efficiency of parallel processing of the system.

9. Stop MPP

***Precautions***

If an EP fails and stops during the operation of DM MPP system, the whole MPP system will be in a state that DM large-scale parallel processing MPP can not normally serve.
At present, all user sessions will be disconnected by the system. Global login is not allowed, but local login is only allowed. Therefore, in order to ensure the high availability of MPP system, we strongly recommend using DM MPP
Deployment scheme combined with data guard.

#Stopping mpp is the same as the normal stopping of database instances, regardless of order. That is, stop the database instance of each EP. service DmServiceEP01 stop service DmServiceEP02 stop

10. Data distribution concept

The data of DM MPP system is distributed in each EP node and supports the hash distribution, random distribution, copy distribution, range distribution and LIST distribution types of table data. Users can select the appropriate distribution type for table data according to the actual situation of the application.
1. Hash distribution
Hash distribution calculates a hash value for the data row according to one or more columns specified in the table definition, and then distributes the row data to the mapped nodes according to the hash value and hash mapping table.
When the join key used in the join query of the table is a hash distribution column, the query plan under MPP will be optimized, such as reducing the number of communication operators in the plan, using indexes, optimizing grouping plans, etc., so as to reduce the distribution of data among nodes and improve the query efficiency.
When using hash distribution, whether the data between nodes is balanced depends on the set hash distribution column and the data in the table.
When the number of nodes changes, the data of each node needs to be redistributed according to the new hash mapping table.

2. Random distribution
There is no distribution column in the random distribution table. When inserting table data, the data will be randomly and evenly distributed to each node according to a certain random algorithm.
The advantage of random distribution is that there is no mapping relationship between data and nodes. After the number of nodes is changed, if there is no requirement for node data balance, there is no need to change the existing data of the node.
Generally speaking, the performance of random distribution is not as good as hash distribution for complex queries and more data distribution between nodes.

3. Replication distribution
The local data of the replication distribution table on each node is a complete copy. When querying the data of the table, it can be completed separately on any node without obtaining data from other nodes.
Replication distribution is generally used for tables with a small amount of data.

4. Range distribution
Range distribution according to the column value range distribution items of one or more columns specified in the table definition, determines which corresponding EP of MPP to store a row of data.

5. LIST distribution
LIST distribution determines which corresponding EP of MPP to store a row of data by specifying the discrete value set of one or more columns in the table. This distribution is used when the column values in the table can be enumerated (enumeration).

11. EP data distribution and zoning table

DM MPP also supports the combination of data distribution and partition table to realize "data distribution and then partition".
After the data is distributed to each node, the query performance can be further improved by partitioning the data in a single node.
The type of distribution and the type of partition can be mixed and matched. For example, establish the range horizontal partition table of the hash distribution table.

Tags: Database

Posted by press711 on Sun, 17 Apr 2022 01:02:44 +0930