3.1 DMMPP two node environment construction
1. Preliminary planning
1.1 system planning
1.2 IP and port planning
1.3 storage planning
1.4 database initialization parameter planning
2. Install basic database software
(omitted)
PS:
The version of DM server used by each EP of DM MPP shall be consistent. Meanwhile, pay attention to the number of operating system bits, size end mode
The time zone and time setting should be consistent, otherwise unexpected errors may be caused.
3. configure environment variables
The specific configuration of environment variables is as follows:
su - dmdba vi .bash_profile source ~/.bash_profile ################################################# ### Dameng Database Environment Configuration ### export TMP=/tmp export TMP_DIR=$TMP export DM_INSTALL_TMPDIR=$TMP export DM_HOME=/DMDB/app/DM8 export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$DM_HOME/bin" export PATH=$DM_HOME/bin:$DM_HOME/tool:$PATH:$HOME/bin umask 022 ### end ###
4. Initialize database instance
There are two ways to build the Library: dbca graphical tool and dminit command line tool.
Use the dbca graphical tool to build the library here:
(create databases on MPPEP01/MPPEP02 hosts respectively)
Screenshot of MPPEP01:
Screenshot of MPPEP02:
Stop the database instance before configuring the mpp environment
service DmServiceEP01 stop service DmServiceEP02 stop
5. Edit the configuration file required for the MPP environment
5.1 configuring DM ini
MPPEP01 host database EP01 instance DM INI file
## adjustment dm.ini Partial parameters ## INSTANCE_NAME = EP01 PORT_NUM = 5236 MAL_INI = 1 MPP_INI = 1
MPPEP02 host database EP02 instance DM INI file
## adjustment dm.ini Partial parameters ## INSTANCE_NAME = EP02 PORT_NUM = 5236 MAL_INI = 1 MPP_INI = 1
5.2 configuring dmmal ini
Dmmal between all EP nodes Ini communication system configuration file is the same.
[MAL_INST1] MAL_INST_NAME = EP01 MAL_HOST = 10.10.10.11 MAL_PORT = 5336 MAL_INST_HOST = 192.168.0.11 MAL_INST_PORT = 5236 [MAL_INST2] MAL_INST_NAME = EP02 MAL_HOST = 10.10.10.12 MAL_PORT = 5336 MAL_INST_HOST = 192.168.0.12 MAL_INST_PORT = 5236
Same dmmal Ini to MPPEP02 data directory
[dmdba@MPPEP01 DMDB]$ scp dmmal.ini 192.168.0.12:/DMDB/data/DMDB/ dmdba@192.168.0.12's password: dmmal.ini 100% 245 0.2KB/s 00:00 [dmdba@MPPEP01 DMDB]$
5.3 prepare dmmpp Ini configuration file (dmpp.ctl)
dmmpp.ctl is a binary file, which cannot be configured directly by the user. You need to configure dmmpp first ini.
# dmmpp.ini [SERVICE_NAME1] MPP_SEQ_NO = 0 MPP_INST_NAME = EP01 [SERVICE_NAME2] MPP_SEQ_NO = 1 MPP_INST_NAME = EP02
Use the dmctlcvt tool to convert dmmpp Ini to dmmpp CTL control file, and then transfer this file to each other at each node to ensure that all nodes apply the same dmmpp ctl.
dmctlcvt TYPE=2 SRC=/DMDB/data/DMDB/dmmpp.ini DEST=/DMDB/data/DMDB/dmmpp.ctl scp /DMDB/data/DMDB/dmmpp.ctl 192.168.0.12:/DMDB/data/DMDB [dmdba@MPPEP01 DMDB]$ ls -l dmmpp* -rw-r--r-- 1 dmdba dinstall 41540 12 April 15:01 dmmpp.ctl -rw-r--r-- 1 dmdba dinstall 106 12 April 15:00 dmmpp.ini [dmdba@MPPEP01 DMDB]$ scp /DMDB/data/DMDB/dmmpp.ctl 192.168.0.12:/DMDB/data/DMDB dmdba@192.168.0.12's password: dmmpp.ctl 100% 41KB 40.6KB/s 00:00
6. Run MPP
After the MPP front-end environment is configured, start the database instances EP01/EP02 respectively (regardless of the sequence of instance startup).
### PS: the user logs in to any EP node (also known as the master node) for normal database operation. chkconfig --list |grep ^Dm service DmServiceEP01 start service DmServiceEP02 start
7. Create distribution table test
Global connection login mpp (default) disql SYSDBA/SYSDBAglobal@localhost:5236
Log in to MPP disql SYSDBA through local connection/ SYSDBAlocal@localhost :5236
dm_svc.conf configuration
TIME_ZONE=(+8:00) LANGUAGE=(en) mppsvc=(192.168.0.11:5236,192.168.0.12:5236) LOGIN_MODE=(1)
7.1 creating tablespaces
## DDL must be executed in mpp global mode create tablespace cqsoftbs datafile '/DMDB/data/DMDB/cqsoftbs01.DBF' size 64;
7.2 creating users
create user cqsoft identified by "cqsoft_123" diskspace unlimited default tablespace cqsoftbs;
7.3 authorized users
grant resource to cqsoft;
7.4 create hash distribution table
conn cqsoft/"cqsoft_123" create table tab_hash(a1 int, a2 char(10)) distributed by hash(a1);
7.5 creating a random distribution table
create table tab_random(a1 int, a2 char(10)) distributed randomly;
7.6 create range distribution table
create table tab_range(a1 int, a2 char(10)) distributed by range(a1) ( values equ or less than (100) on EP01, values equ or less than (maxvalue) on EP02 );
7.7 create LIST distribution table
create table tab_list (a1 int, a2 char(10)) distributed by list(a2) ( values ('male') on EP01, values (default) on EP02 ); -- The instance distribution in the distribution table cannot be specified repeatedly. values ('female') on EP02, values (default) on EP02 error[-2212]:The distribution table specifies the same instance name[EP02].
7.8 create replication distribution table
The data content of this table on each EP node is exactly the same. There is no need to get data from other nodes.
create table tab_replicate (a1 int, a2 char(10)) distributed fully;
7.9 range and horizontal partition table of hash distribution table
create table tab_hash_range_part (a1 int, a2 char(10), a3 char(10)) partition by range (a1) ( partition p1 values less than (100), partition p2 values less than (200), partition p3 values less than (maxvalue) ) distributed by hash (a1);
8. Fast data loading dmfldr
Prepare data text file tab_list_distr.txt
1,in 2,male 3,male 4,male 5,male 6,male 7,female 8,female 9,female 10,female 11,female 12,female 13,female 14,female 15,female 16,female 17,female 18,female 19,male 20,male
dmfldr control filetab_ list_ distr.ctl
Note: you'd better check the code of jukeng data file first.
OPTIONS ( CHARACTER_CODE = 'UTF-8' ) LOAD DATA INFILE '/DMDB/soft/tab_list_distr.txt' BADFILE '/DMDB/soft/fldr.bad' INTO TABLE tab_list FIELDS ',' (A1,A2)
dmfldr load data
su - dmdba dmfldr cqsoft/"cqsoft_123"@localhost:5236 control=\'/DMDB/soft/tab_list_distr.ctl\'
Global mode query table content:
Local mode query table content:
PS: from the above query results, we know that mpp uses the benefits of distributed storage and distributed computing to improve the efficiency of parallel processing of the system.
9. Stop MPP
***Precautions***
If an EP fails and stops during the operation of DM MPP system, the whole MPP system will be in a state that DM large-scale parallel processing MPP can not normally serve.
At present, all user sessions will be disconnected by the system. Global login is not allowed, but local login is only allowed. Therefore, in order to ensure the high availability of MPP system, we strongly recommend using DM MPP
Deployment scheme combined with data guard.
#Stopping mpp is the same as the normal stopping of database instances, regardless of order. That is, stop the database instance of each EP. service DmServiceEP01 stop service DmServiceEP02 stop
10. Data distribution concept
The data of DM MPP system is distributed in each EP node and supports the hash distribution, random distribution, copy distribution, range distribution and LIST distribution types of table data. Users can select the appropriate distribution type for table data according to the actual situation of the application.
1. Hash distribution
Hash distribution calculates a hash value for the data row according to one or more columns specified in the table definition, and then distributes the row data to the mapped nodes according to the hash value and hash mapping table.
When the join key used in the join query of the table is a hash distribution column, the query plan under MPP will be optimized, such as reducing the number of communication operators in the plan, using indexes, optimizing grouping plans, etc., so as to reduce the distribution of data among nodes and improve the query efficiency.
When using hash distribution, whether the data between nodes is balanced depends on the set hash distribution column and the data in the table.
When the number of nodes changes, the data of each node needs to be redistributed according to the new hash mapping table.
2. Random distribution
There is no distribution column in the random distribution table. When inserting table data, the data will be randomly and evenly distributed to each node according to a certain random algorithm.
The advantage of random distribution is that there is no mapping relationship between data and nodes. After the number of nodes is changed, if there is no requirement for node data balance, there is no need to change the existing data of the node.
Generally speaking, the performance of random distribution is not as good as hash distribution for complex queries and more data distribution between nodes.
3. Replication distribution
The local data of the replication distribution table on each node is a complete copy. When querying the data of the table, it can be completed separately on any node without obtaining data from other nodes.
Replication distribution is generally used for tables with a small amount of data.
4. Range distribution
Range distribution according to the column value range distribution items of one or more columns specified in the table definition, determines which corresponding EP of MPP to store a row of data.
5. LIST distribution
LIST distribution determines which corresponding EP of MPP to store a row of data by specifying the discrete value set of one or more columns in the table. This distribution is used when the column values in the table can be enumerated (enumeration).
11. EP data distribution and zoning table
DM MPP also supports the combination of data distribution and partition table to realize "data distribution and then partition".
After the data is distributed to each node, the query performance can be further improved by partitioning the data in a single node.
The type of distribution and the type of partition can be mixed and matched. For example, establish the range horizontal partition table of the hash distribution table.