Database and table of distributed technology

Data segmentation

In short, it is to disperse the data previously stored in one database into multiple databases through certain conditions, so as to reduce the load of a single database. Data segmentation, according to its segmentation rules, can be roughly divided into two types, vertical segmentation and horizontal segmentation.

vertical partitioning

Is to divide into different databases according to different tables or schemas.

advantage:

  • Clear business and clear splitting rules after splitting;
  • Easy expansion and integration between systems;
  • Simple data maintenance

Disadvantages:

  • Some business tables cannot be join ed and can only be called through the interface, which increases the complexity of the system;
  • Cross database transactions are difficult to handle;
  • After vertical segmentation, some business data is too large, and there is still a single performance bottleneck

horizontal partitioning

Split the data in a table into different databases according to certain rules

advantage:

  • Solve the performance bottleneck of single database, big data and high concurrency;
  • The splitting rules are well encapsulated and almost transparent to the application end, so developers do not need to care about the splitting details;
  • The stability and load capacity of the system are improved;

Disadvantages:

  • Splitting rules are difficult to abstract;
  • The consistency of fragmented transactions is difficult to solve;
  • During secondary expansion, data migration and maintenance are difficult.

summary

Common disadvantages are:

  • Distributed transaction problem;
  • Cross database join problem;
  • Management of multiple data sources

There are two main ideas for the management of multiple data sources:

1. Client mode: in each application module, configure the data source you need, directly access the database, and complete the data integration in each module; Client mode: sharding JDBC

2. Intermediate agent mode: the intermediate agent manages all data sources uniformly. The database layer is completely transparent to the developers, and the developers do not need to pay attention to the details of splitting. Intermediate agent mode: MyCat

Database read-write separation

Disadvantages of read-write separation

When the synchronization fails or the synchronization delay is relatively large, the data of the write library and the read library are inconsistent. Can the user accept the inconsistency of this data.

 

Overview of MyCat

  • It is an open source distributed database system. Front end users can regard it as a database agent.
  • The core function of MyCat is to divide databases and tables, that is, a large table is horizontally divided into N small tables, which are then stored in the back-end MySql data
  • The backend supports MySql, Oracle, SQLs server, DB2 and other mainstream databases, as well as NoSql databases such as MongoDB
  • It can not only be used for read-write separation, database and table separation, but also for disaster recovery backup, cloud platform construction, etc., so that your architecture has strong adaptability and flexibility

Application scenario of MyCat

  • Simple read-write separation. At this time, the configuration is the simplest. It supports read-write separation and master-slave switching;
  • Sub database and sub table: for tables with more than 1000w, it can support 100 billion data at most;
  • Multi tenant applications, each application has a database, the application is only connected to MyCat, and the program itself does not need to be modified;
  • Replace HBase to analyze big data

Basic concepts in MyCat

Logical library (Schema)

In the actual development, developers do not need to know the existence of database middleware. Developers only need to have the concept of database.

Therefore, database middleware can be regarded as a logical library composed of one or more database clusters

Logical table

For an application system, a table that reads and writes data is a logical table. The data in the logical table is divided horizontally and distributed in different fragment libraries

dataNode

After the data is segmented, a large table is divided into different sharding databases. The database where each sharding table is located is called a sharding node

Node host (dataHost)

After data segmentation, each sharding node does not necessarily occupy a real physical host. There will be multiple sharding nodes on the same physical host. The host of these sharding nodes is called the node host

Fragment rule

According to a certain business logic, divide the data into a certain partition. This rule is called the partition rule

Global sequence

The system ensures the unique identification of data. This mechanism to ensure the unique identification of data is called global serial number.

Configuration of mycat

server.xml

<?xml version="1.0" encoding="UTF-8"?>
<!-- - - Licensed under the Apache License, Version 2.0 (the "License"); 
	- you may not use this file except in compliance with the License. - You 
	may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 
	- - Unless required by applicable law or agreed to in writing, software - 
	distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT 
	WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the 
	License for the specific language governing permissions and - limitations 
	under the License. -->
<!DOCTYPE mycat:server SYSTEM "server.dtd">
<mycat:server xmlns:mycat="http://io.mycat/">
	<system>
	<property name="nonePasswordLogin">0</property> <!-- 0 For login with password, 1 for login without password ,The default is 0. If it is set to 1, you need to specify the default account-->
	<property name="useHandshakeV10">1</property>
	<property name="useSqlStat">0</property>  <!-- 1 To enable real-time statistics, 0 to disable -->
	<property name="useGlobleTableCheck">0</property>  <!-- 1 To enable all overtime consistency detection and 0 to close -->
		<property name="sqlExecuteTimeout">300</property>  <!-- SQL Execution timeout unit:second-->
		<property name="sequnceHandlerType">2</property>
		<!--<property name="sequnceHandlerPattern">(?:(\s*next\s+value\s+for\s*MYCATSEQ_(\w+))(,|\)|\s)*)+</property>-->
		<!--Must have MYCATSEQ_perhaps mycatseq_Enter the sequence matching process Note MYCATSEQ_With spaces-->
		<property name="sequnceHandlerPattern">(?:(\s*next\s+value\s+for\s*MYCATSEQ_(\w+))(,|\)|\s)*)+</property>
	<property name="subqueryRelationshipCheck">false</property> <!-- When there is an associated query in the sub query,Check whether there is a fragment field in the associated field .default false -->
      <!--  <property name="useCompression">1</property>--> <!--1 For on mysql Compression protocol-->
        <!--  <property name="fakeMySQLVersion">5.6.20</property>--> <!--Set simulated MySQL Version number-->
	<!-- <property name="processorBufferChunk">40960</property> -->
	<!-- 
	<property name="processors">1</property> 
	<property name="processorExecutor">32</property> 
	 -->
        <!--Default to type 0: DirectByteBufferPool | type 1 ByteBufferArena | type 2 NettyBufferPool -->
		<property name="processorBufferPoolType">0</property>
		<!--The default is 65535 64 K be used for sql Maximum text length during parsing -->
		<!--<property name="maxStringLiteralLength">65535</property>-->
		<!--<property name="sequnceHandlerType">0</property>-->
		<!--<property name="backSocketNoDelay">1</property>-->
		<!--<property name="frontSocketNoDelay">1</property>-->
		<!--<property name="processorExecutor">16</property>-->
		<!--
			<property name="serverPort">8066</property> <property name="managerPort">9066</property> 
			<property name="idleTimeout">300000</property> <property name="bindIp">0.0.0.0</property>
			<property name="dataNodeIdleCheckPeriod">300000</property> 5 * 60 * 1000L; //Connection idle check
			<property name="frontWriteQueueSize">4096</property> <property name="processors">32</property> -->
		<!--Distributed transaction switch: 0 means not filtering distributed transactions, 1 means filtering distributed transactions (if only global tables are involved in distributed transactions, then not filtering), and 2 means not filtering distributed transactions,However, distributed transaction logs are recorded-->
		<property name="handleDistributedTransactions">0</property>
		
			<!--
			off heap for merge/order/group/limit      1 On 0 off
		-->
		<property name="useOffHeapForMerge">0</property>

		<!--
			The unit is m
		-->
        <property name="memoryPageSize">64k</property>

		<!--
			The unit is k
		-->
		<property name="spillsFileBufferSize">1k</property>

		<property name="useStreamOutput">0</property>

		<!--
			The unit is m
		-->
		<property name="systemReserveMemorySize">384m</property>


		<!--Whether to adopt zookeeper Coordinated switching  -->
		<property name="useZKSwitch">false</property>

		<!-- XA Recovery Log Log path -->
		<!--<property name="XARecoveryLogBaseDir">./</property>-->

		<!-- XA Recovery Log Log name -->
		<!--<property name="XARecoveryLogBaseName">tmlog</property>-->
		<!--If yes true Strictly observe the isolation level if necessary,Not just select Statement to switch connections in a transaction-->
		<property name="strictTxIsolation">false</property>
		
		<property name="useZKSwitch">true</property>
		
	</system>
	
	<!-- overall situation SQL Firewall settings -->
	<!--Wildcards can be used in the white list%Or*-->
	<!--for example<host host="127.0.0.*" user="root"/>-->
	<!--for example<host host="127.0.*" user="root"/>-->
	<!--for example<host host="127.*" user="root"/>-->
	<!--for example<host host="1*7.*" user="root"/>-->
	<!--In these configurations, for 127.0.0.1 Can be root Account login-->
	<!--
	<firewall>
	   <whitehost>
	      <host host="1*7.0.0.*" user="root"/>
	   </whitehost>
       <blacklist check="false">
       </blacklist>
	</firewall>
	-->

	<user name="root" defaultAccount="true">
		<property name="password">root</property>
		<property name="schemas">foodie-shop-dev</property>
		
		<!-- Table level DML Permission setting -->
		<!-- 		
		<privileges check="false">
			<schema name="TESTDB" dml="0110" >
				<table name="tb01" dml="0000"></table>
				<table name="tb02" dml="1111"></table>
			</schema>
		</privileges>		
		 -->
	</user>

	<user name="user">
		<property name="password">user</property>
		<property name="schemas">foodie-shop-dev</property>
		<property name="readOnly">true</property>
	</user>

</mycat:server>

schema.xml

<?xml version="1.0"?>
<!DOCTYPE mycat:schema SYSTEM "schema.dtd">
<mycat:schema xmlns:mycat="http://io.mycat/">

	<schema name="foodie-shop-dev" checkSQLschema="false" sqlMaxLimit="100" dataNode="dn131">
		<!-- auto sharding by id (long) 
		<table name="user" dataNode="dn131,dn132" rule="mod-long" />
		<table name="province" dataNode="dn131,dn132" type="global"/>
		<table name="o_order" autoIncrement="true" primaryKey="id"  dataNode="dn131,dn132" rule="mod-long">
			<childTable name="order_item" joinKey="order_id" parentKey="id"/>
		</table>
		-->

		<!-- <table name="oc_call" primaryKey="ID" dataNode="dn1$0-743" rule="latest-month-calldate"
			/> -->
		<table name="orders" dataNode="dn131,dn132" rule="sharding-by-murmur">
			<childTable name="order_items" joinKey="order_id" parentKey="id"/>
			<childTable name="order_status" joinKey="order_id" parentKey="id" />
		</table>
	</schema>
	<dataNode name="dn131" dataHost="db131" database="foodie-shop-dev" />
	<dataNode name="dn132" dataHost="db132" database="foodie-shop-dev" />
	<dataHost name="db131" maxCon="1000" minCon="10" balance="0"
			  writeType="0" dbType="mysql" dbDriver="native" switchType="1"  slaveThreshold="100">
		<heartbeat>select user()</heartbeat>
		<writeHost host="M1" url="192.168.73.131:3306" user="imooc"
				   password="Imooc@123456">
		</writeHost>
	</dataHost>
	<dataHost name="db132" maxCon="1000" minCon="10" balance="0"
                          writeType="0" dbType="mysql" dbDriver="native" switchType="1"  slaveThreshold="100">
                <heartbeat>select user()</heartbeat>
                <writeHost host="M1" url="192.168.73.132:3306" user="imooc"
                                   password="Imooc@123456">
                </writeHost>
        </dataHost>
</mycat:schema>

MySQL master-slave configuration

Master node configuration

log-bin,Specifies the name of the file

to configure server-id,The default is 1 

Primary create backup account REPLACTION SLAVE
   grant replaction slave on *.* to 'your username'@'%';

Master lock table FLUSH TABLES WITH READ LOCK

Master found bin-log Location of
  show master status;
Primary backup data
  mysqldump --alldatabases --master-data >dbdump.db

Slave node configuration

to configure server-id,The default is 1 

change master to 
master_host='123.123.123.123',
master_user='repl',
mastet_password='123',
master_logfile='file.000001',
master_log_pos=1460;

start slave;

Global tables and sub tables of mycat

HA of mycat

Sharding jdbc

 

Posted by varsha on Mon, 18 Apr 2022 04:37:49 +0930