Data segmentation
In short, it is to disperse the data previously stored in one database into multiple databases through certain conditions, so as to reduce the load of a single database. Data segmentation, according to its segmentation rules, can be roughly divided into two types, vertical segmentation and horizontal segmentation.
vertical partitioning
Is to divide into different databases according to different tables or schemas.
advantage:
- Clear business and clear splitting rules after splitting;
- Easy expansion and integration between systems;
- Simple data maintenance
Disadvantages:
- Some business tables cannot be join ed and can only be called through the interface, which increases the complexity of the system;
- Cross database transactions are difficult to handle;
- After vertical segmentation, some business data is too large, and there is still a single performance bottleneck
horizontal partitioning
Split the data in a table into different databases according to certain rules
advantage:
- Solve the performance bottleneck of single database, big data and high concurrency;
- The splitting rules are well encapsulated and almost transparent to the application end, so developers do not need to care about the splitting details;
- The stability and load capacity of the system are improved;
Disadvantages:
- Splitting rules are difficult to abstract;
- The consistency of fragmented transactions is difficult to solve;
- During secondary expansion, data migration and maintenance are difficult.
summary
Common disadvantages are:
- Distributed transaction problem;
- Cross database join problem;
- Management of multiple data sources
There are two main ideas for the management of multiple data sources:
1. Client mode: in each application module, configure the data source you need, directly access the database, and complete the data integration in each module; Client mode: sharding JDBC
2. Intermediate agent mode: the intermediate agent manages all data sources uniformly. The database layer is completely transparent to the developers, and the developers do not need to pay attention to the details of splitting. Intermediate agent mode: MyCat
Database read-write separation
Disadvantages of read-write separation
When the synchronization fails or the synchronization delay is relatively large, the data of the write library and the read library are inconsistent. Can the user accept the inconsistency of this data.
Overview of MyCat
- It is an open source distributed database system. Front end users can regard it as a database agent.
- The core function of MyCat is to divide databases and tables, that is, a large table is horizontally divided into N small tables, which are then stored in the back-end MySql data
- The backend supports MySql, Oracle, SQLs server, DB2 and other mainstream databases, as well as NoSql databases such as MongoDB
- It can not only be used for read-write separation, database and table separation, but also for disaster recovery backup, cloud platform construction, etc., so that your architecture has strong adaptability and flexibility
Application scenario of MyCat
- Simple read-write separation. At this time, the configuration is the simplest. It supports read-write separation and master-slave switching;
- Sub database and sub table: for tables with more than 1000w, it can support 100 billion data at most;
- Multi tenant applications, each application has a database, the application is only connected to MyCat, and the program itself does not need to be modified;
- Replace HBase to analyze big data
Basic concepts in MyCat
Logical library (Schema)
In the actual development, developers do not need to know the existence of database middleware. Developers only need to have the concept of database.
Therefore, database middleware can be regarded as a logical library composed of one or more database clusters
Logical table
For an application system, a table that reads and writes data is a logical table. The data in the logical table is divided horizontally and distributed in different fragment libraries
dataNode
After the data is segmented, a large table is divided into different sharding databases. The database where each sharding table is located is called a sharding node
Node host (dataHost)
After data segmentation, each sharding node does not necessarily occupy a real physical host. There will be multiple sharding nodes on the same physical host. The host of these sharding nodes is called the node host
Fragment rule
According to a certain business logic, divide the data into a certain partition. This rule is called the partition rule
Global sequence
The system ensures the unique identification of data. This mechanism to ensure the unique identification of data is called global serial number.
Configuration of mycat
server.xml
<?xml version="1.0" encoding="UTF-8"?> <!-- - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. --> <!DOCTYPE mycat:server SYSTEM "server.dtd"> <mycat:server xmlns:mycat="http://io.mycat/"> <system> <property name="nonePasswordLogin">0</property> <!-- 0 For login with password, 1 for login without password ,The default is 0. If it is set to 1, you need to specify the default account--> <property name="useHandshakeV10">1</property> <property name="useSqlStat">0</property> <!-- 1 To enable real-time statistics, 0 to disable --> <property name="useGlobleTableCheck">0</property> <!-- 1 To enable all overtime consistency detection and 0 to close --> <property name="sqlExecuteTimeout">300</property> <!-- SQL Execution timeout unit:second--> <property name="sequnceHandlerType">2</property> <!--<property name="sequnceHandlerPattern">(?:(\s*next\s+value\s+for\s*MYCATSEQ_(\w+))(,|\)|\s)*)+</property>--> <!--Must have MYCATSEQ_perhaps mycatseq_Enter the sequence matching process Note MYCATSEQ_With spaces--> <property name="sequnceHandlerPattern">(?:(\s*next\s+value\s+for\s*MYCATSEQ_(\w+))(,|\)|\s)*)+</property> <property name="subqueryRelationshipCheck">false</property> <!-- When there is an associated query in the sub query,Check whether there is a fragment field in the associated field .default false --> <!-- <property name="useCompression">1</property>--> <!--1 For on mysql Compression protocol--> <!-- <property name="fakeMySQLVersion">5.6.20</property>--> <!--Set simulated MySQL Version number--> <!-- <property name="processorBufferChunk">40960</property> --> <!-- <property name="processors">1</property> <property name="processorExecutor">32</property> --> <!--Default to type 0: DirectByteBufferPool | type 1 ByteBufferArena | type 2 NettyBufferPool --> <property name="processorBufferPoolType">0</property> <!--The default is 65535 64 K be used for sql Maximum text length during parsing --> <!--<property name="maxStringLiteralLength">65535</property>--> <!--<property name="sequnceHandlerType">0</property>--> <!--<property name="backSocketNoDelay">1</property>--> <!--<property name="frontSocketNoDelay">1</property>--> <!--<property name="processorExecutor">16</property>--> <!-- <property name="serverPort">8066</property> <property name="managerPort">9066</property> <property name="idleTimeout">300000</property> <property name="bindIp">0.0.0.0</property> <property name="dataNodeIdleCheckPeriod">300000</property> 5 * 60 * 1000L; //Connection idle check <property name="frontWriteQueueSize">4096</property> <property name="processors">32</property> --> <!--Distributed transaction switch: 0 means not filtering distributed transactions, 1 means filtering distributed transactions (if only global tables are involved in distributed transactions, then not filtering), and 2 means not filtering distributed transactions,However, distributed transaction logs are recorded--> <property name="handleDistributedTransactions">0</property> <!-- off heap for merge/order/group/limit 1 On 0 off --> <property name="useOffHeapForMerge">0</property> <!-- The unit is m --> <property name="memoryPageSize">64k</property> <!-- The unit is k --> <property name="spillsFileBufferSize">1k</property> <property name="useStreamOutput">0</property> <!-- The unit is m --> <property name="systemReserveMemorySize">384m</property> <!--Whether to adopt zookeeper Coordinated switching --> <property name="useZKSwitch">false</property> <!-- XA Recovery Log Log path --> <!--<property name="XARecoveryLogBaseDir">./</property>--> <!-- XA Recovery Log Log name --> <!--<property name="XARecoveryLogBaseName">tmlog</property>--> <!--If yes true Strictly observe the isolation level if necessary,Not just select Statement to switch connections in a transaction--> <property name="strictTxIsolation">false</property> <property name="useZKSwitch">true</property> </system> <!-- overall situation SQL Firewall settings --> <!--Wildcards can be used in the white list%Or*--> <!--for example<host host="127.0.0.*" user="root"/>--> <!--for example<host host="127.0.*" user="root"/>--> <!--for example<host host="127.*" user="root"/>--> <!--for example<host host="1*7.*" user="root"/>--> <!--In these configurations, for 127.0.0.1 Can be root Account login--> <!-- <firewall> <whitehost> <host host="1*7.0.0.*" user="root"/> </whitehost> <blacklist check="false"> </blacklist> </firewall> --> <user name="root" defaultAccount="true"> <property name="password">root</property> <property name="schemas">foodie-shop-dev</property> <!-- Table level DML Permission setting --> <!-- <privileges check="false"> <schema name="TESTDB" dml="0110" > <table name="tb01" dml="0000"></table> <table name="tb02" dml="1111"></table> </schema> </privileges> --> </user> <user name="user"> <property name="password">user</property> <property name="schemas">foodie-shop-dev</property> <property name="readOnly">true</property> </user> </mycat:server>
schema.xml
<?xml version="1.0"?> <!DOCTYPE mycat:schema SYSTEM "schema.dtd"> <mycat:schema xmlns:mycat="http://io.mycat/"> <schema name="foodie-shop-dev" checkSQLschema="false" sqlMaxLimit="100" dataNode="dn131"> <!-- auto sharding by id (long) <table name="user" dataNode="dn131,dn132" rule="mod-long" /> <table name="province" dataNode="dn131,dn132" type="global"/> <table name="o_order" autoIncrement="true" primaryKey="id" dataNode="dn131,dn132" rule="mod-long"> <childTable name="order_item" joinKey="order_id" parentKey="id"/> </table> --> <!-- <table name="oc_call" primaryKey="ID" dataNode="dn1$0-743" rule="latest-month-calldate" /> --> <table name="orders" dataNode="dn131,dn132" rule="sharding-by-murmur"> <childTable name="order_items" joinKey="order_id" parentKey="id"/> <childTable name="order_status" joinKey="order_id" parentKey="id" /> </table> </schema> <dataNode name="dn131" dataHost="db131" database="foodie-shop-dev" /> <dataNode name="dn132" dataHost="db132" database="foodie-shop-dev" /> <dataHost name="db131" maxCon="1000" minCon="10" balance="0" writeType="0" dbType="mysql" dbDriver="native" switchType="1" slaveThreshold="100"> <heartbeat>select user()</heartbeat> <writeHost host="M1" url="192.168.73.131:3306" user="imooc" password="Imooc@123456"> </writeHost> </dataHost> <dataHost name="db132" maxCon="1000" minCon="10" balance="0" writeType="0" dbType="mysql" dbDriver="native" switchType="1" slaveThreshold="100"> <heartbeat>select user()</heartbeat> <writeHost host="M1" url="192.168.73.132:3306" user="imooc" password="Imooc@123456"> </writeHost> </dataHost> </mycat:schema>
MySQL master-slave configuration
Master node configuration
log-bin,Specifies the name of the file to configure server-id,The default is 1 Primary create backup account REPLACTION SLAVE grant replaction slave on *.* to 'your username'@'%'; Master lock table FLUSH TABLES WITH READ LOCK Master found bin-log Location of show master status; Primary backup data mysqldump --alldatabases --master-data >dbdump.db
Slave node configuration
to configure server-id,The default is 1 change master to master_host='123.123.123.123', master_user='repl', mastet_password='123', master_logfile='file.000001', master_log_pos=1460; start slave;