tencent cloud

Stream Compute Service

Releases Notes and Announcements
Release Notes
Product Introduction
Overview
Strengths
Use Cases
Purchase Guide
Billing Overview
Billing Mode
Refund
Configuration Adjustments
Getting Started
Preparations
Creating a Private Cluster
Creating a SQL Job
Creating a JAR Job
Creating an ETL Job
Creating a Python Job
Operation Guide
Managing Jobs
Developing Jobs
Monitoring Jobs
Job Logs
Events and Diagnosis
Managing Metadata
Managing Checkpoints
Tuning Jobs
Managing Dependencies
Managing Clusters
Managing Permissions
SQL Developer Guide
Overview
Glossary and Data Types
DDL Statements
DML Statements
Merging MySQL CDC Sources
Connectors
SET Statement
Operators and Built-in Functions
Identifiers and Reserved Words
Python Developer Guide
ETL Developer Guide
Overview
Glossary
Connectors
FAQ
Contact Us

Merging MySQL CDC Sources

PDF
포커스 모드
폰트 크기
마지막 업데이트 시간: 2023-11-08 15:58:10
When syncing data, Flink CDC Connector will establish a database connection for each table. This significantly increases the load on the database instance when data is synced among multiple tables or across the database. For this, Stream Compute Service introduced the source merging capability.

Overview

Take the following job for example:
CREATE TABLE `source_1`
(
`f_sequence` INT,
`f_random` INT,
`f_random_str` VARCHAR,
PRIMARY KEY (`f_sequence`) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc' ,
'hostname' = 'ip1',
'port' = '3306',
'username' = 'xxx',
'password' = 'xxx',
'database-name' = 'db1',
'table-name' = 'source_1'
);

CREATE TABLE `source_2`
(
`f_sequence` INT,
`f_random` INT,
`f_random_1` INT,
`f_random_str` VARCHAR,
`f_random_str_1` VARCHAR,
PRIMARY KEY (`f_sequence`) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc' ,
'hostname' = 'ip1',
'port' = '3306',
'username' = 'xxx',
'password' = 'xxx',
'database-name' = 'db2',
'table-name' = 'source_2'
);


CREATE TABLE `sink_1`
(
`f_sequence` INT,
`f_random` INT,
`f_random_str` VARCHAR,
PRIMARY KEY (`f_sequence`) NOT ENFORCED
) WITH (
'connector' = 'logger'
);

CREATE TABLE `sink_2`
(
`f_sequence` INT,
`f_random` INT,
`f_random_1` INT,
`f_random_str` VARCHAR,
`f_random_str_1` VARCHAR,
PRIMARY KEY (`f_sequence`) NOT ENFORCED
) WITH (
'connector' = 'logger'
);

insert into sink_1 select * from source_1;
insert into sink_2 select * from source_2;
To sync the two tables, which belong to the same database instance, Flink will generate two pipelines. Each CDC source will establish a connection with the MySQL database. When a job has many database connections, the load on the database will be high.

After source merging is enabled, multiple MySQL CDC sources of the same database instance will be merged into one source, reducing the database load. Also, when new data is synced, binlog data only needs to be read once for multiple sources.


How to enable CDC source merging

At the beginning of a SQL job, use the SET command to enable merging of MySQL CDC sources.
SET table.optimizer.mysql-cdc-source.merge.enabled=true;
SET parameters
Option
Default Value
Description
table.optimizer.mysql-cdc-source.merge.enabled
false
Whether to enable merging of MySQL sources. If it is enabled, Stream Compute Service will automatically merge MySQL CDC sources that belong to the same database in a job into one source.
table.optimizer.mysql-cdc-source.merge.default-group.splits
1
The number of sources multiple MySQL CDC sources are merged into (when merging is enabled). If there is a large number of tables, merging them all into one source may not meet performance requirements. In such cases, you can increase the value of this parameter. Stream Compute Service will group the sources as evenly as possible according to the parameter specified.
Assume that source_1, source_2, source_3, and source_4 are MySQL CDC sources from the same database instance (source and sink definitions are omitted). You can use the following SET command to configure the source merging feature.
SET table.optimizer.mysql-cdc-source.merge.enabled=true;
SET table.optimizer.mysql-cdc-source.merge.default-group.splits=2;

insert into sink_1 select * from source_1;
insert into sink_2 select * from source_2;
insert into sink_3 select * from source_3;
insert into sink_4 select * from source_4;
This will automatically divide the four CDC sources into two groups.


도움말 및 지원

문제 해결에 도움이 되었나요?

피드백