【Demo】Syncing Data From MySQL to Doris Using SeaTunnel

Published in

Dev Genius

4 min readMay 7, 2024

Editor | Debra Chen

As data technology rapidly advances, understanding and mastering various tools and techniques becomes increasingly important. To this end, we are preparing to launch a Demo demonstration plan on the Apache SeaTunnel community on how to use connectors, inviting you all who are passionate about data synchronization technology to share your knowledge and practical experience!

In this issue, we have invited Mr. Gao Jun, a PMC member of the community, to record a demo tutorial in the video which themed How to sync data from MySQL to Doris. If you are interested in this plan, feel free to email me(xiyan@whaleops.com) to participate in the Demo recording! Whether you are a data engineer, developer, or technology enthusiast, you are welcome to showcase your technical talents.

Highlight~Highlight~ If you are a user and want to see synchronization scenarios Demo, please scroll down to the bottom and leave a comment, and we will prioritize producing the Demo with the highest demand for synchronization scenarios!

Demo Goals

Our goal is to create a platform for sharing and learning, helping community members better understand and apply various data connectors through specific Demo demonstrations and corresponding documentation. These demos can help beginners learn quickly and also provide a stage for senior experts to showcase innovative solutions.

MySQL to Doris

JDBC MySQL Connector

Description

Read data from external data sources via JDBC.

Supported MySQL Versions

5.5/5.6/5.7/8.0

Supported Engines

Spark
Flink
SeaTunnel Zeta

Dependencies

Spark/Flink engine

You need to ensure that the jdbc driver jar is placed in the ${SEATUNNEL_HOME}/plugins/ directory.

SeaTunnel Zeta engine

You need to ensure that the jdbc driver jar is placed in the ${SEATUNNEL_HOME}/lib/ directory.

Key Features of MySQL Connector

Batch processing
Stream processing
Exactly once
Column projection
Parallel processing
Support for user-defined sharding
Support for reading from multiple tables

Supports query SQL and can achieve projection effect.

Supported Data Source Information

Database Dependencies

Please download the support list corresponding to ‘Maven’ and copy it to ‘SEATNUNNEL_HOME/plugins/jdbc/lib/’.

Data Type Mapping

Source Options

Tips

If partition_column is not set, it will run in single concurrency, and if partition_column is set, it will be executed in parallel according to the concurrency of tasks.

Task Example

Simple:

This example queries type_bin ‘table’ 16 data in your test “database” in single parallel and queries all of its fields. You can also specify which fields to query for final output to the console.

# Defining the runtime environment
env {
  # You can set flink configuration here
  execution.parallelism = 2
  job.mode = "BATCH"
}
source{
    Jdbc {
        url = "jdbc:mysql://localhost:3306/test?serverTimezone=GMT%2b8"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "123456"
        query = "select * from type_bin limit 16"
    }
}

transform {
    # If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
    # please go to https://seatunnel.apache.org/docs/transform-v2/sql
}sink {
    Console {}
}

Parallel:

Read your query table in parallel with the shard field you configured and the shard data You can do this if you want to read the whole table

source {
    Jdbc {
        url = "jdbc:mysql://localhost:3306/test?serverTimezone=GMT%2b8"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "123456"
        # Define query logic as required
        query = "select * from type_bin"
        # Parallel sharding reads fields
        partition_column = "id"
        # Number of fragments
        partition_num = 10
    }
}

Parallel Boundary:

It is more efficient to specify the data within the upper and lower bounds of the query It is more efficient to read your data source according to the upper and lower boundaries you configured

source {
    Jdbc {
        url = "jdbc:mysql://localhost:3306/test?serverTimezone=GMT%2b8"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "123456"
        # Define query logic as required
        query = "select * from type_bin"
        partition_column = "id"
        # Read start boundary
        partition_lower_bound = 1
        # Read end boundary
        partition_upper_bound = 500
        partition_num = 10
    }
}

Above is the code and documentation explanation shown in the video. You can combine it with the explanations in the video for hands-on practice. If you are interested in recording, please continue reading below.

How to submit your Demo recording

Participating in the Demo recording is simple!

First, choose a connector you are familiar with and prepare a 5 to 10-minute video demonstration. Make sure your video clearly shows how to configure and use the connector, what problems it solves, and any possible best practices.

Once you have your demo video ready, you can email me via debra.chen12@gmail.com. Remember to include relevant descriptive information, such as the connector you’re demonstrating and an overview of the video content.

That’s how you can participate in the Demo recording, take action now!