Full Link Online Stress Testing for Production Database: Apache ShardingSphere Shadow Database Feature Upgrade

Published in

Dev Genius

6 min readJan 9, 2022

What is full link stress testing?

As the Internet industry is growing rapidly, businesses that operate with large amounts of data are seeing a rapid expansion.

Predictably, ever changing customer demand is having a significant impact on the stability of their whole systems. For example, online food delivery platforms receive most of their customer orders at noon and in the evening. Online shopping sprees and time-limited sales promotions are also good examples.

All businesses are served by a series of business systems which are distributedly deployed in different machines. “Data Plannig” cannot only ensure systems’ stability but also save costs, which are some of the major problems for technology teams.

To precisely get the right service from a specific machine, stress testing should be conducted in the production environment. This can ensure the authenticity of the environment and the data, significantly improving the precision of “data planning”.

Shadow Database and Full-Link Stress Testing

Performing a stress test on an online business system is evidently risky. For example, data corruption or performance problems may arise.

Just imagine how it would compromise customer experience, if customers found their order has been lost or an unpaid order popped up?

Full-link online stress testing implies a significant amount of complicated work which requires cooperation between microservices and middlewares. Apache ShardingSphere focuses on database-level solutions in full-link stress testing.

With ShardingSphere’s powerful SQL parsing capability, Apache ShardingSphere released the shadow database stress testing feature to determine shadow databases by executing SQL, and to meet the online stress testing needs in complicated business scenarios with the flexible configuration of shadow algorithms. By routing stress testing traffic to shadow databases, and normal online traffic to production database, stress testing data will be isolated and the data corruption problem is solved.

Shadow Database Function Upgrading

Shadow database function was initially realized in version 4.1.0 by adding a logic shadow column. By parsing, executing, routing and rewriting SQL, Apache ShardingSphere deleted the shadow column and column value. Users do not need to setup or do anything during this process. They only need to modify SQL accordingly, add shadow fields and corresponding configurations.

Two pain points occur when adding shadow column:

Before performing stress testings, users need to modify test-related SQL according to actual business needs.
SQL modification will increase implementation damage and reduce the accuracy of stress testing results. After discussion in the ShardingSphere community, we decided to upgrade shadow database functions. Apache ShardingSphere 4.1.1 GA shadow database API had relatively simple functionaloity. Whether to open shadow database is determined by corresponding values of the logicColumn.

rules:
- !SHADOW
 column: # shadow field nam
 shadowMappings:
 ds: shadow_ds # production data source name list: shadow database name list

The upgraded shadow database API of 5.0.0 GA is more powerful. Users can determine whether the shadow database function is enabled through the “enable” attribute. Configurable shadow table can determine what needs to be stress tested by means of a table, and supports multiple shadow algorithms. For example, column value matching algorithm, regular expression matching and SQL comment matching algorithm.

rules:
- !SHADOW
  enable: true # shadow databasepn-off switch. Optional value: true/false, default false
  dataSources:
    shadowDataSource:
      sourceDataSourceName: ds # productionDataSourceName
      shadowDataSourceName: shadow_ds # shadowDataSourceName
  tables:
    <shadow-table-name>:
      dataSourceNames: shadowDataSource # #NameListOfShadowDataSourcesRelatedtoShadowDatabases (separate multiple values with ",")
      shadowAlgorithmNames: 
        - <shadow-algorithm-name> # #NameListOfShadowAlgorithmsRelatedtoShadowDatabases
  shadowAlgorithms:
    <shadow-algorithm-name>:
      type: # shadowAlgorithmType
      props: 
        xxx: xxx # shadowAlgorithmAttributeConfiguration

Shadow Database in Practice

Online full-link stress testing diagram:

Prepare stress testing environment:

Suppose an e-commerce website needs to perform online stress testing for an order (demonstrating how to use stand-alone deployment). Suppose stress testing table t_order is an order table, and the ID of the test user is 0.

The data generated by test user order is executed on ds_shadow shadow database, and the production data is executed on ds production database.

Prepare testing environment:

Download ShardingSphere-Proxy 5.0.0 GA from the Download Page, and for installation configuration details, please refer to ShardingSphere-Proxy-Quick-Start.
Configure ShardingSphere-Proxy in the hypothetical stress testing scenario mentioned above:

server.yaml

rules:
  - !AUTHORITY
    users:
      - root@%:root
      - sharding@:sharding
    provider:
      type: NATIVE
  - !TRANSACTION
    defaultType: XA
    providerType: Atomikosprops:
  max-connections-size-per-query: 1
  executor-size: 16  # Infinite by default.
  proxy-frontend-flush-threshold: 128  # The default value is 128.
  proxy-opentracing-enabled: false
  proxy-hint-enabled: false
  sql-show: true
  check-table-metadata-enabled: false
  lock-wait-timeout-milliseconds: 50000 # The maximum time to wait for a lock
  show-process-list-enabled: false
    # Proxy backend query fetch size. A larger value may increase the memory usage of ShardingSphere Proxy.
    # The default value is -1, which means set the minimum value for different JDBC drivers.
  proxy-backend-query-fetch-size: -1
  check-duplicate-table-enabled: false
  sql-comment-parse-enabled: true
  proxy-frontend-executor-size: 0 # Proxy frontend executor size. The default value is 0, which means let Netty decide.
    # Available options of proxy backend executor suitable: OLAP(default), OLTP. The OLTP option may reduce time cost of writing packets to client, but it may increase the latency of SQL execution
    # if client connections are more than proxy-frontend-netty-executor-size, especially executing slow SQL.
  proxy-backend-executor-suitable: OLAP

config-shadow.yaml

schemaName: shadow_poc_databasedataSources:
  ds:
    url: jdbc:mysql://127.0.0.1:3306/ds?serverTimezone=UTC&useSSL=false
    username: root
    password:
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50
    minPoolSize: 1
  ds_shadow:
    url: jdbc:mysql://127.0.0.1:3306/ds_shadow?serverTimezone=UTC&useSSL=false
    username: root
    password:
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50
    minPoolSize: 1rules:
- !SHADOW
  enable: true
  dataSources:
    shadowDataSource:
      sourceDataSourceName: ds
      shadowDataSourceName: ds_shadow  tables:
    t_order:
      dataSourceNames:
        - shadowDataSource
      shadowAlgorithmNames:
        - user-id-insert-match-algorithm
        - simple-note-algorithm  shadowAlgorithms:
    user-id-insert-match-algorithm:
      type: COLUMN_REGEX_MATCH
      props:
        operation: insert
        column: user_id
        regex: "[0]"
    simple-note-algorithm:
      type: SIMPLE_NOTE
      props:
        foo: bar

3. Order service

Order-related businesses are not discussed here. Considering the simplest request taking and order table inserting as an example, the order table structure is as follows:

Order Table Structure

CREATE TABLE `t_order` (
    `id` INT(11) AUTO_INCREMENT COMMENT 'Order ID',
    `user_id` VARCHAR(32) NOT NULL COMMENT 'User ID',
    `sku` VARCHAR(32) NOT NULL COMMENT 'Product ordered sku',
    PRIMARY KEY (`id`)
)ENGINE = InnoDB COMMENT = 'order table';

Stress Testing Process Simulation

Use postman to simulate the order created by test user, which is as follows:

SQL executor being routed to and executed in shadow database can be seen in ShardingSphere-Proxy execution log:

Verify Stress Testing Results :

Shadow database ds_shadow executes query sentenceSELECT * FROM t_order;

Query result:

Production database ds executes query sentence SELECT * FROM t_order; Query result:

The data generated from test user order creation will be routed to shadow database. For more complicated configurations, please refer to Shadow Database Stress Testing in the ShardingSphere official document.

Complete Solution for Full-Link Online Stress Testing — CyborgFlow

As mentioned in the introduction, full-link online stress testing is a complicated task that requires collaboration between microservices and middlewares to meet the needs of different traffic and stress testing tag transmissions.

Additonally, the testing service should be stateless and immediately available. CyborgFlow, which is jointly maintained by Apache ShardingSphere, Apache APISIX and Apache SkyWalking provides out-of-the-box (OoTB) solution to run load test in your online system.

Apache APISIX is responsible for making tags on testing data at the gateway layer, while Apache SkyWalking is responsible for transmission through the whole scheduling link, and finally, Apache ShardingSphere-Proxy will isolate data and route testing data to the shadow database.

The 0.1.0 version of CyborgFlow has been released and is available for download.

Apache ShardingSphere Open Source Project Links:

ShardingSphere Github

ShardingSphere Twitter

ShardingSphere Slack Channel

Contributor Guide

Author

HouYang

SphereEx Middleware Developer, Apache ShardingSphere Contributor.
Currently he focuses on the design and development of ShadowDB and full-link stress testing.