Skip to content

Conversation

@yzeng1618
Copy link
Contributor

Purpose of this pull request

#9272

This PR adds case insensitivity feature to the Doris connector. During data synchronization, especially in migration scenarios from Oracle to Doris, column name matching issues often occur because Oracle stores table and field names in uppercase by default, while Doris typically uses lowercase identifiers. By adding a case_sensitive configuration option, users can control whether column names are case-sensitive, thus resolving case difference issues during cross-database system migration.

Does this PR introduce any user-facing change?

Yes, this PR introduces a new configuration option case_sensitive that allows users to control whether the Doris connector is case-sensitive when processing column names. When set to false , the connector automatically converts column names to lowercase, achieving case-insensitive column name matching. This is particularly useful when migrating data from databases like Oracle that use uppercase identifiers by default to Doris.

How was this patch tested?

  • Integration tests: Tested actual data synchronization scenarios using Oracle and Doris environments to ensure column name case differences do not affect data synchronization
  • Manual testing: Verified the stability and compatibility of the feature under different configuration combinations
  • Scenario testing: Mainly tested single table, multiple tables with parameter set to false, and scenarios without setting parameters, all results met the requirements

Check list

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces a new configuration option to control case sensitivity within the Doris connector, addressing issues in column name matching when migrating from systems like Oracle. Key changes include adding extensive tests for the case sensitivity feature, updating table/column name processing in both sink and converter components, and integrating the new configuration option via updated options and configuration classes.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
DorisTypeConvertorV2Test.java, DorisTypeConvertorV1Test.java Added tests verifying default and override behavior for case sensitivity.
DorisStreamLoad.java Modified to optionally lowercase the table name based on the case_sensitive flag.
DorisSinkWriter.java, SeaTunnelRowSerializerFactory.java, SeaTunnelRowSerializer.java Updated to use a new serializer factory that accepts a case sensitivity flag and processes field names accordingly.
AbstractDorisTypeConverter.java Updated builder method to apply case sensitivity during column name construction.
DorisTableConfig.java, DorisSinkOptions.java, DorisSinkConfig.java Introduced and integrated the new case_sensitive configuration option.
Comments suppressed due to low confidence (2)

seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/sink/writer/DorisStreamLoad.java:97

  • Consider using toLowerCase(Locale.ROOT) instead of plain toLowerCase() to guarantee locale-insensitive behavior during case conversion.
this.table = dorisSinkConfig.isCaseSensitive() ? tablePath.getTableName() : tablePath.getTableName().toLowerCase();

seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/datatype/AbstractDorisTypeConverter.java:98

  • Consider specifying Locale.ROOT in the toLowerCase() call to ensure consistent, locale-insensitive string conversion.
String columnName = caseSensitive ? typeDefine.getName() : typeDefine.getName().toLowerCase();

String[] fieldNames = seaTunnelRowType.getFieldNames();
String[] processedFieldNames = new String[fieldNames.length];
for (int i = 0; i < fieldNames.length; i++) {
processedFieldNames[i] = caseSensitive ? fieldNames[i] : fieldNames[i].toLowerCase();
Copy link

Copilot AI May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using toLowerCase(Locale.ROOT) here as well to avoid potential locale-related inconsistencies.

Suggested change
processedFieldNames[i] = caseSensitive ? fieldNames[i] : fieldNames[i].toLowerCase();
processedFieldNames[i] = caseSensitive ? fieldNames[i] : fieldNames[i].toLowerCase(Locale.ROOT);

Copilot uses AI. Check for mistakes.
@Hisoka-X Hisoka-X linked an issue May 12, 2025 that may be closed by this pull request
3 tasks
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yzeng1618 ! Please update the docs.

Comment on lines 372 to 388
# If you want to convert all table and column names to lowercase
sink {
Doris {
fenodes = "e2e_dorisdb:8030"
username = root
password = ""
database = "Test_DB" # Will be converted to "test_db"
table = "Test_Table" # Will be converted to "test_table"
case_sensitive = false # Convert all names to lowercase
sink.enable-2pc = "true"
sink.label-prefix = "test_case_insensitive"
doris.config = {
format = "json"
read_json_by_line = "true"
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If you want to convert all table and column names to lowercase
sink {
Doris {
fenodes = "e2e_dorisdb:8030"
username = root
password = ""
database = "Test_DB" # Will be converted to "test_db"
table = "Test_Table" # Will be converted to "test_table"
case_sensitive = false # Convert all names to lowercase
sink.enable-2pc = "true"
sink.label-prefix = "test_case_insensitive"
doris.config = {
format = "json"
read_json_by_line = "true"
}
}
}

Keep one demo is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been modified.

}
}
# 如果您想将所有表名和列名转换为小写
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been modified.

@Hisoka-X Hisoka-X changed the title [Feature][connector-doris] adds case insensitivity feature to the Doris connector [Feature][Connector-doris] Adds case insensitivity feature May 16, 2025
@hailin0 hailin0 merged commit 9d1cffa into apache:dev May 17, 2025
5 checks passed
dybyte pushed a commit to dybyte/seatunnel that referenced this pull request Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][connector-doris] Case Insensitive During Synchronization

3 participants