-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Feature][Connector-doris] Adds case insensitivity feature #9306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request introduces a new configuration option to control case sensitivity within the Doris connector, addressing issues in column name matching when migrating from systems like Oracle. Key changes include adding extensive tests for the case sensitivity feature, updating table/column name processing in both sink and converter components, and integrating the new configuration option via updated options and configuration classes.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| DorisTypeConvertorV2Test.java, DorisTypeConvertorV1Test.java | Added tests verifying default and override behavior for case sensitivity. |
| DorisStreamLoad.java | Modified to optionally lowercase the table name based on the case_sensitive flag. |
| DorisSinkWriter.java, SeaTunnelRowSerializerFactory.java, SeaTunnelRowSerializer.java | Updated to use a new serializer factory that accepts a case sensitivity flag and processes field names accordingly. |
| AbstractDorisTypeConverter.java | Updated builder method to apply case sensitivity during column name construction. |
| DorisTableConfig.java, DorisSinkOptions.java, DorisSinkConfig.java | Introduced and integrated the new case_sensitive configuration option. |
Comments suppressed due to low confidence (2)
seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/sink/writer/DorisStreamLoad.java:97
- Consider using toLowerCase(Locale.ROOT) instead of plain toLowerCase() to guarantee locale-insensitive behavior during case conversion.
this.table = dorisSinkConfig.isCaseSensitive() ? tablePath.getTableName() : tablePath.getTableName().toLowerCase();
seatunnel-connectors-v2/connector-doris/src/main/java/org/apache/seatunnel/connectors/doris/datatype/AbstractDorisTypeConverter.java:98
- Consider specifying Locale.ROOT in the toLowerCase() call to ensure consistent, locale-insensitive string conversion.
String columnName = caseSensitive ? typeDefine.getName() : typeDefine.getName().toLowerCase();
| String[] fieldNames = seaTunnelRowType.getFieldNames(); | ||
| String[] processedFieldNames = new String[fieldNames.length]; | ||
| for (int i = 0; i < fieldNames.length; i++) { | ||
| processedFieldNames[i] = caseSensitive ? fieldNames[i] : fieldNames[i].toLowerCase(); |
Copilot
AI
May 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using toLowerCase(Locale.ROOT) here as well to avoid potential locale-related inconsistencies.
| processedFieldNames[i] = caseSensitive ? fieldNames[i] : fieldNames[i].toLowerCase(); | |
| processedFieldNames[i] = caseSensitive ? fieldNames[i] : fieldNames[i].toLowerCase(Locale.ROOT); |
Hisoka-X
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yzeng1618 ! Please update the docs.
…sitive parameter.
docs/en/connector-v2/sink/Doris.md
Outdated
| # If you want to convert all table and column names to lowercase | ||
| sink { | ||
| Doris { | ||
| fenodes = "e2e_dorisdb:8030" | ||
| username = root | ||
| password = "" | ||
| database = "Test_DB" # Will be converted to "test_db" | ||
| table = "Test_Table" # Will be converted to "test_table" | ||
| case_sensitive = false # Convert all names to lowercase | ||
| sink.enable-2pc = "true" | ||
| sink.label-prefix = "test_case_insensitive" | ||
| doris.config = { | ||
| format = "json" | ||
| read_json_by_line = "true" | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # If you want to convert all table and column names to lowercase | |
| sink { | |
| Doris { | |
| fenodes = "e2e_dorisdb:8030" | |
| username = root | |
| password = "" | |
| database = "Test_DB" # Will be converted to "test_db" | |
| table = "Test_Table" # Will be converted to "test_table" | |
| case_sensitive = false # Convert all names to lowercase | |
| sink.enable-2pc = "true" | |
| sink.label-prefix = "test_case_insensitive" | |
| doris.config = { | |
| format = "json" | |
| read_json_by_line = "true" | |
| } | |
| } | |
| } |
Keep one demo is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been modified.
docs/zh/connector-v2/sink/Doris.md
Outdated
| } | ||
| } | ||
| # 如果您想将所有表名和列名转换为小写 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been modified.
Purpose of this pull request
#9272
This PR adds case insensitivity feature to the Doris connector. During data synchronization, especially in migration scenarios from Oracle to Doris, column name matching issues often occur because Oracle stores table and field names in uppercase by default, while Doris typically uses lowercase identifiers. By adding a case_sensitive configuration option, users can control whether column names are case-sensitive, thus resolving case difference issues during cross-database system migration.
Does this PR introduce any user-facing change?
Yes, this PR introduces a new configuration option case_sensitive that allows users to control whether the Doris connector is case-sensitive when processing column names. When set to false , the connector automatically converts column names to lowercase, achieving case-insensitive column name matching. This is particularly useful when migrating data from databases like Oracle that use uppercase identifiers by default to Doris.
How was this patch tested?
Check list
New License Guide
release-note.