-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[Feature][Connector-JDBC] Supprot read Oracle BLOB data as string instead of bytes #9305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…o support large blob fields.
…o support large blob fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses the Oracle BLOB data preservation issue by introducing a new configuration option, HANDLE_BLOB_AS_STRING, to allow conditional conversion of BLOB data to a string rather than always converting to Base64‐encoded bytes. Key changes include:
- Adding new tests to verify both BYTE and STRING conversions for BLOB fields.
- Updating type conversion logic in OracleTypeConverter, OracleTypeMapper, and OracleDialect to conditionally use string conversion based on the HANDLE_BLOB_AS_STRING flag.
- Propagating the new configuration option through JdbcConnectionConfig, JdbcCatalogOptions, and related factory and loader classes.
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| OracleTypeConverterTest.java | Added tests to verify BLOB conversion with and without string conversion enabled. |
| JdbcFieldTypeUtils.java | Introduced special handling for BLOB in getString to decode using UTF-8. |
| JdbcCatalogUtils.java, JdbcSourceFactory.java, OracleTypeMapper.java, OracleTypeConverter.java, OracleDialectFactory.java, OracleDialect.java | Updated constructors and mappings to pass through the HANDLE_BLOB_AS_STRING flag for proper BLOB handling. |
| JdbcDialectLoader.java, JdbcDialectFactory.java | Added overloads to accept JdbcConnectionConfig for consistent option propagation. |
| JdbcSourceConfig.java, JdbcOptions.java, JdbcConnectionConfig.java | Integrated the new HANDLE_BLOB_AS_STRING configuration at the source level. |
| OracleCatalogFactory.java, OracleCatalog.java, JdbcCatalogOptions.java | Updated catalog creation and options to include HANDLE_BLOB_AS_STRING. |
| OracleTypeUtils.java (connector-cdc) | Adjusted the call to OracleTypeConverter to explicitly disable BLOB to string conversion for CDC. |
| java.sql.Blob blob = (java.sql.Blob) obj; | ||
| try { | ||
| byte[] bytes = blob.getBytes(1, (int) blob.length()); | ||
| return new String(bytes, java.nio.charset.StandardCharsets.UTF_8); | ||
| } finally { | ||
| blob.free(); |
Copilot
AI
May 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The getString method unconditionally converts a java.sql.Blob to a UTF-8 string. Consider integrating a check against the HANDLE_BLOB_AS_STRING configuration so that the conversion happens only when explicitly enabled.
| java.sql.Blob blob = (java.sql.Blob) obj; | |
| try { | |
| byte[] bytes = blob.getBytes(1, (int) blob.length()); | |
| return new String(bytes, java.nio.charset.StandardCharsets.UTF_8); | |
| } finally { | |
| blob.free(); | |
| if (HANDLE_BLOB_AS_STRING) { | |
| java.sql.Blob blob = (java.sql.Blob) obj; | |
| try { | |
| byte[] bytes = blob.getBytes(1, (int) blob.length()); | |
| return new String(bytes, java.nio.charset.StandardCharsets.UTF_8); | |
| } finally { | |
| blob.free(); | |
| } | |
| } else { | |
| throw new SQLException("BLOB handling as string is disabled by configuration."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made some attempts at modification. I found that it would require a lot of changes to other files and some logical adaptations. Moreover, there is a high likelihood of parameter passing errors. In order to maintain simplicity, I have removed this part of the content.
| public boolean handleBlobAsString = JdbcOptions.HANDLE_BLOB_AS_STRING.defaultValue(); | ||
|
|
Copilot
AI
May 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider making the handleBlobAsString field private and providing accessor methods to ensure consistency with other configuration fields, improving encapsulation.
| public boolean handleBlobAsString = JdbcOptions.HANDLE_BLOB_AS_STRING.defaultValue(); | |
| private boolean handleBlobAsString = JdbcOptions.HANDLE_BLOB_AS_STRING.defaultValue(); | |
| public boolean isHandleBlobAsString() { | |
| return handleBlobAsString; | |
| } | |
| public void setHandleBlobAsString(boolean handleBlobAsString) { | |
| this.handleBlobAsString = handleBlobAsString; | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this. @yzeng1618
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been modified.
Hisoka-X
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the docs and e2e.
…tead of bytes (apache#9305) Co-authored-by: zengyi <[email protected]>
Search before asking
Description
The JDBC connector currently fails to properly preserve the original content when processing BLOB fields from Oracle databases. This issue is clearly demonstrated in the provided example:
In the Oracle source table (TEST_BLOB_TABLE), we have BLOB data with different content types:
Row 1: Simple text "Hello, World!"
Row 2: XML content
Row 3: HTML content
However, after synchronization to the Doris target table, all BLOB data is converted to Base64-encoded strings:
Row 1: "SGVsbG8sIFdvcmxkIQ=="
Row 2: "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4..."
Row 3: "PCFET0NUWVBFIGh0bWw+PGh0bWwgc3R5bGU9Im92..."
This transformation makes the data unusable in its original form. Users cannot directly work with the text, XML, or HTML content as they could in the source database. Instead, they would need to perform additional Base64 decoding steps to retrieve the original content.
Usage Scenario
This feature is essential for users who need to accurately transfer Oracle BLOB data to target systems while preserving the original content format. Specific scenarios include:
Without this feature, users must implement additional post-processing steps to decode and reconstruct the original data, significantly complicating their data pipelines.
Related issues
no
Are you willing to submit a PR?
Code of Conduct