Skip to content

Conversation

@yzeng1618
Copy link
Contributor

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

The JDBC connector currently fails to properly preserve the original content when processing BLOB fields from Oracle databases. This issue is clearly demonstrated in the provided example:

In the Oracle source table (TEST_BLOB_TABLE), we have BLOB data with different content types:

  • Row 1: Simple text "Hello, World!"

  • Row 2: XML content

  • Row 3: HTML content
    However, after synchronization to the Doris target table, all BLOB data is converted to Base64-encoded strings:

  • Row 1: "SGVsbG8sIFdvcmxkIQ=="

  • Row 2: "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4..."

  • Row 3: "PCFET0NUWVBFIGh0bWw+PGh0bWwgc3R5bGU9Im92..."
    This transformation makes the data unusable in its original form. Users cannot directly work with the text, XML, or HTML content as they could in the source database. Instead, they would need to perform additional Base64 decoding steps to retrieve the original content.

Usage Scenario

This feature is essential for users who need to accurately transfer Oracle BLOB data to target systems while preserving the original content format. Specific scenarios include:

  1. Data Migration Projects : When migrating databases containing BLOB fields with text, XML, HTML, or other structured content from Oracle to systems like Doris, users need the original content to remain usable.
  2. Document Management Systems : Organizations storing documents (HTML, XML, JSON) in Oracle BLOB fields need to maintain the document structure during data synchronization.
  3. Application Integration : When applications rely on specific data formats stored in BLOB fields, the integrity of these formats must be preserved during data transfer.
  4. Data Analysis : Analysts working with structured data stored in BLOB fields need the original format for proper analysis rather than encoded strings.
    Without this feature, users must implement additional post-processing steps to decode and reconstruct the original data, significantly complicating their data pipelines.

Related issues

no

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses the Oracle BLOB data preservation issue by introducing a new configuration option, HANDLE_BLOB_AS_STRING, to allow conditional conversion of BLOB data to a string rather than always converting to Base64‐encoded bytes. Key changes include:

  • Adding new tests to verify both BYTE and STRING conversions for BLOB fields.
  • Updating type conversion logic in OracleTypeConverter, OracleTypeMapper, and OracleDialect to conditionally use string conversion based on the HANDLE_BLOB_AS_STRING flag.
  • Propagating the new configuration option through JdbcConnectionConfig, JdbcCatalogOptions, and related factory and loader classes.

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
OracleTypeConverterTest.java Added tests to verify BLOB conversion with and without string conversion enabled.
JdbcFieldTypeUtils.java Introduced special handling for BLOB in getString to decode using UTF-8.
JdbcCatalogUtils.java, JdbcSourceFactory.java, OracleTypeMapper.java, OracleTypeConverter.java, OracleDialectFactory.java, OracleDialect.java Updated constructors and mappings to pass through the HANDLE_BLOB_AS_STRING flag for proper BLOB handling.
JdbcDialectLoader.java, JdbcDialectFactory.java Added overloads to accept JdbcConnectionConfig for consistent option propagation.
JdbcSourceConfig.java, JdbcOptions.java, JdbcConnectionConfig.java Integrated the new HANDLE_BLOB_AS_STRING configuration at the source level.
OracleCatalogFactory.java, OracleCatalog.java, JdbcCatalogOptions.java Updated catalog creation and options to include HANDLE_BLOB_AS_STRING.
OracleTypeUtils.java (connector-cdc) Adjusted the call to OracleTypeConverter to explicitly disable BLOB to string conversion for CDC.

Comment on lines +66 to +71
java.sql.Blob blob = (java.sql.Blob) obj;
try {
byte[] bytes = blob.getBytes(1, (int) blob.length());
return new String(bytes, java.nio.charset.StandardCharsets.UTF_8);
} finally {
blob.free();
Copy link

Copilot AI May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getString method unconditionally converts a java.sql.Blob to a UTF-8 string. Consider integrating a check against the HANDLE_BLOB_AS_STRING configuration so that the conversion happens only when explicitly enabled.

Suggested change
java.sql.Blob blob = (java.sql.Blob) obj;
try {
byte[] bytes = blob.getBytes(1, (int) blob.length());
return new String(bytes, java.nio.charset.StandardCharsets.UTF_8);
} finally {
blob.free();
if (HANDLE_BLOB_AS_STRING) {
java.sql.Blob blob = (java.sql.Blob) obj;
try {
byte[] bytes = blob.getBytes(1, (int) blob.length());
return new String(bytes, java.nio.charset.StandardCharsets.UTF_8);
} finally {
blob.free();
}
} else {
throw new SQLException("BLOB handling as string is disabled by configuration.");

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made some attempts at modification. I found that it would require a lot of changes to other files and some logical adaptations. Moreover, there is a high likelihood of parameter passing errors. In order to maintain simplicity, I have removed this part of the content.

Comment on lines 67 to 68
public boolean handleBlobAsString = JdbcOptions.HANDLE_BLOB_AS_STRING.defaultValue();

Copy link

Copilot AI May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider making the handleBlobAsString field private and providing accessor methods to ensure consistency with other configuration fields, improving encapsulation.

Suggested change
public boolean handleBlobAsString = JdbcOptions.HANDLE_BLOB_AS_STRING.defaultValue();
private boolean handleBlobAsString = JdbcOptions.HANDLE_BLOB_AS_STRING.defaultValue();
public boolean isHandleBlobAsString() {
return handleBlobAsString;
}
public void setHandleBlobAsString(boolean handleBlobAsString) {
this.handleBlobAsString = handleBlobAsString;
}

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this. @yzeng1618

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been modified.

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the docs and e2e.

@Hisoka-X Hisoka-X changed the title [Feature][Connector-JDBC] Fix Oracle BLOB data format preservation issue [Feature][Connector-JDBC] Supprot read Oracle BLOB data as string instead of bytes May 15, 2025
@hailin0 hailin0 merged commit 454a88f into apache:dev May 16, 2025
6 checks passed
dybyte pushed a commit to dybyte/seatunnel that referenced this pull request Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants