[Fix][Transform-V2] Reduce embedding precision from double to float #9635

xiaochen-zhou · 2025-07-28T16:35:14Z

Purpose of this pull request

Reduce embedding precision from double to float，close #9611

Does this PR introduce any user-facing change?

no

How was this patch tested?

Exists tests

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

xiaochen-zhou · 2025-07-28T16:41:54Z

I think we can start by reducing the embedding precision from double to float. The precision loss isn’t just happening with Zhipu—it’s actually an issue with almost all models where the embedding type is returned as double, like

Qianfan

openai model：

So, as a quick fix, we can switch to float for now and add a note in the docs to let users know. @Hisoka-X

Hisoka-X · 2025-07-29T02:29:30Z

So, as a quick fix, we can switch to float for now and add a note in the docs to let users know. @Hisoka-X

+1. Next step, we should support double vector type.

Hisoka-X · 2025-07-29T02:30:59Z

Thanks @xiaochen-zhou . Could you add a test case to cover it?

xiaochen-zhou · 2025-07-29T03:11:51Z

Thanks @xiaochen-zhou . Could you add a test case to cover it?

OK.

loupipalien · 2025-07-30T04:29:14Z

So, as a quick fix, we can switch to float for now and add a note in the docs to let users know. @Hisoka-X

+1. Next step, we should support double vector type.

@Hisoka-X @xiaochen-zhou Another question, is there a plan to support multimodal embeddings?https://www.volcengine.com/docs/82379/1523520

xiaochen-zhou · 2025-07-30T05:45:50Z

So, as a quick fix, we can switch to float for now and add a note in the docs to let users know. @Hisoka-X

+1. Next step, we should support double vector type.

@Hisoka-X @xiaochen-zhou Another question, is there a plan to support multimodal embeddings?https://www.volcengine.com/docs/82379/1523520

I think this suggestion is great, and I would be happy to try implementing it. @Hisoka-X

Hisoka-X · 2025-07-30T07:57:20Z

So, as a quick fix, we can switch to float for now and add a note in the docs to let users know. @Hisoka-X

+1. Next step, we should support double vector type.

@Hisoka-X @xiaochen-zhou Another question, is there a plan to support multimodal embeddings?https://www.volcengine.com/docs/82379/1523520

+1

Hisoka-X

Thanks @xiaochen-zhou

[Fix][Transform-V2] Reduce embedding precision from double to float

591adcb

github-actions bot added document Transform-v2 labels Jul 28, 2025

[Fix][Transform-V2] Reduce embedding precision from double to float

bc5d1f3

add EmbeddingVectorTest#testVectorPrecision()

be93914

Hisoka-X approved these changes Jul 30, 2025

View reviewed changes

github-actions bot added approved reviewed labels Jul 30, 2025

corgy-w approved these changes Jul 31, 2025

View reviewed changes

corgy-w merged commit c1d2172 into apache:dev Jul 31, 2025
5 checks passed

xiaochen-zhou deleted the embedding-float branch August 3, 2025 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix][Transform-V2] Reduce embedding precision from double to float #9635

[Fix][Transform-V2] Reduce embedding precision from double to float #9635

xiaochen-zhou commented Jul 28, 2025 •

edited by Hisoka-X

Loading

Uh oh!

xiaochen-zhou commented Jul 28, 2025

Uh oh!

Hisoka-X commented Jul 29, 2025 •

edited

Loading

Uh oh!

Hisoka-X commented Jul 29, 2025

Uh oh!

xiaochen-zhou commented Jul 29, 2025

Uh oh!

loupipalien commented Jul 30, 2025

Uh oh!

xiaochen-zhou commented Jul 30, 2025

Uh oh!

Hisoka-X commented Jul 30, 2025

Uh oh!

Hisoka-X left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Fix][Transform-V2] Reduce embedding precision from double to float #9635

[Fix][Transform-V2] Reduce embedding precision from double to float #9635

Conversation

xiaochen-zhou commented Jul 28, 2025 • edited by Hisoka-X Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

xiaochen-zhou commented Jul 28, 2025

Uh oh!

Hisoka-X commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hisoka-X commented Jul 29, 2025

Uh oh!

xiaochen-zhou commented Jul 29, 2025

Uh oh!

loupipalien commented Jul 30, 2025

Uh oh!

xiaochen-zhou commented Jul 30, 2025

Uh oh!

Hisoka-X commented Jul 30, 2025

Uh oh!

Hisoka-X left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xiaochen-zhou commented Jul 28, 2025 •

edited by Hisoka-X

Loading

Hisoka-X commented Jul 29, 2025 •

edited

Loading