Skip to content
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
02a62e9
init bedrock model files
SEZ9 Mar 30, 2025
2d06f2e
Merge branch 'apache:dev' into dev
SEZ9 Mar 30, 2025
b0350d3
init parameters
SEZ9 Mar 30, 2025
c75cad1
Merge branch 'dev' of https://github.com/SEZ9/seatunnel into dev
SEZ9 Mar 30, 2025
11f476b
test complete
SEZ9 Apr 5, 2025
4ee4dd5
fix type
SEZ9 Apr 5, 2025
af08bbe
fix type
SEZ9 Apr 6, 2025
590cf92
fix typo
SEZ9 Apr 6, 2025
88e616c
Merge branch 'apache:dev' into dev
SEZ9 Apr 7, 2025
6b4b3ff
change link
SEZ9 Apr 7, 2025
e531857
Merge branch 'dev' of https://github.com/SEZ9/seatunnel into dev
SEZ9 Apr 7, 2025
5f43a40
trigger build
SEZ9 Apr 7, 2025
72246c5
update doc
SEZ9 Apr 7, 2025
6e1a424
updated EmbeddingTransformFactory
SEZ9 Apr 8, 2025
8d7b4e3
add e2e transform amazon model
SEZ9 Apr 8, 2025
384db01
Merge branch 'apache:dev' into dev
SEZ9 Apr 9, 2025
e973325
trigger build
SEZ9 Apr 9, 2025
5261ac2
Merge branch 'apache:dev' into dev
SEZ9 Apr 11, 2025
dcce04e
Merge branch 'apache:dev' into dev
SEZ9 Apr 20, 2025
61c6bce
add dependencies check list
SEZ9 Apr 20, 2025
c4f6717
Update known-dependencies.txt
SEZ9 Apr 20, 2025
a07024b
Update known-dependencies.txt
SEZ9 Apr 20, 2025
7287e45
Update known-dependencies.txt
SEZ9 Apr 20, 2025
53f622e
modified aws region option key name
SEZ9 Apr 21, 2025
0b4bdd7
Merge branch 'apache:dev' into dev
SEZ9 Apr 25, 2025
9e004b5
Update embedding_transform.conf
SEZ9 Apr 25, 2025
059d1b7
Update embedding_transform.conf
SEZ9 Apr 25, 2025
23fcc1f
Update embedding.md
SEZ9 Apr 26, 2025
aaf88da
add support amazon endpoint , modified e2e mock test
SEZ9 Apr 26, 2025
d76c740
update bedrock e2e URI
SEZ9 Apr 26, 2025
d847009
change e2e only bedrock cohere
SEZ9 Apr 26, 2025
efbaa87
Update embedding_transform.conf
SEZ9 Apr 27, 2025
0ab05c0
Update mock-embedding.json
SEZ9 Apr 27, 2025
c36114e
Update mock-embedding.json
SEZ9 Apr 27, 2025
ee121ac
Update BedrockModel.java
SEZ9 Apr 27, 2025
0298311
Update TestEmbeddingIT.java
SEZ9 Apr 27, 2025
33aca70
remove useIdleConnectionReaper
SEZ9 Apr 28, 2025
1e53c59
Merge branch 'apache:dev' into dev
SEZ9 May 7, 2025
7894179
Merge branch 'apache:dev' into dev
SEZ9 May 17, 2025
b9f8313
Merge branch 'apache:dev' into dev
SEZ9 May 27, 2025
6cd605d
Merge branch 'apache:dev' into dev
SEZ9 May 28, 2025
05cb544
Merge branch 'apache:dev' into dev
SEZ9 May 29, 2025
fd6ae5e
Update Elasticsearch.md
SEZ9 May 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 17 additions & 16 deletions docs/en/transform-v2/embedding.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,26 @@ different API endpoints.

## Options

| Name | Type | Required | Default Value | Description |
|----------------------------------|--------|----------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| model_provider | enum | yes | - | The model provider for embedding. Options may include `QIANFAN`, `OPENAI`, etc. |
| api_key | string | yes | - | The API key required to authenticate with the embedding service. |
| secret_key | string | yes | - | The secret key required for additional authentication with the embedding service. |
| single_vectorized_input_number | int | no | 1 | The number of inputs vectorized in one request. Default is 1. |
| vectorization_fields | map | yes | - | A mapping between input fields and their corresponding output vector fields. |
| model | string | yes | - | The specific model to use for embedding (e.g: `text-embedding-3-small` for OPENAI). |
| api_path | string | no | - | The API endpoint for the embedding service. Typically provided by the model provider. |
| dimension | int | no | - | TThe vector dimension defaults to 2048. The Embedding-3 model supports custom vector dimensions, and it is recommended to choose dimensions of 256, 512, 1024, or 2048. |
| oauth_path | string | no | - | The API endpoint for the oauth service. |
| custom_config | map | no | | Custom configurations for the model. |
| custom_response_parse | string | no | | Specifies how to parse the response from the model using JsonPath. Example: `$.choices[*].message.content`. |
| custom_request_headers | map | no | | Custom headers for the request to the model. |
| custom_request_body | map | no | | Custom body for the request. Supports placeholders like `${model}`, `${input}`. |
| Name | Type | Required | Default Value | Description |
|--------------------------------|--------|----------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| model_provider | enum | yes | - | The model provider for embedding. Options may include `AMAZON`, `QIANFAN`, `OPENAI`, etc. |
| api_key | string | yes | - | The API key required to authenticate with the embedding service. |
| secret_key | string | yes | - | The secret key required for additional authentication with the embedding service. |
| region | string | no | | AWS Region. Required for use Amazon Bedrock model. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| region | string | no | | AWS Region. Required for use Amazon Bedrock model. |
| amazon.region | string | no | | AWS Region. Required for use Amazon Bedrock model. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified region to aws_region

| single_vectorized_input_number | int | no | 1 | The number of inputs vectorized in one request. Default is 1. |
| vectorization_fields | map | yes | - | A mapping between input fields and their corresponding output vector fields. |
| model | string | yes | - | The specific model to use for embedding (e.g: `text-embedding-3-small` for OPENAI). |
| api_path | string | no | - | The API endpoint for the embedding service. Typically provided by the model provider. |
| dimension | int | no | - | TThe vector dimension defaults to 2048. The Embedding-3 model supports custom vector dimensions, and it is recommended to choose dimensions of 256, 512, 1024, or 2048. |
| oauth_path | string | no | - | The API endpoint for the oauth service. |
| custom_config | map | no | | Custom configurations for the model. |
| custom_response_parse | string | no | | Specifies how to parse the response from the model using JsonPath. Example: `$.choices[*].message.content`. |
| custom_request_headers | map | no | | Custom headers for the request to the model. |
| custom_request_body | map | no | | Custom body for the request. Supports placeholders like `${model}`, `${input}`. |

### model_provider

The providers for generating embeddings include common options such as `DOUBAO`, `QIANFAN`, and `OPENAI`. Additionally,
The providers for generating embeddings include common options such as `AMAZON`, `DOUBAO`, `QIANFAN`, and `OPENAI`. Additionally,
you can choose `CUSTOM` to implement requests and retrievals for custom embedding models.

### api_key
Expand Down
33 changes: 17 additions & 16 deletions docs/zh/transform-v2/embedding.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,26 @@

## 配置选项

| 名称 | 类型 | 是否必填 | 默认值 | 描述 |
|----------------------------------|--------|------|--------|--------------------------------------------------------------------|
| model_provider | enum | 是 | - | embedding模型的提供商。可选项包括 `QIANFAN`、`OPENAI` 等。 |
| api_key | string | 是 | - | 用于验证embedding服务的API密钥。 |
| secret_key | string | 是 | - | 用于额外验证的密钥。一些提供商可能需要此密钥进行安全的API请求。 |
| single_vectorized_input_number | int | 否 | 1 | 单次请求向量化的输入数量。默认值为1。 |
| vectorization_fields | map | 是 | - | 输入字段和相应的输出向量字段之间的映射。 |
| model | string | 是 | - | 要使用的具体embedding模型。例如,如果提供商为OPENAI,可以指定 `text-embedding-3-small`。 |
| api_path | string | 否 | - | embedding服务的API。通常由模型提供商提供。 |
| dimension | int | 否 | 2048 | 向量维度默认为 2048,Embedding-3模型支持自定义向量维度,建议选择256、512、1024或2048维度。 |
| oauth_path | string | 否 | - | oauth 服务的 API 。 |
| custom_config | map | 否 | | 模型的自定义配置。 |
| custom_response_parse | string | 否 | | 使用 JsonPath 解析模型响应的方式。示例:`$.choices[*].message.content`。 |
| custom_request_headers | map | 否 | | 发送到模型的请求的自定义头信息。 |
| custom_request_body | map | 否 | | 请求体的自定义配置。支持占位符如 `${model}`、`${input}`。 |
| 名称 | 类型 | 是否必填 | 默认值 | 描述 |
|--------------------------------|--------|------|--------|------------------------------------------------------------------|
| model_provider | enum | 是 | - | embedding模型的提供商。可选项包括 `AMAZON`、`QIANFAN`、`OPENAI` 等。 |
| api_key | string | 是 | - | 用于验证embedding服务的API密钥。 |
| secret_key | string | 是 | - | 用于额外验证的密钥。一些提供商可能需要此密钥进行安全的API请求。 |
| region | string | 否 | | 用于使用Amazon Bedrock 模型,需要指定模型请求区域. |
| single_vectorized_input_number | int | 否 | 1 | 单次请求向量化的输入数量。默认值为1。 |
| vectorization_fields | map | 是 | - | 输入字段和相应的输出向量字段之间的映射。 |
| model | string | 是 | - | 要使用的具体embedding模型。例如,如果提供商为OPENAI,可以指定 `text-embedding-3-small`。 |
| api_path | string | 否 | - | embedding服务的API。通常由模型提供商提供。 |
| dimension | int | 否 | 2048 | 向量维度默认为 2048,Embedding-3模型支持自定义向量维度,建议选择256、512、1024或2048维度。 |
| oauth_path | string | 否 | - | oauth 服务的 API 。 |
| custom_config | map | 否 | | 模型的自定义配置。 |
| custom_response_parse | string | 否 | | 使用 JsonPath 解析模型响应的方式。示例:`$.choices[*].message.content`。 |
| custom_request_headers | map | 否 | | 发送到模型的请求的自定义头信息。 |
| custom_request_body | map | 否 | | 请求体的自定义配置。支持占位符如 `${model}`、`${input}`。 |

### embedding_model_provider

用于生成 embedding 的模型提供商。常见选项包括 `DOUBAO`、`QIANFAN`、`OPENAI` 等,同时可选择 `CUSTOM` 实现自定义 embedding
用于生成 embedding 的模型提供商。常见选项包括 `AMAZON`、 `DOUBAO`、`QIANFAN`、`OPENAI` 等,同时可选择 `CUSTOM` 实现自定义 embedding
模型的请求以及获取。

### api_key
Expand Down
Loading
Loading