Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .idea/icon.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accidentally deleted it.

Binary file not shown.
16 changes: 16 additions & 0 deletions docs/en/connector-v2/source/CosFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ To use this connector you need put hadoop-cos-{hadoop.version}-{version}.jar and
| compress_codec | string | no | none |
| archive_compress_codec | string | no | none |
| encoding | string | no | UTF-8 |
| binary_chunk_size | int | no | 1024 |
| binary_complete_file_mode | boolean | no | false |
| common-options | | no | - |

### path [string]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add description too?

Expand Down Expand Up @@ -365,6 +367,18 @@ Note: gz compressed excel file needs to compress the original file or specify th
Only used when file_format_type is json,text,csv,xml.
The encoding of the file to read. This param will be parsed by `Charset.forName(encoding)`.

### binary_chunk_size [int]

Only used when file_format_type is binary.

The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory.

### binary_complete_file_mode [boolean]

Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

### common options

Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details.
Expand Down Expand Up @@ -420,6 +434,8 @@ source {
region = "ap-chengdu"
path = "/seatunnel/read/binary/"
file_format_type = "binary"
binary_chunk_size = 2048
binary_complete_file_mode = false
}
}
sink {
Expand Down
16 changes: 16 additions & 0 deletions docs/en/connector-v2/source/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
| archive_compress_codec | string | no | none |
| encoding | string | no | UTF-8 |
| null_format | string | no | - |
| binary_chunk_size | int | no | 1024 |
| binary_complete_file_mode | boolean | no | false |
| common-options | | no | - |

### host [string]
Expand Down Expand Up @@ -380,6 +382,18 @@ null_format to define which strings can be represented as null.

e.g: `\N`

### binary_chunk_size [int]

Only used when file_format_type is binary.

The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory.

### binary_complete_file_mode [boolean]

Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

### common options

Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details.
Expand Down Expand Up @@ -482,6 +496,8 @@ source {
password = tianchao
path = "/seatunnel/read/binary/"
file_format_type = "binary"
binary_chunk_size = 2048
binary_complete_file_mode = false
}
}
sink {
Expand Down
14 changes: 14 additions & 0 deletions docs/en/connector-v2/source/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ Read data from hdfs file system.
| archive_compress_codec | string | no | none |
| encoding | string | no | UTF-8 | |
| null_format | string | no | - | Only used when file_format_type is text. null_format to define which strings can be represented as null. e.g: `\N` |
| binary_chunk_size | int | no | 1024 | Only used when file_format_type is binary. The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory. |
| binary_complete_file_mode | boolean | no | false | Only used when file_format_type is binary. Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false. |
| common-options | | no | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details. |

### delimiter/field_delimiter [string]
Expand Down Expand Up @@ -159,6 +161,18 @@ Note: gz compressed excel file needs to compress the original file or specify th
Only used when file_format_type is json,text,csv,xml.
The encoding of the file to read. This param will be parsed by `Charset.forName(encoding)`.

### binary_chunk_size [int]

Only used when file_format_type is binary.

The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory.

### binary_complete_file_mode [boolean]

Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

### Tips

> If you use spark/flink, In order to use this connector, You must ensure your spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x. If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you download and install SeaTunnel Engine. You can check the jar package under ${SEATUNNEL_HOME}/lib to confirm this.
Expand Down
18 changes: 17 additions & 1 deletion docs/en/connector-v2/source/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,9 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
| compress_codec | string | no | none |
| archive_compress_codec | string | no | none |
| encoding | string | no | UTF-8 |
| null_format | string | no | - |
| null_format | string | no | - |
| binary_chunk_size | int | no | 1024 |
| binary_complete_file_mode | boolean | no | false |
| common-options | | no | - |
| tables_configs | list | no | used to define a multiple table task |

Expand Down Expand Up @@ -363,6 +365,18 @@ null_format to define which strings can be represented as null.

e.g: `\N`

### binary_chunk_size [int]

Only used when file_format_type is binary.

The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory.

### binary_complete_file_mode [boolean]

Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

### common options

Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details
Expand Down Expand Up @@ -477,6 +491,8 @@ source {
LocalFile {
path = "/seatunnel/read/binary/"
file_format_type = "binary"
binary_chunk_size = 2048
binary_complete_file_mode = false
}
}
sink {
Expand Down
14 changes: 14 additions & 0 deletions docs/en/connector-v2/source/OssFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,8 @@ If you assign file type to `parquet` `orc`, schema option not required, connecto
| compress_codec | string | no | none | Which compress codec the files used. |
| encoding | string | no | UTF-8 |
| null_format | string | no | - | Only used when file_format_type is text. null_format to define which strings can be represented as null. e.g: `\N` |
| binary_chunk_size | int | no | 1024 | Only used when file_format_type is binary. The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory. |
| binary_complete_file_mode | boolean | no | false | Only used when file_format_type is binary. Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false. |
| file_filter_pattern | string | no | | Filter pattern, which used for filtering files. |
| common-options | config | no | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details. |

Expand All @@ -221,6 +223,18 @@ The compress codec of files and the details that supported as the following show
Only used when file_format_type is json,text,csv,xml.
The encoding of the file to read. This param will be parsed by `Charset.forName(encoding)`.

### binary_chunk_size [int]

Only used when file_format_type is binary.

The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory.

### binary_complete_file_mode [boolean]

Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

### file_filter_pattern [string]

Filter pattern, which used for filtering files.
Expand Down
14 changes: 14 additions & 0 deletions docs/en/connector-v2/source/S3File.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,8 @@ If you assign file type to `parquet` `orc`, schema option not required, connecto
| archive_compress_codec | string | no | none | |
| encoding | string | no | UTF-8 | |
| null_format | string | no | - | Only used when file_format_type is text. null_format to define which strings can be represented as null. e.g: `\N` |
| binary_chunk_size | int | no | 1024 | Only used when file_format_type is binary. The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory. |
| binary_complete_file_mode | boolean | no | false | Only used when file_format_type is binary. Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false. |
| file_filter_pattern | string | no | | Filter pattern, which used for filtering files. |
| filename_extension | string | no | - | Filter filename extension, which used for filtering files with specific extension. Example: `csv` `.txt` `json` `.xml`. |
| common-options | | no | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details. |
Expand Down Expand Up @@ -301,6 +303,18 @@ Note: gz compressed excel file needs to compress the original file or specify th
Only used when file_format_type is json,text,csv,xml.
The encoding of the file to read. This param will be parsed by `Charset.forName(encoding)`.

### binary_chunk_size [int]

Only used when file_format_type is binary.

The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory.

### binary_complete_file_mode [boolean]

Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

## Example

1. In this example, We read data from s3 path `s3a://seatunnel-test/seatunnel/text` and the file type is orc in this path.
Expand Down
14 changes: 14 additions & 0 deletions docs/en/connector-v2/source/SftpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,8 @@ The File does not have a specific type list, and we can indicate which SeaTunnel
| archive_compress_codec | string | no | none |
| encoding | string | no | UTF-8 |
| null_format | string | no | - | Only used when file_format_type is text. null_format to define which strings can be represented as null. e.g: `\N` |
| binary_chunk_size | int | no | 1024 | Only used when file_format_type is binary. The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory. |
| binary_complete_file_mode | boolean | no | false | Only used when file_format_type is binary. Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false. |
| common-options | | No | - | Source plugin common parameters, please refer to [Source Common Options](../source-common-options.md) for details. |

### file_filter_pattern [string]
Expand Down Expand Up @@ -254,6 +256,18 @@ Note: gz compressed excel file needs to compress the original file or specify th
Only used when file_format_type is json,text,csv,xml.
The encoding of the file to read. This param will be parsed by `Charset.forName(encoding)`.

### binary_chunk_size [int]

Only used when file_format_type is binary.

The chunk size (in bytes) for reading binary files. Default is 1024 bytes. Larger values may improve performance for large files but use more memory.

### binary_complete_file_mode [boolean]

Only used when file_format_type is binary.

Whether to read the complete file as a single chunk instead of splitting into chunks. When enabled, the entire file content will be read into memory at once. Default is false.

### schema [config]

#### fields [Config]
Expand Down
Loading