Skip to content

Conversation

@chestnut-c
Copy link
Contributor

@chestnut-c chestnut-c commented Jul 8, 2025

Purpose of this pull request

If the batch task configuration property checkpoint.interval is missing or checkpointEnable=false, the checkpoint storage policy configured in seatunnel.yaml will not be triggered, and the default loaclfile will be used to avoid not enabling checkpoint but actually using the checkpoint storage configured in seatunnel.yaml. #9542

Just like the PR I closed last time #9543 , this time I resubmitted the PR after modification.

Does this PR introduce any user-facing change?

No

How was this patch tested?

yes,When running batch tasks locally, remove the task configuration checkpoint.interval, but the checkpoint storage configured in seatunnel.yaml is still hdfs. After the modification, the task runs normally.

Check list

@github-actions github-actions bot added the Zeta label Jul 8, 2025
@Hisoka-X
Copy link
Member

Hi @chestnut-c . Could you help to check the root reason of ClassNotFound before we consider how to fix it? #9542 (comment)

@chestnut-c
Copy link
Contributor Author

Hi @chestnut-c . Could you help to check the root reason of ClassNotFound before we consider how to fix it? #9542 (comment)

hi @Hisoka-X The purpose of this PR is to ask why checkpoint will be disabled, and the system will still create checkpoint Storage later. Maybe I didn't express it clearly. hahha

@Hisoka-X
Copy link
Member

Hisoka-X commented Jul 10, 2025

why checkpoint will be disabled, and the system will still create checkpoint Storage later.

When we disabled checkpoints, it is indeed unnecessary to create extra checkpoint storage. But re-examining this code,
we have actually created the checkpoint storage when the server starts.

Perhaps we should modify it like this:

  1. Remove the checkpoint storage creation every time the CheckpointManager is created, and reuse the created checkpoint storage.
  2. When a job does not have checkpoints enabled, skip reading data from the checkpoint storage, which can reduce unnecessary overhead.

    and

Regarding the logic of skipping checkpoint storage reading, for the sake of subsequent maintainability, we can inject a storage named empty into the checkpoint manager when the checkpoint is disabled, and all read and write operations will have no additional effect.

@chestnut-c
Copy link
Contributor Author

chestnut-c commented Jul 10, 2025

why checkpoint will be disabled, and the system will still create checkpoint Storage later.

When we disabled checkpoints, it is indeed unnecessary to create extra checkpoint storage. But re-examining this code, we have actually created the checkpoint storage when the server starts.

Perhaps we should modify it like this:

  1. Remove the checkpoint storage creation every time the CheckpointManager is created, and reuse the created checkpoint storage.

  2. When a job does not have checkpoints enabled, skip reading data from the checkpoint storage, which can reduce unnecessary overhead.

    and

Regarding the logic of skipping checkpoint storage reading, for the sake of subsequent maintainability, we can inject a storage named empty into the checkpoint manager when the checkpoint is disabled, and all read and write operations will have no additional effect.

@Hisoka-X OK, maybe I misunderstood. I think if it is not enabled, there is no need to trigger the relevant logic. Since this is the framework design, we can skip this issue. thx bro

@Hisoka-X
Copy link
Member

  1. Remove the checkpoint storage creation every time the CheckpointManager is created, and reuse the created checkpoint storage.

  2. When a job does not have checkpoints enabled, skip reading data from the checkpoint storage, which can reduce unnecessary overhead.

    and

Regarding the logic of skipping checkpoint storage reading, for the sake of subsequent maintainability, we can inject a storage named empty into the checkpoint manager when the checkpoint is disabled, and all read and write operations will have no additional effect.

@chestnut-c , Could you help to perform an optimization in this way?

@chestnut-c
Copy link
Contributor Author

chestnut-c commented Jul 14, 2025

When we disabled checkpoints, it is indeed unnecessary to create extra checkpoint storage. But re-examining this code,
we have actually created the checkpoint storage when the server starts.

ok , @Hisoka-X pls help me to review this again

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chestnut-c for update. I left a problem that needs improve.

Hisoka-X
Hisoka-X previously approved these changes Jul 16, 2025
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if ci passes. Thanks @chestnut-c

@Hisoka-X Hisoka-X changed the title [Hotfix][Zeta] Reset checkpoint storage when disabled checkpoint in batch mode [Fix][Zeta] Avoid redundant checkpoint reads when disabled checkpoint in batch mode Jul 16, 2025
@Hisoka-X Hisoka-X changed the title [Fix][Zeta] Avoid redundant checkpoint reads when disabled checkpoint in batch mode [Fix][Zeta] Avoid redundant checkpoint reads when disabled checkpoint Jul 16, 2025
…ss obtain Storage to directly reuse the CheckpointStorage created by the server for the first time
@chestnut-c
Copy link
Contributor Author

I found that sometimes some unit test cases in the CI pipeline would time out, sometimes fail, and sometimes succeed, which was very strange. @Hisoka-X

@nielifeng nielifeng requested a review from Copilot July 24, 2025 01:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request fixes an issue where checkpoint reads were occurring unnecessarily when checkpointing was disabled in batch tasks. The fix ensures that when checkpoint.interval is missing or checkpointEnable=false, the system avoids using the configured checkpoint storage and instead defaults to local file storage.

Key changes include:

  • Modifying checkpoint manager initialization to use a shared checkpoint storage service
  • Adding checkpoint enable checks before performing checkpoint operations
  • Creating comprehensive tests to verify the behavior when checkpoints are disabled

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
CheckpointService.java Added getter annotation to expose checkpoint storage
CheckpointManager.java Added checkpoint enable checks and refactored to accept checkpoint storage as parameter
JobMaster.java Modified to pass checkpoint storage from service during manager initialization
CheckpointStorageTest.java Added test case with mock storage to verify no redundant checkpoint reads
CheckpointManagerTest.java Updated constructor calls to include checkpoint storage parameter
CheckpointCoordinatorTest.java Updated constructor calls to include checkpoint storage parameter
batch_fake_to_console_without_checkpoint_interval.conf Added test configuration file without checkpoint interval
Comments suppressed due to low confidence (1)

seatunnel-engine/seatunnel-engine-server/src/test/java/org/apache/seatunnel/engine/server/checkpoint/CheckpointStorageTest.java:93

  • The await assertion on line 94 uses the same list reference 'savepoint1' that was populated before the await block. This test may pass incorrectly since the list content won't change during the wait period. Consider fetching a fresh list inside the assertion lambda.
        await().atMost(5000, TimeUnit.MILLISECONDS)

@chestnut-c
Copy link
Contributor Author

@Hisoka-X thanks , the unit case tests all is ok hahahha

Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chestnut-c

@chestnut-c
Copy link
Contributor Author

@corgy-w please help review.

@corgy-w corgy-w merged commit bfd70cd into apache:dev Jul 28, 2025
5 checks passed
LeonYoah added a commit to LeonYoah/seatunnel that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Zeta] 'checkpoint.interval' configuration of env is missing, checkpoint be disabled, but still create storage.

3 participants