JP-3930: Step function for opening models #9723

melanieclarke · 2025-08-04T21:35:33Z

Partially resolves JP-3930

Continuing work on making sure input models are not modified by Steps and following up on #9709, which introduced some new data copies for some outlier_detection use cases.

For outlier_detection, I added an internal function that decided from the input whether a copy was needed, and returned an open model. I realized something similar was needed for skymatch and tweakreg, so I'm proposing here to add a Step.prepare_output function to more generally support this use case, and expand its use to the other image3 steps that expect ModelLibrary input.

This is drawing on @braingram's suggestion here: #8588 (comment)
Unlike the proposed function in that PR, though, this one would explicitly support the expectation that input models are not modified after they are provided to a step or pipeline. Also unlike that function, this is not meant to be used as with self.open_model(input_data) as model: at the top of every step, because it will vary by use case whether the open model should be closed within the step.

Since the Step function can check whether it is part of a pipeline, I added one extra clause to the implementation from the OutlierDetectionStep: if self.parent is not None, copies are not made. This ends up gaining back the memory costs introduced in #9709.

I think we could potentially expand the use of this function to the remaining Steps, and use it to be more intentional about when we copy input data and when we close the input models. If we don't make shallow copies with with datamodels.open(model), and we do make deep copies when needed, then the only time the opened model should need to be closed within a step is if the output is a completely new model generated within the step (not a modified version of the input model).

See also the discussion here: spacetelescope/stdatamodels#512

Tasks

melanieclarke · 2025-08-04T21:38:03Z

Initial regtests here:
https://github.com/spacetelescope/RegressionTests/actions/runs/16731840817

Comparing to regtests for #9709, performance is restored to what it was before the change in all cases.

melanieclarke · 2025-08-04T21:55:15Z

@emolter @braingram @tapastro @penaguerrero - I'd like your thoughts on this proposed change to Step.open_model, when you have a chance. I'll leave it at draft for now.

codecov · 2025-08-04T22:05:43Z

Codecov Report

❌ Patch coverage is 95.34884% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.43%. Comparing base (94c0254) to head (ad36c6c).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
jwst/datamodels/container.py	0.00%	1 Missing ⚠️
jwst/tweakreg/tweakreg_step.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9723      +/-   ##
==========================================
+ Coverage   83.38%   83.43%   +0.05%     
==========================================
  Files         366      366              
  Lines       37770    37784      +14     
==========================================
+ Hits        31493    31525      +32     
+ Misses       6277     6259      -18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

emolter · 2025-08-05T14:23:02Z

Thanks for continuing to work on this Melanie! Overall I think something like this would be very beneficial. To re-state what I view as the design requirements,

Steps or Pipelines should never modify their user-specified input model. This requires making a copy sometimes.
Steps or Pipelines should avoid making copies whenever possible.
All datamodels need to be closed upon exiting a Pipeline, or a Step that was called standalone, if they were not open to begin with.

I think your proposal achieves all of these, but I'm still trying to work through when/if closing needs to happen. My expectations would be as follows:
If I do

fname = "foo.fits"
result = ResampleStep.call(fname)

my expectation would be that result is an open datamodel but it's the only open datamodel. But if I do

model = dm.open(fname)
result = ResampleStep.call(model)

now I expect model and result to both be open datamodels. If I do

result = Image3Pipeline.call(fname)

I would expect that the individual Steps within the pipeline will keep the model open when passed between them, to avoid having to close and re-open it.

Finally, if I call the pipeline from the command-line, I'd expect result to also close itself at the end.

Can you say a bit more about how this works?

melanieclarke · 2025-08-05T14:50:11Z

Thanks for taking a look, Ned. I agree with all your requirements and expectations, except I might restate # 3 as:
"All datamodels opened within a Pipeline, or a Step that was called standalone, need to be closed upon exiting if they were not open to begin with, unless they are returned as the result of the step." That is, open input models should stay open and result models should be returned open.

The way I would envision this working in the future is:

A Step calls self.open_model on its input: output_model = self.open_model(input_data)
If the model type is not changed by the step (e.g. flat correction), it proceeds to modify output_model as needed, and return it. No need for explicit copies, and no need to close anything.
- If input_data was a model and the step is called standalone, it is still open and unmodified since a copy was made. The output_model is returned open.
- If input_data was a model and the step is called as part of a pipeline, output_model is input_data and it remains open, but is modified in place.
- If input_data was a file, only one model is opened, and it is returned as output_model.
If a new model is generated by the step (e.g. resampling), it creates the new model with reference to output_model. Before returning, if output_model is input_data, leave it open; otherwise, close it because it is no longer needed. Return the new model as the result.
- If input_data was a model and the step is called standalone, output_model is a copy. This is usually a necessary copy, because if the step is skipped, the status is set in output_model and it is returned open, as the result. If the step completes, the output_model copy is closed before returning and only the new model and the input_data stay open.
- If input_data was a model and the step is called as part of a pipeline, output_model is input_data and it remains open. If the calling pipeline no longer needs input_data after the step completes, it can close it.
- If input_data was a file, output_model is opened from it, but closed before returning, so only the new model is open after the step completes.

penaguerrero · 2025-08-05T18:49:46Z

Thanks for this work, Melanie! I think the code you propose is a good generalization for both steps that take regular models as well as steps that take a model container or library as input, and I agree that it is better to decide to make this copy or not at a higher level than the step. This should improve the memory footprint overall in all pipelines! When I did the test for run for Detector1 in #8588 (comment), the memory footprint in my laptop for that heavy 2.6 GB uncal MIRI file went down from about 60 GB to about 40 GB. I am curious to see the changes with your proposal.

braingram

Thanks for putting this together. I left some inline comments and overall the approach looks good to me.

jwst/stpipe/core.py

jwst/tweakreg/tweakreg_step.py

jwst/stpipe/core.py

melanieclarke · 2025-08-06T19:42:22Z

It sounds like everyone so far generally likes this approach, so I will work on adding some more tests to verify the prepare_outputs function in different use cases. I'll still leave it in draft until Tyler has a chance to weigh in.

Edit: tests added in bcd0064

penaguerrero · 2025-08-07T15:27:48Z

I remembered that when I worked on this for all the steps of detector 1, I made a copy when necessary and then immediately deleted the original object. This reduced memory quite a bit.

melanieclarke · 2025-08-07T15:38:28Z

I remembered that when I worked on this for all the steps of detector 1, I made a copy when necessary and then immediately deleted the original object. This reduced memory quite a bit.

I think that won't work in general, because it would violate the user assumption that if they provide a datamodel to the step, it will still exist after the step completes. I'm hoping, though, that we can get similar savings by not making copies when it's not needed. We'll also definitely need to give some thought to places where we can usefully delete objects after they're no longer needed, especially within the pipelines, which often have more knowledge about which objects are needed than the steps do.

melanieclarke · 2025-09-15T15:06:05Z

@tapastro - I'm taking this PR out of draft because it would be nice to get into this build if possible. It restores performance to previous benchmarks for some stage 3 use cases.

melanieclarke · 2025-09-15T15:06:51Z

Updated regtests:
https://github.com/spacetelescope/RegressionTests/actions/runs/17737667818

emolter · 2025-09-15T17:10:03Z

The new test test_ngroup_1 is failing on the regtest run. It looks like just a network blip but since the test is new it'd be nice to be sure, would you mind rerunning those?

Would it be helpful for me to re-review this?

melanieclarke · 2025-09-15T17:14:50Z

Would it be helpful for me to re-review this?

Yes please! It's been a minute, so would be very helpful to have you recheck it.

I'll rerun the regtests.

Edit for regtest results: test_ngroup_1 is passing, but there are unrelated failures from a MIRI CRDS delivery. I'll run again when those are okified.

Edit again: all passing now.

emolter

Thanks for adding the detailed test suite! This looks good to me - regtests for memory usage were helpful to determine that this is doing what we expect.

jwst/assign_mtwcs/tests/test_mtwcs.py

jwst/stpipe/tests/test_prepare_output.py

github-actions bot added outlier_detection skymatch tweakreg stpipe datamodels testing assign_mtwcs labels Aug 4, 2025

melanieclarke mentioned this pull request Aug 5, 2025

JP-3930: Make sure input is not modified inside Steps #9725

Merged

10 tasks

braingram reviewed Aug 6, 2025

View reviewed changes

jwst/stpipe/core.py Outdated Show resolved Hide resolved

jwst/tweakreg/tweakreg_step.py Outdated Show resolved Hide resolved

jwst/stpipe/core.py Outdated Show resolved Hide resolved

melanieclarke force-pushed the open_model branch from ec06155 to 82e1d45 Compare August 6, 2025 16:22

melanieclarke changed the title ~~WIP JP-3930: Step function for opening models~~ JP-3930: Step function for opening models Aug 7, 2025

melanieclarke added this to the Build 12.1 milestone Aug 7, 2025

melanieclarke force-pushed the open_model branch from 4f19309 to bfff125 Compare August 8, 2025 17:08

melanieclarke force-pushed the open_model branch from c22e031 to 56bd36c Compare September 15, 2025 15:04

melanieclarke marked this pull request as ready for review September 15, 2025 15:04

melanieclarke requested review from a team as code owners September 15, 2025 15:04

melanieclarke requested a review from tapastro September 15, 2025 15:07

emolter approved these changes Sep 16, 2025

View reviewed changes

jwst/assign_mtwcs/tests/test_mtwcs.py Show resolved Hide resolved

jwst/stpipe/tests/test_prepare_output.py Outdated Show resolved Hide resolved

melanieclarke added 10 commits September 16, 2025 11:48

Fix attribute type error (must be str, not path)

20e8b4d

Modify Step.open_model to make copies as needed

b5ff6d1

Use Step.open_model to ensure input models are not modified

0f350ab

Rename to prepare_output; fix in_memory=False cases

9fad079

Add options to skip opening or copying models if needed

0a23a6e

Tests for prepare_output

190eeb2

Add change log

222c4e6

Improve exception handling for input in assign_mtwcs

47ca509

Minor clean up

1a634ca

Use model.hasattr() instead of hasattr()

ad36c6c

melanieclarke force-pushed the open_model branch from 4042a65 to ad36c6c Compare September 16, 2025 15:48

tapastro approved these changes Sep 16, 2025

View reviewed changes

tapastro merged commit f76fe12 into spacetelescope:main Sep 16, 2025
28 checks passed

melanieclarke deleted the open_model branch September 16, 2025 18:20

stscijgbot-jp mentioned this pull request Sep 19, 2025

Regression test for each step should test that result is not input #3113

Closed

JP-3930: Step function for opening models #9723

JP-3930: Step function for opening models #9723

Uh oh!

Conversation

melanieclarke commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tasks

Uh oh!

melanieclarke commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melanieclarke commented Aug 4, 2025

Uh oh!

codecov bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

emolter commented Aug 5, 2025

Uh oh!

melanieclarke commented Aug 5, 2025

Uh oh!

penaguerrero commented Aug 5, 2025

Uh oh!

braingram left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

melanieclarke commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

penaguerrero commented Aug 7, 2025

Uh oh!

melanieclarke commented Aug 7, 2025

Uh oh!

melanieclarke commented Sep 15, 2025

Uh oh!

melanieclarke commented Sep 15, 2025

Uh oh!

emolter commented Sep 15, 2025

Uh oh!

melanieclarke commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emolter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

melanieclarke commented Aug 4, 2025 •

edited

Loading

melanieclarke commented Aug 4, 2025 •

edited

Loading

codecov bot commented Aug 4, 2025 •

edited

Loading

melanieclarke commented Aug 6, 2025 •

edited

Loading

melanieclarke commented Sep 15, 2025 •

edited

Loading