Skip to content

Conversation

@emolter
Copy link
Collaborator

@emolter emolter commented Jul 9, 2025

Resolves JP-4052

Closes #9613

This PR addresses errors noticed in ops for a large WFSS dataset. In short, extract_1d and combine_1d lacked graceful handling of all-NaN slit data and spectra, respectively. This PR adds those catches and allows the failing program to process successfully.

Tasks

  • If you have a specific reviewer in mind, tag them.
  • add a build milestone, i.e. Build 12.0 (use the latest build if not sure)
  • Does this PR change user-facing code / API? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see changelog readme for instructions)
    • update or add relevant tests
    • update relevant docstrings and / or docs/ page
    • start a regression test and include a link to the running job (click here for instructions)
      • Do truth files need to be updated ("okified")?
        • after the reviewer has approved these changes, run okify_regtests to update the truth files
  • if a JIRA ticket exists, make sure it is resolved properly

@emolter
Copy link
Collaborator Author

emolter commented Jul 9, 2025

@emolter emolter added this to the Build 12.1 milestone Jul 9, 2025
@codecov
Copy link

codecov bot commented Jul 9, 2025

Codecov Report

❌ Patch coverage is 0% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (af59781) to head (3f38a9f).
⚠️ Report is 439 commits behind head on main.

Files with missing lines Patch % Lines
jwst/pipeline/calwebb_spec3.py 0.00% 9 Missing ⚠️
jwst/combine_1d/combine1d.py 0.00% 8 Missing ⚠️
jwst/combine_1d/combine_1d_step.py 0.00% 7 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #9625   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files        367     367           
  Lines      37217   37286   +69     
=====================================
- Misses     37217   37286   +69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@emolter emolter marked this pull request as ready for review July 10, 2025 18:16
@emolter emolter requested review from a team as code owners July 10, 2025 18:16
@emolter
Copy link
Collaborator Author

emolter commented Jul 10, 2025

It's still not clear to me why there are slits being generated from calwebb_spec2 that are all-NaN and have unphysical source_xpos/source_ypos, but the present PR does allow the failing program to process successfully. Maybe a follow-up ticket is needed on the former?

Copy link
Collaborator

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix and test coverage look good to me.

If I'm interpreting Pey Lian's request correctly - I don't see an easy way to add a forward-looking xfail test here for the potential other problems in spec2 that are triggering the all-NaN condition in the first place. Whatever errors there may be would be in a different part of the code entirely.

Can you please rerun the regtests? The last posted ones got mixed up with the CRDS change for MIRI.

@emolter
Copy link
Collaborator Author

emolter commented Jul 10, 2025

restarting regtests https://github.com/spacetelescope/RegressionTests/actions/runs/16205746138

There seem to be some failures in x1d from spec2 that I didn't expect. Will look into what is causing those.

@melanieclarke
Copy link
Collaborator

From the regtest results, it looks like this is unexpectedly impacting NIRSpec MOS data, which did not previously crash in extraction. There are known situations where there are all-NaN slits for MOS, for sources that partially fall on a detector in a region where there are valid wavelengths defined but no valid flat data. Up to now, we have produced empty spectra for these slits -- this change would instead skip them.

I'm a little leery of introducing this change to existing products to the next RC. Maybe we should defer it to the next build after all? Or find a narrower fix that addresses the specific crash for WFSS?

@emolter
Copy link
Collaborator Author

emolter commented Jul 11, 2025

I believe there's an easy way to fix this. I don't think the skip clause in extract is strictly necessary to get the failing dataset to process through the pipeline - the failures reported in the ticket were in combine1d (or make_wfss_multicombined for the current dev version).

I added the extract skip just because I didn't see the point in processing an all-NaN slit, and that was also why I was suggesting to look into why they were being created in spec2, well upstream of the ops failure. But if these are expected for certain products, then it would make sense to me to catch the error in combine_1d only.

My one concern with getting that done today is that I won't have time to run the full asn through the pipeline again, since it takes ~24 hours on my computer. I created a smaller asn subset that fails on main which I'd assume to be representative, and would probably feel safe calling it good enough if that processed successfully.

@melanieclarke
Copy link
Collaborator

I believe there's an easy way to fix this. I don't think the skip clause in extract is strictly necessary to get the failing dataset to process through the pipeline - the failures reported in the ticket were in combine1d (or make_wfss_multicombined for the current dev version).

Okay, that makes sense. I did wonder how we had missed a crash in extract_1d for that condition.

I added the extract skip just because I didn't see the point in processing an all-NaN slit, and that was also why I was suggesting to look into why they were being created in spec2, well upstream of the ops failure. But if these are expected for certain products, then it would make sense to me to catch the error in combine_1d only.

I also don't think there's much point in processing an all-NaN slit, but I think INS should weigh in. Can you file a follow up ticket, for next build?

My one concern with getting that done today is that I won't have time to run the full asn through the pipeline again, since it takes ~24 hours on my computer. I created a smaller asn subset that fails on main which I'd assume to be representative, and would probably feel safe calling it good enough if that processed successfully.

That sounds safe enough to me too -- the combine1d fix alone should do no harm at least, since it is currently crashing in that case. We can test the full data set next week, and fix any additional problems in later releases.

@emolter
Copy link
Collaborator Author

emolter commented Jul 11, 2025

Ticket filed here, feel free to edit as you see fit https://jira.stsci.edu/browse/JP-4058

@emolter
Copy link
Collaborator Author

emolter commented Jul 11, 2025

Copy link
Collaborator

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks, Ned.

@emolter emolter merged commit cfa58c7 into spacetelescope:main Jul 11, 2025
28 checks passed
@emolter emolter deleted the JP-4052 branch July 11, 2025 16:39
@tapastro tapastro modified the milestones: Build 12.1, 12.0.1 Jul 11, 2025
@tapastro
Copy link
Contributor

@meeseeksdev backport to release/1.19.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

B11.3.1 OPS: error processing WFSS data in spec3

4 participants