Skip to content

Conversation

@emolter
Copy link
Collaborator

@emolter emolter commented Jan 7, 2025

Resolves JP-1649

Closes #5262

This PR adds the option to pass in user-defined sky levels to the skymatch step.

Tasks

  • request a review from someone specific, to avoid making the maintainers review every PR
  • add a build milestone, i.e. Build 11.3 (use the latest build if not sure)
  • Does this PR change user-facing code / API? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • update or add relevant tests
    • update relevant docstrings and / or docs/ page
    • start a regression test and include a link to the running job (click here for instructions)
      • Do truth files need to be updated ("okified")?
        • after the reviewer has approved these changes, run okify_regtests to update the truth files
  • if a JIRA ticket exists, make sure it is resolved properly
news fragment change types...
  • changes/<PR#>.general.rst: infrastructure or miscellaneous change
  • changes/<PR#>.docs.rst
  • changes/<PR#>.stpipe.rst
  • changes/<PR#>.datamodels.rst
  • changes/<PR#>.scripts.rst
  • changes/<PR#>.fits_generator.rst
  • changes/<PR#>.set_telescope_pointing.rst
  • changes/<PR#>.pipeline.rst

stage 1

  • changes/<PR#>.group_scale.rst
  • changes/<PR#>.dq_init.rst
  • changes/<PR#>.emicorr.rst
  • changes/<PR#>.saturation.rst
  • changes/<PR#>.ipc.rst
  • changes/<PR#>.firstframe.rst
  • changes/<PR#>.lastframe.rst
  • changes/<PR#>.reset.rst
  • changes/<PR#>.superbias.rst
  • changes/<PR#>.refpix.rst
  • changes/<PR#>.linearity.rst
  • changes/<PR#>.rscd.rst
  • changes/<PR#>.persistence.rst
  • changes/<PR#>.dark_current.rst
  • changes/<PR#>.charge_migration.rst
  • changes/<PR#>.jump.rst
  • changes/<PR#>.clean_flicker_noise.rst
  • changes/<PR#>.ramp_fitting.rst
  • changes/<PR#>.gain_scale.rst

stage 2

  • changes/<PR#>.assign_wcs.rst
  • changes/<PR#>.badpix_selfcal.rst
  • changes/<PR#>.msaflagopen.rst
  • changes/<PR#>.nsclean.rst
  • changes/<PR#>.imprint.rst
  • changes/<PR#>.background.rst
  • changes/<PR#>.extract_2d.rst
  • changes/<PR#>.master_background.rst
  • changes/<PR#>.wavecorr.rst
  • changes/<PR#>.srctype.rst
  • changes/<PR#>.straylight.rst
  • changes/<PR#>.wfss_contam.rst
  • changes/<PR#>.flatfield.rst
  • changes/<PR#>.fringe.rst
  • changes/<PR#>.pathloss.rst
  • changes/<PR#>.barshadow.rst
  • changes/<PR#>.photom.rst
  • changes/<PR#>.pixel_replace.rst
  • changes/<PR#>.resample_spec.rst
  • changes/<PR#>.residual_fringe.rst
  • changes/<PR#>.cube_build.rst
  • changes/<PR#>.extract_1d.rst
  • changes/<PR#>.resample.rst

stage 3

  • changes/<PR#>.assign_mtwcs.rst
  • changes/<PR#>.mrs_imatch.rst
  • changes/<PR#>.tweakreg.rst
  • changes/<PR#>.skymatch.rst
  • changes/<PR#>.exp_to_source.rst
  • changes/<PR#>.outlier_detection.rst
  • changes/<PR#>.tso_photometry.rst
  • changes/<PR#>.stack_refs.rst
  • changes/<PR#>.align_refs.rst
  • changes/<PR#>.klip.rst
  • changes/<PR#>.spectral_leak.rst
  • changes/<PR#>.source_catalog.rst
  • changes/<PR#>.combine_1d.rst
  • changes/<PR#>.ami.rst

other

  • changes/<PR#>.wfs_combine.rst
  • changes/<PR#>.white_light.rst
  • changes/<PR#>.cube_skymatch.rst
  • changes/<PR#>.engdb_tools.rst
  • changes/<PR#>.guider_cds.rst

@codecov
Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 97.77778% with 1 line in your changes missing coverage. Please review.

Project coverage is 76.94%. Comparing base (24cfbcb) to head (6ef5998).
Report is 871 commits behind head on main.

Files with missing lines Patch % Lines
jwst/skymatch/skymatch_step.py 97.61% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9053      +/-   ##
==========================================
+ Coverage   76.92%   76.94%   +0.01%     
==========================================
  Files         498      498              
  Lines       45810    45844      +34     
==========================================
+ Hits        35240    35273      +33     
- Misses      10570    10571       +1     
Flag Coverage Δ *Carryforward flag
nightly 77.38% <ø> (-0.01%) ⬇️ Carriedforward from a4eaadf

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@emolter emolter added this to the Build 11.3 milestone Jan 7, 2025
@emolter emolter requested review from drlaw1558 and goudfroo January 7, 2025 17:46
@emolter emolter marked this pull request as ready for review January 7, 2025 17:50
@emolter emolter requested a review from a team as a code owner January 7, 2025 17:50
@emolter
Copy link
Collaborator Author

emolter commented Jan 7, 2025

initial round of regression tests started here: https://github.com/spacetelescope/RegressionTests/actions/runs/12656574738

edit: all passing

@mcara
Copy link
Member

mcara commented Jan 7, 2025

  1. I think user-supplied sky values could be provided the same way custom group_id or tweakreg_catalog are currently provided in the ASN. This way all these values are bundled with the corresponding input cal file, which, IMO is more robust than a separate skylist.
  2. I am not sure this PR is even needed as currently users could simply edit input cal files and set:
cal_model.meta.background.method = "user"  # optional
cal_model.meta.background.level = custom_sky_value
cal_model.meta.background.subtracted = False

Then save the model and disable sky subtraction step.

Actually, you could add sky_subtracted word as well so that users could indicate whether the sky value has been subtracted.

@emolter
Copy link
Collaborator Author

emolter commented Jan 8, 2025

Thanks for the suggestions Mihai. There's a relevant discussion on the JIRA ticket: https://jira.stsci.edu/browse/JP-1649

This PR follows David Law's suggestion at the bottom: add a "user" option and a parameter encoding a list of sky levels. Let's call that suggestion (0).
Your suggestion (1) of including this in the association was also raised by Paul, and your suggestion (2) of just skipping skymatch after editing the cal files is similar to the workaround Paul ended up using.

I think ultimately any of 0, 1, or 2 would work, but they should be documented, including (especially?) if (2) ends up being our suggested workflow.

Wondering if @tapastro and @melanieclarke have opinions on this.

Copy link
Collaborator

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, since it's just a list of floating point values, it's most easily and directly provided via a parameter, as implemented here. Option 0 sounds like the most user friendly to me.

@mcara
Copy link
Member

mcara commented Jan 8, 2025

I think, since it's just a list of floating point values, it's most easily and directly provided via a parameter, as implemented here. Option 0 sounds like the most user friendly to me.

There are several problems with this approach:

  1. When users will eventually want to add custom regions for sky matching, we will introduce a new parameter: skyreglist, and then another parameter and so on.
  2. As I mentioned above, one issue is that a separate list can be or become out of order with the input image list. For example, take a look at this code:
    for group_index, (group_id, group_inds) in enumerate(library.group_indices.items()):
    sky_images = []
    for index in group_inds:
    model = library.borrow(index)
    try:
    sky_images.append(self._imodel2skyim(model, index))
    finally:
    library.shelve(model, index, modify=False)
    if len(sky_images) == 1:
    images.extend(sky_images)
    else:
    images.append(SkyGroup(sky_images, id=group_index))

    Let's assume input images belong to different groups and are not sorted by the group ID:
    [image1_g1, image2_g2, image3_g1, image4_g2] with sky values: [sky1, sky2, sky3, sky4]
    
    Then the code above will create two sky groups: SkyGroup1 (with images 1 and 3) and SkyGroup2 (with images 2 and 4). After flattening these groups in
    # unpack groups into individual images. assumption is that user-defined background values
    # have been figured out on a per-model basis, agnostic of group membership.
    images_flattened = []
    for im in images:
    if isinstance(im, SkyGroup):
    images_flattened.extend(im)
    else:
    images_flattened.append(im)
    the order will be: [image1_g1, image3_g1, image2_g2, image4_g2]. This does not correspond to the user-provided order for sky values: [sky1, sky2, sky3, sky4]. To make things worse, it will be difficult for the users to even know they need to check the log file to detect this kind of errors.
  3. Even if we consider that the previous issue could be solved - see, for example, the skyfile parameter in the drizzlepac - creation of a SkyImage or SkyGroup is expensive: it involves creating spherical polygons for the outlines of each image (and for groups, computing the union of these regions). Why do all of this? The purpose of skymatch is to figure out background values and the culmination of this process is in setting computed values in input model's meta:
    dm.meta.background.method = str(self.skymethod)
    dm.meta.background.level = sky
    dm.meta.background.subtracted = self.subtract
    if self.subtract:
    dm.data[...] = sky_image.image[...]
    Why not leave skymatch alone and simply in the Step class read sky values and update meta and return. There is nothing else to do here. In fact, I am not even sure this belongs in this step as there is no computation performed.

@drlaw1558
Copy link
Contributor

Jumping in to clarify quickly; the aim of this change is to make it simpler for users to handle the case where backgrounds are complex and hard to match. Specifying particular regions for matching seems like a rabbit hole I'd be reluctant to explore, as it could become arbitrarily complicated. Rather, the intent is for users to measure offset values in whatever complex manner they want outside the pipeline, and for us to then give a simple way within the pipeline of applying those values (rather than hacking cal files). Might a two-element list of [filename, value] deal with the ordering issue?

@emolter
Copy link
Collaborator Author

emolter commented Jan 8, 2025

Melanie and I discussed this more at our stand-up today. I think we agree with Mihai that it might be good to avoid touching the skymatch() (formerly match()) routine itself, and we agree with David that the present UI is the best solution given that the "minimum competency" for a user is to change pipeline parameters and not necessarily to directly modify datamodels. So I will work on a way to move the fix into the top-level Step while retaining the current UI. David, I'll use the [filename, value] syntax if I find it's unavoidable (or preferred) to do it that way.

@mcara
Copy link
Member

mcara commented Jan 8, 2025

Might a two-element list of [filename, value] deal with the ordering issue?

Yes, it might. That is exactly how it works in the drizzlepac. But wouldn't it be just as easy to open asn table in an editor and add sky values to each member? We already have the machinery to handle this for group_id and tweakreg_catalog. Why not support 1-2 new keywords?

@mcara
Copy link
Member

mcara commented Jan 8, 2025

@emolter
Copy link
Collaborator Author

emolter commented Jan 8, 2025

But wouldn't it be just as easy to open asn table in an editor and add sky values to each member?

I think the concern is, no, it would not be just as easy. It would require a user to understand the structure of an association file. While I agree that it isn't particularly complicated to change, the expectation is usually that the asn files will "just work".

With regards to adding 1-2 new keywords to the asn, it could be just as easily argued that there's no harm adding 1-2 new input parameters to skymatch. Plus, step parameters are typically easier for a user to find in the documentation.

@mcara
Copy link
Member

mcara commented Jan 8, 2025

Actually, for tweakreg there is already such a parameter: catfile. However, tweakreg supports both custom catalogs via this argument and ASN tables.

Copy link
Member

@mcara mcara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that skylist is actually a list and not a text file. I wonder how will this work for very long lists, like 100s. One has to be extremely careful not to swap the order of values or files. If one decides to edit the list of input files and remove a (==one) problematic image from the ASN, how easy it will be to edit this skylist. I simply think this approach is too error-prone.

Personally, I do not like this approach but ultimately it is up to you, @drlaw1558 and @melanieclarke @tapastro . But I just cannot approve this in the current form.

I think for the very least skylist could be (or could be allowed to be) a text file with two columns: filename, skyval.

@tapastro
Copy link
Contributor

The current implementation doesn't allow providing a list using strun - some parsing is required. It may also be simpler to provide a text file for a command-line argument instead the complete list, as Mihai suggests.

Copy link
Contributor

@kmacdonald-stsci kmacdonald-stsci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and I have nothing more to add than what Mihai commented.

@emolter
Copy link
Collaborator Author

emolter commented Jan 13, 2025

Newest version of the code requires a filename containing a two-column list of (filename, skyval) pairs. How does this look to everyone @mcara @tapastro ?

@melanieclarke
Copy link
Collaborator

Newest version of the code requires a filename containing a two-column list of (filename, skyval) pairs. How does this look to everyone @mcara @tapastro ?

For a similar change for tweaking offsets in cube building, we decided to go with an ASDF file as input, listing filenames, RA, and Dec offsets. For consistency, perhaps we should do that here too?

@emolter
Copy link
Collaborator Author

emolter commented Jan 13, 2025

For a similar change for tweaking offsets in cube building, we decided to go with an ASDF file as input, listing filenames, RA, and Dec offsets. For consistency, perhaps we should do that here too?

There are other places in the code (e.g., custom catalog in tweakreg) that require a text file. So regardless of the format it would be consistent with some other steps and inconsistent with others. Happy to make it ASDF if that's what you want to do though.

@melanieclarke
Copy link
Collaborator

Happy to make it ASDF if that's what you want to do though.

I don't have a strong preference -- I think both approaches have drawbacks. It might be useful to consider in the future a more general policy on how we take input for pipeline steps like this.

For the filename matching, if the step is run in the middle of a pipeline, which filename is it looking for? The suffix from the step immediately prior to skymatch, or input file suffix?

@emolter
Copy link
Collaborator Author

emolter commented Jan 13, 2025

For the filename matching, if the step is run in the middle of a pipeline, which filename is it looking for? The suffix from the step immediately prior to skymatch, or input file suffix?

Right now it looks for model.meta.filename which I don't think should be changed through the steps (?)

@mcara
Copy link
Member

mcara commented Jan 13, 2025

For a similar change for tweaking offsets in cube building, we decided to go with an ASDF file as input, listing filenames, RA, and Dec offsets. For consistency, perhaps we should do that here too?

Is ASDF file easier for the users than a standard text file? What is the advantage? I would have to look at cube builder to see an example, I guess.

Why not just do it in ASN (and skyfile as well, why not)? In tweakreg step catalogs can be provided either via ASN or a simple text file. Wouldn't ASDF make the user learn the structure of the ASDF files?

@melanieclarke
Copy link
Collaborator

Is ASDF file easier for the users than a standard text file? What is the advantage? I would have to look at cube builder to see an example, I guess.

Why not just do it in ASN (and skyfile as well, why not)? In tweakreg step catalogs can be provided either via ASN or a simple text file. Wouldn't ASDF make the user learn the structure of the ASDF files?

Yes, ASDF is more complex and requires more work on the user's part. The advantage is that you can store more complicated structures and that you can validate it. I'm not advocating for it here, but I do think we should start thinking about how we manage input from users more consistently across the pipeline steps.

I have no problem with supporting sky values in the ASN file as well, especially if we are using tweakreg as a model for how users provide override inputs in image3.

@emolter
Copy link
Collaborator Author

emolter commented Jan 14, 2025

I have no problem with supporting sky values in the ASN file as well, especially if we are using tweakreg as a model for how users provide override inputs in image3.

I'm not sure I see the advantage of supporting two different ways to define the same background levels. If we allow this to be passed as a parameter or as part of the asn, then we need to:

  • test and maintain both
  • document both
  • define which should take precedence (or if a failure should occur) when both are defined at the same time

I already find the way tweakreg does the sky catalog to be perhaps slightly overcomplicated.

I also don't quite see the appeal of modifying asn files after they are created in order to do this. Yes, it's a way to keep the filename and the sky level next to each other, but so is a two-column text file and in my experience it's easier to create the latter.

If that's what is desired by the various stakeholders I'll happily code it or modify what Mihai already sent earlier, but the original suggestion on the ticket of how this should be implemented is closer to the current solution than to the pass-as-asn solution.

@emolter
Copy link
Collaborator Author

emolter commented Jan 14, 2025

@melanieclarke
Copy link
Collaborator

melanieclarke commented Jan 14, 2025

If that's what is desired by the various stakeholders I'll happily code it or modify what Mihai already sent earlier, but the original suggestion on the ticket of how this should be implemented is closer to the current solution than to the pass-as-asn solution.

I agree that implementing pass-as-asn goes a bit beyond what is currently requested. I suggest we go with the text file solution for now, and I will file a ticket to examine how we take user input for pipeline steps that does not fit neatly into simple parameters.

Update: ticket is here: https://jira.stsci.edu/browse/JP-3850

Copy link
Collaborator

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go now. Thanks for all the useful discussions, everyone, and for all the updates, Ned.

@melanieclarke melanieclarke merged commit 242a9dd into spacetelescope:main Jan 15, 2025
31 checks passed
@emolter emolter deleted the JP-1649 branch January 15, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

skymatch would be improved by offering an option to input user-defined sky values

6 participants