Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Dec 6, 2025

This is a nice improvement because it:

  1. Makes the PhysicalExprAdapter trait that users might need to implement simpler (less boilerplate for users).
  2. Decouples these two transformations so that we can replace partition values and then apply a projection without having to also do the schema mapping (it would be from the logical schema to the logical schema, confusing and a waste of compute). I ran into this need in Push down projection expressions into ParquetOpener #19111. I think there may be other ways of doing it (e.g. piping in the expected output schema from ParquetSource) but it felt nicer this way and I expect other places may also need the decoupled transformation.
  3. I think we can use it in the future to implement Replace constant columns with literals #19089.
  4. It's less lines of code 😄

@github-actions github-actions bot added core Core DataFusion crate datasource Changes to the datasource crate labels Dec 6, 2025
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Dec 6, 2025
@adriangb adriangb changed the title Split out partition column handling from PhysicalExprAdapter trait Clean up PhysicalExprAdapter and PhysicalExprSimplifier Dec 6, 2025
&mut self,
expr: Arc<dyn PhysicalExpr>,
) -> Result<Arc<dyn PhysicalExpr>> {
pub fn simplify(&self, expr: Arc<dyn PhysicalExpr>) -> Result<Arc<dyn PhysicalExpr>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is backwards compatible (aside from compiler warnings that the variable doesn't need to be mutable) and means we can keep multiple references, arc it, etc.

@adriangb
Copy link
Contributor Author

adriangb commented Dec 6, 2025

This is blocked by one of #19129 or #19130, I think either should make it work. Currently some tests fail because we can't prune based on 1 = 2.

As a follow up to this we should be able to delete PartitionPruningStatistics and hence remove CompositePruningStatistics and simplify FilePruner. They'll need to go through the deprecation process, and we'll need to document that you should call replace_columns_with_literals instead.

@github-actions github-actions bot added the common Related to common crate label Dec 6, 2025
@adriangb
Copy link
Contributor Author

adriangb commented Dec 6, 2025

I've merged #19129 into this branch to get CI passing. I'll fix the conflicts once we merge that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant