Skip to content

Unify casting logic #17114

@adriangb

Description

@adriangb

While working on #16589 (comment) we came to the realization that there is now 2 paths of casting / adaptation logic:

  1. SchemaAdapter which now supports nested structs as of Add nested struct casting support and integrate into SchemaAdapter #16371
  2. The Cast expr (i.e. select 1::text in SQL or implicit casts) which uses the arrow cast kernel which does not support nested structs and such

It would be good to unify these.

There was discussion of this very point in apache/arrow-rs#7176 and one thing that came up was to have arrow develop some sort of SchemaAdapter for itself.

One of the important issues to consider here in terms of performance, and maybe something to have a broader discussion on, is that one of the advantages of SchemaAdapter is that it can pre-compute the work to do be done and then avoid any sort of introspection in the hot path. This is not possible with a PhysicalExpr.

Thus I would like to propose the following rough course of action:

  1. Unify the code paths, this can be something as naive as dynamically building a SchemaAdapter each time a Cast PhysicalExpr gets called or could be something like refactoring the code to be shared.
  2. Think about some sort of PhysicalExpr::optimize(inputs) that can in this case pre-compute the needed casts and build efficient data structures to apply those in a loop. I think this could benefit a lot of other expressions as well that need to do prep work for each execution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PROPOSAL EPICA proposal being discussed that is not yet fully underway

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions