-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-7254] Add rule for sharing trivially equivalent RelNodes within Combine #4658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
core/src/main/java/org/apache/calcite/rel/rules/CombineSimpleEquivalenceRule.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableCombine.java
Outdated
Show resolved
Hide resolved
b60609a to
25e3ae0
Compare
core/src/test/java/org/apache/calcite/test/enumerable/EnumerableCombineTest.java
Show resolved
Hide resolved
| // Combine both queries | ||
| return builder.combine(2).build(); | ||
| }) | ||
| .returnsUnordered( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that AssertQuery.returnsUnordered(String... lines) can return multiple lines. In JdbcTest, some examples use the format {column_name}={value}, such as returnsUnordered("DID2=1", "DID2=2"). I'm not sure if the current QUERY_0=xxx format is well-suited for future Quidem tests. Assuming that in the future I could use a setting like set enable_combine = true to allow multiple SQL statements to execute simultaneously, would we still want to obtain two separate result sets—just like when the two SQL statements are executed separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good questions, I've updated the PR so Combine now returns one row per input query. Each row holds an array of maps, where each map represents a row from that query with column names as keys.
We are constrained by JDBC's limitations of returning a single result set per query but this would allow iterating over individual query results with resultSet.next() and then iterating over each query's results via the array.
For example, combining:
SELECT name FROM depts;
and
SELECT empid, name, deptno FROM emps WHERE deptno = 10;
Yields:
-- Row 1
QUERY=[
{name=Sales},
{name=Marketing},
{name=HR}
],
-- Row 2
QUERY=[
{empid=100, name=Bill, deptno=10},
{empid=150, name=Sebastian, deptno=10} ,
{empid=110, name=Theodore, deptno=10}
]
Later, we should probably allow for aliases for queries so they don't have the generic QUERY column label but that may be more challenging (see the syntax for MULTI described in CALCITE-6188).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the very detailed explanation. I think the current format of the result presentation looks quite good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To later accommodate named queries I've changed the format once more. From the comment in EnumerableCombine:
The output format is a wide table where each column corresponds to a query (named QUERY_0, QUERY_1, etc.) and each row contains a struct with that query's column values for that row index. The number of output rows equals the maximum row count across all input queries. Queries with fewer rows have null values for the additional rows.
Example output for two queries:
QUERY_0 | QUERY_1
------------------------ | ------------------------
{empno=100, name=Bill} | {deptno=10, name=Sales}
{empno=110, name=Eric} | {deptno=20, name=HR}
{empno=120, name=Ted} | null
core/src/main/java/org/apache/calcite/rel/rules/CombineSimpleEquivalenceRule.java
Show resolved
Hide resolved
04a59dd to
95e2419
Compare
core/src/test/java/org/apache/calcite/test/enumerable/EnumerableCombineTest.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/rel/rules/CombineSimpleEquivalenceRule.java
Outdated
Show resolved
Hide resolved
xiedeyantu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good overall, just left a few minor comments.
| * @param queryLists array of lists, one per query | ||
| * @return list of Object arrays representing combined rows | ||
| */ | ||
| public static List<Object[]> combineQueryResults(List[] queryLists) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a small test for this method?
| // Verify both columns have values (no cross-contamination) | ||
| Object query0 = resultSet.getObject("QUERY_0"); | ||
| Object query1 = resultSet.getObject("QUERY_1"); | ||
| if (query0 == null || query1 == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use AssertThat for positive validation? I don't understand why the throw new AssertionError approach is used here.
|
There are still some minor issues in the CI that need to be fixed. |
| for (int rowIdx = 0; rowIdx < maxRows; rowIdx++) { | ||
| Object[] row = new Object[queryLists.length]; | ||
| for (int queryIdx = 0; queryIdx < queryLists.length; queryIdx++) { | ||
| List queryList = queryLists[queryIdx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @Nullable annotation is required here; otherwise CI will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you - that was it.
xiedeyantu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think we can squash all the commits, and I'll leave the rest of the merging to you.
|
Thank you so much for the reviews @xiedeyantu, much appreciated! |
|



CALCITE-7254
This is a first pass, but I’m not yet convinced it’s the right approach. We may need to gather potential shared sub-expressions in a dedicated context and defer the rewrite until later. At the moment, the suggester identifies candidate spools and performs the replacement directly within the rule, which may unintentionally prevent the optimizer from exploring cheaper plans.
Likewise, certain rules may prevent shareable components from being recognized. For example, filter pushdown can eliminate a join tree that would otherwise be reusable.