Skip to content

Support rank function in TopK optimization of window function#17210

Open
Sh-Zh-7 wants to merge 4 commits intomasterfrom
feat/szh/support_dense_in_window_optim
Open

Support rank function in TopK optimization of window function#17210
Sh-Zh-7 wants to merge 4 commits intomasterfrom
feat/szh/support_dense_in_window_optim

Conversation

@Sh-Zh-7
Copy link
Contributor

@Sh-Zh-7 Sh-Zh-7 commented Feb 21, 2026

Previous TopK optimization only support row_number window function, this PR add rank window function as well.

This PR also stop sort elimination when using TopKRankingOperator and RowNumberOperator since they may accept out of order data.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request extends the TopK optimization for window functions to support the rank() function in addition to the previously supported row_number() function. The key difference between these functions is that rank() assigns the same rank value to rows with equal sort key values (peers), while row_number() assigns unique sequential numbers to all rows.

Changes:

  • Implemented rank() window function support in TopK optimization with proper handling of tied values (peer groups)
  • Prevented sort elimination and sort-to-stream-sort transformations for TopKRanking and RowNumber operators since they may accept out-of-order data
  • Added comprehensive test coverage for rank() functionality including edge cases with ties and multiple partitions

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
WindowFunctionOptimizationTest.java Updated test queries from row_number() to rank() to verify TopK optimization works with rank function
TopKRankingOperatorTest.java New comprehensive unit test file with 12 test cases covering rank() behavior with ties, partitions, and edge cases
IoTDBWindowFunction3IT.java Added integration tests for rank() including filter pushdown, basic rank computation, and tie handling
TopKRankingOperator.java Enabled RANK ranking type by instantiating GroupedTopNRankBuilder with appropriate comparator and equals/hash strategy
GroupedTopNRankBuilder.java New builder class that manages rank computation using heap-based accumulator and peer group detection
GroupedTopNRankAccumulator.java Core accumulator implementing rank logic with max-heap for efficient top-N selection and peer group management
TsBlockWithPositionEqualsAndHash.java New interface for equality and hash computation needed for peer group detection in rank()
SimpleTsBlockWithPositionEqualsAndHash.java Implementation of equals/hash interface supporting all TSDataType values with proper null handling
TopNPeerGroupLookup.java Specialized hash table for efficient peer group lookup using composite (groupId, rowId) keys
RowIdComparisonHashStrategy.java Combined interface merging comparison and hashing strategies with default equals implementation
TransformSortToStreamSort.java Added visitors to prevent sort-to-stream-sort transformation for TopKRanking and RowNumber nodes
SortElimination.java Added visitors to prevent sort elimination for TopKRanking and RowNumber nodes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 74 to 93
TsBlock newPage,
int groupCount,
int[] groupIds,
TsBlockWithPositionComparator comparator,
RowReferenceTsBlockManager pageManager) {
int currentGroups = groupIdToHeapBuffer.getTotalGroups();
groupIdToHeapBuffer.allocateGroupIfNeeded(groupCount);

for (int position = 0; position < newPage.getPositionCount(); position++) {
int groupId = groupIds[position];
if (groupId >= currentGroups || groupIdToHeapBuffer.getHeapValueCount(groupId) < topN) {
return position;
}
long heapRootNodeIndex = groupIdToHeapBuffer.getHeapRootNodeIndex(groupId);
if (heapRootNodeIndex == UNKNOWN_INDEX) {
return position;
}
long rightPageRowId = peekRootRowIdByHeapNodeIndex(heapRootNodeIndex);
TsBlock rightPage = pageManager.getTsBlock(rightPageRowId);
int rightPosition = pageManager.getPosition(rightPageRowId);
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming convention: parameters and variables use "page" terminology (newPage, pageManager, rightPage) while the codebase typically uses "TsBlock" terminology. For consistency with the rest of the IoTDB codebase, consider renaming: newPage → newTsBlock, pageManager → tsBlockManager, rightPage → rightTsBlock. The GroupedTopNRankBuilder already uses the consistent naming (tsBlockManager, tsBlock), so this file should follow the same pattern.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +145 to +163
@Override
public PlanNode visitTopKRanking(TopKRankingNode node, Context context) {
PlanNode newNode = node.clone();
for (PlanNode child : node.getChildren()) {
newNode.addChild(child.accept(this, context));
}
context.setCannotEliminateSort(true);
return newNode;
}

@Override
public PlanNode visitRowNumber(RowNumberNode node, Context context) {
PlanNode newNode = node.clone();
for (PlanNode child : node.getChildren()) {
newNode.addChild(child.accept(this, context));
}
context.setCannotEliminateSort(true);
return newNode;
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation correctly prevents sort elimination for TopKRanking and RowNumber nodes, but there's a comment at line 169-173 (not shown in diff but nearby) that lists only 3 situations where sort cannot be eliminated. With these new visitor methods, there are now 5 situations. Consider updating that comment to include TopKRanking and RowNumber nodes, or generalize it to avoid becoming stale when new cases are added.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +193 to +208
public void testPushDownLimitIntoWindowWithRank() {
// TopKRanking(RANK, topN=2) keeps rank≤2 per partition, then LIMIT 2 on final result
// d1 rank≤2: 1.0(r=1), 3.0(r=2), 3.0(r=2) → 3 rows sorted by time: 09:05,09:09,09:10
// d2 rank≤2: 2.0(r=1), 4.0(r=2) → 2 rows
// ORDER BY device, time LIMIT 2 → first 2 from d1
String[] expectedHeader = new String[] {"time", "device", "value", "rk"};
String[] retArray =
new String[] {
"2021-01-01T09:05:00.000Z,d1,3.0,2,", "2021-01-01T09:07:00.000Z,d1,5.0,4,",
};
tableResultSetEqualTest(
"SELECT * FROM (SELECT *, rank() OVER (PARTITION BY device ORDER BY value) as rk FROM demo) ORDER BY device, time LIMIT 2",
expectedHeader,
retArray,
DATABASE_NAME);
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test comment and expected result are inconsistent with the SQL query. The comment states "TopKRanking(RANK, topN=2) keeps rank≤2 per partition", but the SQL query does not have a WHERE clause to filter by rank. Without such a filter, all rows should be returned with their computed rank values, not just those with rank≤2. The expected result showing value=5 with rank=4 is correct for the query without a WHERE clause, but contradicts the comment. Either add a WHERE clause like "WHERE rk <= 2" to match the comment, or update the comment to correctly describe that this test computes ranks for all rows and then applies LIMIT 2 to the final result set after ordering by device and time.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limit still uses TopKRankingNode.

@Test
public void testRankWithPartitionDescendingAndTies() {
// d1: values [5, 3, 3, 1] DESC → 5(r=1),3(r=2),3(r=2),1(r=4) → keep rank≤2
// d2: values [6, 2, 4] DESC → 6(r=1),4(r=2),2(r=3) → keep rank≤2
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly describes the data for d2. The comment states "d2: values [6, 2, 4]" but looking at the actual test data arrays, d2 only has two values: 6 (at index 4) and 4 (at index 5). There is no value 2 for d2 in this test case. The comment should be corrected to "d2: values [6, 4] DESC → 6(r=1),4(r=2) → keep rank≤2".

Suggested change
// d2: values [6, 2, 4] DESC → 6(r=1),4(r=2),2(r=3) → keep rank≤2
// d2: values [6, 4] DESC → 6(r=1),4(r=2) → keep rank≤2

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHAT R U TALKING ABOUT.

}

public static long nextPowerOfTwo(long x) {
return 1L << 64 - Long.numberOfLeadingZeros(x - 1L);
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator precedence in this expression is potentially problematic. The subtraction operator has higher precedence than the left shift operator, so this is evaluated as 1L << (64 - Long.numberOfLeadingZeros(x - 1L)). While this is likely the intended behavior, it would be clearer and safer to add explicit parentheses: 1L << (64 - Long.numberOfLeadingZeros(x - 1L)). Additionally, when x is 1, x-1 is 0, and Long.numberOfLeadingZeros(0) returns 64, resulting in 1L << (64 - 64) = 1L << 0 = 1, which is correct. However, shifting by 64 or more bits can lead to undefined behavior in Java, so this should be handled carefully, though in practice the shift amount is masked to the lower 6 bits for long values.

Suggested change
return 1L << 64 - Long.numberOfLeadingZeros(x - 1L);
return 1L << (64 - Long.numberOfLeadingZeros(x - 1L));

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
6.6% Duplication on New Code (required ≤ 5%)

See analysis details on SonarQube Cloud

@codecov
Copy link

codecov bot commented Feb 26, 2026

Codecov Report

❌ Patch coverage is 76.98413% with 145 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.75%. Comparing base (8782662) to head (5680b37).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...engine/execution/operator/TopNPeerGroupLookup.java 57.89% 64 Missing ⚠️
...execution/operator/GroupedTopNRankAccumulator.java 88.17% 37 Missing ⚠️
...erator/SimpleTsBlockWithPositionEqualsAndHash.java 60.00% 18 Missing ⚠️
...ational/planner/optimizations/SortElimination.java 0.00% 12 Missing ⚠️
...ine/execution/operator/GroupedTopNRankBuilder.java 90.62% 9 Missing ⚠️
...anner/optimizations/TransformSortToStreamSort.java 0.00% 4 Missing ⚠️
...xecution/operator/RowIdComparisonHashStrategy.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17210      +/-   ##
============================================
+ Coverage     39.70%   39.75%   +0.05%     
  Complexity      282      282              
============================================
  Files          5101     5107       +6     
  Lines        341925   342675     +750     
  Branches      43527    43674     +147     
============================================
+ Hits         135773   136247     +474     
- Misses       206152   206428     +276     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants