Support rank function in TopK optimization of window function by Sh-Zh-7 · Pull Request #17210 · apache/iotdb

Sh-Zh-7 · 2026-02-21T06:32:26Z

Previous TopK optimization only support row_number window function, this PR add rank window function as well.

This PR also stop sort elimination when using TopKRankingOperator and RowNumberOperator since they may accept out of order data.

Copilot

Pull request overview

This pull request extends the TopK optimization for window functions to support the rank() function in addition to the previously supported row_number() function. The key difference between these functions is that rank() assigns the same rank value to rows with equal sort key values (peers), while row_number() assigns unique sequential numbers to all rows.

Changes:

Implemented rank() window function support in TopK optimization with proper handling of tied values (peer groups)
Prevented sort elimination and sort-to-stream-sort transformations for TopKRanking and RowNumber operators since they may accept out-of-order data
Added comprehensive test coverage for rank() functionality including edge cases with ties and multiple partitions

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
WindowFunctionOptimizationTest.java	Updated test queries from row_number() to rank() to verify TopK optimization works with rank function
TopKRankingOperatorTest.java	New comprehensive unit test file with 12 test cases covering rank() behavior with ties, partitions, and edge cases
IoTDBWindowFunction3IT.java	Added integration tests for rank() including filter pushdown, basic rank computation, and tie handling
TopKRankingOperator.java	Enabled RANK ranking type by instantiating GroupedTopNRankBuilder with appropriate comparator and equals/hash strategy
GroupedTopNRankBuilder.java	New builder class that manages rank computation using heap-based accumulator and peer group detection
GroupedTopNRankAccumulator.java	Core accumulator implementing rank logic with max-heap for efficient top-N selection and peer group management
TsBlockWithPositionEqualsAndHash.java	New interface for equality and hash computation needed for peer group detection in rank()
SimpleTsBlockWithPositionEqualsAndHash.java	Implementation of equals/hash interface supporting all TSDataType values with proper null handling
TopNPeerGroupLookup.java	Specialized hash table for efficient peer group lookup using composite (groupId, rowId) keys
RowIdComparisonHashStrategy.java	Combined interface merging comparison and hashing strategies with default equals implementation
TransformSortToStreamSort.java	Added visitors to prevent sort-to-stream-sort transformation for TopKRanking and RowNumber nodes
SortElimination.java	Added visitors to prevent sort elimination for TopKRanking and RowNumber nodes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T02:25:05Z

...main/java/org/apache/iotdb/db/queryengine/execution/operator/GroupedTopNRankAccumulator.java

+      TsBlock newPage,
+      int groupCount,
+      int[] groupIds,
+      TsBlockWithPositionComparator comparator,
+      RowReferenceTsBlockManager pageManager) {
+    int currentGroups = groupIdToHeapBuffer.getTotalGroups();
+    groupIdToHeapBuffer.allocateGroupIfNeeded(groupCount);
+
+    for (int position = 0; position < newPage.getPositionCount(); position++) {
+      int groupId = groupIds[position];
+      if (groupId >= currentGroups || groupIdToHeapBuffer.getHeapValueCount(groupId) < topN) {
+        return position;
+      }
+      long heapRootNodeIndex = groupIdToHeapBuffer.getHeapRootNodeIndex(groupId);
+      if (heapRootNodeIndex == UNKNOWN_INDEX) {
+        return position;
+      }
+      long rightPageRowId = peekRootRowIdByHeapNodeIndex(heapRootNodeIndex);
+      TsBlock rightPage = pageManager.getTsBlock(rightPageRowId);
+      int rightPosition = pageManager.getPosition(rightPageRowId);


Inconsistent naming convention: parameters and variables use "page" terminology (newPage, pageManager, rightPage) while the codebase typically uses "TsBlock" terminology. For consistency with the rest of the IoTDB codebase, consider renaming: newPage → newTsBlock, pageManager → tsBlockManager, rightPage → rightTsBlock. The GroupedTopNRankBuilder already uses the consistent naming (tsBlockManager, tsBlock), so this file should follow the same pattern.

Copilot · 2026-02-25T02:25:05Z

...a/org/apache/iotdb/db/queryengine/plan/relational/planner/optimizations/SortElimination.java

+    @Override
+    public PlanNode visitTopKRanking(TopKRankingNode node, Context context) {
+      PlanNode newNode = node.clone();
+      for (PlanNode child : node.getChildren()) {
+        newNode.addChild(child.accept(this, context));
+      }
+      context.setCannotEliminateSort(true);
+      return newNode;
+    }
+
+    @Override
+    public PlanNode visitRowNumber(RowNumberNode node, Context context) {
+      PlanNode newNode = node.clone();
+      for (PlanNode child : node.getChildren()) {
+        newNode.addChild(child.accept(this, context));
+      }
+      context.setCannotEliminateSort(true);
+      return newNode;
+    }


The implementation correctly prevents sort elimination for TopKRanking and RowNumber nodes, but there's a comment at line 169-173 (not shown in diff but nearby) that lists only 3 situations where sort cannot be eliminated. With these new visitor methods, there are now 5 situations. Consider updating that comment to include TopKRanking and RowNumber nodes, or generalize it to avoid becoming stale when new cases are added.

Copilot · 2026-02-25T02:25:05Z

integration-test/src/test/java/org/apache/iotdb/relational/it/db/it/IoTDBWindowFunction3IT.java

+  public void testPushDownLimitIntoWindowWithRank() {
+    // TopKRanking(RANK, topN=2) keeps rank≤2 per partition, then LIMIT 2 on final result
+    // d1 rank≤2: 1.0(r=1), 3.0(r=2), 3.0(r=2) → 3 rows sorted by time: 09:05,09:09,09:10
+    // d2 rank≤2: 2.0(r=1), 4.0(r=2) → 2 rows
+    // ORDER BY device, time LIMIT 2 → first 2 from d1
+    String[] expectedHeader = new String[] {"time", "device", "value", "rk"};
+    String[] retArray =
+        new String[] {
+          "2021-01-01T09:05:00.000Z,d1,3.0,2,", "2021-01-01T09:07:00.000Z,d1,5.0,4,",
+        };
+    tableResultSetEqualTest(
+        "SELECT * FROM (SELECT *, rank() OVER (PARTITION BY device ORDER BY value) as rk FROM demo) ORDER BY device, time LIMIT 2",
+        expectedHeader,
+        retArray,
+        DATABASE_NAME);
+  }


The test comment and expected result are inconsistent with the SQL query. The comment states "TopKRanking(RANK, topN=2) keeps rank≤2 per partition", but the SQL query does not have a WHERE clause to filter by rank. Without such a filter, all rows should be returned with their computed rank values, not just those with rank≤2. The expected result showing value=5 with rank=4 is correct for the query without a WHERE clause, but contradicts the comment. Either add a WHERE clause like "WHERE rk <= 2" to match the comment, or update the comment to correctly describe that this test computes ranks for all rows and then applies LIMIT 2 to the final result set after ordering by device and time.

Limit still uses TopKRankingNode.

Copilot · 2026-02-25T02:25:06Z

...g/apache/iotdb/db/queryengine/execution/operator/process/window/TopKRankingOperatorTest.java

+  @Test
+  public void testRankWithPartitionDescendingAndTies() {
+    // d1: values [5, 3, 3, 1] DESC → 5(r=1),3(r=2),3(r=2),1(r=4) → keep rank≤2
+    // d2: values [6, 2, 4] DESC → 6(r=1),4(r=2),2(r=3) → keep rank≤2


The comment incorrectly describes the data for d2. The comment states "d2: values [6, 2, 4]" but looking at the actual test data arrays, d2 only has two values: 6 (at index 4) and 4 (at index 5). There is no value 2 for d2 in this test case. The comment should be corrected to "d2: values [6, 4] DESC → 6(r=1),4(r=2) → keep rank≤2".

Suggested change

// d2: values [6, 2, 4] DESC → 6(r=1),4(r=2),2(r=3) → keep rank≤2

// d2: values [6, 4] DESC → 6(r=1),4(r=2) → keep rank≤2

WHAT R U TALKING ABOUT.

Copilot · 2026-02-25T02:25:06Z

...de/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/TopNPeerGroupLookup.java

+  }
+
+  public static long nextPowerOfTwo(long x) {
+    return 1L << 64 - Long.numberOfLeadingZeros(x - 1L);


The operator precedence in this expression is potentially problematic. The subtraction operator has higher precedence than the left shift operator, so this is evaluated as 1L << (64 - Long.numberOfLeadingZeros(x - 1L)). While this is likely the intended behavior, it would be clearer and safer to add explicit parentheses: 1L << (64 - Long.numberOfLeadingZeros(x - 1L)). Additionally, when x is 1, x-1 is 0, and Long.numberOfLeadingZeros(0) returns 64, resulting in 1L << (64 - 64) = 1L << 0 = 1, which is correct. However, shifting by 64 or more bits can lead to undefined behavior in Java, so this should be handled carefully, though in practice the shift amount is masked to the lower 6 bits for long values.

Suggested change

return 1L << 64 - Long.numberOfLeadingZeros(x - 1L);

return 1L << (64 - Long.numberOfLeadingZeros(x - 1L));

sonarqubecloud · 2026-02-26T02:51:10Z

Quality Gate failed

Failed conditions
6.6% Duplication on New Code (required ≤ 5%)

See analysis details on SonarQube Cloud

codecov · 2026-02-26T03:47:36Z

Codecov Report

❌ Patch coverage is 76.98413% with 145 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.75%. Comparing base (8782662) to head (5680b37).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
...engine/execution/operator/TopNPeerGroupLookup.java	57.89%	64 Missing ⚠️
...execution/operator/GroupedTopNRankAccumulator.java	88.17%	37 Missing ⚠️
...erator/SimpleTsBlockWithPositionEqualsAndHash.java	60.00%	18 Missing ⚠️
...ational/planner/optimizations/SortElimination.java	0.00%	12 Missing ⚠️
...ine/execution/operator/GroupedTopNRankBuilder.java	90.62%	9 Missing ⚠️
...anner/optimizations/TransformSortToStreamSort.java	0.00%	4 Missing ⚠️
...xecution/operator/RowIdComparisonHashStrategy.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #17210      +/-   ##
============================================
+ Coverage     39.70%   39.75%   +0.05%     
  Complexity      282      282              
============================================
  Files          5101     5107       +6     
  Lines        341925   342675     +750     
  Branches      43527    43674     +147     
============================================
+ Hits         135773   136247     +474     
- Misses       206152   206428     +276

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sh-Zh-7 added 2 commits February 21, 2026 14:25

Support rank function in TopK optimization.

2b32a38

Add license header to all newly-added files.

ed79765

JackieTien97 requested a review from Copilot February 25, 2026 02:19

Copilot started reviewing on behalf of JackieTien97 February 25, 2026 02:20 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Sh-Zh-7 added 2 commits February 26, 2026 10:28

Merge master and resolve conflict.

c817cb0

Modified comments and symbol name for copilot.

5680b37

	// d2: values [6, 2, 4] DESC → 6(r=1),4(r=2),2(r=3) → keep rank≤2
	// d2: values [6, 4] DESC → 6(r=1),4(r=2) → keep rank≤2

	return 1L << 64 - Long.numberOfLeadingZeros(x - 1L);
	return 1L << (64 - Long.numberOfLeadingZeros(x - 1L));

Conversation

Sh-Zh-7 commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sh-Zh-7 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sh-Zh-7 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sh-Zh-7 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sh-Zh-7 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sh-Zh-7 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Feb 26, 2026

Quality Gate failed

Uh oh!

codecov bot commented Feb 26, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants