GH-49340: [R] Preserve row order in write_dataset()#49343
Open
marberts wants to merge 5 commits intoapache:mainfrom
Open
GH-49340: [R] Preserve row order in write_dataset()#49343marberts wants to merge 5 commits intoapache:mainfrom
write_dataset()#49343marberts wants to merge 5 commits intoapache:mainfrom
Conversation
|
|
jonkeane
requested changes
Feb 22, 2026
Member
jonkeane
left a comment
There was a problem hiding this comment.
Thanks for the contribution!
Would you mind please writing some tests for this behavior? Somewhere in https://github.com/apache/arrow/blob/main/r/tests/testthat/test-dataset-write.R (+ following similar patterns there) would be lovely.
jonkeane
reviewed
Feb 23, 2026
Member
jonkeane
left a comment
There was a problem hiding this comment.
Thanks for the tests, I have some suggestions about naming + slightly more idiomatic expectations.
It also looks like there are some cpp linting issues too: https://github.com/apache/arrow/actions/runs/22290480080/job/64535896409?pr=49343#step:6:42
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
write_dataset(df)need not preserve the row-ordering ofdfacross partitions. The arrow C++ library was recently updated (since 21.0.0) so that row ordering can be preserved when writing across partitions. This is useful for cases where it is assumed that row-ordering is unchanged within each partition.Created on 2026-02-20 with reprex v2.1.1
What changes are included in this PR?
Added an argument
preserve_ordertowrite_dataset()that setsFileSystemDatasetWriteOptions.preserve_orderto true in the call toExecPlan_Write().Are these changes tested?
Partially. The change is small, so I haven't written unit tests. I can revisit this if necessary.
Are there any user-facing changes?
Yes, there is a new argument in
write_dataset(). The default keeps the current behavior and the argument appears after all existing arguments, so the change in backwards compatible.write_dataset()#49340