Skip to content

Comments

GH-35806: [R] Improve error message for null type inference with sparse CSV data#49338

Open
thisisnic wants to merge 4 commits intoapache:mainfrom
thisisnic:GH-35806-null-type-error-message
Open

GH-35806: [R] Improve error message for null type inference with sparse CSV data#49338
thisisnic wants to merge 4 commits intoapache:mainfrom
thisisnic:GH-35806-null-type-error-message

Conversation

@thisisnic
Copy link
Member

@thisisnic thisisnic commented Feb 19, 2026

Rationale for this change

When reading a CSV with sparse data (many missing values followed by actual values), Arrow can infer a column type as null based on the first block of data. When non-null values appear later, the error message incorrectly suggests using skip = 1 for header rows, which is misleading.

What changes are included in this PR?

Adds a specific check for "conversion error to null" that provides a helpful message explaining the cause (type inference from sparse data) and the solution (change the block size to use for inference).

Are these changes tested?

Yes, added a test in test-dataset-csv.R.

Are there any user-facing changes?

Yes, improved error message when CSV type inference fails due to sparse data.


This PR was authored by Claude (Opus 4.5) and reviewed by @thisisnic.

🤖 Generated with Claude Code

…h sparse CSV data

When a CSV column contains only missing values in the first block of data,
Arrow infers the type as null. If a non-null value appears later, the
conversion fails with an unhelpful error suggesting `skip = 1`.

This change adds a specific check for "conversion error to null" and
provides a more helpful message explaining the cause (type inference
from sparse data) and the solution (specify column types explicitly).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@thisisnic thisisnic requested a review from jonkeane as a code owner February 19, 2026 09:31
@github-actions
Copy link

⚠️ GitHub issue #35806 has been automatically assigned in GitHub to PR creator.

@thisisnic thisisnic marked this pull request as draft February 19, 2026 13:50
@thisisnic
Copy link
Member Author

I'm not totally happy with the error message, will rewrite before marking ready for review

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Feb 23, 2026
@thisisnic thisisnic marked this pull request as ready for review February 23, 2026 14:20
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the improved message, though I'm not totally sure I follow why we can assert the reason for why null was inferred? And also wouldn't this similarly error if someone specified null manually and then there was data(??)

msg <- c(
msg,
i = paste(
"Column type was inferred as null because the first block of data",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Column type was inferred as null because the first block of data",
"Column type was inferred as `null` because the first block of data",

or maybe even NULL?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants