Skip to content

Comments

[SPARK-55636][CONNECT] Add detailed errors in case of deduplication of invalid columns#54422

Open
pranavdev022 wants to merge 2 commits intoapache:masterfrom
pranavdev022:dedup-errors-connect
Open

[SPARK-55636][CONNECT] Add detailed errors in case of deduplication of invalid columns#54422
pranavdev022 wants to merge 2 commits intoapache:masterfrom
pranavdev022:dedup-errors-connect

Conversation

@pranavdev022
Copy link
Contributor

@pranavdev022 pranavdev022 commented Feb 23, 2026

What changes were proposed in this pull request?

This PR updates the error handling for invalid deduplicate column names in Spark Connect to use the standard UNRESOLVED_COLUMN_AMONG_FIELD_NAMES error class instead of throwing INTERNAL_ERROR, a generic error message.

Example Classic Connect (Before) Connect (After)
image Cannot resolve column name "artist_id" among (id, song_name, artist_name). [INTERNAL_ERROR] Invalid deduplicate column artist_id SQLSTATE: XX000 [UNRESOLVED_COLUMN_AMONG_FIELD_NAMES] Cannot resolve column name "artist_id" among (id, song_name, artist_name). SQLSTATE: 42703
image Cannot resolve column name "cont.f1" among (id, cont). [INTERNAL_ERROR] Invalid deduplicate column cont.f1 SQLSTATE: XX000 [UNRESOLVED_COLUMN_AMONG_FIELD_NAMES] Cannot resolve column name "cont.f1" among (id, cont). SQLSTATE: 42703
image works works works
image Cannot resolve column name "song.names" among (id, song.name, artist_name). [INTERNAL_ERROR] Invalid deduplicate column song.names SQLSTATE: XX000 [UNRESOLVED_COLUMN_AMONG_FIELD_NAMES] Cannot resolve column name "song.names" among (id, song.name, artist_name). SQLSTATE: 42703
image works works works
image Cannot resolve column name "cont.value" among (id, cont.val). [INTERNAL_ERROR] Invalid deduplicate column cont.value SQLSTATE: XX000 [UNRESOLVED_COLUMN_AMONG_FIELD_NAMES] Cannot resolve column name "cont.value" among (id, cont.val). SQLSTATE: 42703
image image same same

Why are the changes needed?

The previous error message in Spark Connect was not consistent with classic Spark and lacked helpful context.
This change aligns Spark Connect error messages with classic Spark, providing users with:

  1. The correct error class (UNRESOLVED_COLUMN_AMONG_FIELD_NAMES instead of INTERNAL_ERROR).
  2. The correct SQLSTATE (42703 instead of XX000).
  3. A list of available column names to help users fix the issue.

Does this PR introduce any user-facing change?

Yes. Error messages for invalid deduplicate column names in Spark Connect are now more detailed and consistent with classic Spark.

How was this patch tested?

Tested with a custom image with the proposed changes.

Was this patch authored or co-authored using generative AI tooling?

No

@pranavdev022 pranavdev022 changed the title [to-do] Add detailed errors in case of deduplication of invalid columns [SPARK-55636][CONNECT] Add detailed errors in case of deduplication of invalid columns Feb 23, 2026
Copy link
Contributor

@khakhlyuk khakhlyuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix!

else Deduplicate(allColumns, queryExecution.analyzed)
} else {
val toGroupColumnNames = rel.getColumnNamesList.asScala.toSeq
val fieldNames = allColumns.map(_.name).mkString(", ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitty nit: this line can be moved between 1448 and 1449 since it's not used in the outer scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes moved, that would be better.

@holdenk
Copy link
Contributor

holdenk commented Feb 23, 2026

Thanks for working on error messages/debugging/dev ex :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants