Skip to content

fix: use UTF-8 encoding for CLI file output on Windows#1789

Open
Br1an67 wants to merge 1 commit intounclecode:mainfrom
Br1an67:fix/issue-1762-cli-encoding
Open

fix: use UTF-8 encoding for CLI file output on Windows#1789
Br1an67 wants to merge 1 commit intounclecode:mainfrom
Br1an67:fix/issue-1762-cli-encoding

Conversation

@Br1an67
Copy link

@Br1an67 Br1an67 commented Mar 1, 2026

Summary

On Windows, open() defaults to the system encoding (e.g. cp1252/charmap) which cannot encode all Unicode characters. When crawled content contains characters like × (U+00D7), writing to file fails with 'charmap' codec can't encode character.

The fix explicitly sets encoding='utf-8' on all file write operations in the CLI.

Fixes #1762

List of files changed and why

  • crawl4ai/cli.py — Added encoding='utf-8' to all 4 open(output_file, 'w') calls

How Has This Been Tested?

Verified that all file write paths now use UTF-8 encoding, which supports the full Unicode range.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

On Windows, open() defaults to the system encoding (e.g. cp1252)
which cannot encode all Unicode characters. This causes a
'charmap' codec error when writing crawled content to files.

Explicitly set encoding='utf-8' on all file writes in the CLI.

Fixes unclecode#1762
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: CLI Error charmap

1 participant