Skip to content

fix(builtins): handle multi-byte UTF-8 in tr expand_char_set()#468

Merged
chaliy merged 2 commits intomainfrom
claude/fix-436-Y2nIj
Mar 2, 2026
Merged

fix(builtins): handle multi-byte UTF-8 in tr expand_char_set()#468
chaliy merged 2 commits intomainfrom
claude/fix-436-Y2nIj

Conversation

@chaliy
Copy link
Contributor

@chaliy chaliy commented Mar 2, 2026

Summary

  • expand_char_set() used spec.as_bytes() and bytes[i] as char which corrupted multi-byte UTF-8
  • Rewrote to use char-based iteration with Vec<char> indexing

Test plan

  • Unit test: test_tr_multibyte_utf8
  • Unit test: test_tr_multibyte_utf8_range
  • Unit test: test_cut_multibyte_utf8_chars
  • Unit test: test_expand_char_set_multibyte

Closes #436

Rewrite expand_char_set() to use char-based iteration instead of
byte-based. The old code used spec.as_bytes() and `bytes[i] as char`
which corrupted multi-byte UTF-8 characters.

Closes #436
@chaliy chaliy force-pushed the claude/fix-436-Y2nIj branch from e29bcd9 to 9e4ff61 Compare March 2, 2026 01:42
@chaliy chaliy merged commit afc3373 into main Mar 2, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[M-13] cut/tr builtins silently drop multi-byte UTF-8 characters (TM-UNI-017)

2 participants