Convert UTF16 to Latin1: Difference between revisions
Jump to navigation
Jump to search
(imported material) |
(syntax hilight) |
||
Line 1: | Line 1: | ||
'''Problem'''. We got a data file in XML, but it is encoded in UTF16. Cuis is |
'''Problem'''. We got a data file in XML, but it is encoded in UTF16. Cuis is based on Latin1 characters, so, if possible we should convert to this character set before doing any operation. |
||
'''Solution'''. Provided by Juan on the mailing list on date 11-Jun-2022. |
'''Solution'''. Provided by Juan on the mailing list on date 11-Jun-2022.<syntaxhighlight lang="smalltalk"> |
||
utf16 := 'expo-test-IT-UTF16.xml' asFileEntry binaryContents. |
|||
possibleBOM := utf16 copyFrom: 1 to: 2. |
|||
isLittleEndian := true. "use your best guess" |
|||
possibleBOM = #[255 254] ifTrue: [ |
|||
isLittleEndian |
isLittleEndian := true. |
||
utf16 |
utf16 := utf16 copyFrom: 3 to: utf16 size ]. |
||
possibleBOM = #[254 255] ifTrue: [ |
|||
isLittleEndian |
isLittleEndian := false. |
||
utf16 |
utf16 := utf16 copyFrom: 3 to: utf16 size ]. |
||
String streamContents: [ :out | |
|||
index |
index := 1. |
||
[index < utf16 size] whileTrue: [ |
[index < utf16 size] whileTrue: [ |
||
codePoint := utf16 unsignedShortAt: index bigEndian: isLittleEndian not. |
|||
out nextPut: (Character codePoint: codePoint). |
|||
index _ index + 2 ]]. |
|||
</syntaxhighlight> |
Latest revision as of 20:16, 12 May 2025
Problem. We got a data file in XML, but it is encoded in UTF16. Cuis is based on Latin1 characters, so, if possible we should convert to this character set before doing any operation.
Solution. Provided by Juan on the mailing list on date 11-Jun-2022.
utf16 := 'expo-test-IT-UTF16.xml' asFileEntry binaryContents.
possibleBOM := utf16 copyFrom: 1 to: 2.
isLittleEndian := true. "use your best guess"
possibleBOM = #[255 254] ifTrue: [
isLittleEndian := true.
utf16 := utf16 copyFrom: 3 to: utf16 size ].
possibleBOM = #[254 255] ifTrue: [
isLittleEndian := false.
utf16 := utf16 copyFrom: 3 to: utf16 size ].
String streamContents: [ :out |
index := 1.
[index < utf16 size] whileTrue: [
codePoint := utf16 unsignedShortAt: index bigEndian: isLittleEndian not.
out nextPut: (Character codePoint: codePoint).
index _ index + 2 ]].