Convert UTF16 to Latin1

From Cuis CookBook
Jump to navigation Jump to search

Problem. We got a data file in XML, but it is encoded in UTF16. Cuis is based on Latin1 characters, so, if possible we should convert to this character set before doing any operation.

Solution. Provided by Juan on the mailing list on date 11-Jun-2022.

utf16 := 'expo-test-IT-UTF16.xml' asFileEntry binaryContents. 
possibleBOM := utf16 copyFrom: 1 to: 2. 
isLittleEndian := true. "use your best guess" 
possibleBOM = #[255 254] ifTrue: [ 
       isLittleEndian := true. 
       utf16 := utf16 copyFrom: 3 to: utf16 size ]. 
possibleBOM = #[254 255] ifTrue: [ 
       isLittleEndian := false. 
       utf16 := utf16 copyFrom: 3 to: utf16 size ]. 
String streamContents: [ :out | 
     index := 1. 
     [index < utf16 size] whileTrue: [ 
       codePoint := utf16 unsignedShortAt: index bigEndian: isLittleEndian not. 
       out nextPut: (Character codePoint: codePoint). 
       index _ index + 2 ]].