I have to read a file encoded in UTF-16 using nodejs (in chunks because it is very large). The data from the file will go into a mongodb, so I will need to convert it into utf-8. From googling, it seems that this is just plain not supported by Node, and I will have to resort to converting the raw data from a buffer myself. But I also think there ought to be a better way and I'm just not finding it. Any suggestions?
Thanks.
Node supports UCS-2, the UTF-16 subset supported by JavaScript. Try using that.
See this pull request.
Replace the normal utf8
you'd have when reading a text file with ucs2
:
var fileContents = fs.readFileSync('import.csv','ucs2')
Also, for Google: anyone getting additional � (question mark) characters appearing in a parsed file, this is probably the cause of your problem. Read the file as UTF16/UCS2 and the extra characters will disappear.