node version is 0.6.17.
I was trying to get the number of bytes of a character (SO question) but now I'm testing how to get the REAL number of bytes. The REAL number means that if the char has a value greater than 127 it has to be encoded with a leading extra byte (UTF8 wiki).
Please see:
console.log (Buffer.byteLength ("a", "utf8")); //bytes: 1, UNICODE hex: 0x61 (1), REAL hex: 0x61 (1)
console.log (Buffer.byteLength ("¡", "utf8")); //bytes: 2, UNICODE hex: 0xA1 (1), REAL hex: 0xC2A1 (2)
console.log (Buffer.byteLength ("↑", "utf8")); //bytes: 3, UNICODE hex: 0x2191 (2), REAL hex: 0xE28691 (3)
console.log (Buffer.byteLength ("", "utf8")); //bytes: 3, UNICODE hex: 0x24065 (3), REAL hex: 0xF0A481A5 (4)
Here we have 2 possibilities:
Buffer.byteLength()
returns the UNICODE number of bytes. Example: Unicode for ¡
is 0xA1 (1 byte). If this is true, then the function is bugged because it returns 2 (real length is 2).Buffer.bytelength()
returns the REAL number of bytes. Example: Real hex value for
is 0xF0A481A5 (4 bytes). If this is true, then the function is bugged because it returns 3 (unicode length is 3).What do you think? Is the function bugged?
Solved:
https://github.com/joyent/node/issues/3262#issuecomment-5677385
node.js version 6 only supports BMP character set (0x0000 - 0xFFFF). Version 7 and above supports characters greater than 0xFFFF (not tested).
The function returns the REAL length, so examples 1, 2, 3 are correct and 4 is incorrect.