In a loop I am reading a stream, that is encoded as UTF-8, 10 bytes (say) in every loop. As the stream is being passed to a buffer first, I must specify its read length in bytes before transforming it to a UTF-8 string. The issue that I am facing is that sometimes it will read partial, incomplete characters. I need to fix this.
Is there a way to detect if a string ends with an incomplete character, or some check that I can perform on the last character of my string to determine this?
Preferably a “non single-encoding” solution would be the best.
If a buffer ends with an incomplete character and you convert it into a string and then initialize a new buffer from that string, the new buffer will be a different length (longer if you're using utf8, shorter if you're using ucs2) than the original.
Something like:
var b1=new Buffer(buf.toString('utf8'), 'utf8');
if (b2.length !== buf.length) {
// buffer has an incomplete character
} else {
// buffer is OK
}
Substitute your desired encoding for 'utf8'.
Note that this is dependent on how the current implementation of Buffer#toString
deals with incomplete characters, which isn't documented, though it's unlikely to be changed in a way that would result in equal-length buffers (a future implementation might throw an error instead, so you should probably wrap the code in a try-catch block).