Detect partial or incomplete characters read from a buffer

Question

Detect partial or incomplete characters read from a buffer

In a loop I am reading a stream, that is encoded as UTF-8, 10 bytes (say) in every loop. As the stream is being passed to a buffer first, I must specify its read length in bytes before transforming it to a UTF-8 string. The issue that I am facing is that sometimes it will read partial, incomplete characters. I need to fix this.

Is there a way to detect if a string ends with an incomplete character, or some check that I can perform on the last character of my string to determine this?

Preferably a “non single-encoding” solution would be the best.

node.js
character-encoding
buffer

Answer 1

If a buffer ends with an incomplete character and you convert it into a string and then initialize a new buffer from that string, the new buffer will be a different length (longer if you're using utf8, shorter if you're using ucs2) than the original.

Something like:

var b1=new Buffer(buf.toString('utf8'), 'utf8');
if (b2.length !== buf.length) {
   // buffer has an incomplete character
} else {
   // buffer is OK
}

Substitute your desired encoding for 'utf8'.

Note that this is dependent on how the current implementation of Buffer#toString deals with incomplete characters, which isn't documented, though it's unlikely to be changed in a way that would result in equal-length buffers (a future implementation might throw an error instead, so you should probably wrap the code in a try-catch block).