API reference: http://bespin.cz/~ondras/html/index.html
I need to count he number of chars in a line:
a b c d
e f g h
If I know that all the chars are ascii values then I can do:
Local<String> str = ...
String::AsciiValue s (str->ToString ());
unsigned char c;
for (int i=0; (c = (*s)[i]) != 0; i++){
//...
}
But the string can contain characters encoded with more than 1 byte:
↓ ↓ a b
↓ a b c
I cannot convert the string to a char* because in this case ↓ is encoded in 3 bytes, one char per byte, so my algorithm will add 3 chars instead of 1.
So I need to get the substring. In javascrit is pretty simple:
var s = "↓ ↓ a b";
var c;
for (var i=0; i<s.length; i++){
c = s.substring (i, i + 1);
//or c= s[i];
}
I need to do the same in C++.
Local<String> str = ...
for (int i=0; i<str->Length (); i++){
//???
//Another alternative is to get the String of each position, something like this:
//Local<String> s = str->Get (i);
}
Assuming that you are using this implementation of String::AsciiValue, there appears to be a length() method
Solved.
UTF8 code points: https://en.wikipedia.org/wiki/UTF-8
The basic idea is to mask the byte and check how many bytes must be ignored to fully read the multibyte char.
unsigned char masks[5] = { 192, 224, 240, 248, 252 };
Local<String> str = ...
String::Utf8Value s (str->ToString ());
unsigned char c;
int utf8Bytes = 0;
for (int i=0; (c = (*s)[i]) != 0; i++){
//Ignore utf8 check for one byte chars
if (c > 127){
if (utf8Bytes){
utf8Bytes--;
continue;
}
//Check whether is a utf8 multibyte char
for (int i=4; i>=0; i--){
if ((c & r->masks[i]) == r->masks[i]){
utf8Bytes = i + 1;
break;
}
}
if (utf8Bytes){
//Do something if it's a multibyte char
}
continue;
}
//Do something to check lines, chars, etc
}