Message130

Author stephen
Recipients
Date 2008-01-19.04:58:07
Content
Michael Sperber writes:

 > Could you give a hint about detecting UTF-8?  (I know what UTF-8 looks
 > like, but enough about the other coding systems to be able to say what
 > distinguishes them.)

There are a lot of coding systems.  But basically if you have as many
as 3 non-ASCII characters, the chance that any natural language text
"looks like" UTF-8 is vanishingly small.  Except at the beginning and
end of the string, a single byte >= 0xC0 gives you information about
*at least* three other bytes: the preceding one may *not* be >= 0xC0,
the following N bytes must be in the range 0x80 to 0xBF, and the next
one after that must not be >= 0xC0.

However, this should all already be part of the 'undecided' coding
system.  If it's not working, there's probably something tricky going
on with process buffers.

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta@xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta
History
Date User Action Args
2008-01-19 04:58:07stephenlinkissue104 messages
2008-01-19 04:58:07stephencreate